BATCH-2434 Improve TransactionAwareBufferedWriter #387

marschall · 2015-09-13T17:27:26Z

TransactionAwareBufferedWriter offers a number of optimization
potentials. First it buffers at the char level. Buffering at the byte
level instead saves about 50% memory usage in common cases. Second it
does not overwrite any of the #write(String) methods leading to
unnecessary intermediate copies.

buffer in a ByteArrayOutputStream instead of StringBuilder
overwrite #write(String) methods to avoid copies

Together these two changes should help to reduce both live set size and
allocation rate.

Issue: BATCH-2434
https://jira.spring.io/browse/BATCH-2434

marschall · 2015-09-13T17:46:17Z

I have signed and agree to the terms of the SpringSource Individual Contributor License Agreement.

fmbenhassine · 2019-07-12T11:12:47Z

Hi @marschall ,

Thank you for this PR!

Buffering at the byte level instead saves about 50% memory usage in common cases

How did you measure this? I would like to be able to measure the improvement of this PR before applying any change.

marschall · 2019-07-13T18:26:03Z

Hi @marschall ,

Thank you for this PR!

Buffering at the byte level instead saves about 50% memory usage in common cases

How did you measure this? I would like to be able to measure the improvement of this PR before applying any change.

TransactionAwareBufferedWriter uses a StringBuilder to buffer. StringBuilder uses a char[] to buffer. A char is 16 bit so we end up using 16 bit for every character.
With the change TransactionAwareBufferedWriter uses ByteArrayOutputStream to buffer. ByteArrayOutputStream uses a byte[] to buffer. When a 1 byte encoding is used like ASCII or Latin-1 we end up using only 8 bit for every character. Even if we use UFT-8 most characters end up only using 8 bit.

This changes a bit with Java 9+ where StringBuilder uses a byte[] with a flexible encoding. When the whole string is Latin-1 then only a single byte is used for storing a character, otherwise two bytes are used as before.

Doing a heap dump while a transaction is active and looking at the retained heap of a TransactionAwareBufferedWriter should reveal the differences on Java 8. On Java 9+ a single non Latin-1 character like € has to be in the buffer, otherwise they should be equal.

fmbenhassine · 2019-07-15T10:32:05Z

Thank you for your answer, I was aware of the theory and just wanted to know how you practically measured the 50% improvement in terms of memory usage.

I see the PR is opened against the 3.0.x branch, can you please rebase it on the latest master?

I will add some inline comments about the changes.

.../main/java/org/springframework/batch/support/transaction/TransactionAwareBufferedWriter.java

...tch-infrastructure/src/main/java/org/springframework/batch/item/file/FlatFileItemWriter.java

.../main/java/org/springframework/batch/support/transaction/TransactionAwareBufferedWriter.java

fmbenhassine · 2019-09-05T14:42:30Z

.../main/java/org/springframework/batch/support/transaction/TransactionAwareBufferedWriter.java

+			this.writer = new OutputStreamWriter(outputStream, encoding);
+		}
+
+		int length() throws IOException {


This method is used in org.springframework.batch.support.transaction.TransactionAwareBufferedWriter#getBufferSize. According to its Javadoc, the method should return the size of unflushed buffered data. However, the method length flushes the writer before returning the size of the outputStream. Do we need to flush the writer to get the length of the buffer? A get method should be side-effect free IMO. I tried to remove the writer.flush but this makes several tests to fail. Wdyt?

The issue is that OutputStreamWriter has an internal buffer in sun.nio.cs.StreamEncoder. If we want to know how many bytes we will write this will be the bytes in the ByteArrayOutputStream plus the bytes in the OutputStreamWriter / sun.nio.cs.StreamEncoder. There is no way of accessing the number of bytes in the internal buffer in OutputStreamWriter / sun.nio.cs.StreamEncoder. The only way to be sure is to flush those bytes to the ByteArrayOutputStream.

fmbenhassine · 2020-04-29T16:22:51Z

@marschall Thank you for all these updates! While I like the improvement suggested here and the build is passing with this PR, I still have a couple of concerns:

According to its javadoc, TransactionAwareBufferedWriter is designed to be a wrapper for a file channel and the fact that we are changing its constructor to accept an output stream instead of a file channel makes me nervous (even if an output stream has a file channel associated with it). I feel like we can implement the improvement without introducing this API breaking change (that we are trying to make non breaking by deprecating the current constructor in favor of the new one)
TransactionAwareBufferedWriter is used in all file item writers (flat, json and xml files). So any hidden regression related to this can have a huge impact. The reason I'm pointing that out is that while we have some pretty good unit tests for this class, I don't see any integration test covering the "file write failed + restart" scenario where Spring Batch should truncate the file to the last known offset before starting to write items from where it left off in the previous failed run. I can take care of creating such a test suite (a test for each file type json/flat/xml + different encodings: single byte and multi-byte), but I really want to make sure there are no regressions at this level before merging this PR.

@marschall What do you think?

I also would like to have a second opinion on that PR, @mminella What are your thoughts? Anything I could have missed or overlooked?

Thank you both.

marschall · 2020-05-02T11:27:22Z

@benas can you be a bit more specific what changes you would like to see or which approach you would prefer? A deprecated constructor is provided for backwards comparability. Should I change the class Javadoc as well?

mminella · 2020-05-04T17:58:24Z

.../main/java/org/springframework/batch/support/transaction/TransactionAwareBufferedWriter.java


 	private FileChannel channel;

+	private OutputStream outputStream;


We would still require a FileOutputStream given the dependency on the FileChannel so I'd make this be a FileOutputStream

fmbenhassine · 2020-05-06T12:55:53Z

We have been discussing this with Michael and here are our thoughts. Since the PR contains two improvements, I will address them separately.

1. buffer in a ByteArrayOutputStream instead of StringBuilder

This changes a bit with Java 9+ where StringBuilder uses a byte[] with a flexible encoding.

Indeed, this is basically JEP 254. Since v4.3 will be the last version in the v4 line (as announced last year) and since the next major version will have a Java9+ baseline anyway, we think the deprecation + additional complexity here is not worth it for the remaining short lifetime of v4.

2. overwrite #write(String) methods to avoid copies

This enhancement makes sense and could be merged. I tried it here and the build passes without any regression.

@marschall Could you please update the PR to keep only the second enhancement? It should be good to merge with that. Thank you upfront.

marschall · 2020-05-06T16:02:07Z

@benas yes I can do this

TransactionAwareBufferedWriter offers a number of optimization potentials. First it creates an unnecessary local, temporary char[] in * avoid local, temporary char[] in #write(char[], int, int) * overwrite #write(String) methods to avoid copies Together these two changes should help to reduce allocation rate. Issue: BATCH-2434 https://jira.spring.io/browse/BATCH-2434

marschall · 2020-05-06T18:43:20Z

I pushed the new version.

fmbenhassine · 2020-05-07T14:38:49Z

LGTM. Rebased and merged as 10fd371. Thank you very much for your contribution!

fmbenhassine added status: ready-to-review pr-for: enhancement labels Oct 6, 2018

fmbenhassine added status: waiting-for-feedback and removed status: ready-to-review labels Jul 12, 2019

fmbenhassine self-assigned this Jul 12, 2019

fmbenhassine self-requested a review July 12, 2019 11:13

fmbenhassine requested changes Jul 15, 2019

View reviewed changes

marschall force-pushed the BATCH-2434 branch from cf3a987 to 88da3c6 Compare August 11, 2019 13:27

marschall changed the base branch from 3.0.x to master August 11, 2019 13:28

marschall force-pushed the BATCH-2434 branch 3 times, most recently from abcccb2 to 694b99e Compare August 11, 2019 13:49

fmbenhassine requested changes Sep 5, 2019

View reviewed changes

marschall force-pushed the BATCH-2434 branch from 694b99e to fb2feac Compare October 12, 2019 13:42

spring-projects-issues mentioned this pull request Dec 16, 2019

Improve TransactionAwareBufferedWriter efficiency [BATCH-2434] #1166

Closed

fmbenhassine removed the pr-status: waiting-for-feedback label Jan 13, 2020

fmbenhassine added this to the 4.3.0 milestone Apr 16, 2020

mminella reviewed May 4, 2020

View reviewed changes

fmbenhassine added the status: waiting-for-triage Issues that we did not analyse yet label May 6, 2020

fmbenhassine added status: waiting-for-reporter Issues for which we are waiting for feedback from the reporter and removed status: waiting-for-triage Issues that we did not analyse yet labels May 6, 2020

marschall force-pushed the BATCH-2434 branch from fb2feac to 58a424c Compare May 6, 2020 18:32

fmbenhassine linked an issue May 7, 2020 that may be closed by this pull request

Improve TransactionAwareBufferedWriter efficiency [BATCH-2434] #1166

Closed

fmbenhassine closed this May 7, 2020

fmbenhassine added has: backports Legacy label from JIRA. Superseded by "for: backport-to-x.x.x" and removed status: waiting-for-reporter Issues for which we are waiting for feedback from the reporter labels May 7, 2020


		private FileChannel channel;

		private OutputStream outputStream;

BATCH-2434 Improve TransactionAwareBufferedWriter #387

BATCH-2434 Improve TransactionAwareBufferedWriter #387

Uh oh!

Conversation

marschall commented Sep 13, 2015

Uh oh!

marschall commented Sep 13, 2015

Uh oh!

fmbenhassine commented Jul 12, 2019

Uh oh!

marschall commented Jul 13, 2019

Uh oh!

fmbenhassine commented Jul 15, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fmbenhassine Sep 5, 2019

Choose a reason for hiding this comment

Uh oh!

marschall Oct 12, 2019

Choose a reason for hiding this comment

Uh oh!

fmbenhassine commented Apr 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marschall commented May 2, 2020

Uh oh!

mminella May 4, 2020

Choose a reason for hiding this comment

Uh oh!

fmbenhassine commented May 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. buffer in a ByteArrayOutputStream instead of StringBuilder

2. overwrite #write(String) methods to avoid copies

Uh oh!

marschall commented May 6, 2020

Uh oh!

marschall commented May 6, 2020

Uh oh!

fmbenhassine commented May 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fmbenhassine commented Apr 29, 2020 •

edited

Loading

fmbenhassine commented May 6, 2020 •

edited

Loading