-
Notifications
You must be signed in to change notification settings - Fork 3.5k
move Batch deserialization out of the lock #17050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mechanically, this makes sense, and I see no problem with it. I can't think of any way that deferring the deserialization until outside of the lock would cause problems.
I would prefer for the intermediate object to be an internal detail of the Queue class, and to have a name that more-accurately describes what it is.
I have opened jsvd#9 to show what I mean by that.
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]>
b4a6109 to
a2e7667
Compare
|
💚 Build Succeeded
History
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
Thanks @mashhurs for the review and Joao for the work :) |
|
@logstashmachine backport 8.18 |
|
@logstashmachine backport 8.17 |
|
@logstashmachine backport 8.16 |
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447)
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447)
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447)
|
@logstashmachine backport 9.0 |
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447)
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447) Co-authored-by: João Duarte <[email protected]>
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447) Co-authored-by: João Duarte <[email protected]>
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447) Co-authored-by: João Duarte <[email protected]>
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447) Co-authored-by: João Duarte <[email protected]>
|
@logstashmachine backport 8.x |
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447)
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]> (cherry picked from commit 637f447) Co-authored-by: João Duarte <[email protected]>
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block. This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently. - add intermediate batch-holder from `Queue` methods - make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder` Co-authored-by: Ry Biesemeyer <[email protected]>
…gify * upstream/main: (27 commits) Add Windows 2025 to CI (elastic#17133) Update container acceptance tests with stdout/stderr changes (elastic#17138) entrypoint: avoid polluting stdout (elastic#17125) Fix acceptance test assertions for updated plugin remove (elastic#17126) Fix acceptance test assertions for updated plugin `remove` (elastic#17122) plugins: improve `remove` command to support multiple plugins (elastic#17030) spec: improve ls2ls spec (elastic#17114) allow concurrent Batch deserialization (elastic#17050) CPM handle 404 response gracefully with user-friendly log (elastic#17052) qa: use clean expansion of LS tarball per fixture instance (elastic#17082) Allow capturing heap dumps in DRA BK jobs (elastic#17081) Use centralized source of truth for active branches (elastic#17063) Update logstash_releases.json (elastic#17055) fix logstash-keystore to accept spaces in values when added via stdin (elastic#17039) Don't honor VERSION_QUALIFIER if set but empty (elastic#17032) Release note placeholder might be empty, making parsing lines nil tolerant. (elastic#17026) Fix BufferedTokenizer to properly resume after a buffer full condition respecting the encoding of the input string (elastic#16968) Add short living 9.0 next and update main in CI release version definition. (elastic#17008) Core version bump to 9.1.0 (elastic#16991) Add 9.0 branch to the CI branches definition (elastic#16997) ...





Release Notes
Discussion
currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.
This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.
To test, edit the generator input to produce 100kb messages, and run:
bin/logstash -e "input { generator { threads => 5 } } output { null {} }"with PQ enabled.This PR should shows about a 4-5x performance improvement for 10 worker pipeline: