Skip to content

Conversation

@jsvd
Copy link
Member

@jsvd jsvd commented Feb 11, 2025

Release Notes

  • Improves performance of the Persistent Queue, especially in the case of large events, by moving deserialization out of the exclusive access lock.

Discussion

currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

To test, edit the generator input to produce 100kb messages, and run:

bin/logstash -e "input { generator { threads => 5 } } output { null {} }" with PQ enabled.

This PR should shows about a 4-5x performance improvement for 10 worker pipeline:

Metric v8.15.5 v8.15.5+pr17050
Events per Second 1031 4971
CPU Usage (for 10 cores) 111.2% 593.1%
Concurrent Deserializations 1 2-5

Copy link
Member

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mechanically, this makes sense, and I see no problem with it. I can't think of any way that deferring the deserialization until outside of the lock would cause problems.

I would prefer for the intermediate object to be an internal detail of the Queue class, and to have a name that more-accurately describes what it is.

I have opened jsvd#9 to show what I mean by that.

@jsvd jsvd changed the title first attempt at moving Batch deserialization out of the lock move Batch deserialization out of the lock Feb 17, 2025
@jsvd jsvd marked this pull request as ready for review February 17, 2025 09:53
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with
   a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
@jsvd jsvd force-pushed the unlock_queue_deserialization branch from b4a6109 to a2e7667 Compare February 17, 2025 10:24
@elastic-sonarqube
Copy link

Quality Gate passed Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

@jsvd jsvd requested a review from yaauie February 17, 2025 15:27
Copy link
Contributor

@mashhurs mashhurs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jlind23
Copy link

jlind23 commented Feb 17, 2025

Thanks @mashhurs for the review and Joao for the work :)

@jsvd jsvd merged commit 637f447 into elastic:main Feb 17, 2025
7 checks passed
@jsvd
Copy link
Member Author

jsvd commented Feb 17, 2025

@logstashmachine backport 8.18

@jsvd
Copy link
Member Author

jsvd commented Feb 17, 2025

@logstashmachine backport 8.17

@jsvd
Copy link
Member Author

jsvd commented Feb 17, 2025

@logstashmachine backport 8.16

github-actions bot pushed a commit that referenced this pull request Feb 17, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)
github-actions bot pushed a commit that referenced this pull request Feb 17, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)
github-actions bot pushed a commit that referenced this pull request Feb 17, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)
@jsvd
Copy link
Member Author

jsvd commented Feb 17, 2025

@logstashmachine backport 9.0

@jsvd jsvd deleted the unlock_queue_deserialization branch February 17, 2025 19:20
github-actions bot pushed a commit that referenced this pull request Feb 17, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)
jsvd added a commit that referenced this pull request Feb 17, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)

Co-authored-by: João Duarte <[email protected]>
jsvd added a commit that referenced this pull request Feb 17, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)

Co-authored-by: João Duarte <[email protected]>
jsvd added a commit that referenced this pull request Feb 17, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)

Co-authored-by: João Duarte <[email protected]>
jsvd added a commit that referenced this pull request Feb 17, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)

Co-authored-by: João Duarte <[email protected]>
@jsvd
Copy link
Member Author

jsvd commented Feb 18, 2025

@logstashmachine backport 8.x

github-actions bot pushed a commit that referenced this pull request Feb 18, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)
jsvd added a commit that referenced this pull request Feb 18, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
(cherry picked from commit 637f447)

Co-authored-by: João Duarte <[email protected]>
v1v pushed a commit to v1v/logstash that referenced this pull request Feb 25, 2025
Currently the deserialization is behind the readBatch's lock, so any large batch will take time deserializing, causing any other Queue writer (e.g. netty executor threads) and any other Queue reader (pipeline worker) to block.

This commit moves the deserialization out of the lock, allowing multiple pipeline workers to deserialize batches concurrently.

- add intermediate batch-holder from `Queue` methods
- make the intermediate batch-holder a private inner class of `Queue` with a descriptive name `SerializedBatchHolder`

Co-authored-by: Ry Biesemeyer <[email protected]>
v1v added a commit to v1v/logstash that referenced this pull request Feb 25, 2025
…gify

* upstream/main: (27 commits)
  Add Windows 2025 to CI (elastic#17133)
  Update container acceptance tests with stdout/stderr changes (elastic#17138)
  entrypoint: avoid polluting stdout (elastic#17125)
  Fix acceptance test assertions for updated plugin remove (elastic#17126)
  Fix acceptance test assertions for updated plugin `remove` (elastic#17122)
  plugins: improve `remove` command to support multiple plugins (elastic#17030)
  spec: improve ls2ls spec (elastic#17114)
  allow concurrent Batch deserialization (elastic#17050)
  CPM handle 404 response gracefully with user-friendly log (elastic#17052)
  qa: use clean expansion of LS tarball per fixture instance (elastic#17082)
  Allow capturing heap dumps in DRA BK jobs (elastic#17081)
  Use centralized source of truth for active branches (elastic#17063)
  Update logstash_releases.json (elastic#17055)
  fix logstash-keystore to accept spaces in values when added via stdin (elastic#17039)
  Don't honor VERSION_QUALIFIER if set but empty (elastic#17032)
  Release note placeholder might be empty, making parsing lines nil tolerant. (elastic#17026)
  Fix BufferedTokenizer to properly resume after a buffer full condition respecting the encoding of the input string (elastic#16968)
  Add short living 9.0 next and update main in CI release version definition. (elastic#17008)
  Core version bump to 9.1.0 (elastic#16991)
  Add 9.0 branch to the CI branches definition (elastic#16997)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants