fix(qsnp): OOM caused by large number of failed filter reads at same location #395

holmeso · 2025-08-05T00:51:33Z

Description

This pull request introduces several improvements and bug fixes to the qsnp pipeline, primarily focused on memory management during BAM processing, code modernization, and test cleanup. The most significant update is the addition of a safeguard to limit memory usage when encountering large numbers of poor-quality reads at the same genomic position. Additionally, the codebase is modernized with Java 8+ idioms, and tests are updated for clarity and conciseness.

Pipeline memory and performance improvements:

Added logic in the Producer class to limit the number of failed filter records per start position to 1000, preventing memory issues when many poor-quality reads map to the same location (Pipeline.java). [1] [2]

Code modernization and cleanup:

Replaced explicit type arguments with the diamond operator (<>) and updated collection usage to use Java 8+ idioms, such as List::toList instead of stream().collect(Collectors.toList()) (Pipeline.java, StandardPipelineTest.java). [1] [2] [3] [4]
Updated deprecated or verbose code, such as using getFirst() instead of get(0) for lists and simplifying exception messages (Pipeline.java). [1] [2]

Test improvements and cleanup:

Updated assertions in tests to use more descriptive and modern JUnit methods (PipelineTest.java). [1] [2]
Removed unused imports and deprecated or ignored test methods, streamlining the test codebase (PipelineTest.java, StandardPipelineTest.java). [1] [2] [3]

Documentation and naming consistency:

Improved parameter naming and documentation for clarity in method signatures and comments (Pipeline.java). [1] [2]

These changes collectively improve the pipeline's robustness, maintainability, and code clarity.) were we now keep the bases of the failed filter reads rather than just a count in the Accumulator object.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

No new unit tests, but I ran this against the 47 failed SDFTM runs in cromwell prod and they all succeeded.

Are WDL Updates Required?

Not specifically, but the adamajava version used in the various workflows will need to be updated once this change has been released

…location

holmeso · 2025-08-05T00:52:39Z

fyi, copilot wrote the bulk of this PR summary

newellf · 2025-08-05T01:14:17Z

Like the PR summary by (mostly!) copilot

fix(qsnp): oOM caused by large number of failed filter reads at same …

bc3936e

…location

newellf approved these changes Aug 5, 2025

View reviewed changes

holmeso merged commit 2664248 into master Aug 5, 2025
1 check passed

holmeso deleted the qsnp_unfiltered_read_bug branch August 5, 2025 01:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(qsnp): OOM caused by large number of failed filter reads at same location #395

fix(qsnp): OOM caused by large number of failed filter reads at same location #395

Uh oh!

holmeso commented Aug 5, 2025

Uh oh!

holmeso commented Aug 5, 2025

Uh oh!

newellf commented Aug 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(qsnp): OOM caused by large number of failed filter reads at same location #395

fix(qsnp): OOM caused by large number of failed filter reads at same location #395

Uh oh!

Conversation

holmeso commented Aug 5, 2025

Description

Type of change

How Has This Been Tested?

Are WDL Updates Required?

Uh oh!

holmeso commented Aug 5, 2025

Uh oh!

newellf commented Aug 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants