perf(qpicard - cram): fork htsjdk to include performance enhancements for CRAM file reading #384
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The
ByteArrayStopCodecclass inhtsjdk(fork) has been updated to use aPushbackInputStream, which means that reading data from the input stream can be made quicker. A jar file from the htsjdk fork is used byqpicardwhich will impact all code that reads and writes BAM/SAM/CRAM files.Also created a copy of the
ValidateSamFileandSamFileValidatorinqmule. The minor changes to these files allow them to use theAsyncCRAMReaderclass (also inqmule) that readsCRAMRecordsinto a queue.The motivation for this change was to reduce the time that the
ValidateCRAMprocess takes in theFTUB_WGGSSwdl. This was taking over 7 hrs to validate a large CRAM file (~2 billion reads).The included changes reduce this time by around half.
A downside to this approach is that when desired updates appear in
htsjdk, we will need to update the folk and release an updated jar file. We don't do this often so am happy to see how it goes.Another option is to raise a PR with
htsjdkdirectly to see if they are interested in the change, but I am not sure if it is the structure of our CRAMs that are benefitting so much from the change, or if all CRAMs would.Type of change
How Has This Been Tested?
Existing unit tests pass, and code has been run and compared against existing output from existing tools
Are WDL Updates Required?
Not required, but once released, FTUB_WGGSS should be updated to use qmule's ValidateSamFile rather than picards.
Checklist: