-
Notifications
You must be signed in to change notification settings - Fork 5
fix(qannotate): snpeff records were not being dealt with correctly #342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…en the variation was 1/2 If multiple alts were present (1/2) then nanno was not able to retreive the snpeff annotation
| String scoreS = score + "/" + (4 + vcfFiles.length); | ||
| scoreDist.computeIfAbsent(scoreS, v -> new AtomicInteger()).incrementAndGet(); | ||
| ps.println(cp.getChromosome() + Constants.TAB + cp.getStartPosition() + Constants.TAB +cp.getName() + Constants.TAB | ||
| ps.println(cp.getChromosome() + Constants.TAB + cp.getStartPosition() + Constants.TAB +cp.getRef() + Constants.TAB | ||
| + cp.getAlt() + Constants.TAB + Arrays.stream(p.getLeft()).map(s -> null == s ? "./." : s).collect(Collectors.joining(Constants.TAB_STRING)) | ||
| + Constants.TAB + Arrays.stream(p.getRight()).map(s -> null == s ? "." : s).collect(Collectors.joining(Constants.TAB_STRING)) | ||
| + Constants.TAB + Arrays.stream(missingACs).map(s -> null == s ? "." : s).collect(Collectors.joining(Constants.TAB_STRING)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor query on consistency — is there a rationale for using TAB and TAB_STRING from Constants but not SLASH_STRING, MISSING_DATA, MISSING_GT etc. for the other string literals here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complete lack of consistency - using more Constatns now (even using static imports)
| for (ChrPositionRefAlt cp : recs) { | ||
| // logger.info("writing out: " + cp.getName()); | ||
| ps.println(cp.getChromosome() + Constants.TAB + cp.getStartPosition() + "\t.\t" + cp.getName() + Constants.TAB + cp.getAlt() + "\t.\t.\t."); | ||
| ps.println(cp.getChromosome() + Constants.TAB + cp.getStartPosition() + "\t.\t" + cp.getRef() + Constants.TAB + cp.getAlt() + "\t.\t.\t."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're here, is this an opportunity to generate output in a nicer (more explicit) way, given that these look basically like VCF records? Essentially the same concatenation is used a couple more times below too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a method to ChrPositionUtils that takes a ChrPosition (and some other bits of info) and returns a Vcf string
Description
If a vcf record being annotated has more than 1 entry in the alt field, then nanno will split that record into 2 records.
eg.
chr1 1000 A C,T .....will become
chr1 1000 A C .....and
chr1 1000 A T .....However, nanno was using the whole of the alt field in snpEff annotation files, which meant that there would not be a match from snpEff (just for records that had more than 1 entry in the alt field).
The fix involves splitting the 'alt' field when it contains more than one value, allowing the VCF record to be annotated to match against each of these values.
Other changes:
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
New unit tests, along with testing on cluster
Are WDL Updates Required?
No
Checklist: