Skip to content

Conversation

@holmeso
Copy link
Contributor

@holmeso holmeso commented Jan 31, 2024

Description

If a vcf record being annotated has more than 1 entry in the alt field, then nanno will split that record into 2 records.
eg.
chr1 1000 A C,T .....
will become
chr1 1000 A C .....
and
chr1 1000 A T .....

However, nanno was using the whole of the alt field in snpEff annotation files, which meant that there would not be a match from snpEff (just for records that had more than 1 entry in the alt field).

The fix involves splitting the 'alt' field when it contains more than one value, allowing the VCF record to be annotated to match against each of these values.

Other changes:

  • reverted a change made be the previous PR build(all projects): gradle updates, now works with vscode #341 that removed a build.last step
  • renamed ChePositionRefAlt.getName() to getRef()
  • added GATK GT field to nano output
  • added original_alt field to nanno output (useful when splitting 1/2 variants)
  • keep original GATK_AD values rather than manipulating them (original_alt field allows for this)
  • update Executor so that spaces in the classpath (hello IDEA) don't cause the Process to fall over

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

New unit tests, along with testing on cluster

Are WDL Updates Required?

No

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

…en the variation was 1/2

If multiple alts were present (1/2) then nanno was not able to retreive the snpeff annotation
@delocalizer delocalizer self-assigned this Feb 1, 2024
Comment on lines 170 to 175
String scoreS = score + "/" + (4 + vcfFiles.length);
scoreDist.computeIfAbsent(scoreS, v -> new AtomicInteger()).incrementAndGet();
ps.println(cp.getChromosome() + Constants.TAB + cp.getStartPosition() + Constants.TAB +cp.getName() + Constants.TAB
ps.println(cp.getChromosome() + Constants.TAB + cp.getStartPosition() + Constants.TAB +cp.getRef() + Constants.TAB
+ cp.getAlt() + Constants.TAB + Arrays.stream(p.getLeft()).map(s -> null == s ? "./." : s).collect(Collectors.joining(Constants.TAB_STRING))
+ Constants.TAB + Arrays.stream(p.getRight()).map(s -> null == s ? "." : s).collect(Collectors.joining(Constants.TAB_STRING))
+ Constants.TAB + Arrays.stream(missingACs).map(s -> null == s ? "." : s).collect(Collectors.joining(Constants.TAB_STRING))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor query on consistency — is there a rationale for using TAB and TAB_STRING from Constants but not SLASH_STRING, MISSING_DATA, MISSING_GT etc. for the other string literals here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete lack of consistency - using more Constatns now (even using static imports)

for (ChrPositionRefAlt cp : recs) {
// logger.info("writing out: " + cp.getName());
ps.println(cp.getChromosome() + Constants.TAB + cp.getStartPosition() + "\t.\t" + cp.getName() + Constants.TAB + cp.getAlt() + "\t.\t.\t.");
ps.println(cp.getChromosome() + Constants.TAB + cp.getStartPosition() + "\t.\t" + cp.getRef() + Constants.TAB + cp.getAlt() + "\t.\t.\t.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're here, is this an opportunity to generate output in a nicer (more explicit) way, given that these look basically like VCF records? Essentially the same concatenation is used a couple more times below too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a method to ChrPositionUtils that takes a ChrPosition (and some other bits of info) and returns a Vcf string

@holmeso holmeso merged commit 8ad0466 into master Feb 2, 2024
@holmeso holmeso deleted the nanno_snp_eff_idea branch February 2, 2024 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants