-
Notifications
You must be signed in to change notification settings - Fork 31
Suggestions to tx_annotate_mt #655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…adinstitute/gnomad_methods into jg/tx_annotate_mt_suggestions # Conflicts: # gnomad/utils/transcript_annotation.py
…adinstitute/gnomad_methods into jg/tx_annotate_mt_suggestions
…/gnomad_methods into jg/tx_annotate_mt_suggestions # Conflicts: # gnomad/utils/transcript_annotation.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments/ questions
~splicing.contains(ht.vep_processed_csqs.most_severe_consequence) | ||
) | ||
|
||
if filter_to_homs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this option because I don't think it belongs here since this is more about filtering variants based on vep info. The user can filter their variant list to whatever variants they want before using this function
if filter_to_csqs is not None: | ||
filter_to_csqs = [csq for csq in filter_to_csqs if csq not in splice_csqs] | ||
else: | ||
# TODO: Need to modify process consequences to ignore splice variants, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this TODO in, can you make the changes that you think are necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed with Konrad about keeping 'OS' variants, we think current csq_order to prioritze 'OS' over 'LC' is better than deleting them, but I keep that as an option in the pull out worst function after tx annotation.
:param vep_root: Name used for root VEP annotation. Default is 'vep'. | ||
:return: Table of transcript expression information prepared for annotation. | ||
""" | ||
# TODO: Filter to only CDS regions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this handled by filter_to_protein_coding
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it seems that they filtered to 'CDS' from the Gencode GTF to make a bed file, Beryl used a bed file directly, but I don't get the same number of 'CDS' lines based on Konrad's code to import_gtf, need to dig into it.
return ht | ||
|
||
|
||
def preprocess_variants_for_tx( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved most of this to filter_vep_transcript_csqs
because I think it fits nicely there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! I didn't know that function exists.
splicing = hl.literal( | ||
["splice_acceptor_variant", "splice_donor_variant", "splice_region_variant"] | ||
) | ||
splice_csqs = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should add this as a global utils.vep? similar to the others like CSQ_CODING_HIGH_IMPACT
, CSQ_CODING_MEDIUM_IMPACT
, CSQ_CODING_LOW_IMPACT
, ...
) -> hl.Table: | ||
""" | ||
Filter variants to those that fall on transcripts of interest. | ||
Prepare a Table of transcript expression information for annotation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it a table of transcript expression information? I think this is a variant table without any expression information.
No description provided.