Skip to content

Conversation

jkgoodrich
Copy link
Contributor

No description provided.

Copy link
Contributor Author

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments/ questions

~splicing.contains(ht.vep_processed_csqs.most_severe_consequence)
)

if filter_to_homs:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this option because I don't think it belongs here since this is more about filtering variants based on vep info. The user can filter their variant list to whatever variants they want before using this function

if filter_to_csqs is not None:
filter_to_csqs = [csq for csq in filter_to_csqs if csq not in splice_csqs]
else:
# TODO: Need to modify process consequences to ignore splice variants,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this TODO in, can you make the changes that you think are necessary?

Copy link
Contributor

@KoalaQin KoalaQin Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed with Konrad about keeping 'OS' variants, we think current csq_order to prioritze 'OS' over 'LC' is better than deleting them, but I keep that as an option in the pull out worst function after tx annotation.

:param vep_root: Name used for root VEP annotation. Default is 'vep'.
:return: Table of transcript expression information prepared for annotation.
"""
# TODO: Filter to only CDS regions?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this handled by filter_to_protein_coding?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it seems that they filtered to 'CDS' from the Gencode GTF to make a bed file, Beryl used a bed file directly, but I don't get the same number of 'CDS' lines based on Konrad's code to import_gtf, need to dig into it.

return ht


def preprocess_variants_for_tx(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved most of this to filter_vep_transcript_csqs because I think it fits nicely there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! I didn't know that function exists.

splicing = hl.literal(
["splice_acceptor_variant", "splice_donor_variant", "splice_region_variant"]
)
splice_csqs = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add this as a global utils.vep? similar to the others like CSQ_CODING_HIGH_IMPACT, CSQ_CODING_MEDIUM_IMPACT, CSQ_CODING_LOW_IMPACT, ...

@KoalaQin KoalaQin merged commit 0e2aca0 into qh/tx_annotate_mt Jan 4, 2024
) -> hl.Table:
"""
Filter variants to those that fall on transcripts of interest.
Prepare a Table of transcript expression information for annotation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it a table of transcript expression information? I think this is a variant table without any expression information.

@jkgoodrich jkgoodrich deleted the jg/tx_annotate_mt_suggestions branch January 22, 2024 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants