-
Notifications
You must be signed in to change notification settings - Fork 31
Copy of: Add gnomad_gks() and get_gks() for extracting gks information for a specified variant #596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
gnomad/resources/grch38/gnomad.py
Outdated
# Select relevant fields, checkpoint, and filter to interval before adding | ||
# annotations | ||
ht = ht.select(ht.freq, ht.info.vrs, ht.popmax) | ||
ht = ht.checkpoint(hl.utils.new_temp_file("vrs_checkpoint", extension="ht")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I can make one last suggestion, could there be an optional argument like disable_checkpoint
to gnomad_gks
that turns off this checkpoint, and a doc string that says it is recommended to leave on for larger datasets?
I'm just observing that for small tables that are more like tens of thousands of variants and <1GB in size and can comfortably fit in memory, the checkpoint adds a lot of time overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats definitely possible. We are also about to make one more change for Popmax (where AF =/= FAF) so yours isn't the last suggestion:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks @matren395
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matren395 I can push a branch with that and create another multi-tiered PR
gnomad/resources/grch38/gnomad.py
Outdated
# Select relevant fields, checkpoint, and filter to interval before adding | ||
# annotations | ||
keep_fields = [ht.freq, ht.info.vrs, ht.popmax] | ||
keep_fields = [ht.freq, ht.info.vrs, ht.popmax, ht.faf] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop ht.pomax (we're pulling faf instead)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could add the code for defining the popmaxFAF (similar to the browser code) above keep_fields, select faf95 here, then within add_gks_va just grab faf95.popmax and faf95. popmax_population if faf95 exists in the struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay that's a lot smarter, thank you!
gnomad/utils/annotations.py
Outdated
"frequency": faf95_dict[-1], | ||
"confidenceInterval": 0.95, | ||
"popFreqId": f"{gnomad_id}.{input_struct.popmax.pop.upper()}", | ||
"popFreqId": f"{gnomad_id}.{faf95_dict.pop.upper()}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where did faf95_dict get the pop struct?
gnomad/utils/annotations.py
Outdated
|
||
faf95_dict = sorted(sorted_faf95, key=lambda x: x[1])[-1] | ||
|
||
if faf95_dict[-1] > 0.0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the browser it looks like if popmaxfaf is 0, it is still displayed, it just doesn't have a hover over, so we should probabaly copy that same behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returns a 0 value now and only gives None for popFreqID
gnomad/resources/grch38/gnomad.py
Outdated
from typing import Optional, Union | ||
|
||
import hail as hl | ||
from gnomad_qc.v3.create_release.prepare_vcf_data_release import FAF_POPS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to maintain proper isort order, move imports from "gnomad_qc" below ones from "gnomad"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's odd, I ran isort on it? maybe running black afterwards messed it back up - i'll address it
gnomad/resources/grch38/gnomad.py
Outdated
) | ||
|
||
|
||
def get_coverage_ht( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete this function
gnomad/utils/annotations.py
Outdated
new_freq = freq_meta_idx.map( | ||
lambda x: hl.bind( | ||
lambda y: y.annotate(AF=hl.or_missing(y.AN > 0, y.AC / y.AN)).select( | ||
lambda y: y.annotate(AF=hl.if_else(y.AN > 0, y.AC / y.AN, 0)).select( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this change for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall touching this manually, but I looked back and this change occurred when I was copying in the code from the old branch and setting up the PR. Apparently something was either ahead or behind of something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but anyways I reverted it back to the or_missing() functionality
463653d
to
b315b68
Compare
rebased with Steve's method! |
requirements.txt
Outdated
@@ -1,4 +1,5 @@ | |||
annoy | |||
ga4gh.vrs[extras] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the version here should probably be pinned so when we release a vrs-python non-alpha/beta on a 1.x or 2.x tag this doesn't break. The version discussed previously was 0.8.4
but assuming vrs-python sticks by semantic versioning it could be something wider.
change name of 'input_dict' to 'input_struct' to reflect the fact that it is an input Hail Struct (even though it can be indexed like a Python dictionary) - and a few capitalization changes in there as well
may have some more renaming to be done or to run Black from command line, tbd Co-authored-by: klaricch <[email protected]>
with freq_index_dict and sex_id fixes
select and checkpoint added
run Black and upgrade example interval to a gene I was testing with - ESPN gene in Chr1 - that doesn't error out at the collect statement
Black's method for reformatting that inline comment is really awkward , huh
thanks for the fix! Co-authored-by: klaricch <[email protected]>
Revise Faf95 work , but this implementation is probably not perfect yet!
d0ac270
to
1d1e517
Compare
Copy of Pull Request "Add gnomad_gks() and get_gks() for extracting gks information for a specified variant" with only relevant changes included. The Git History got extremely confused when consulting with Steve Jahl, and his suggestion in the face of the problem was to copy the relevant changes to a new PR/branch, given how complicated to resolve the GitHistory would have been