-
Notifications
You must be signed in to change notification settings - Fork 31
Add generic constraint functions: annotate_with_mu(), count_variants(), downsampling_counts_expr(), filter_vep_transcript_csqs(), combine_functions(), filter_x_nonpar(), and filter_y_nonpar() #481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
gnomad/utils/constraint.py
Outdated
agg[f"singleton_downsampling_counts_{pop}"] = downsampling_counts_expr( | ||
freq_expr, freq_meta_expr, pop, singleton=True | ||
) | ||
# count the variants that have same combination of `grouping` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double check that my description about counting the variants is accurate!
gnomad/utils/constraint.py
Outdated
else: | ||
return hl.int(freq_expr[i].AC > 0) | ||
|
||
# mark variants met with specified criteria in each downsampling as 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above! Want to make sure the description is accurate :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so the hl.map
will apply _get_criteria
to each downsampling requested. so that will return a list of 1's and 0's for each variant, where if the variant exists in the downsampling (AC >0) and any specified criteria is met then it is given a 1 in the list, otherwise it is given a zero.
Then the hl.agg.array_sum
is a aggregation expression that when applied, will sum up all of these arrays of 1's and 0's to give an array that has one element per downsample in pop
that is a count of variants that exist in the downsampling (AC >0) and, is requested, meet any specified criteria (singleton
or max_af
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little bit more to clarify so we can make the comment more accurate: when you said "that is a count of variants that exist in the downsampling", it counts variants that have the same grouping
annotation right?
gnomad/utils/constraint.py
Outdated
lambda x: x[0] | ||
) | ||
|
||
# variants met with specified criteria will be marked as 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let me know if my understanding of _get_criteria
's job is correct :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some more docstring and comment changes. Code all looks good to me, it's very close!
gnomad/utils/constraint.py
Outdated
lambda x: x[0] | ||
) | ||
|
||
# variants met with specified criteria will be marked as 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided a suggested change, though actually since this is a function, we should probably just have a docstring here instead of the comment.
# variants met with specified criteria will be marked as 1 | |
# Variants meeting specified criteria (`singleton` or `max_af`) if requested or with an AC > 0 will return 1. |
gnomad/utils/constraint.py
Outdated
else: | ||
return hl.int(freq_expr[i].AC > 0) | ||
|
||
# mark variants met with specified criteria in each downsampling as 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so the hl.map
will apply _get_criteria
to each downsampling requested. so that will return a list of 1's and 0's for each variant, where if the variant exists in the downsampling (AC >0) and any specified criteria is met then it is given a 1 in the list, otherwise it is given a zero.
Then the hl.agg.array_sum
is a aggregation expression that when applied, will sum up all of these arrays of 1's and 0's to give an array that has one element per downsample in pop
that is a count of variants that exist in the downsampling (AC >0) and, is requested, meet any specified criteria (singleton
or max_af
).
Co-authored-by: jkgoodrich <[email protected]>
gnomad/utils/constraint.py
Outdated
:param max_af: Maximum variant allele frequency to keep. By default no allele frequency cutoff is applied. | ||
:return: Aggregation Expression for an array of the variant counts in downsamplings for specified population. | ||
""" | ||
# Get indices of dictionaries in meta dictionaries that only have the "downsampling" key with specified "group" and "pop" values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a consistent pattern of adding period at the end of each comment? I remember there are some suggestions in the constraint repo about adding period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, lets move to adding a period at the end of all comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small reformatting things, then I think this is good to go
gnomad/utils/constraint.py
Outdated
:param max_af: Maximum variant allele frequency to keep. By default no allele frequency cutoff is applied. | ||
:return: Aggregation Expression for an array of the variant counts in downsamplings for specified population. | ||
""" | ||
# Get indices of dictionaries in meta dictionaries that only have the "downsampling" key with specified "group" and "pop" values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, lets move to adding a period at the end of all comments
Co-authored-by: jkgoodrich <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!!
Yay!!! Thank you so much for all the feedback! |
No description provided.