Skip to content

Conversation

averywpx
Copy link
Contributor

No description provided.

@averywpx averywpx requested a review from jkgoodrich September 15, 2022 14:15
@averywpx averywpx self-assigned this Sep 15, 2022
@averywpx averywpx changed the title Add generic constraint functions: annotate_with_mu(), count_variants(), downsampling_counts_expr(), filter_vep_transcript_csqs(), combine_functions() Add generic constraint functions: annotate_with_mu(), count_variants(), downsampling_counts_expr(), filter_vep_transcript_csqs(), combine_functions(), filter_x_nonpar(), and filter_y_nonpar() Sep 15, 2022
@jkgoodrich jkgoodrich self-assigned this Sep 15, 2022
@averywpx averywpx requested a review from jkgoodrich September 22, 2022 16:13
agg[f"singleton_downsampling_counts_{pop}"] = downsampling_counts_expr(
freq_expr, freq_meta_expr, pop, singleton=True
)
# count the variants that have same combination of `grouping`
Copy link
Contributor Author

@averywpx averywpx Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double check that my description about counting the variants is accurate!

else:
return hl.int(freq_expr[i].AC > 0)

# mark variants met with specified criteria in each downsampling as 1
Copy link
Contributor Author

@averywpx averywpx Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above! Want to make sure the description is accurate :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so the hl.map will apply _get_criteria to each downsampling requested. so that will return a list of 1's and 0's for each variant, where if the variant exists in the downsampling (AC >0) and any specified criteria is met then it is given a 1 in the list, otherwise it is given a zero.

Then the hl.agg.array_sum is a aggregation expression that when applied, will sum up all of these arrays of 1's and 0's to give an array that has one element per downsample in pop that is a count of variants that exist in the downsampling (AC >0) and, is requested, meet any specified criteria (singleton or max_af).

Copy link
Contributor Author

@averywpx averywpx Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little bit more to clarify so we can make the comment more accurate: when you said "that is a count of variants that exist in the downsampling", it counts variants that have the same grouping annotation right?

lambda x: x[0]
)

# variants met with specified criteria will be marked as 1
Copy link
Contributor Author

@averywpx averywpx Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if my understanding of _get_criteria's job is correct :)

@averywpx averywpx requested a review from jkgoodrich October 6, 2022 15:07
Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some more docstring and comment changes. Code all looks good to me, it's very close!

lambda x: x[0]
)

# variants met with specified criteria will be marked as 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provided a suggested change, though actually since this is a function, we should probably just have a docstring here instead of the comment.

Suggested change
# variants met with specified criteria will be marked as 1
# Variants meeting specified criteria (`singleton` or `max_af`) if requested or with an AC > 0 will return 1.

else:
return hl.int(freq_expr[i].AC > 0)

# mark variants met with specified criteria in each downsampling as 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so the hl.map will apply _get_criteria to each downsampling requested. so that will return a list of 1's and 0's for each variant, where if the variant exists in the downsampling (AC >0) and any specified criteria is met then it is given a 1 in the list, otherwise it is given a zero.

Then the hl.agg.array_sum is a aggregation expression that when applied, will sum up all of these arrays of 1's and 0's to give an array that has one element per downsample in pop that is a count of variants that exist in the downsampling (AC >0) and, is requested, meet any specified criteria (singleton or max_af).

:param max_af: Maximum variant allele frequency to keep. By default no allele frequency cutoff is applied.
:return: Aggregation Expression for an array of the variant counts in downsamplings for specified population.
"""
# Get indices of dictionaries in meta dictionaries that only have the "downsampling" key with specified "group" and "pop" values
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a consistent pattern of adding period at the end of each comment? I remember there are some suggestions in the constraint repo about adding period.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, lets move to adding a period at the end of all comments

Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small reformatting things, then I think this is good to go

:param max_af: Maximum variant allele frequency to keep. By default no allele frequency cutoff is applied.
:return: Aggregation Expression for an array of the variant counts in downsamplings for specified population.
"""
# Get indices of dictionaries in meta dictionaries that only have the "downsampling" key with specified "group" and "pop" values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, lets move to adding a period at the end of all comments

@averywpx averywpx requested a review from jkgoodrich October 14, 2022 11:33
Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!!

@averywpx
Copy link
Contributor Author

Yay!!! Thank you so much for all the feedback!

@averywpx averywpx merged commit 725d4a5 into main Oct 14, 2022
@averywpx averywpx deleted the constraint/count_variants branch November 1, 2022 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants