-
Notifications
You must be signed in to change notification settings - Fork 31
Add pprint globals and a global/row length comparison, updates monoallelic expr in validity checks #630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…alidate_release_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor suggestions but logic LGTM!
I also noticed this in the checks
gnomad_qc/v4/create_release/create_combined_faf_release_ht.py:27:0: E0401: Unable to import 'statsmodels.stats.contingency_tables' (import-error)
not sure if that's an issue or is already taken care of elsewhere
gnomad/assessment/validity_checks.py
Outdated
return True | ||
|
||
|
||
def compare_related_global_and_row_lengths( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this minor, but I read this as applying only to the relatedness fields initially. what about "compare_relevant_global_and_row_lengths"? or "compare_global_and_row_annot_lengths"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point I think I favor "compare_global_and_row_annot_lengths"
Co-authored-by: Katherine Chao <[email protected]>
gnomad/assessment/validity_checks.py
Outdated
pops: List[str] = POPS[CURRENT_MAJOR_RELEASE], | ||
missingness_threshold: float = 0.5, | ||
monoallelic_expr: Optional[hl.expr.BooleanExpression] = None, | ||
monoallelic_expr: Optional[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @ch-kr , I committed suggestions but will update this and L257 above to whatever we decide. You suggested unexpected_gt_expr
and I put forward same_call_site_expr
but mine limits it to these two uses, mono and only het.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also my reasoning for not renaming originally was that it will be a breaking change but seems like a pretty minor one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm bad at naming things 😆 what about gt_check_expr
? is that too general? otherwise same_call_site_expr
works for me since we don't have another use for this field other than monoallelic or only het yet (though good point that renaming again in the future would be another breaking change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more small thoughts
gnomad/assessment/validity_checks.py
Outdated
return True | ||
|
||
|
||
def compare_global_and_row_annot_lengths( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add an option to compare all rows with a default of False. Just to make it clear it's only comparing the first row to the globals and with support to compare all if wanted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe check_global_and_row_annot_lengths
? or confirm_global_and_row_annot_lengths
gnomad/assessment/validity_checks.py
Outdated
|
||
def compare_global_and_row_annot_lengths( | ||
t: Union[hl.MatrixTable, hl.Table], | ||
len_comp_globals_rows: Dict[str, List[str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename to something like row_to_globals_check
gnomad/assessment/validity_checks.py
Outdated
pops: List[str] = POPS[CURRENT_MAJOR_RELEASE], | ||
missingness_threshold: float = 0.5, | ||
monoallelic_expr: Optional[hl.expr.BooleanExpression] = None, | ||
site_gt_check_expr: Optional[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thought here to just force a dict
Co-authored-by: jkgoodrich <[email protected]>
Backto you @jkgoodrich ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a little change in the handling of all rows
Co-authored-by: jkgoodrich <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This adds pprint of globals and a comparison of associated row and global annotation's lengths. It also opens up monoallelic_expr to accept a dictionary so we can also pass
only_het
. The comp naming is horrible, suggestions welcome.I added the new checks to the validate_release_t as well though we wont use it as currently implemented for v4 because we don't want to write out the annotations required for the global and row annotation comp.