Skip to content

Conversation

mike-w-wilson
Copy link
Contributor

This adds pprint of globals and a comparison of associated row and global annotation's lengths. It also opens up monoallelic_expr to accept a dictionary so we can also pass only_het. The comp naming is horrible, suggestions welcome.

I added the new checks to the validate_release_t as well though we wont use it as currently implemented for v4 because we don't want to write out the annotations required for the global and row annotation comp.

Copy link
Contributor

@ch-kr ch-kr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor suggestions but logic LGTM!

I also noticed this in the checks

gnomad_qc/v4/create_release/create_combined_faf_release_ht.py:27:0: E0401: Unable to import 'statsmodels.stats.contingency_tables' (import-error)

not sure if that's an issue or is already taken care of elsewhere

return True


def compare_related_global_and_row_lengths(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this minor, but I read this as applying only to the relatedness fields initially. what about "compare_relevant_global_and_row_lengths"? or "compare_global_and_row_annot_lengths"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point I think I favor "compare_global_and_row_annot_lengths"

pops: List[str] = POPS[CURRENT_MAJOR_RELEASE],
missingness_threshold: float = 0.5,
monoallelic_expr: Optional[hl.expr.BooleanExpression] = None,
monoallelic_expr: Optional[
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @ch-kr , I committed suggestions but will update this and L257 above to whatever we decide. You suggested unexpected_gt_expr and I put forward same_call_site_expr but mine limits it to these two uses, mono and only het.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also my reasoning for not renaming originally was that it will be a breaking change but seems like a pretty minor one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm bad at naming things 😆 what about gt_check_expr? is that too general? otherwise same_call_site_expr works for me since we don't have another use for this field other than monoallelic or only het yet (though good point that renaming again in the future would be another breaking change)

Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more small thoughts

return True


def compare_global_and_row_annot_lengths(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an option to compare all rows with a default of False. Just to make it clear it's only comparing the first row to the globals and with support to compare all if wanted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check_global_and_row_annot_lengths? or confirm_global_and_row_annot_lengths


def compare_global_and_row_annot_lengths(
t: Union[hl.MatrixTable, hl.Table],
len_comp_globals_rows: Dict[str, List[str]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename to something like row_to_globals_check

pops: List[str] = POPS[CURRENT_MAJOR_RELEASE],
missingness_threshold: float = 0.5,
monoallelic_expr: Optional[hl.expr.BooleanExpression] = None,
site_gt_check_expr: Optional[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thought here to just force a dict

@mike-w-wilson
Copy link
Contributor Author

Backto you @jkgoodrich !

Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a little change in the handling of all rows

Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mike-w-wilson mike-w-wilson merged commit 0d2e71f into main Oct 19, 2023
@mike-w-wilson mike-w-wilson deleted the mw/validity_check_v4_adds branch October 19, 2023 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants