Skip to content

Conversation

matren395
Copy link
Contributor

Copy of Pull Request "Add gnomad_gks() and get_gks() for extracting gks information for a specified variant" with only relevant changes included. The Git History got extremely confused when consulting with Steve Jahl, and his suggestion in the face of the problem was to copy the relevant changes to a new PR/branch, given how complicated to resolve the GitHistory would have been

# Select relevant fields, checkpoint, and filter to interval before adding
# annotations
ht = ht.select(ht.freq, ht.info.vrs, ht.popmax)
ht = ht.checkpoint(hl.utils.new_temp_file("vrs_checkpoint", extension="ht"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I can make one last suggestion, could there be an optional argument like disable_checkpoint to gnomad_gks that turns off this checkpoint, and a doc string that says it is recommended to leave on for larger datasets?

I'm just observing that for small tables that are more like tens of thousands of variants and <1GB in size and can comfortably fit in memory, the checkpoint adds a lot of time overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats definitely possible. We are also about to make one more change for Popmax (where AF =/= FAF) so yours isn't the last suggestion:)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks @matren395

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matren395 I can push a branch with that and create another multi-tiered PR

# Select relevant fields, checkpoint, and filter to interval before adding
# annotations
keep_fields = [ht.freq, ht.info.vrs, ht.popmax]
keep_fields = [ht.freq, ht.info.vrs, ht.popmax, ht.faf]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop ht.pomax (we're pulling faf instead)

Copy link
Contributor

@klaricch klaricch Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could add the code for defining the popmaxFAF (similar to the browser code) above keep_fields, select faf95 here, then within add_gks_va just grab faf95.popmax and faf95. popmax_population if faf95 exists in the struct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay that's a lot smarter, thank you!

"frequency": faf95_dict[-1],
"confidenceInterval": 0.95,
"popFreqId": f"{gnomad_id}.{input_struct.popmax.pop.upper()}",
"popFreqId": f"{gnomad_id}.{faf95_dict.pop.upper()}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did faf95_dict get the pop struct?


faf95_dict = sorted(sorted_faf95, key=lambda x: x[1])[-1]

if faf95_dict[-1] > 0.0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the browser it looks like if popmaxfaf is 0, it is still displayed, it just doesn't have a hover over, so we should probabaly copy that same behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns a 0 value now and only gives None for popFreqID

@klaricch klaricch self-requested a review September 27, 2023 16:31
from typing import Optional, Union

import hail as hl
from gnomad_qc.v3.create_release.prepare_vcf_data_release import FAF_POPS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to maintain proper isort order, move imports from "gnomad_qc" below ones from "gnomad"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's odd, I ran isort on it? maybe running black afterwards messed it back up - i'll address it

)


def get_coverage_ht(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this function

new_freq = freq_meta_idx.map(
lambda x: hl.bind(
lambda y: y.annotate(AF=hl.or_missing(y.AN > 0, y.AC / y.AN)).select(
lambda y: y.annotate(AF=hl.if_else(y.AN > 0, y.AC / y.AN, 0)).select(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this change for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall touching this manually, but I looked back and this change occurred when I was copying in the code from the old branch and setting up the PR. Apparently something was either ahead or behind of something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but anyways I reverted it back to the or_missing() functionality

@matren395 matren395 requested a review from klaricch September 27, 2023 17:27
@matren395 matren395 force-pushed the dm/variant_report_code_rebranch branch from 463653d to b315b68 Compare September 27, 2023 17:36
@matren395
Copy link
Contributor Author

rebased with Steve's method!

requirements.txt Outdated
@@ -1,4 +1,5 @@
annoy
ga4gh.vrs[extras]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the version here should probably be pinned so when we release a vrs-python non-alpha/beta on a 1.x or 2.x tag this doesn't break. The version discussed previously was 0.8.4 but assuming vrs-python sticks by semantic versioning it could be something wider.

@matren395 matren395 requested a review from klaricch September 28, 2023 19:03
matren395 and others added 8 commits September 28, 2023 15:14
change name of 'input_dict' to 'input_struct' to reflect the fact that it is an input Hail Struct (even though it can be indexed like a Python dictionary) - and a few capitalization changes in there as well
may have some more renaming to be done or to run Black from command line, tbd

Co-authored-by: klaricch <[email protected]>
with freq_index_dict and sex_id fixes
@matren395 matren395 force-pushed the dm/variant_report_code_rebranch branch from d0ac270 to 1d1e517 Compare September 28, 2023 19:14
@matren395 matren395 merged commit f8e4313 into main Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants