-
Notifications
You must be signed in to change notification settings - Fork 31
Add additional strata to freq dict labels #542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this is a really stupid question, but can we change to just something like this:
def make_freq_index_dict(
freq_meta: List[Dict[str, str]],
label_delimiter: str = "_",
sort_order: Optional[List[str]] = SORT_ORDER,
) -> Dict[str, int]:
index_dict = {}
for i, f in enumerate(freq_meta):
if sort_order and len(set(f.keys()) - set(sort_order)) < 1:
index_dict[label_delimiter.join([f[g] for g in sorted(f.keys(), key=(lambda x: sort_order.index(x)) if sort_order else None)])] = i
return index_dict
We could rename it something so that the existing function still exists as is, but I think this is way over complicated for what we need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the other parts look good, do you want to keep the second solution for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested Julia's suggestion, but I think you need to put the addtional_strata in the SORT_ORDER now. Julia prefers adj/raw in the end of all the combos.
Also, this code need a minor change:
def make_freq_index_dict(
freq_meta: List[Dict[str, str]],
label_delimiter: str = "_",
sort_order: Optional[List[str]] = SORT_ORDER,
) -> Dict[str, int]:
index_dict = {}
for i, f in enumerate(freq_meta):
if sort_order and (len(f.keys()) - len(sort_order)) < 1:
index_dict[label_delimiter.join([f[g] for g in sorted(f.keys(), key=(lambda x: sort_order.index(x)) if sort_order else None)])] = i
return index_dict
for x, y in itertools.product( | ||
anchor_val, | ||
make_label_combos(copy_label_groups, label_delimiter=label_delimiter), | ||
make_label_combos(copy_label_groups, sort_order, label_delimiter), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to add additional strata like "gatk_version" in SORT_ORDER
Closing this PR as I added Julia's suggestion to another PR #551 |
We built functionality to add additional strata to the freq array but did not include the ability to build those strata into the freq index dictionary. This seemed like the simplest way to do this. Additional strata are simply added to the end of the label
amr_xx_adj_4.1.0.1
. We historicaly have liked to see adj last so I did put a note that we could insert before the last elemnt in sort order. We could also add the functionality to insert the strata into the SORT_ORDER list at a certain index but this requires knowledge of the order and the order could change based on other groupings. For these reasons, I felt it was best (and easiest) to tack it onto the end of the order or before the last element.