Skip to content

Conversation

mike-w-wilson
Copy link
Contributor

We built functionality to add additional strata to the freq array but did not include the ability to build those strata into the freq index dictionary. This seemed like the simplest way to do this. Additional strata are simply added to the end of the label amr_xx_adj_4.1.0.1. We historicaly have liked to see adj last so I did put a note that we could insert before the last elemnt in sort order. We could also add the functionality to insert the strata into the SORT_ORDER list at a certain index but this requires knowledge of the order and the order could change based on other groupings. For these reasons, I felt it was best (and easiest) to tack it onto the end of the order or before the last element.

Copy link
Contributor

@jkgoodrich jkgoodrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this is a really stupid question, but can we change to just something like this:

def make_freq_index_dict(
    freq_meta: List[Dict[str, str]],
    label_delimiter: str = "_",
    sort_order: Optional[List[str]] = SORT_ORDER,
) -> Dict[str, int]:
    index_dict = {}
    for i, f in enumerate(freq_meta):
        if sort_order and len(set(f.keys()) - set(sort_order)) < 1:
            index_dict[label_delimiter.join([f[g] for g in sorted(f.keys(), key=(lambda x: sort_order.index(x)) if sort_order else None)])] = i

    return index_dict

We could rename it something so that the existing function still exists as is, but I think this is way over complicated for what we need

@jkgoodrich jkgoodrich assigned KoalaQin and unassigned jkgoodrich Jun 1, 2023
@jkgoodrich jkgoodrich requested a review from KoalaQin June 1, 2023 19:00
Copy link
Contributor

@KoalaQin KoalaQin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other parts look good, do you want to keep the second solution for now?

Copy link
Contributor

@KoalaQin KoalaQin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested Julia's suggestion, but I think you need to put the addtional_strata in the SORT_ORDER now. Julia prefers adj/raw in the end of all the combos.
Also, this code need a minor change:

def make_freq_index_dict(
    freq_meta: List[Dict[str, str]],
    label_delimiter: str = "_",
    sort_order: Optional[List[str]] = SORT_ORDER,
) -> Dict[str, int]:
    index_dict = {}
    for i, f in enumerate(freq_meta):
        if sort_order and (len(f.keys()) - len(sort_order)) < 1:
            index_dict[label_delimiter.join([f[g] for g in sorted(f.keys(), key=(lambda x: sort_order.index(x)) if sort_order else None)])] = i

    return index_dict

for x, y in itertools.product(
anchor_val,
make_label_combos(copy_label_groups, label_delimiter=label_delimiter),
make_label_combos(copy_label_groups, sort_order, label_delimiter),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to add additional strata like "gatk_version" in SORT_ORDER

@mike-w-wilson
Copy link
Contributor Author

Closing this PR as I added Julia's suggestion to another PR #551

@mike-w-wilson mike-w-wilson deleted the mw/freq_dict_additional_strata branch April 3, 2024 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants