-
Notifications
You must be signed in to change notification settings - Fork 77
Implement algorithm for calculating species tree topology distributions. #650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement algorithm for calculating species tree topology distributions. #650
Conversation
1f9280c
to
d0d4242
Compare
Codecov Report
@@ Coverage Diff @@
## master #650 +/- ##
==========================================
+ Coverage 87.71% 87.77% +0.05%
==========================================
Files 23 23
Lines 17360 17468 +108
Branches 3421 3456 +35
==========================================
+ Hits 15227 15332 +105
Misses 1044 1044
- Partials 1089 1092 +3
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #650 +/- ##
==========================================
+ Coverage 87.63% 87.74% +0.10%
==========================================
Files 23 23
Lines 17523 17668 +145
Branches 3450 3498 +48
==========================================
+ Hits 15357 15503 +146
Misses 1062 1062
+ Partials 1104 1103 -1
Continue to review full report at Codecov.
|
3fe1fde
to
9d6cd23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick feedback here @daniel-goldstein, I've not grokked how it fits together yet or the tests.
Should we follow the pattern of adding wrapper methods on Tree and TreeSequence (and move the docstrings over) to not have to expose the combinatorics module? If so, should we lift the functions over entirely or make the Tree{Sequence} methods one-liners?
Yes, let's one-liner methods to the Tree and TreeSequence classes.
9d6cd23
to
fd6e6fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @daniel-goldstein! The main changes we need to make I think is on the interface, as we discussed.
fd6e6fe
to
90cf5d0
Compare
@jeromekelleher Addressed the changes we discussed. Created a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @daniel-goldstein - a few more minor comments.
90cf5d0
to
192e238
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of more comments @daniel-goldstein.
@petrelharp, do you want to take a look through this, or should I merge? We want to start using this pretty soon, and I think the overall API is probably pretty solid (we should mark it as alpha anyway, so people don't have too high expectations of stability).
192e238
to
be1bc0b
Compare
Looks good! I had some clarifying comments on the docs, and am happy to look again if you want, but don't have to. |
Thanks @petrelharp! |
be1bc0b
to
57dde5f
Compare
OK, great, I'm going to mark this for merging. @daniel-goldstein, can you open issues to track anything we've left dangling/unfinished here please? |
@Mergifyio refresh |
Travis AWOL again! Kicking it. |
Command |
57dde5f
to
88efe1c
Compare
This implements both per-tree and incremental algorithms for computing exact tree topology distributions on tree sequences.
As discussed in my talk last week, this uses a DP approach to compute all possible species topology distributions at each node in the tree, starting from leaves up to roots. The
TopologyTree
class is used to store topology distributions at each node. This approach does not support internal samples, and enforces that samples be a subset of leaves (leaves that are not samples are ignored). This does, nicely, support multiply-rooted trees. As there can't be species topologies that span different roots, we compute the topologies for each root usingTopologyTree
and "add" the resulting distributions together.Since this approach needs only a single upward traversal, the incremental algorithm is straightforward. We maintain a reference to the
TopologyTree
for each node and reconstruct thoseTopologyTree
s on the upward traversals when inserting/removing edge diffs.Should we follow the pattern of adding wrapper methods on
Tree
andTreeSequence
(and move the docstrings over) to not have to expose the combinatorics module? If so, should we lift the functions over entirely or make theTree{Sequence}
methods one-liners?cc @jeromekelleher @hyanwong