Skip to content

Conversation

daniel-goldstein
Copy link
Member

@daniel-goldstein daniel-goldstein commented May 26, 2020

This implements both per-tree and incremental algorithms for computing exact tree topology distributions on tree sequences.

As discussed in my talk last week, this uses a DP approach to compute all possible species topology distributions at each node in the tree, starting from leaves up to roots. The TopologyTree class is used to store topology distributions at each node. This approach does not support internal samples, and enforces that samples be a subset of leaves (leaves that are not samples are ignored). This does, nicely, support multiply-rooted trees. As there can't be species topologies that span different roots, we compute the topologies for each root using TopologyTree and "add" the resulting distributions together.

Since this approach needs only a single upward traversal, the incremental algorithm is straightforward. We maintain a reference to the TopologyTree for each node and reconstruct those TopologyTrees on the upward traversals when inserting/removing edge diffs.

Should we follow the pattern of adding wrapper methods on Tree and TreeSequence (and move the docstrings over) to not have to expose the combinatorics module? If so, should we lift the functions over entirely or make the Tree{Sequence} methods one-liners?

cc @jeromekelleher @hyanwong

@daniel-goldstein daniel-goldstein force-pushed the exact-topology-distributions branch from 1f9280c to d0d4242 Compare May 26, 2020 12:33
@codecov
Copy link

codecov bot commented May 26, 2020

Codecov Report

Merging #650 into master will increase coverage by 0.05%.
The diff coverage is 97.24%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #650      +/-   ##
==========================================
+ Coverage   87.71%   87.77%   +0.05%     
==========================================
  Files          23       23              
  Lines       17360    17468     +108     
  Branches     3421     3456      +35     
==========================================
+ Hits        15227    15332     +105     
  Misses       1044     1044              
- Partials     1089     1092       +3     
Flag Coverage Δ
#c_tests 89.05% <97.24%> (+0.06%) ⬆️
#python_c_tests 91.16% <97.24%> (+0.07%) ⬆️
#python_tests 98.86% <97.24%> (-0.05%) ⬇️
Impacted Files Coverage Δ
python/tskit/combinatorics.py 98.18% <97.24%> (-0.24%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1a1f315...d0d4242. Read the comment docs.

@codecov
Copy link

codecov bot commented May 26, 2020

Codecov Report

Merging #650 into master will increase coverage by 0.10%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #650      +/-   ##
==========================================
+ Coverage   87.63%   87.74%   +0.10%     
==========================================
  Files          23       23              
  Lines       17523    17668     +145     
  Branches     3450     3498      +48     
==========================================
+ Hits        15357    15503     +146     
  Misses       1062     1062              
+ Partials     1104     1103       -1     
Flag Coverage Δ
#c_tests 89.02% <100.00%> (+0.12%) ⬆️
#python_c_tests 91.21% <100.00%> (+0.16%) ⬆️
#python_tests 98.99% <100.00%> (+0.03%) ⬆️
Impacted Files Coverage Δ
python/tskit/__init__.py 100.00% <ø> (ø)
python/tskit/combinatorics.py 99.13% <100.00%> (+0.26%) ⬆️
python/tskit/trees.py 98.20% <100.00%> (+0.01%) ⬆️
python/_tskitmodule.c 83.85% <0.00%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f39c8f7...88efe1c. Read the comment docs.

@daniel-goldstein daniel-goldstein force-pushed the exact-topology-distributions branch 5 times, most recently from 3fe1fde to 9d6cd23 Compare May 26, 2020 17:05
@daniel-goldstein daniel-goldstein marked this pull request as ready for review May 26, 2020 17:06
Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick feedback here @daniel-goldstein, I've not grokked how it fits together yet or the tests.

Should we follow the pattern of adding wrapper methods on Tree and TreeSequence (and move the docstrings over) to not have to expose the combinatorics module? If so, should we lift the functions over entirely or make the Tree{Sequence} methods one-liners?

Yes, let's one-liner methods to the Tree and TreeSequence classes.

@daniel-goldstein daniel-goldstein force-pushed the exact-topology-distributions branch from 9d6cd23 to fd6e6fe Compare May 29, 2020 13:09
Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @daniel-goldstein! The main changes we need to make I think is on the interface, as we discussed.

@daniel-goldstein daniel-goldstein force-pushed the exact-topology-distributions branch from fd6e6fe to 90cf5d0 Compare May 31, 2020 17:53
@daniel-goldstein
Copy link
Member Author

@jeromekelleher Addressed the changes we discussed. Created a TopologyCounter which is indexable by a tuple of sample set indexes that's returned from count_topologies, and a PartialTopologyCounter that handles most of the heavy lifting in the algorithm. With these in place there wasn't much point to a TopologyTree, so the per-node state in the counting topologies methods are TopologyCounters.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @daniel-goldstein - a few more minor comments.

@daniel-goldstein daniel-goldstein force-pushed the exact-topology-distributions branch from 90cf5d0 to 192e238 Compare June 2, 2020 03:52
Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of more comments @daniel-goldstein.

@petrelharp, do you want to take a look through this, or should I merge? We want to start using this pretty soon, and I think the overall API is probably pretty solid (we should mark it as alpha anyway, so people don't have too high expectations of stability).

@daniel-goldstein daniel-goldstein force-pushed the exact-topology-distributions branch from 192e238 to be1bc0b Compare June 2, 2020 13:20
@petrelharp
Copy link
Contributor

Looks good! I had some clarifying comments on the docs, and am happy to look again if you want, but don't have to.

@daniel-goldstein
Copy link
Member Author

Thanks @petrelharp!

@daniel-goldstein daniel-goldstein force-pushed the exact-topology-distributions branch from be1bc0b to 57dde5f Compare June 2, 2020 14:38
@jeromekelleher
Copy link
Member

OK, great, I'm going to mark this for merging.

@daniel-goldstein, can you open issues to track anything we've left dangling/unfinished here please?

@benjeffery
Copy link
Member

@Mergifyio refresh

@benjeffery
Copy link
Member

Travis AWOL again! Kicking it.

@mergify
Copy link
Contributor

mergify bot commented Jun 3, 2020

Command refresh: success

@benjeffery benjeffery force-pushed the exact-topology-distributions branch from 57dde5f to 88efe1c Compare June 3, 2020 13:06
@mergify mergify bot merged commit 5958f0e into tskit-dev:master Jun 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants