Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up subproblem solver using faster version of BUILD for trees #59

Open
bredelings opened this issue Apr 15, 2019 · 4 comments
Open

Comments

@bredelings
Copy link
Contributor

bredelings commented Apr 15, 2019

(2018) Fast Compatibility Testing for Rooted Phylogenetic Trees

This paper allows compatibility testing directly on a collection of trees without first decomposing them into splits. It should allow performing this test in O(M*log^2(M)) where M=vertices+edges for all the trees T[i] in a "profile". I can't remember how bad our current implementation is (quadratic? cubic?) but this should be much better, especially since it eliminates the bottleneck step, which is to perform tons of set intersections between large sets.

Implementation issues:

  • This algorithm doesn't deal with incertae sedis taxa. This only affects the taxonomy though.

  • The current version of the subproblem solver incrementally add splits to a set in combine( ). We could instead walk incrementally root-ward from a tree we are testing and remove splits if they are bad. If we think that most splits are good we could (for example) initially try to add a whole tree, and then try to add its subtrees incrementally if that fails, and perhaps avoid doing B tests for a tree with B branches.

  • The algorithm relies on dynamic graph algorithms described in Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity
    The BOOST graph library only implements stuff under incremental edge insertion and does not allow edge deletion.

@bredelings
Copy link
Contributor Author

I suspect that implementing a not-quite-as-fast version of the connectivity test would still be a speedup.

@bredelings
Copy link
Contributor Author

bredelings commented Apr 16, 2019

This is now being developed on the faster-solver branch. The plan is to get a minimal working version, and speed up things that seems to be taking the most time. This should take less time than beginning by implementing the full HDT fully-dynamic graph connectivity stuff.

@bredelings
Copy link
Contributor Author

I've done initial coding to use HDT-0. It required

  • representing connected components of the graph by spanning trees
  • representing spanning trees by Euler tours of the spanning tree
  • representing Euler tours using a Treap instead of a linked list, since a Treap should allow log(n) time to check if two nodes are in the same Euler tour.
    This is currently on branch treap-solver. It is no longer terribly slow, but is still slower than the original as of 11/4/2020.

Other approaches to speed up the subtree solver involve trying to avoid redoing the BUILD problem completely for every new branch that we try and test. However, an initial approach using memoization on branch 'use-dict-in-solver' did not appear to produce a speedup either. So this is hard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant