[Paper] Phylogenetic Stochastic Mapping without Matrix Exponentiation

This is very kind of you, @nicolas_lartill, but we are not competent enough to do it on our own :slight_smile: Seriously though, Jan and I need to decide which platform to use for going forward with this. PhyloBayes was our first though, but we are afraid to mess with it without someone holding our hands.

I haven’t thought about local proposals, but my feeling, in line with @nicolas_lartill’s thoughts, is that even local rearrangements of phylogenies are better to be accompanied by global updates of the substitution history.

We could certainly think about it. The code of the old version of phylobayes is close to inextricable. On the other hand, the mpi version is a bit more rationalized (although still not totally compliant with current object-oriented programming practice), so it could be used as a possible environment.

in fact, once you realize that resampling V ~ p(V | W, tau) or discarding the V component while performing topological updates are mcmc-equivalent, as I mentioned above, and that the usual caching of conditional likelihoods at the nodes of the tree becomes available in the latter case, then all this means that you can resample the topology in a computationally efficient manner while integrating over all configurations of the V component.

I moved 3 posts to an existing topic: Matrix strategies to avoid rate matrix exponentiation

This is good to know, @nicolas_lartill. I didn’t know about the mpi version of PhyloBayes is so different from the old version.

@mlandis, you mentioned that you plan to work on this using RevBayes as a platform – any thoughts on PhyloBayes vs RevBayes as a platform for moving forward? We wouldn’t want to steal your thunder, but would be happy to help of course. There are plenty of projects to go around in terms of ejecting matrix exponentiation/diagonalization from various phylogenetic models.

I get you point, @nicolas_lartill – clever! Although this is all conditional on the number and locations of real and virtual jumps – I don’t have a sense of how frequently they would have to be updated, but you are probably right in that one does not have to update W immediately following a topology update.

We use the branch indices to map the figure onto our notation in text:

The tree in plot 4 of Figure 1 has four branches so the collection of state labels is V = {v1,v2,v3,v4}, where v1 = (1, 1), v2 = (1, 2), v3 = (3, 3), and v4 = (3, 1). The collection of intertransition times is W = {w1, w2, w3, w4}, where w1 = (1.6, 1.6), w2 = (0.64, 2.56), w3 = (7, 1), and w4 = (2.4,2.4). The locations/times of each virtual jump on branch i is represented by a vector ui = (ui1, …, ui(mi−ni)). For example, the distance from the parent node of branch i to the dth virtual jump is uid. The collection of the ui vectors, fully determined by (V,W), is denoted by U. In plot 4 of Figure 1 the collection of virtual jump times is U = {u1, u2, u3, u4} whereu1 =(1.6),u2 =(),u3 =(7),andu4 =().

It is pretty sad that Jan and I were working on this for more than 2 years (we noticed Rao and Teh’s original paper almost immediately as it came out in print), but still cannot explain the method well.

Ha, I didn’t mean to make you feel badly! I just thought for a while that the numbers were telling me something special about panel 3. I suggest just having a separate panel with the branch numberings.

A separate uncolored tree with branch numbers is a great idea. Thanks, @ematsen!

This is great! In addition to improving from O(N^3) to O(N^2) for dense matrices and even to O(N) for sparse enough matrices, the continuous time Bayesian network (CTBN) section in a later section of the Rao-Teh paper (perhaps not implemented here) will allow O(<< N) stochastic mapping with structured (e.g. covarion-like) continuous time Markov processes with combinatorially large state spaces. We are currently looking at a process with N = 61*2^20 ~ 60M states!

Jan and I did not touch CTBN, Markov blankets, etc., so this is not implemented in the paper above. It is great that you find these tools useful for your enormous state space, @argriffing! Soon enough we will be modeling evolution on the state space, where each element is the whole DNA sequence!

Of course there’s no stolen thunder when brainstorming! Either framework suits me, though I’d prefer RevBayes if only for my familiarity with its code. Since you mentioned interest in future projects, I also think it provides a flexible framework for exploring new models (not to imply otherwise of PhyloBayes). In any case, I anticipate identifying good MCMC proposals will be a greater challenge than the software implementation.

1 Like