Sampling from the SPR random walk on rooted trees


#1

Fellow babblers,

@cwhidden and I would like to sample from the subtree-prune-regraft (SPR) random walk on rooted phylogenetic trees. Does anyone know an easy way to do this? Chris could roll his own, but I’ll bet that there is an easy solution out there.

If we wanted to sample from the random walk on unrooted trees, we could sample from the MrBayes prior. BEAST is rooted, which is nice, but has non-uniform priors on topologies. @mlandis would this be easy with revBayes?

Thanks!


#2

The priors in BEAST are uniform on histories, also known as ranked trees. A ranked tree is a rooted topology plus an ordering of the internal nodes which increases towards the tips. To me, these seem the most natural objects to study in the context of rooted trees. Do you need to use unranked trees?


#3

@ematsen, @cwhidden : Running under MCMC under the prior to sample from a diffusion on tree-space sounds like a great idea. BEAST, MrBayes, and RevBayes would all work nicely for this. Feel free to email if you’d like help doing this through RevBayes.

One thing to consider is SPR may be applied without constraint on an unrooted tree. For time trees (rooted trees with branch lengths proportional to time), whether a branch is a valid regrafting point depends both on the subtree’s ancestor’s age and the start and end age of the receiving branch. e.g. A subtree with age 100 cannot regraft to a branch originating at age 1.

So SPR proposals for time trees often impose additional constraints on regraft points (see fixed node-height prune regraft, FNPR), or impose constraints and resample the attachment node’s age (Wilson-Balding). For more, see Hoehna, Defoin-Platel, and @alexei_drummond’s work on tree proposals. You can imagine other types of SPR proposals that produce valid states for a time tree, some of which would perform poorly in practice, but I wanted to mention that the type of diffusion you impose on time-tree-space will interact with your prior distribution on divergence times. Of course that might be exactly what you’re interested in!


#4

Thanks, Graham and Michael!

I should have been a little more specific. The set of objects that interest us are rooted phylogenetic tree topologies without branch lengths or any other extra information. The random walk of interest to us is the MCMC using rooted subtree-prune-regraft (SPR) moves corresponding to the uniform distribution on rooted phylogenetic tree topologies.

I belive this can be constructed MCMC-style by uniformly sampling a neighbor s of a given t (with respect to SPR) and accepting such a proposal with probability min(1, d_t / d_s), where d_t is the degree of t and d_s is the degree of s. I’d think MrBayes is doing the unrooted equivalent when sampling from the prior and is limited to SPR moves. However, we are interested in rooted trees and rooted SPR moves.

Is that more clear? Is that something that can be done easily with RevBayes?


#5

@ematsen: Here are two simple .Rev files that will sample trees under a prior: an unrooted tree with SPR, and a time tree using various proposals. These work with the language implemented in the development branch, which will most likely be the final language definition.

I know you mentioned you’d like these SPR proposals to disregard node ages for rooted trees. This is not currently implemented since time-free rooted trees appear rarely in phylogenetics, but I’d be glad to add it if RevBayes looks like an easy solution for you.


#6

@mlandis, thank you very much for your complete response.

We will have a look and a think. Your point is well taken regarding time-free rooted trees, but that’s what we can do right now given our mathematical “technology.”


#7

Maybe you are interested in the wrong objects :slight_smile: If the trees are rooted then all simple phylogenetic models I know (e.g. coalescent and birth-death trees with single-time sampling) generate uniform priors on ranked rooted trees (as @mlandis points out).


#8

The prior is not actually a problem. We can set things up to have any Markov proposal and acceptance mechanism, and hence we can get any prior on topologies, such as the induced prior on rooted phylogenetic trees coming from ranked phylogenetic topologies. Thus, if there are any proposals in either RB or BEAST that induce Markov proposals on rooted phylogenetic topologies, then we would be good to go.

Note that this isn’t a trivial criterion, as a proposal that uses the real-valued branch lengths or times in a nontrivial way will probably not be Markov when projected down to rooted topologies. Indeed, I don’t think that the standard RB or BEAST proposals would be.

Thanks, all. This has been quite helpful.


#9

Once you have learned to love ranked rooted trees, you might like this paper

Song, Y.: Properties of subtree-prune-and-regraft operations on totally-ordered phylogenetic trees. Annals of Combinatorics 10, 129–146 (2006)

(totally-ordered = ranked)


#10

I’m already a fan. They make many things easier, and are interesting in their own right, e.g.

Thanks for the tip, Graham!