Upcoming talk on phylogenetic MCMC mixing using SPR distance


@cwhidden and I have been applying his awesome SPR techniques to understand what keeps phylogenetic MCMC chains from mixing. Here’s a talk about it. My talk is (in theory) due today, but I would love to get any feedback to improve the talk and our upcoming writeup of the work.

We thank @beiko, @koadman, and @cmccoy for providing helpful feedback, but this work builds on a big literature, with great work from @hoehna, @alexei_drummond, @trayc7, and many others.


Nice slides @ematsen! Almost don’t need to hear the talk :wink: I like the visualisation on slide 19. But without hearing your talk I don’t know how that differs from the technique used for the visualisation on slide 28, or if they are just different data sets. For those amongst us inclined towards time-trees rather than their uncouth unrooted brethren, is this work mainly applicable to one or the other? Are your examples on time-trees or unrooted trees? From our experience there can be some distinctive difficulties in sampling the space of time-trees compared to unrooted trees. You could also follow @mathmomike’s lead and finish with a list of specific open questions in this space!


Hi @alexei_drummond. I’m glad you like the visualizations. Slide 28 is indeed a different data set and we are looking at many more.

We’ve been focusing on unrooted trees with MrBayes for now and these examples are from unrooted trees. However, the methods themselves are applicable to both time-trees and unrooted trees. In fact, SPR distances are much faster to compute with rooted trees. I don’t think time-trees will make it into our first paper, as we are exploring many ideas that couldn’t fit into the talk (slide 29), but I’m very interested in comparing both.


I like the party time list on slide 29. The area that we would be especially interested in is using these ideas to build better tree proposal moves for time-trees in BEAST2.


Looks great @cwhidden and @ematsen!

Do you have any metrics for edges in the SPR graph are the most heavily traversed? You might get at this by looking at the frequency of accepted MCMC proposals for T_i -> T_j. (Perhaps you already recorded this in your commute time statistics). Some SPR paths must allow significantly faster passage between modes than a path chosen at random. It’d be interesting to see what are the major thoroughfares between treeburbs.


Developing better tree proposal moves is one of our goals as well. We’re starting to look at examples like slide 26, although I don’t yet see an easy way to move between those two trees. This motivated the commute time approach (as you suggested in your 2012 paper) and we are trying to see which trees are difficult to reach and how the MCMC moves between them.

I’m currently running some long golden runs with MrBayes on a different microbial dataset. I think I’ll have to fire up BEAST when those finish.


@mlandis, yes we have been looking at the frequency of traffic between trees as well and have tried visualizing this by making the edges thicker in proportion to the number of transitions. This also couldn’t fit into the talk and is very much a work in progress.

One issue with directly looking at tree transitions is that sampling density affects the quality of tree transition statistics, but some of these tree islands require a large number of iterations to reach (and then get stuck in for a large number of iterations). I was filling up all of @ematsen 's hard drive space before I switched to sampling every 100 or 1000 iterations. Commute time statistics are an alternative way of looking at this.

Also, that last comment was @alexei_drummond, in case it wasn’t clear from the context.


Interesting talk! Are you doing NNI moves or SPR moves for your exploration? The text says SPR, but the image used on the slides is of the 5-leaf NNI-space. --Katherine


Urk! We are using SPR moves. I’ll have to make it clear that we are just using this graph as an example of the sort of graph that SPR moves will create.

Thanks, everyone, for your great feedback. I’ll have to tune the presentation up a bit before presenting to a phylogenetics audience! The upcoming audience is a bunch of folks who work on defense-related science projects.