This is an archived static version of the original phylobabble.org discussion site.

Interesting phylogenetic search heuristics

ematsen

I thought it would be fun to have a discussion about interesting optimization heuristics for phylogenetic inference. Please post to contribute! Here’s one from the parsimony literature:

The Parsimony Ratchet Nixon, Cladistics 1999; PDF.

Generate a starting tree (e.g., a “Wagner” tree followed by some level of branch swapping or not)
Randomly select a subset of characters, each of which is given additional weight (e.g., add 1 to the weight of each selected character).
Perform branch swapping (e.g., “branch-breaking” or TBR) on the current tree using the reweighted matrix, keeping only one (or a few) trees.
Set all weights for the characters to the “original” weights (typically, equal weights).
Perform branch swapping (e.g., branch-breaking or TBR) on the current tree (from step 3) keeping one (or a few) trees.
Return to step 2. Steps 2–6 are considered to be one iteration, and typically, 50–200 or more iterations are performed. The number of characters to be sampled for reweighting in step 2 is determined by the user; I have found that between 5 and 25% of the characters provide good results in most cases.

In this context, a “weight” is a per-column multiplier of the parsimony score used in the grand parsimony total.

I think this one is interesting, and I don’t know of anything like it being used in the likelihood literature. Does anyone else? Seems to me that the closest would be a love child between heated chains and the bootstrap.

josephwb

@rutgeraldo has his “Likeihood Ratchet”, but I don’t know where it might be implemented.

ematsen

Pretty neat!

ncbi.nlm.nih.gov

Accelerated likelihood surface exploration: the likelihood ratchet.

RA Vos, Systematic biology, Jun 2003

The existence of multiple likelihood maxima necessitates algorithms that explore a large part of the tree space. However, because of computational constraints, stepwise addition-based tree-searching methods do not allow for this exploration in reasonable time. Here, I present an algorithm that increases the speed at which the likelihood landscape can be explored. The iterative algorithm combines the computational speed of distance-based tree construction methods to arrive at approximations of the global optimum with the accuracy of optimality criterion based branch-swapping methods to improve on the result of the starting tree. The algorithm moves between local optima by iteratively perturbing the tree landscape through a process of reweighting randomly drawn samples of the underlying sequence data set. Tests on simulated and real data sets demonstrated that the optimal solution obtained using stepwise addition-based heuristic searches was found faster using the algorithm presented here. Tests on a previously published data set that established the presence of tree islands under maximum likelihood demonstrated that the algorithm identifies the same tree islands in a shorter amount of time than that needed using stepwise addition. The algorithm can be readily applied using standard software for phylogenetic inference.

ematsen

Here’s a new one from @Alexis_RAxML & co:

ncbi.nlm.nih.gov

An Efficient Independence Sampler for Updating Branches in Bayesian Markov chain Monte Carlo Sampling of Phylogenetic Trees.

AJ Aberer, A Stamatakis and F Ronquist, Systematic biology, Jan 2016

Sampling tree space is the most challenging aspect of Bayesian phylogenetic inference. The sheer number of alternative topologies is problematic by itself. In addition, the complex dependency between branch lengths and topology increases the difficulty of moving efficiently among topologies. Current tree proposals are fast but sample new trees using primitive transformations or re-mappings of old branch lengths. This reduces acceptance rates and presumably slows down convergence and mixing. Here, we explore branch proposals that do not rely on old branch lengths but instead are based on approximations of the conditional posterior. Using a diverse set of empirical data sets, we show that most conditional branch posteriors can be accurately approximated via a [Formula: see text] distribution. We empirically determine the relationship between the logarithmic conditional posterior density, its derivatives, and the characteristics of the branch posterior. We use these relationships to derive an independence sampler for proposing branches with an acceptance ratio of ~90% on most data sets. This proposal samples branches between 2× and 3× more efficiently than traditional proposals with respect to the effective sample size per unit of runtime. We also compare the performance of standard topology proposals with hybrid proposals that use the new independence sampler to update those branches that are most affected by the topological change. Our results show that hybrid proposals can sometimes noticeably decrease the number of generations necessary for topological convergence. Inconsistent performance gains indicate that branch updates are not the limiting factor in improving topological convergence for the currently employed set of proposals. However, our independence sampler might be essential for the construction of novel tree proposals that apply more radical topology changes.

It’s related to some ideas I presented in a talk at Evolution 2013 and which we’re finally getting around to writing up.