Speeding up Felsenstein's pruning algorithm


#1

Dear All,

A simple trick to speedup the pruning algorithm to compute tree likelihood exploiting the time-reversible property of Markov models is described in the supplementary material of this paper in MBE: https://doi.org/10.1093/molbev/msx281. This trick was implemented in RAxML and IQ-TREE. The gist is to speed up branch length estimation, which dominates the runtime. The runtime reduction is 4, 20 and 61 times for DNA, protein and codon models, respectively.

Regards, Minh


#2

So this is a speedup of the pruning algorithm itself, not a speedup of the bootstrap method? It would greatly speed up the computation of the likelihood of a single tree when there was no bootstrapping occurring?


#3

As I understand it is a speed-up of the edge length optimisation, e.g. inside the Newton-Raphson, not of the pruning algorithm. By the way I do the same inside phangorn.

Regards, Klaus


#4

A quick reading of the manuscript suggests that the speedup is by using trees encountered during the search for the ML tree on the full data, saving them, and evaluating their likelihoods with the bootstrapped data. Since the likelihoods for individual sites would have been computed for those trees when first encountered, their likelihood on the bootstrapped data would be quick to compute. If my reading is right, this is a good speedup, but most emphatically not “A simple trick to speedup the pruning algorithm to compute tree likelihood exploiting the time-reversible property of Markov models”. Unless I am misunderstanding what is proposed.


#5

I retract what I said. I read the Supplementary materials for his paper (they are accessible even if you can’t get the body of the paper). A very nice spectral-decomposition method that speeds up likelihood computations by a factor of the number of states in the character (4 for nucleotide sequences, 20 for amino acids, 61 for codon models). The coefficients of the spectral decomposition are then what is pruned. Cool!


#6

Dear Joe, Yes that’s exactly whole point under this trick. It looks simple once you write all the equations down and has a substantial speedup for state-rich models like protein or codon. I’m just surprised that this approach was never documented anywhere. Minh


#7

Yes that’s right. However, the pruning algorithm in turn needs the branch lengths to have the right ML scores. So in the end it boosts the pruning algorithm.

Minh