[Paper] Phycas: Software for Bayesian Phylogenetic Analysis

I had no idea that this was coming down the pipe, but was excited to see a paper come out describing Phycas (http://www.phycas.org) from its authors, including @mtholder. Code is at https://github.com/plewis/phycas.

I haven’t played with Phycas, but I can describe what I think of as its “killer feature,” which is the ability to use a prior incorporating polytomous trees. That means that the software can return trees that look like this:

where I’ve put an arrow pointing at the polytomy. In this case, the polytomy shows three descendants of a given lineage.

I would argue that such a representation is a more honest one (a “shrunken” estimate for @nicolas_lartill and friends). That is, if there is not information to resolve an internal node, then an unresolved tree is returned. One often sees such nodes when people collapse nodes in ML trees that have low bootstrap support. However, I think that it’s better than that because it’s rolled into the actual inference, meaning that the overall likelihoods are properly estimated. In a field where we are already in statistically tenuous territory with the number of discrete parameters being on the order of the number of independent data points (to say nothing of the discrete estimation part) having fewer parameters when appropriate is refreshing.

In the Bayesian phylogenetics world, not allowing such multifurcations can cause some trouble, as was the subject of some research in the mid-2000’s. A good culmination of that work is this paper by Z Yang:

The point of this work is that if the data truly doesn’t have any signal concerning an unresolved node, and we use a Bayesian phylogenetic inference package that doesn’t allow multifurcations (every one except for Phycas as far as I know) the data set may (with non-vanishing probability) give very high confidence that one of the resolutions is correct. This is shown in this figure from Yang’s paper (click to expand):

See the deep red in the corners? That’s showing that in replicate data sets there is a substantial probability that one of the resolutions will be very highly supported.

So, I can’t comment on Phycas’ usability, runtimes, etc, and Conditional Predictive Ordinates sound nice, but for me, proper support for polytomies is the headlining feature of Phycas. If anyone tries it out, please post here!

I concur regarding polytomies, but I’m coming at this from a morphological/paleontological point of view (well that’s my POV always, unless I’m talking to paleontologists…).

Because a morphotaxon could persist through multiple branching events with little morphological change (i.e. an ancestor having many descendants) we would expect morphological phylogenetics to be replete with such ‘intrinsically unresolvable’ polytomies, and the situation only gets worse as we add fossils (or have a better sampled fossil record).

Although I wasn’t the first one to notice this effect, I wrote up some simulations regarding the impact of this phenomenon here:

Very cool. I was interested in looking at polytomies after reading this paper by Richard Neher and Oskar Hallatschek:

In it they show that rapidly adapting populations will have an unusual coalescent process in which there are frequent “true” polytomies where one individual in the population spawns multiple progeny that have descendants in the present-day sample, like so:

They refer to these events as “multiple mergers”.

Huh, interesting! In a sense, trvb, its the same ultimate dynamic as what I mentioned with regards to morphology: if branching that produces sampled descendants occurs more quickly than characters are changing, then ‘true’ polytomies are inevitable.