Number of bootstrap replicates for short alignments


#1

Hi, I have been days struggling with a phylogenetic tree of around 90 short sequences (domain based) whose support values for many branches are really low (<30 in many cases). I have noticed that aLRT supports are good and do not correlate with the bootstrapped result.

I was wondering if the number of replicates (100 in my case) could have an effect in short alignments (~250 sites) or if there is an alternative way to measure the support of some specific branches in very unstable alignments. Note that even small variations in the the software used to align the sequences lead to different topologies.

Any suggestion on how to address this type of problem?

thanks!

ps. I am using phyml for the tree (4 rate categories), and mafft-linsi for the alg


#2

My gut reaction is that the splits with inconsistent results across software versions and low bootstrap support cannot be reliable, even if they have high aLRT support. If you want to investigate further, I suggest simulating alignments using your estimated trees (and reintroducing any gaps) and seeing how the support values perform.


#3

We designed an empirical method to assess/decide how many BS replicates are necessary a couple of years ago and implemented it in RAxML: http://link.springer.com/chapter/10.1007/978-3-642-02008-7_13#page-1

Generally, for datasets like the one you describe, I’d expect it to require at least 500 replicates, however this is until it will converge around stable values which can still be low, I assume that you will simply have to accept the low values because there is not enough data. I’d also explore how many topologically distinct but statistically not significantly different ML trees you obtain on the original alignment.

Alexis


#4

[quote=“Alexis_RAxML, post:3, topic:353”] explore how many topologically distinct but statistically not significantly different ML trees you obtain on the original alignment.[/quote]

@Alexis_RAxML, is there a way to do explore those topologies in RAxML? As RAxML is searching the tree space, it would be good if there was a way to ask RAxML to save all the close-to-optimal topologies it encounters to a file. These can be tested later using options that are already available in RAxML.


#5

@smirarab, @Alexis_RAxML gave me some useful hints about this in the raxml mailing list: https://groups.google.com/forum/#!topic/raxml/Rd-WyAbVKxw


#6

This sounds like a problem that would also be well served by a Bayesian approach. In particular, the dataset is small enough to ru nthe neat new Bayesian approach to partitioning in BEAST2.

Regardless of the models used, a Bayesian approach would provide useful information on the posterior distribution of topologies.


#7

@rob_lanfear, thanks. I already used a bayesian approach (MrBayes) and posterior probabilities are indeed much higher than the bootstrap support for the same branches. I will definitively take a look at the new BEAST2!


#8

@jhcepas: beware comparing posterior probabilities and bootstrap support. In general the former are always higher, because the two are measuring different things. So, the fact that the posterior probabilities are higher probably doesn’t tell you whether the other methods are working well or not. A rough rule of thumb is that you want posterior probabilities to be >95% before making any strong interpretations of a node (others may disagree, and it’s always better to compare trees in some formal way rather than just interpret support values of any support on a single ‘best’ tree).

Rob