This is an archived static version of the original discussion site.

Tree-based measure of species longevity?


Hi all,

does anyone know of a reasonable way to come up with estimates of species longevity within clades? By which I mean, how long a species / lineage sticks around before it either goes extinct or speciates?

The general idea is that one might have a (near?) complete, dated phylogeny with a bunch of genera in it and that the method I’m after would get me an estimate for average species longevity for each genus.

I guess you could say, crudely, that average species longevity is more or less proportional to average interior branch length (if we pretend that a species is an internode) but when extinction rates are high we’d overestimate, especially for the older internodes.

Isn’t there some clever R package for this?



p.s. apologies for the ill thought-out question. Equally ill thought-out answers and opinions are very welcome.


A lot of this has been done in virology, in order to measure aspects of the epidemiology over time, for viruses. How a “species” of virus is defined is very fuzzy, but likewise it is also a lot more fuzzy than most people realize even for mammals and other vertebrates. But anyway, the big problem with all of this is that the methods require good sampling over time, and the actual sampling is usually not good, in terms of such things as random sampling and “well mixed populations” from which to sample, and so on.

With most organisms there is also the problem that gene trees do not equate to species trees, and if you use many genes for the species tree recombination, introgression and other problems soon make you wonder what you are measuring. For one example, if we take a very solidly worked out phylogeny such as Human-Chimpanzee-Gorilla, and we include Neanderthal genomes, and one Denisova man genome, plus two species of Chimpanzee (Pan Troglodytes and Pan paniscus) how do we decide how many lineages of Chimpanzees have gone extinct the way Neanderthal and Denisova man did? Humans are interested in humans, so we worked very hard to get the Neanderthal and Denisova genomes, we did not work so hard to find extinct Chimpanzee or Gorilla lineages.


Hi Brian,

thanks for the response and for thinking along!

How a “species” of virus is defined is very fuzzy, but likewise it is also a lot more fuzzy than most people realize even for mammals and other vertebrates.

Absolutely. Thinking this through you pretty quickly end up having to consider how speciation might have happened (e.g. by budding off versus by a split where both descendants diverge).

Humans are interested in humans, so we worked very hard to get the Neanderthal and Denisova genomes, we did not work so hard to find extinct Chimpanzee or Gorilla lineages.

Indeed. And we’ve worked even less hard to find species that aren’t so much like us.

Returning to the crude idea of looking at internodes, the problem you’re pointing out is essentially that there are biases in taxon sampling, right? With higher sampling you’re going to split up the internodes more and therefore end up with lower estimates of “species longevity” (whatever that really is).



Yes, it is the bias in taxon sampling more than the density of overall sampling, that throws things off. I was thinking this morning about studies of the amphibians, which have been very extensively sampled with a high motivation for identifying new species. The explosion in the number of different species pretty much coincides with the flowering plants and insects if I recall right, and this makes sense if most salamanders and frogs eat insects. The amphibians predated the dinosaurs and I suspect many went extinct at the KT boundary event, not because of lack of “fitness” but just bad luck to have lived in environments most devastated at that time.

With mammals, speciation of squirrels and mice is more frequent than speciation of wolves and horses because the small animals don’t have as much motivation to travel hundreds of miles and swim rivers and all. Birds travel a lot but tend to get picky about mate choices with songs and fancy feathers and all, so different lineages of birds have different reasons for speciating (causes of speciation) than the mammals do, on average.

The DNA/protein sequence analysis methods used in recent years to study this type of thing are call “coalescent analyses”. A good paper on amphibians is “Global patterns of diversification in the history of modern amphibians” by Kim Roelants et al.

I get the impression that a lot of molecular biologists, and even more computer scientists who do bioinformatics, think that because they can “see” patterns of evolutionary history in the genes, the genes are directly involved in the “fitness” of the organisms. But usually the real story is about habitats, climates, catastrophes and other things and not the new loop in the 16S rRNA or whatever we see in the gene we have picked to look at.


So, I’m going to come at this from the paleo-end (because I’m a paleontologist), and I think the what-about-budding issue is particularly critical. With really great fossil records, we estimate species duration just by, well, measuring morpho-species duration in the fossil record. Like this recent piece, on my favorite group, which are far too extinct for us to have any species concept other than the morphological:

Or this piece, about durations of fossil mammals:

But I’m really weary about reconstructing species duration down-tree, at least with current methods. For what its worth, Wagner and Erwin (1995; tried estimating how morphological ‘speciation’ in the fossil record is due to budding, bifurcation, etc, and found mainly support for budding.

And this whole budding versus bifurcation, its a big headache, too. One issue I became involved in, is that if you have morphologically-static ancestors with descendant species via budding, you should expect your morphological cladograms are going to be full of polytomies:

Plus, if you have some morphologically unchanging taxon, how do you code that for phylogenetic analysis, like tip-dating with fossils? Do you treat every single find of some morpho-species that has ‘persisted’ since the Paleozoic (and there are some! plays horror music) as independent OTUs with the same morphological characters? Sounds like a nice way to break the Markov model badly.

For what its worth, I’ve written some simulation code for dealing with this mess, where species are treated as these persistent units with possible buddings, or branchings or ‘anagenesis’ between them (because it turns out the inferences we’d make about the fossil record might be pretty different if we think a different pattern of differentiation is involved). You can find it in function simFossilRecord, in R package paleotree. Perhaps simulations of such patterns might be useful to your situation?


Excellent responses, thanks so much! This will take me a while to digest. I had come across some of the fossil papers already and they are definitely something I also want to look at, if anything at least as a way to validate tree-based longevity estimates. Thanks!