This is an archived static version of the original discussion site.

DNA or protein sequence identity vs true phylogenetic distance


One quick and easy tool for visualizing the DNA distances in a data set, is the DAMBE function under graphics to plot the transitions and transversions vs F84 phylogenetic distance for each pairwise comparison in the data set. I have attached a plot here, showing the plots for Primate T-Cell Leukemia Viruses and Primate Lentiviruses. We know that HIV-1 M group, with F84 distances less than 0.15 here, represent roughly 100 years of evolution. But comparing HIV-1 to SIV from African Green Monkey (distances > 0.5) the age estimate to the common ancestor is in the millions of years range.

Using DNA or protein distances, we don’t have any methods as far as I know, that would extrapolate from 100 years to get 15% (phylogenetically corrected) distance to more than 100,000 years for 50% (phylogenetically corrected) distance, let alone millions of years. Silent sites become more than saturated with mutations while at the same time many other sites remain absolutely invariant over time. Thus, calibrating the “molecular clock” has to be done in a time/distance range that is applicable to the data.

The same plots can be done for mammals, vertebrates, etc. The DNA distances between the most distant mammals is more than saturated with mutation in mitochondrial DNA, but not nuclear genes. DNA distances in nuclear genes become saturated when comparing vertebrates (fish, amphibians, reptiles, birds, mammals etc) and at those distances the mitochondrial genomes are easy to align but quite misleading for the “molecular clock” methods.