Several burning questions



It is unclear if this forum is frequented as much as it used to be. Let’s see if this post goes off echoing into the abyss. Is anyone out there knowledgeable and / or working on the following topics?

  1. Accounting for co-evolution in phylogenetic models. It is my (naive) understanding that phylogenetic models are built on the core assumption of mutational independence. Recent work built on the assumption that some mutations are co-dependent (co-evolution) has resulted in significant progress in protein structure modeling from sequences alone ( Said differently, how is co-evolution (within a protein family, between protein families, between nucleotides / amino acids / codons / etc) being taken into consideration in phylogenetic models nowadays?

  2. VAE hyperbolic Poincaré embeddings via VAE for phylogeny. Surely, some of you in this crowd encountered this recent paper from DeepMind ( I attended NeurIPs recently and was informed that as enticing as Figure 1 is, it is apparently only applicable to phylogenetic tree construction if the ancestral nodes are part of the dataset as well… Anybody else thinking about this!?

  3. "Ground truths" for phylogenetic trees. As a newcomer to this field - please forgive me if this is already an obvious answer… - but it is unclear if there exist empirical benchmarks by which phylogenetic tree models can be compared against. I imagine there are sub-topic and / or niche “positive controls” for phylogenetic tree construction but unlike the protein world, which has a biannual protein structure prediction competition (CASP), a field-wide benchmark / metric / competition for phylogeny seems to be absent…

Dylan (email:


Hi Dylan. Interesting questions. I won’t claim to be an expert on these topics (relative to the other people in this forum), but I can give a few thoughts for the sake of discussion:

  1. I’m not aware of co-evolution being built into existing mainstream phylogenetic algorithms. It would be cool to be able to test that hypothesis with phylogenetic models. For bacteria, it might be tricky to distinguish co-evolution from horizontal gene transfer (assuming the sites are close together on the genome).
  2. Not familiar with it.
  3. Simulations are often used to test tree models, but I’m not aware of any empirical benchmark. I can think of two big differences between phylogenetic algorithms and protein folding algorithms.

a) Protein folding algorithms were pretty bad when these benchmarks were established. Phylogenetic algorithms are pretty good. I think an empirical dataset would be best if it illustrated a specific, important problems for which there currently is no good solution.

b) Protein folding algorithms are predictive, while phylogenetic algorithms are usually historical inferences. I can’t think of many scenarios where we’d have good ground truths, aside from bacterial long term evolution experiments (where we could sequenced several isolates from the population at several time points, then see how well an algorithm could recreate the properties of that population based on isolates from the final timepoint) – but I don’t know if this would actually be challenging.


I agree with @adamr on the points he addressed.

Regarding point 2, my impression was that the Poincare VAE does not reconstruct a phylogenetic tree. It is a VAE that does a good job of respecting hierarchical structure when it exists. Note that in figures like this one:


the blue is the true underlying tree.

If you had ancestral sequences then I guess you could join together sequences in the latent space, but I don’t see this being more accurate than just joining sequences by sequence similarity.


Hi Adam, thanks for the response.

  1. Co-evolution: Interesting! My understanding is that co-variation between sequences can confound the co-variation that we currently assume to be co-evolution. At this point, some colleagues and I believe we could test our ability to disentangle these two phenomena by seeing how well phylogeny-resolved co-evolution models improves our protein structure prediction capacity as compared against phylogeny-unresolved co-evolution models. TBD, maybe you’ll hear about it sometime soon :slight_smile:. On the flip side, co-evolution-resolved phylogeny models could also be something we could show… Though it is unclear what case studies (molecular clock proteins???) would be good to try this on. If you have an idea - or anyone else out there - please reach out! :smiley:

  2. Empirical benchmarking phylogeny models. While I am more comfortable in domains where there exist explicit pos/neg controls, I guess phylogeny is a branch of science where the truth is more elusive… Forgive me if this is extra pedantic, but you state that “phylogenetic algorithms are pretty good”. How do we know this? Whereas with protein structure determination, where our predictions can be evaluated against a physical structure, it is unclear how we can evaluate a phylogeny model improvement. I guess that was my overarching question. How do we know our phylogeny models are good models?


Hi, yes, that makes sense. I think what makes this work especially exciting to think about is that VAEs can also be thought of as generative models. To me, the ultimate generative model is evolution itself (quoting Dana Pe’er at NeurIPS 2019 here). Unlike some of the current generative model applications, e.g: celebrity faces, creating such a generative model with appropriate biological constraints is quite philosophically appealing. Showing however, that such a model models evolution well, would be challenging.

In general, these latent embeddings / manifold approaches (UMAP, for instance) come across as promising alternative approaches for constructing phylogenetic relationships. Again, I am somewhat of a newcomer to the field so forgive me if this all sounds like wild speculation…