Andreas Wagner has a new paper on analyzing influenza sequence data using a super simple Hamming-distance network-based approach.
Proceedings. Biological sciences, Jul 2014 07
Networks of evolving genotypes can be constructed from the worldwide time-resolved genotyping of pathogens like influenza viruses. Such genotype networks are graphs where neighbouring vertices (viral strains) differ in a single nucleotide or amino acid. A rich trove of network analysis methods can help understand the evolutionary dynamics reflected in the structure of these networks. Here, I analyse a genotype network comprising hundreds of influenza A (H3N2) haemagglutinin genes. The network is rife with cycles that reflect non-random parallel or convergent (homoplastic) evolution. These cycles also show patterns of sequence change characteristic for strong and local evolutionary constraints, positive selection and mutation-limited evolution. Such cycles would not be visible on a phylogenetic tree, illustrating that genotype network analysis can complement phylogenetic analyses. The network also shows a distinct modular or community structure that reflects temporal more than spatial proximity of viral strains, where lowly connected bridge strains connect different modules. These and other organizational patterns illustrate that genotype networks can help us study evolution in action at an unprecedented level of resolution.
He ends up with plots like:
Fundamentally non-phylogenetic, this approach doesn’t try to reconstruct evolutionary history, but instead shows a simple overview of genetic relationships. Andreas suggests that these graphs make it easy to detect convergent evolution that would not be apparent in the strictly branching tree.
I don’t have a good intuition for how these sorts of graphs translate to trees and vice versa. Does this seem like it’s a useful addition to constructing a tree or more of a distraction?