This is an archived static version of the original phylobabble.org discussion site.

Visualizing conflicting trees with different sampling?

josephwb

Anyone know the best way to visualize conflicting tree toopologies with incomplete overlap in taxon sampling?

What we got: a complete tree (say, species tree), and a whack of gene trees which may or may not have complete taxon sampling. We want a figure with single set of taxon labels that all trees map to. Ignoring edge lengths, as things get messy very quickly. If a gene tree does not contain taxa in the basal split of the species tree, don’t want it’s root to start at the species tree root, but instead more tipward; otherwise, relationships get obscured.

DensiTree is something we have explored, but it doesn’t seem to work well with uneven sampling across trees. We’ve also been playing with R code graciously provided by @liamjrevell, and we may be able to get this to do what we want, but I thought I would check with with the phylo-timaliids to see if something already exists.

Thanks! JWB.

jhcepas

I recently added to the ETE toolkit the option to calculate the robinson-foulds distance between trees with different sampling. These improvements are still part of a pre-release version of ETE available here (usable, but not heavily tested and documented yet). You may want to take a look at the tree.robinson_foulds() function.

Also, as part of the pre-release version, several scripts are included that allow to compare trees:

  1. ete diff: command line tool that shows the differences between two topologies I am still working in possible improvement on how differences are detected. Currently it uses euclidean distance between the content of two clades. At some point, I would like to join the output to the ETE tree visualization features, so differences can be nicely displayed.
  2. ete dist: calculate RF distance between trees
  3. ete maptrees: calculate distances between a bunch of genetrees and a reference species tree. It allows missing species and decomposition of duplicated gene families.

As I said, all programs should be functional but they are still a work in progress…

brhollan

I know of two approaches to this. The Z-closure method

Huson, Daniel H., et al. “Phylogenetic super-networks from partial trees.” IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 1.4 (2004): 151-158.

which is implemented in SplitsTree4. It starts with partial splits and uses a closure rule to build up a set of complete splits which are then displayed as a splitsgraph.

The second is Quartet-Imputation

Holland, B., Conner, G., Huber, K., & Moulton, V. (2007). Imputing supertrees and supernetworks from quartets. Systematic biology, 56(1), 57-67.

It begins by filling in the taxa that are missing in each tree in such a way as to try and maximise the number of quartets that agree with other trees in the input set. Then you can do what you like with the resulting set of trees, usually a consensus tree or consensus network, but no doubt you could also feed them into Densitree.

Cheers, B

josephwb

Thanks @jhcepas. I think I poorly worded my request. I am not looking to perform analyses on the various trees, I just want to display them on a common plot. Bascially: DensiTree, but with incompletely-sampled trees.

But the tree metrics you mention sound very useful. Good to know these are available.

josephwb

Thanks @brhollan. Have thought about networks, but it just isn’t what we are looking for.

taxonbytes

Here’s an output of Euler/X for two conflicting/differently sampled trees of the lemur clade Cheirogaleoidae (1993 / 2005 analyses). However it uses all input names. http://taxonbytes.org/using-the-euler-x-toolkit-to-align-taxonomies-introductory-notes/