Genesis SW for post-processing phylogenetic placements


#1

Dear All,

For those of you using the Evolutionary Placement Algorithm (EPA) or @ematsen 's pplacer tool there is now a toolkit available at http://genesis-lib.org/ to

  • Read, manipulate and write .jplace files
  • Extract, filter and merge placements
  • Calculate distance measures (e.g., Earth Movers Distance)
  • Visualize read abundances on the branches of the tree

Cheers,

Alexis


#2

Could you describe the differences from our guppy, which does all of those things?

http://matsen.github.io/pplacer/generated_rst/guppy.html#list-of-subcommands


#3

The main difference is that genesis is a library, for both C++ and Python. This makes it flexible and offers different use cases:

  • Interactively inspecting and manipulating Pqueries (and other data types).
  • Writing custom scripts or programs for specific purposes, e.g., count some quantity, convert a file, output some information, etc.
  • Adding or modifying functionality, e.g., new visualizations, new distance measures, etc.

In addition, the currently available functions are different ones than the ones guppy offers. I did not want to create a copy/replacement for guppy but instead offer new complementary methods. For example:

  • Simulation of Pqueries, i.e., artificially create them according to some distribution and with some specific properties. I’m currently working on that part again, so there will be more of that in the near future.
  • The earth mover’s distance is different from guppy’s Kantorovich-Rubinstein distance: Instead of moving around masses that sit on the edges of the tree, the masses of single placements (given by their like_weight_ratio) are moved around. It is thus more finely grained.
  • The visualization of placements and trees is customizable. For example, the placement density can be shown using colour coded tree edges, but could easily be extended to make the edges thicker instead.
  • It is generally meant for experimenting and playing around with data. The tree data structure for example is customizable so that arbitrary data can be stored on the nodes and edges of the tree.

There are also some helpful out-of-the-box tools to be used. Currently, there are two demo programs (http://doc.genesis-lib.org/demos.html) for visualizing placements and for extracting them from clades of the tree. I will add more in the future.

As a final remark, I am open to feature requests. Maybe it’s just a three-liner, and it is always good to get a feeling about what kind of functionality is needed by the community.


#4

That’s cool, @lczech! Thanks for the update. I should also say that in general it’s a good thing for there to be at least two software packages doing the same thing.

Regarding

I’m not sure if I quite understand. Guppy’s KR can move about mass from individual placements, split by their likelihood weight ratio or posterior probability.


#5

Oh, that’s right. Somehow, I had in mind that the KR distance works on edges (maybe because of the figures showing thickened edges). Anyway, I now read the paper again (http://arxiv.org/abs/1005.1699), and it seems my procedure is quite close to yours. Sorry for the confusion.