iPython Notebook for Phylogenetics


#1

Firstly - sorry, this is not an announcement but a suggestion/call for a collaborative project. I have been using iPython Notebook for manipulating data and plotting and think it would be a great environment for phylogenetics. For it to work, it would need a coherent library with standardised objects for storing trees, etc., and some visualisation tools for trees, alignments etc. And some embedded tree building/alignment software.

For the former, the obvious choice (I think) would be to use Dendropy, by @jeetsukumaran and @mtholder but other options may be available. Then a good set of (possibly D3 based, JavaScript) visualisation tools could be built in at the Notebook end (the alternative would be to add plotting routines built on top of MatPlotLib). Extensions could be added by using standard Python package management.

Any thoughts on this? My primary motivation is to replace the various software packages I use for teaching and produce a coherent framework.


What topics for FuturePhy hackathons?
#2

Between an extensible visualization library and Dendropy, we might finally see more phylogenetics models in Python.

For visualization, Rick Ree was working on an iPython visualization library called ivy, and demonstrated some compelling use cases, but I haven’t seen any new developments lately.

You also might find Jake Vanderplas’ work and perspective on integrating d3 and matplotlib interesting: http://jakevdp.github.io/blog/2014/01/10/d3-plugins-truly-interactive/. (This link also contains links to other interactive Javascript visualization libraries.)


#3

I recently added iPython notebook support to the ETE toolkit, which is a python library for handling and plotting trees. Although it overlaps with Dendropy in many aspects, the focus of ETE is not on phylogenetic computation but the exploration of trees, their annotation and inline visualization. You could easily combine them.

There is a comprehensive tutorial and some figure examples, in case you want to take a look.


#4

I think this is a great idea.

For the parser/container/data model back-end, with full-disclosure of obvious biases, I think DendroPy would work well for the phylogenetics domain. We are currently working on the 4.x series release (both tree reading and character parsing performance are now orders of magnitude better than the 3.x series, and scaling is O(1) for the former as opposed to O(n)!!!), and are open to incorporating changes to support playing better with other tools in the stack. Let me know if there is anything specific that you need if you do decide to go this route. And abstraction layer between the dendropy data object model and other tools in the stack come time mind …

While I like and use matplotlib, I am resigned to the fact that installing this natively on Macs is going to be a nightmare for a some time to come, even for someone with reasonable good sys-admin skills and experience. The task is somewhat less painful if you use a package manager such as Homebrew. And by “somewhat less painful” I mean a nightmare you eventually wake up from, as opposed to a nightmare that never ends). And with every OSX release it seems to get worse. Yosemite’s pending release is already resulting in rumblings in various forums. Of course, one can always use pre-packaged Python distributions such as Enthought etc. But this still places a whole lot of unnecessary burden on the end user. All of this is a shame: I like matplotlib (In fact, I like the whole numpy + scipy + matplotlib stack. And the first two actually are now pretty straightforward to build and install natively; if only matplotlib was as well …)

So, what it boils down to is: I like, support, and encourage the use of alternate, non-Python, platforms for visualizations. Java/Ecma-Script seems like a truly excellent choice. It means that we can potentially use the browser as a front-end, not to mention the possibility for visualization over the web or natively on (some) mobile devices with minor enhancements. I know this was was not a declared objective, but I think it is a substantial side-benefit while achieving the primary goal of visualization with the minimum of hassle for the end user and developers.


#5

My attraction to this idea was the very fact that the iPython server can be on a central machine (would need authentication) and access can be through the browser (say, by a room of students). The rendering would be best done in JavaScript but it would be easy to pass a JSON tree as an output of Dendropy to be rendered on the client (there are plenty of JS tree renderers). This blog post shows how the two can communicate (bidirectional, if needed):

http://jakevdp.github.io/blog/2013/06/01/ipython-notebook-javascript-python-communication/

Would be good to have some sort of progress indicator/task manager for very long tasks like running trees in an external package.


#6

I use Python for phylogenetics, with numpy/scipy/networkx for array-based and graph-based algorithms.

Last I checked, the phylogenetics-specific libraries were insufficient for my purposes, but if they were improved then I would use them. For example if the beagle library or its pytbeaglehon wrapper were very easy to use with Python then I’d probably use it all the time. I don’t use visualization as much as I should, but if nice Python visualization libraries for phylogenetics were available then I’d probably use them too.


#7

ETE is already able to produce custom SVG images of trees and algs (this is in fact the method used for the inline IPython visualization). I have been recently playing with javascript + d3 + ETE, and it does not look very difficult to combine them. I uploaded an example here.


#8

Just wanted to chime in and suggest checking out PyCogent for fitting models to trees in Python http://pycogent.org/

The documentation is pretty inadequate, but responses on the forum have been pretty quick for me (NB i am in the same timezone as one of the main forum respondents, which most will not be). I should say though that the lack of documentation seems to be partly a symptom of the fact that the package can do a lot.


#9

This is very nice. Have you ever thought of making a webapp whereby users could upload a tree, and the app would return an obfuscated URL which displays that tree? I think this would be very popular, as installing a tree viewer is a hurdle for non-phylogenetic collaborators to explore trees. One might actually generate the url from a compressed tree format, which would then get decompressed and displayed. Thus there would be no storage associated with it. (Might get long though…)


#10

Hi Eric, I had a shot at doing something similar to this with IcyTree, which can also render “uploaded” trees in newick/nexus format and networks in extended newick format. (Quotes because IcyTree is entirely client-side javascript.) However I quickly ran into the ~2000 character limit on URLs, as my URLs need to completely encode the file. I suppose jhcepas’ nice server-side program wouldn’t necessarily suffer from this problem though.


#11

@jhcepas’ ete2 just blew my socks off:

With little Pandas experience and no ETE experience, that took about 25 minutes.


#12

thanks @ematsen, I hope I can make it more portable soon by getting rid of the Qt4 dependencies. I am also working on alternative ways of displaying large alignments and other type of meta data.

Regarding the website tool: sure, I will try to open a login based system so tree figures can be shared (obfuscated URLs would be limited by a fixed length)