This is an archived static version of the original discussion site.

Visualising big trees (again)


Being a sucker for punishment I keep returning to the problem of visualising big trees. My latest quick and dirty demo uses Google Maps and generates image tiles using spatial queries on a MySQL database. Blog post here and live demo here

There are two algorithmic/design issues to solve for this to be really usable. The first is reducing the number of nodes drawn at each zoom level (ideally to something roughly constant per 256 x 256 pixel tile), the second is determining what labels should be displayed at what zoom level. I’d welcome thoughts on that, I recall some work has been done on the first problem as part of the TreeJuxtaposer project.


Thinking way outside the box here, but maybe this is (finally, after about 280 yrs) a good use case for ranks? Zoom level 17+ show species names, zoom 13-16 show genera, zoom 0 shows domains, etc. Ranked groups are the names most people will recognize, and they’re organized to be useful collections at varying scales. Maybe some posthoc adjusting: i.e., of all the families to show in this window at this zoom, show no more than 7, and make them be the ones with the greatest diversity.

For the drawing, why not cache the generated tiles? First time someone looks at a particular area at a particular zoom, it takes a while to render, but after that the cached image is used. Doesn’t take much more time on the back end and makes the app faster as it gets used. You could also have a script that pre-caches tiles in the background, depending on your overall server load.


Tangential, but…

“…maybe this is (finally, after about 280 yrs) a good use case for ranks…”

‘Hssssss,’ said the paleontologist.

More on topic, yeah, I think zooming named clades is probably the way to go here. After isn’t the common solution we’ve been using since we started making large trees a few decades ago just to collapse diverse groups when they clutter up the figure we want to present?

Just a thought, but perhaps even named paraphyletic groups might be useful in some cases (…oh, yes, I went there).


I guess the problem with ranks is the assumption that they will match a particular zoom level, and that there will be names with ranks in all parts of the tree. What I’d like to do is compute from the tree which bits should be labelled, then apply appropriate labels. But it would be good to label key taxa as well (e.g., model organisms).

Re caching, the tiles are cached in the browser, but yes, server-side caching would help.


One reason for the Google maps route is that the infrastructure for zooming a fixed drawing space already exists. Approaches that distort the space, or change the tree structure itself are going to require more work. All possible approaches, of course.