Monday, March 27, 2006

Quantum treemaps


One of the things that keeps bothering me is the lack of compelling ways to visualise information in phylogenetic databases. Trees themselves are, I feel, pretty awful objects to work with. They are large, and displaying them takes up a lot of screen real estate. Yet, in many ways, the more one sees of the tree the less one gains from the experience. For example, CAIDA's Walrus tool (right), used by Tim Hughes to display large trees looks fabulous, but is it useful? By which I mean, can we use it find out about stuff, or do we just spin it around and go "ohh, isn't it pretty?"

Treemaps are another tool that I've looked at, but never been terribly impressed. However, quantum treemaps, described in Ordered and Quantum Treemaps: Making Effective Use of 2D Space to Display Hierarchies, look potentially useful. To quote from the paper describing them:


The goal of the Quantum Treemap algorithm is similar to other treemap algorithms, but instead of generating rectangles of arbitrary aspect ratios, it generates rectangles with widths and heights that are integer multiples of a given elemental size. In this manner, it always generates rectangles in which a grid of elements of the same size can be layed out. Furthermore, all the grids of elements will align perfectly with rows and columns of elements running across the entire series of rectangles. It is this basic element size that cannot be made any smaller that led to the name of Quantum Treemaps



Quantum treemaps have made their way into Photomesa.

So, here's my thought. What if we used a quantum treemap to browse TreeBASE? Suppose we have a mapping between TreeBASE taxa and the NCBI taxonomy (or any other taxonomy, it doesn't really matter). If we then have some notion of what taxa each TreeBASE study is mainly about, then we could display a quantum treemap of studies rooted at any node in the NCBI taxonomy. For example, studies on mammals, grouped by order. The point here is not to see the tree, but to navigate through the studies using the tree.

Whereas treemaps usually display a nested hierarchy, my sense is that quantum treemaps are used to display the children of a node, rather than the whole tree. I think this is because the final size of a quantum treemap is unpredictable.

The mapping of TreeBASE names to NCBI tax_ids is not trivial, but I've got most of one done. Mapping studies to taxa needs a little thought. One approach is to take a tree from a study, relabel it with NCBI tax_ids, then find the least common ancestor in the NCBI taxonomy of the centroid of the tree. The idea is that this is in the core of the tree, and hence should capture what the tree is about. Finding the LCA of the root would be an obvious thing to do, but if one has a tree comprising mostly vertebrates, but rooted with a bacterium, then the root LCA is the root of life, which isn't a terribly accurate summary of the tree.

I've been playing with generating quantum treemaps, based on a C++ port of some Java code written by Ben Bederson. The next step is to try and bolt this together into a demo of how this might be used to navigate TreeBASE.

Links

Tuesday, March 21, 2006

OpenSearch

Not a huge fan of IE, but this post on David Patten's blog nicely illustrates the ease of use of A9's OpenSearch with IE 7.

I'd previously played with OpenSearch as a quick way to integrate biodiversity sources, and put together a couple that have been registered with A9 (search for "taxonomy" and you'll find them). It's essentially adding a few tags to RSS or Atom feeds, coupled with a simple way to describe the search engine.

Perhaps it's time to play with this a little more. It would be a very simple way to open up some data.

(Via A9 Developer Blog.)

Currently playing in iTunes: Wonderwall by Oasis

Firefox Extension for Turning Built-in SVG on and off

A quick Google found this Firefox extension for turning built-in SVG on and off, posted on or maybe something uplifting.

Really useful little extension, because Firefox SVG support is actually pretty awful very good. (Just discovered that FireFox couldn't handle my original SVG, but if I put in the namespaces as attributes of the svg tag, everything worked fine. Must remember to engage brain before typing...).

The Adobe SVG plugin is much nicer (this is still true). This extension enables the user to switch between Firefox and Adobe. Pity there is nothing like this for Camino, a very nice Gecko-based browser for Mac OS X that I've gotten used to. Because it uses the same rendering engine as Firefox, Camino makes the same hash nice job of SVG (once the namespaces are included).

Currently playing in iTunes: Fire and Rain by James Taylor

Monday, March 20, 2006

Fun and games with WebDot and Fedora Core 4


Well, that was fun. I've just installed AT&T's WebDot, a Tcl CGI program for generating images of graphs on the fly using Graphviz. Although there are RPMs for Fedora Core 4 available from AT&T's site, they didn't work. Here's what I did to get this working:


  • Installed Graphviz from the RPM.

  • Installed WebDot from the source tarball, not the RPM.

  • Failing to read the instructions (like, who does that?) I neglected to install graphviz-tcl-2.8-1.fc4.i386.rpm (Graphviz Tcl). I tried to do this but it complained about a lack of Tk

  • For reasons unbeknownst to me, I didn't have Tk installed (but did have Tcl), so off to the Fedora Core FTP server to grab tk-8.4.9-3.i386.rpm.

  • Now graphviz-tcl-2.8-1.fc4.i386.rpm installs happily, but still no joy.

  • Running Webdot from the command line

    cd /var/www/cgi-bin
    ./webdot

    produced some meaningful error messages at last. The script couldn't find a Tcl shell because the script was looking for Tcl 8.3, and Fedora Core 4 has 8.4. Hence, I edited the first line of the script accordingly. Then it complained about not finding libtcldot.so.0.0.0. Turned out the script thought this library was in /usr/lib/graphviz/, whereas the RPM had put it in /usr/lib/graphviz/tcl (sigh). So, the first two lines now look like this:

    #!/usr/bin/tclsh8.4
    set LIBTCLDOT /usr/lib/graphviz/tcl/libtcldot.so.0.0.0


  • OK, the webdot script now runs, but no images of graphs. One useful hint is to try and view the images WebDot generates, even if they look like missing images (i.e., there doesn't seem to be anything there). Sometimes useful error messages are displayed, in this case I saw a message about missing fonts.

  • Having gone through this particular version of hell a while ago when putting WebDot on a RedHat 8 box, I remembered we need some TrueType fonts. AT&T have these on the Graphviz web site, but as a tarball with no instructions what to do with them. I still had the RPM I used on the RedHat 8 box (you can grab it here), so installed this (which puts the fonts into /usr/X11R6/lib/X11/fonts/truetype/).

  • Now, it works (yay!).


Isn't software fun? By the way, the nice icon in this post comes from pixelglow's wonderful port of Graphviz to Mac OS X. After the fun with Fedora, it's back to my Mac...

Tuesday, March 14, 2006

Biodiversity Informatics Visualization


The Human-Computer Interaction Lab at the University of Maryland has produced numerous nice visualisation tools, and has a page devoted to biodiversity visualisation.


We are building information retrieval and analysis interfaces for the rapidly expanding domain of biodiversity and ecological databases. Biodiversity databases contain organism-related information such as distribution, taxonomy, natural history, and conservation data. They are as complex as molecular and medical biology resources, yet serve a broad audience as do general-use digital libraries. We began by developing an interactive tree visualization (TaxonTree) for Kingdom Animalia. We also developed a prototype allowing coupled interaction with two trees (DoubleTree). We are currently working on developing other methods of visualizing both hierarchical and non-hierarchical biodiversity information (TreePlus and EcoLens), leveraging prior research on digital libraries and on bioinformatics. This has involved exploring ontologies and biodiversity data management in collaboration with the Animal Diversity Web (ADW) and the SPIRE project, and tree-reasoning with Kevin Omland at UMBC.

Monday, March 13, 2006

Least Common Ancestor (LCA) queries

My suspicion is that most queries biologists would make concerning trees are fundamentally LCA queries (as opposed to pattern matching, for example). For instance, "find me trees where group x is monophyletic" is a LCA query. Hence, LCA algorithms of of special interest (especially if they can be incorporated in a database). One recent LCA algorithm due to Bender and Farach-Colton (PDF here) has an implementation available in SourceForge, courtesy of Muhammad Ahsan Yusuf and Zack Ramjan.

There is a recent paper on this work (and the problem of LCA's in directed acyclic graphs) by Bender et al. in Journal of Algorithms(doi:10.1016/j.jalgor.2005.08.001).

Wednesday, March 01, 2006

All My Eye

All My Eye is a new blog by staff at Ingenta involved in RDF and related projects. Worth keeping an eye on, especially as the Ingenta fields are metadata rich, and have been used by uBio's RSS project.

Genome browsers and chronograms



Continuing the theme of visualising phylogenies, one thing which strikes me is the parallel between genome browsers that display annotation "tracks" (such as the UCSC Genome Browser) and illustrations of "chronograms" with geological periods and accompanying data, such as sea levels, isotope levels, etc. In my haste I couldn't find an example with a sea-level track, but I know they exist. The chronogram at right comes from Steppan et al. (doi:10.1111/j.1095-8312.2003.00274.x). In both cases there is a natural co-ordinate system (genome location and time, respectively) going from left to right, and annotations that can be added using the same frame of reference.



Hence, wouldn't it be cool™ if we had a database of phylogenies that could be queried by time slices (see my earlier post on interval queries), and which would display a phylogeny together with user-selected annotation tracks (obtained, say, from external geological databases)?