Saturday, June 11, 2011

Mendeley Hack4Knowledge: towards an "ego wall"

I'm taking a virtual part in Mendeley's Hack4Knowledge event. I'm using this a chance to explore some ideas about building novel interfaces to bibliographic data in Mendeley. One idea is to display a user's entire library in one screen. I think the user interfaces employed by most bibliographic software are too conservative and there some cool things that could be done. For example, see A fluid treemap interface for personal digital libraries (doi:10.1145/1065385.1065512, PDF available from CiteSeer).

One idea I'm playing with is to display all a Mendeley user's papers as a quantum treemap, with thumbnails of the papers and "badges" indicating, for example, how many readers each paper has. The idea is that at a glance you can see all your publications, and which ones are being read the most. You can think of it as an "ego wall" — a quick way to see what others think about your work. Below is part of my library. You can see the full treemap here as an SVG file. Imagine this as an iPad interface to a user's Mendeley library.

Wall

Eventually I'll make this live. I'm doing this yet as the script to create the visualisation is slow due to the multiple requests I need to make to get the necessary information. I have to get the list of a user's papers from Mendeley, then I call the API for each paper to get basic bibliographic details. I have to screen scrape the corresponding paper's web page to get the thumbnail and the paper's UUID, which I can then use to get the readership stats via Mendeley's API via yet another API call. Sigh.

Anyway, this is enough hacking for one day. Hope to spend some more time on this project tomorrow.



Wednesday, June 08, 2011

Adding Solr to BioStor: searching for real

Solr

Prompted by the appearance on the BHL blog of an article about BioStor I've thinking about how to improve what is basically a fairly clunky tool.

One major weakness is searching the collection of nearly 40,000 articles extracted from BHL. Note the word "extracted." BioStor isn't a tool like PubMed or Google Scholar where the goal is to find articles on a topic. Instead it addresses a more specific question, namely whether a given article is contained in an item scanned by BHL. Confusion about this was one reason publication of my paper on BioStor (doi:10.1186/1471-2105-12-187) took so long to pass through the review stage.

However, users (myself included) expect to be able to search for articles. So, it's time to explore ways to make it easier to find articles within the BioStor database. I've junked the previous pretty crappy code I wrote and have started to play with the Solr search engine. I'd experimented with Solr a while ago, but other stuff got in the way. Today I've managed to add it to BioStor and do a preliminary indexing of the articles in BioStor. So far I'm only indexing basic bibliographic metadata, and displaying the first 30 hits, but already it's making it much easier to find interesting stuff in BioStor.

Solr also supports faceted searching (i.e., clustering results by categories such as year, author, journal). I don't so much with this yet, but there's clearly a lot of scope. I could also add taxonomic names, and even the OCR text to Solr, greatly expanding the ability to find articles. But that's for the future. For now, here are some interesting searches:




I wrote that: asserting authorship using the Mendeley API

Inspired by the forthcoming Hack4Knowledge I've put together a service that enables you to assert that you are the author of a paper using the Mendeley API.

If you are impatient, give it a try at:

http://iphylo.org/~rpage/hack4knowledge/iwrotethat/

To use it you need a Mendeley account. When you go to I wrote that you will be asked to connect to your Mendeley account. Once you've done that, enter the DOI or PubMed ID of a paper and, if the paper is in your Mendeley library and flagged as a paper you've authored, you should see something like this:

Wrote

The site can be a little sluggish as it needs to go through all of your publications one by one until it finds a match.

Why?
Imagine you have a web database that includes publications, and you want people to join your site as users. If they have publications in your database, you'd like your users to be able to say "I'm the author of those papers" or, more generally, the author you have as "Roderic D. M. Page" is me.

One way to do this would be to enable the users to sign in to your site using Mendeley (see my blog post Mendeley connect). Once they've done that, the user could select a publication and say "that's mine". How do we test this assertion? Well, if the user is indeed the author it is likely that they will have added it to their "My Publications" section in their Mendeley library. So, we can use the Mendeley API to get a list of the author's publications and see whether the publication they claim is, in fact, one of theirs.

The inspiration for this came from tools like Google Analytics, where in order to add the tool to your web site you need to convince Google that you own the site. One way to do this is to add some text supplied by Google to the HTML on for site, on the assumption that only you can do this (because it's your site). In the same way, only you can add papers to your Mendeley library. Of course, I'm assuming that Mendeley users are being trustworthy when they and papers to "My Publications" (i.e., they're not claiming authorship on papers they didn't write).

How?
This hack uses Mendeley's OAuth support (the same technology used by Twitter and Facebook to connect to other sites) to enable you to connect your Mendeley account to the "I wrote that" application (note that my app never sees your account name or password). I use the Mendeley API user authored method to get a list of your publications, and user library document details to retrieve details of each publication. I then compare the DOI or PMID you supplied with each publication, until I find one that matches. If none matches, then I've no evidence you authored that paper.

Moan
No post about the Mendeley API would be complete without a moan about the state of the API. Apart from the fact that there is no function to directly find a publication in your library by DOI or PMID (hence I have to look at them all), there is virtually no support for retrieving any details about the user. For example, I wanted to brighten the web page up a little by adding a picture of the Mendeley user once they've logged in. There is no API function for this, nor a function to retrieve an identifier or URL for the user. Hence, in order to get a picture I screen scrape (yes, screen scrape) the Mendeley web page for the reference to get the URL for the linked author of the paper, then scrape the author's profile page and extract the URL for the image. This is insane. Please, please can we have a better API?

Thursday, June 02, 2011

Would you give me a grant? An experiment in Open Science

I would like to know what you think of a grant proposal I plan to submit to the UK Natural Environment Research Council at the end of the month. The proposal takes the notion of "dark taxa" explored in an earlier blog post and outlines three things I'd like to do:
  1. Quantify the extent of dark taxa (taxa in GenBank that don't have scientific names)
  2. Determine how many dark taxa are genuinely new species (as opposed to taxa that are known to science but simply haven't been labelled with their proper names)
  3. Explore what we can learn about a taxon's biology even if it lacks a scientific name (e.g., the "symbiome")

Given that I discuss most of my ideas on this blog, and deposit preprints in Nature Precedings before the corresponding manuscript is published, it seems a logical extension to make grant proposals open as well. So you view the proposal on Google Docs, and you can add comments, if you wish.



Any feedback or suggestions are welcome. Do you think this is fundable? Have I made a good case for the proposed research? Is it interesting, or is it obvious, or has it already been done? Let me know what you think.