Friday, September 28, 2012

Reading the Biodiversity Heritage Library using Readmill

Readmill reasonably smalltl;dr Readmill might be a great platform for shared annotation and correction of Biodiversity Heritage Library content.

Thinking about accessing the taxonomic literature I started revisiting previous ideas. One is DeepDyve (see DeepDyve - renting scientific articles). Imagine not having to pay large sums for an article, but being able to rent it. Yes, open access would be great, but ultimately it's all a question of money (who pays and when), the challenge is to find the mix of models that encourage people to digitise the relevant literature. Instead of publishers insisting we pay $US30 for an article, how about renting it for the short time we actually need to read it?

Another model is unglue.it, a Kickstarter-like company that seeks to raise funds to digitise and make freely available e-Books. unglue.it has campaigns where people pledge donations, and if sufficient pledges are made the book's rights-holder has the book digitised and released DRM-free.

Looking at unglue.it I stumbled across Readmill, "a curious community of readers, highlighting and sharing the books they love." Readmill has an iPad app where you can highlight passages of text and add your own annotation. These annotations can be shared, and multiple people can read and comment on the same book. Imagine doing this on BHL content. You could highlight parts of the text where the OCR has failed, and provide a correction. You could highlight taxonomic names that automatic parsers have missed, geographic localities, cited literature, etc. All within a nice, social app.

Even better, Readmill has an API. You can retrieve highlights and comments on those highlights. So, if someone flags a sentence as mangled OCR and provides a correction, that correction could be harvested and feed back to, say, BHL. These corrections could be used to improve searches, as well as the text delivered when generating searchable PDFs, etc.

You can even add highlights via the API, so we could upload a ePub book then add all the taxonomic names found by uBio or NetiNeti, enabling users to see which bits of text are probably names, correcting any mistakes along the way. Instead of giving readers a blank canvas they could already have annotations to start with.

Building an app from scratch to read and annotate BHL content would be a major undertaking. From my cursory initial look I wonder if Readmill might just provide the platform we need to clean up and annotate key parts of the BHL corpus?