Really, You Can Annotate Anything

Well, any text, that is. Hypothesis is working with various partners on image and video annotation, but this blog is about the range of texts that you can annotate using the app. For a long time, I’ve limited my pitch to teachers, telling them that they could collaboratively annotate readings with their students if the texts were online in some format. And really that format had to be HTML or PDF. And really, if a PDF, that PDF had to be OCRed–meaning not just an image of the text, but an image in which the text has been “optically recognized.”

OCRed PDF
OCRed PDF

If you deal with PDFs online, you’ve probably noticed that some are different from others. Some are really just images. You cannot select text. You cannot search text. And, of course, you cannot annotate text. In other cases, though, there’s a layer of text on top of the image. These PDFs have been OCRed and then the text has been recombined with the image. It’s a pretty cool technology that has come a long way in the last few years.

I remember making course packets when I taught my own courses at UT-Austin while in grad school. I spent a long time curating these tomes–making sure my selection of readings was exhaustive, diverse, and a little quirky. I’d make Xeroxes of articles and then bring them to the copy shop and the copy shop would make bound paper course packets available to students enrolled in my courses. They’d also give you a CD with a PDF copy of the all the texts together.

Course packet binding options from Jenn's Copy on Guadalupe
Course packet binding options from Jenn’s Copy on Guadalupe

Using  OCR technology, I could now create a version of this packet with the text recognizable digitally. Students would be able to search for key terms. They could cut and paste passages for their essays. More to the point, though, using Hypothesis I could go in and add my notes to the texts to guide and provoke my students as they did their reading–like editing my very own textbook or anthology! They could add their own private notes as they read. And they could have conversations in the margins with each other as well.

Text is not selectable on a regular PDF
Text is not selectable on a regular PDF

 

The OCRed PDF
An OCRed PDF in which text is selectable

Before the semester began I was talking with Lee Skallerup Bessette about using Hypothesis in her fall courses–if you read Prof Hacker, you may remember Lee asking other teacher-scholars last spring about their uses of collaborative annotation in the classroom. Lee wanted to have students collaboratively annotate selections from a contemporary novel she was teaching. I advised her it would probably be too difficult to do so given that she’d have to copy, OCR, and host the text. But then I actually tried to do it myself.

The key revelation for me was how easy it is to OCR a text. There are some free or partially free web-hosted converters that I have yet to try. I used Adobe Acrobat Pro. My test case was a Xerox of Joan Didion’s essay “Goodbye to All That” needed by a professor at NYU. Here’s the original PDF. Here’s the OCRed version. The conversion took less than a minute, tinkering with the workflow included. Open the latter, activate the Hypothesis extension, and you can annotate immediately. Do it! (If you haven’t installed the extension yet, you can use this “via” proxy link to play along.)

Annotating the OCRed PDF
Annotating the OCRed PDF using Hypothesis

The ease of OCRing PDFs opens up a lot of possibilities for teachers using Hypothesis. Your texts don’t need to be digitally native–news articles and the like already published on the web, for example. If you can figure out where to host the content (more on this in a future blog), you can Xerox any text and make it available for students to annotate using Hypothesis. This means that anything in print–anything in your local library, any text not yet digitized, all the contemporary literature not yet in the public realm–is potentially annotatable online: copy selections (see copyright note below), OCR the PDFs, and upload them to the web. (Some copiers and scanners in libraries have OCR built in, so steps two and three may be combined.)

This scanner at the UT-Austin Library allows you to select "searchable PDF" and email copies to yourself
This scanner allows you to select “searchable PDF” and email copies to yourself

A note on copyright: I’m not saying to copy copyrighted books and publish them on the web for all to access. It’s easy enough to make these OCRed PDFs available only to your students through a password-protected website–just make them login to your WordPress page, for example, and host the content there. I believe–and I’m not a copyright lawyer, in fact I avoid thinking about copyright like I avoid thinking about taxes–that sharing texts in this way for educational purposes (or parts of them at least) falls under “fair use” guidelines (see 2.2 here). And if you are already copying and distributing texts through your university library or copy shop, then the permissions have been granted.