A coalition for scholarly annotation

On April 17th, the Annotating all Knowledge coalition gathered in Portland to begin the work of defining, designing, and implementing a common framework for scholarly collaboration. Members of the coalition include publishers, platform providers, librarians, standards makers, and technologists who share a common interest in annotation of all scholarly content for the benefit of scientists, humanists, students, teachers, and citizens.

Coalition members had already agreed on a set of key principles. Annotation must be based on open standards and open-source software. It must be possible to annotate key formats including HTML, PDF, EPUB, images, video, and data. Any standard annotation client must be able to work with any standard annotation server. And finally, the ecosystem should enable readers to choose from a variety of opt-in annotation layers.

The goals of the workshop were to review existing uses of annotation, discuss anticipated uses, consider opportunities and challenges from the perspective of both publishers and implementers, converge on a definition of interoperability, and identify next steps.

The survey of existing uses began with UCSD’s Anita Bandrowski who presented an overview of SciBot, a tool that’s being used today to validate Research Resource Identifiers in scientific papers. Sebastian Karcher, who works with the Qualitative Data Repository at Syracuse, discussed an annotation-enhanced workflow for sharing, reusing, and citing qualitative data. GigaScience’s Nicole Nigoy presented the results of the Giga-Curation Challenge at Biocuration 2016. Saman Ehsan, from the Center for Open Science, highlighted the role annotation can play when researchers work together to reproduce studies in psychology. Mendeley’s William Gunn described annotation of research data as not merely a supplement to scholarly work, but potentially a primary activity. John Inglis, executive director of Cold Spring Harbor Laboratory Press, envisioned an annotation layer for bioRxiv. And Europe PMC’s Jo McEntyre showed an experimental system that mines text for entities (e.g. biomolecules) and automatically creates annotation layers that explain and interconnect them.

The bulk of the day was devoted to a series of group discussions about the user stories that open annotation can enable, and about the nature of the interoperable ecosystem in which those stories will play out. The wide-ranging conversations reflected the diversity of the assembled group, which included journal publishers, repository maintainers, librarians, researchers interested in both qualitative and quantitative methods, builders of publishing platforms, and representatives of standards bodies. Here are some of the user stories that emerged:

  • As an author of a paper, I want to invite a small group of colleagues to provide feedback.
  • As a pre-print publisher, I want to create and manage an overlay journal.

  • As a journal publisher, I want to implement a peer review system that brings authors and reviewers together in role-defined groups.

  • As a university librarian, I want students’ dissertations to be annotated openly so they carry forward an expectation that things should work that way.

  • As a biomedical researcher, I want to use human curation of the literature to develop examples we can use to train our text mining algorithms.

  • As a journal publisher, I want to create safe spaces for commentary so that authors and expert readers can participate comfortably.

  • As a vision-impaired user, I want annotation tools to be accessible to me.

  • As an annotator, I want to be able to assign DOIs to individual contributions, or to sets of them.

  • As an author, I want annotation to integrate with my preferred writing tool so that annotations can flow back into the text.

  • As a scientist, I want to participate in a journal club that travels around the web, annotating papers in many different publishing systems.

  • As an author, I want to annotate my own published work, noting errata and supplying new context.

  • As an editor, I want to document key editorial decisions and expose them to the public.

  • As a scientist I want to annotate everywhere using my ORCID identity.

  • As a researcher, I want control over the annotations I make in various contexts so I can assemble this work into a coherent form and claim credit for it.

  • As a publisher, I want to commission expert annotators whose commentary adds value to my content.

  • As a user, I want to be able to upvote or downvote annotations so I can help improve the quality of discussion.

  • As a funder, I want to bring grant reviewers together in private annotation spaces where discussion is tightly coupled to the documents they are reviewing.

  • As publisher, I want to be able to review and moderate annotations in my own authoritative layer.

  • As an author, I want to be able to annotate and interconnect the references I cite.

  • As a publisher, I understand that readers may view my content through a variety of lenses, but I want to present my own annotation layer as the authoritative one.

  • As the developer of a journal publishing system, I want to deliver a bundled annotation service that integrates with my identity system.

  • As a publisher, I want to project a curated view of a paper’s peer-review process onto the published paper.

While these stories were being discussed, other groups focused on interoperability. The touchstone example was email. We take it for granted that many kinds of email clients can work with many kinds of email servers. Clearly annotation clients and servers should work the same way. The W3C’s Web Annotation Working Group has laid a good foundation that defines a basic data model for annotations, and an API for exchanging them. To flesh out basic interop, we’ll also need to define how users authenticate, discover annotation layers, and carry individual and group identities across systems.

In the scholarly world, of course, annotation will often require precision and semantic richness that we don’t expect in email. How will we relate annotations to domain-specific ontologies? We can’t expect to standardize semantics anytime soon. But we can design for progressive enhancement and graceful degradation. In a conversation between an enhanced client and an enhanced server, annotations about a gene sequence may convey meaning that’s unintelligible to basic clients and servers. That’s OK so long as all clients and servers can exchange basic annotations and none are confused by enhancements.

Brooks Hanson, Director of Publications for the American Geophysical Union, summed up the day with a list of goals for a scholarly annotation layer:

  • It must be built on an open but standard framework that enables global discovery and community enrichment.
  • It must support granular annotation of elements in all key formats, and across different representations of the same content (e.g. PDF vs HTML).
  • There must be a diversity of interoperable annotation systems.
  • These systems must be fully accessible to humans, who may need assistance to use them, and machines that will use APIs to create and mine annotations.
  • It must be possible to identify people, groups, and resources in global ways, so that sharing, discovery, and interconnection can span repositories and annotation services.

These are lofty goals. The energy and buzz in the room, all day long, reflected a shared sense that they are achievable, and that we’ve made demonstrable progress. What’s next? A number of subgroups talked about trials and prototypes. We’ll track those here as they emerge. Dr. Maryann Martone will be providing an update on the coalition’s progress to-date at the I Annotate conference. Interested parties can join the coalition working group to monitor progress. Meanwhile, we’ll keep doing our part to help bring an ecosystem of scholarly annotation to fruition.