Guidance for Web Publishers

By Ed Summers | 10 June, 2013

Hypothesis enables you annotate the Web. However, some things work better if we know just a little bit more about the web resource that you are annotating.

For example, imagine you are annotating the second page of a New York Times article. You probably want to see your annotation when you are looking at the article later as a single page, right? Or perhaps you’ve annotated the HTML for a PLOS ONE article. Wouldn’t you like to see those annotations when you are looking at the PDF version of the same article? If annotations were only associated with the URL you happened to be looking at in your browser then the scenarios above would not work, because the documents being annotated all have different URLs.

Fortunately, there are some simple, widely adopted techniques publishers can use to make their content work well in these annotation scenarios. If you are a Web publisher and would like your website to work well with Hypothesis, the guidance offered below may be helpful. These practices are endorsed by major search engines like Google and Bing, social media sites like Facebook and Pinterest, as well as scholarship tools like Mendeley and Zotero. So you won’t be just getting your site’s content to work well with Hypothesis and annotation–you will be making it fit better into the entire Web.

Canonical Links

To simplify the job of Web crawlers, and to give publishers more control over how their content shows up in search results, the major search engines support canonical links. Google introduced the canonical link, standardized it as RFC 6596, and now it is used widely on the Web to indicate the preferred version of a Web resource. This can be handy for Search Engine Optimization when the number of links to a page significantly changes its rank in search results, and where there is a preferred URL for a resource that you would like search results to link to.

As an example take a look at the HTML for page 2 of this ArsTechnica article. Do you see how it contains this <link> element?

<head>
...
<link rel="canonical" href="http://arstechnica.com/information-technology/2013/05/native-level-performance-on-the-web-a-brief-examination-of-asm-js/" />
...
</head>

This <link> element lets ArsTechnica establish a single URL for the multi-paged article, so when you click on the article in Google search results you will go to it. Also, the relevancy ranking (or Google Juice) for the article reflects the total number of inbound links to the article. Canonical links are good for SEO, but they’re also good for Hypothesis, since they allow us to cluster your annotations around a single URL, and let you see them appropriately based on whatever view of the resource you happen to be looking at in your browser. Please consider adding canonical links to your pages if it’s possible for there to be more than one URL for your resources.

Alternate Links

Alternate links are a defined part of HTML, and allow you to indicate when a Web resource is available in another representation, at another URL. Alternate links have been used for years by the blogging community for auto-discovery of syndicated feeds (RSS or Atom) associated with a Website. They are also useful in situations where you would like to let clients know that a PDF view of a Web page is available, as in the HTML for this PeerJ article which contains this link.

<link rel="alternate" type="application/pdf" href="http://peerj.com/articles/1.pdf">

Alternate links are hints to a Web client about closely related resources that might be of interest, and are generally a good thing. They are especially important to Hypothesis, since we allow you to annotate documents in multiple formats (e.g. HTML and PDF), and it is useful to show all annotations for a document independent of its serialization.

Structured Metadata

As a result of the rapid expansion of the Web, there are many ways to share structured metadata in your Web pages. Patterns for structured data often reflect the requirements of an application, the needs of a particular community, and sometimes the generalized nature of the Web at large. When you annotate a page, Hypothesis looks for this metadata to help identify the resource that you are annotating. Metadata is essential for understanding the types of resources that are being annotated, and for presenting these annotations later. Because of the important role that annotation plays in scholarship and research, Hypothesis’ initial support for metadata is based what Mendeley and Google Scholar provide. But we are always looking for new types of metadata to support. Hypothesis currently supports:

  • Highwire Press
  • Eprints
  • PRISM
  • Dublin Core
  • Facebook’s OpenGraph Protocol
  • Twitter Cards

As an example here is some Highwire Press metadata that is included in this article from PLOS ONE:

<head>
...
<meta name="citation_publisher" content="Public Library of Science"/>
<meta name="citation_title" content="Neural Substrate of Cold-Seeking Behavior in Endotoxin Shock"/>
<meta name="citation_journal_title" content="PLOS ONE" />
<meta name="citation_author" content="Maria C Almeida"/>
<meta name="citation_author" content="Alexandre A Steiner"/>
<meta name="citation_author" content="Luiz G S Branco"/>
<meta name="citation_author" content="Andrej A Romanovsky"/>
<meta name="citation_doi" content="10.1371/journal.pone.0000001"/>
<meta name="citation_date" content="2006/12/20"/>
<meta name="citation_pdf_url" content="http://dx.plos.org/10.1371/journal.pone.0000001.pdf" />
<meta name="citation_issn" content="1932-6203"/>
<meta name="citation_issue" content="1"/>
<meta name="citation_volume" content="1"/>
...
</head>

Publishing metadata about your content is a useful thing to do. We encourage you to publish what makes sense in your context, and to let us know if there are types of metadata that would like us to support.

Related Material

Share this article