2 Matching Annotations
  1. May 2022
    1. The new lines you mention really are present in the text content of the element. HTML tags are not being replaced by new lines, they just get omitted entirely. If you look at the textContent property of the <p> element you selected in the browser console, and you'll see the same new lines. Also if you select the text and run window.getSelection().getRangeAt(0).toString() in the browser console you'll see the same new lines. In summary, this is working as it is currently expected to. What I think may have been surprising here is that the captured text is not the same as what would be copied to the clipboard. When copying to the clipboard, new lines in the source get replaced with spaces, and <br> tags get converted to new lines. Browser specifications distinguish the original text content of HTML "in the source" as returned by element.textContent from the text content "as rendered" returned by element.innerText. Hypothesis has always captured quotes from and searched for quotes in the "source" text content rather than the "rendered" text. This behavior causes issues with line breaks as well. It might make sense for us to look at capturing the rendered text (as copied to the clipboard) rather than the source text in future. We'd need to be careful to handle all the places where this distinction comes up, and also make sure that all existing annotations anchor properly. Also we should talk to other parties interested in the Web Annotations specifications to discuss how this impacts interoperability.
      What I think may have been surprising here is that the captured text is not the same as what would be copied to the clipboard. When <mark>copying to the clipboard, <mark style="background-color: #8000314f">new lines in the source</mark> get <mark style="background-color:#00800030">replaced with spaces</mark>, and <br> tags get converted to new lines</mark>. </br> <mark>Browser specifications distinguish <mark style="background-color: #00800036">the original text content of HTML "in the source"</mark> as returned by <mark style="background-color: #00800036"/>element.textContent</mark> from <mark style="background-color: #ffa500a1">the text content "as rendered" returned by element.innerText.</mark></mark> Hypothesis has always captured quotes from and searched for quotes in the "source" text content rather than the "rendered" text.
    1. Differences from innerText Don't get confused by the differences between Node.textContent and HTMLElement.innerText. Although the names seem similar, there are important differences: textContent gets the content of all elements, including <script> and <style> elements. In contrast, innerText only shows "human-readable" elements. textContent returns every element in the node. In contrast, innerText is aware of styling and won't return the text of "hidden" elements. Moreover, since innerText takes CSS styles into account, reading the value of innerText triggers a reflow to ensure up-to-date computed styles. (Reflows can be computationally expensive, and thus should be avoided when possible.) Both textContent and innerText remove child nodes when altered, but altering innerText in Internet Explorer (version 11 and below) also permanently destroys all descendant text nodes. It is impossible to insert the nodes again into any other element or into the same element after doing so.