      Not going to be the prettiest version, but at least somewhat OCR'd for annotating!



    1. Titi Lucreti Cari De Rerum Natura Libri SexWith a Translation and NotesVolume 1Edited by H. A. J. Munro Lucretius

      Testing out the OCR functionality of docdrop.org.

      I'm noticing that the pdf fingerprint of this text somehow matches that of other texts as there are a lot of non-related annotations on this page.

      Is docdrop doing something squirrelly with the fingerprint @dwhly?

    1. Apart from a basic segmenter taken from OCRopus a trainable line extractor is in the process of being implemented. Full trainability of layout analysis is of utmost importance to a truly universal OCR system, as text layout and its semantics varies widely across time and space, e.g. hand-crafted methods for printed Latin text are unlikely to work reliably on Arabic text or manuscripts with extensive interlinear annotation.

      wip implementation of line segmentation in kraken

    1. Adobe AcrobatPro.

      gImageReader is an excellent open source alternative. It runs both on Windows and Linux, and it provides a simple (yet powerful) frontend GUI to Google's robust open source OCR engine, Tesseract.

      I think an open source tool as this is a better fit to the open annotation ecosystem, based on libre software and standards, that Hypothesis promotes, instead of a proprietary (and expensive) tool such as Adobe AcrobatPro.

