11 Matching Annotations
  1. Feb 2021
  2. Jan 2021
    1. Apart from a basic segmenter taken from OCRopus a trainable line extractor is in the process of being implemented. Full trainability of layout analysis is of utmost importance to a truly universal OCR system, as text layout and its semantics varies widely across time and space, e.g. hand-crafted methods for printed Latin text are unlikely to work reliably on Arabic text or manuscripts with extensive interlinear annotation.

      wip implementation of line segmentation in kraken

  3. Oct 2020
  4. Jul 2020
  5. Apr 2020
    1. Adobe AcrobatPro.

      gImageReader is an excellent open source alternative. It runs both on Windows and Linux, and it provides a simple (yet powerful) frontend GUI to Google's robust open source OCR engine, Tesseract.

      I think an open source tool as this is a better fit to the open annotation ecosystem, based on libre software and standards, that Hypothesis promotes, instead of a proprietary (and expensive) tool such as Adobe AcrobatPro.

  6. Apr 2019
  7. Sep 2015
  8. Aug 2015