13 Matching Annotations
  1. Mar 2023
    1. Google Books .pdf document equivalence problem #7884

      I've noticed on a couple of .pdf documents from Google books that their fingerprints, lack thereof, or some other glitch in creating document equivalency all seem to clash creating orphans.

      Example, the downloadable .pdf of Geyer's Stationer 1904 found at https://www.google.com/books/edition/Geyer_s_Stationer/L507AQAAMAAJ?hl=en&gbpv=0 currently has 109 orphaned annotations caused by this issue.

      See also a specific annotation on this document: https://hypothes.is/a/vNmUHMB3Ee2VKgt4yhjofg

  2. Jan 2022
    1. But Google also uses optical character recognition to produce a second version, for its search engine to use, and this double process has some quirks. In a scriptorium lit by the sun, a scribe could mistakenly transcribe a “u” as an “n,” or vice versa. Curiously, the computer makes the same mistake. If you enter qualitas—an important term in medieval philosophy—into Google Book Search, you’ll find almost two thousand appearances. But if you enter “qnalitas” you’ll be rewarded with more than five hundred references that you wouldn’t necessarily have found.

      I wonder how much Captcha technology may have helped to remedy this in the intervening years?

  3. Jul 2021
    1. Finding these kinds of sites can be tough, especially if you’re looking for authentic 1990s sites and not retro callbacks, since Google seems to refuse to show you pages from over 10 years ago.

      I think I've read this bit about Google forgetting from Tim Bray(?) before. Would be useful to have additional back up for it.

      Not being able to rely on Google means that one's on personal repositories of data in their commonplace book becomes far more valuable in the search proposition. This means that Google search is more of a discovery mechanism rather than having the value of the sort of personalized search people may be looking for.

  4. May 2021
    1. So the proper unit for this kind of exploratory, semantic search is not the file, but rather something else, something I don't quite have a word for: a chunk or cluster of text, something close to those little quotes that I've assembled in DevonThink. If I have an eBook of Manual DeLanda's on my hard drive, and I search for "urban ecosystem" I don't want the software to tell me that an entire book is related to my query. I want the software to tell me that these five separate paragraphs from this book are relevant. Until the tools can break out those smaller units on their own, I'll still be assembling my research library by hand in DevonThink.

      Search on documents returning something in the neighborhood of 500 words or so seems to be the right amount of information. One wants a few paragraphs related to an idea and not an entire book which takes longer to scan.

      Google search does this type of search and it's also what Google Books attempts to do as well when searching specifically there.

    1. But in response there has been no serious attempt by digital media developers to engage in a constructive public dialogue with historians of information and leading librarians. There is, perhaps, a reason for this. As Geoffrey Nunberg starkly revealed in 2009 in the Chronicle of Higher Education, Google cannot celebrate the history of indexing and cataloguing because it would draw attention to its matrix of errors. As of yet, Google Books does not work as an accurate system of cataloguing and searching for books. Nunberg showed that the seemingly clunky nineteenth-century Library of Congress Classification system is still more accurate. So intellectual history can still offer practical models and lessons to the titans of the Web.

      The Information emperor has no clothes.

  5. Jul 2019
    1. Kahle has been critical of Google's book digitization, especially of Google's exclusivity in restricting other search engines' digital access to the books they archive. In a 2011 talk Kahle described Google's 'snippet' feature as a means of tip-toeing around copyright issues, and expressed his frustration with the lack of a decent loaning system for digital materials. He said the digital transition has moved from local control to central control, non-profit to for-profit, diverse to homogeneous, and from "ruled by law" to "ruled by contract". Kahle stated that even public-domain material published before 1923, and not bound by copyright law, is still bound by Google's contracts and requires permission to be distributed or copied. Kahle reasoned that this trend has emerged for a number of reasons: distribution of information favoring centralization, the economic cost of digitizing books, the issue of library staff without the technical knowledge to build these services, and the decision of the administrators to outsource information services
  6. Apr 2019
    1. technology companies have made it work that way. Ebook stores from Amazon, Apple, Google, Kobo, Barnes and Noble all follow broadly the same rules. You’re buying a licence to read, not a licence to own.

      Bear in mind that this "ownership" is common practice with Amazon, Apple, Google, Kobo, Barnes and Noble, and other ones as well.

      It's not this way with non-DRM books, that you can download, and reuse as with physical books.

  7. Aug 2017
    1. without downloading or reading them.

      This is cool but the "reading of them" is the more radical proposition.

    2. Many of the university’s holdings “were invisible to the world,” Coleman says. Google’s involvement promised to change that.

      An important point for those who might immediately dismiss anything Google-related.

    3. “It’s hard to imagine going through a day doing the work we academics do without touching something that wouldn’t be there without Google Book Search,”

      But this is a statement that would align with Somer's lament above, no?

    4. a persistent cultural challenge: how to balance copyright and fair use and keep everybody—authors, publishers, scholars, librarians—satisfied. That work still lies ahead.

      I'll be very interested to see how this gets negotiated moving forward.

  8. Jun 2017
    1. literature became data

      Doesn't this obfuscate the process? Literature became digital. Digital enables a wide range of futther activity to take place on top of literature, including, perhaps, it's datafication.