20 Matching Annotations
  1. Last 7 days
    1. We're going back to the basics today for the non-technical people to explain “what is an “index” and why they are important to making your search engine work cost effectively at scale. Imagine you walked into a library back in the day before computers and asked the librarian to find you every book that mentioned the word "gazebo". You would probably get some pretty weird looks because it would be horribly inefficient for the librarian to go through every single book in the library to satisfy your obscure query. It would likely take months or even years to do a single query. Now imagine you asked them for every book in the library by “Hunter S Thompson”. That would be a piece of cake, but why? That’s because the library maintains an index of all the books that come in by title, author & etc. Each index is just a list of possible values that people would be searching for. In our example, the author index is an alphabetical list of author names and the specific book name/locations where you can find the whole book so you can get all the other information contained in the book. The index is built before any search is ever made. When a new book comes into the library the librarian breaks out those old index cards and adds it to the related indexes before the book ever hits the shelves. We do this same technique when working with data at scale. Let’s circle back to that first query for the word "gazebo". Why wouldn’t the library maintain an index for literally every word ever? Imagine a library filled with more index cards than books? It would be virtually unusable. Common words like the word “the” would likely contain the names of every book in the library rendering that index completely useless. I have seen databases where the indexes are twice the size of the data actually being indexed and it quickly has diminishing returns. It is a delicate balance for people like me to engineer these giant scalable search engines to walk to get the performance we need without flooding our virtual library (the database) with unneeded indexes.

      via u/schematical at https://reddit.com/user/schematical/comments/1oe41bx/what_is_a_database_index_as_explained_to_a_1930s/

      Perhaps it's a question of the "long search" versus the "short search"? Long searches with proper connecting tissue are more often the thing that produces innovation out of serendipity and this is the thing of greatest value versus "What time does the Superbowl start?". How do you build a database index to improve the "long search"?

      See, for example Keith Thomas' problem: https://hyp.is/DFLyZljJEe2dD-t046xWvQ/www.lrb.co.uk/the-paper/v32/n11/keith-thomas/diary

  2. Jul 2025
  3. Aug 2024
  4. Jun 2024
  5. Apr 2024
    1. t our index would refer us to electricitywherever mentioned in the text of our literature if usableinformation is given, and it should also tell us something more—the aspect under which it is treated in each case. Whetherand from what aspect information is usable, that we must decidefor ourselves.

      Kaiser speaks here of the issue of missing index entries in commercial and even library-based indexes versus the personal indexing of one's own card index/zettelkasten.

      Some of the problem comes down to a question of scale as well as semantics, but there's also something tied up with the levels of specificity from broad category headwords to more specific, and finally down to the level of individual ideas. Some of this can be seen in the levels of specificity within the Syntopicon though there aren't any (?, doublecheck) of links from one idea directly to another.

      Note that while there may be direct links from a single idea to another, there is still infinite space by which one can interpose additional ideas between them.

    2. That is not the case.It is true, a variety of published indexes, catalogues and biblio-graphies to periodical and other literature exists, but they donot and cannot meet our individual case, for1 Every individual moves in a sphere of his own and coversindividual ground such as a printed index cannot touch.2 Printed indexes although they give usable information,cannot go sufficiently into details, they must studyabove all the common requirements of a number ofsubscribers sufficiently large to assure their existenceand continuance (apart from the question of adver-tising).

      Kaiser's argument for why building a personal index of notes is more valuable than relying on the indexes of others.

      Note that this is answer still stands firmly even after the advent of both the Mundaneum, Google, and other digital search methods (not to mention his statement about ignoring advertising, which obviously had irksome aspects even in 1911.) Our needs and desires are idiosyncratic, so our personal indexes are going to be imminently more valuable to us over time because of these idiosyncrasies. Sure, you could just Google it, but Google answers stand alone and don't build you toward insight without the added work of creating your own index.

      Some of this is bound up in the idea that your own personal notes are far more valuable than the notes someone else may have taken and passed along to you.

  6. Mar 2024
    1. Kegisters refer to the materials and help tolocate them, indexes refer to the information contained in thesematerials. Ag their function, so their construction is quite distinct.Both however treat the same materials, only in different ways.In some offices no indexes may be required.

      does this fit in with his prior definitions of these things?

    2. Accuracy This is one of the chief claims of the card system. 63To increase accuracy in fUing, the materials arealways arranged numerically. We thereby approach as nearlyas possible to mathematical exactness. The advantages of thecard system become more and more apparejit as the files increasein bulk, and accuracy must remain a constant factor in aU workconnected with it. It will also bring its reward in the smoothworking of the files and the immediate accessibility of anythingrequired. In accuracy might be included consistency, which isindispensable for effective work (356).

      In modern, digital settings, the work of approapriately indexing content is lost in exchange for other forms of organization (tagging, for example), this means one is less reliant on an index for looking up material and more reliant on concordance search of particular words within an ever-growing corpus of collected knowledge.

      Over time and with scale, simple tagging may become overwhelming as a search method for finding the requisite material, even when one knows it exists.

      As a result a repository may do better in the long run with a small handful of carefully applied rules from the start.

    3. The text in this book is numbered by paragraphs and where asubject is treated in more than one place, the numbers in bracketsindicate the additional paragraphs bearing on the subject underdiscussion.

      ¶5

      The book is ostensibly in the form of a card index with numbers laid out in running order to create a book. The index is also done keyed to these paragraph numbers rather than by page as has traditionally been done.

      As a result, one could cut up the book (or two copies to get both sides) and turn it back into a card index with very little work.

    Tags

    Annotators

  7. Jun 2023
    1. Dr Lisa van Aardenne, the chief scientist of the University of Cape Town’s climate system analysis group, discussed the use and utility of thermal stress indices. She pointed out that, by the definitions of the universal thermal climate index, much of Africa is under heat stress most days of the year.  Van Aardenne noted that these indices have been developed from a European perspective and do not align with the reality on the ground in Africa. She added: “I’m very concerned that these indices are not fit for purpose here.”

      So for Africa, the figures are so bad that they always look like they're in an emergency? I'm guessing the impact would be that people are more likely to ignore them

  8. Jan 2023
    1. CREATE INDEX Statement

      ABOUT INDEXES *indexes can make queries faster, much like looking for a term in the index of a textbook instead of skimming through all the pages in the book to find all the references to the term. However, indexes take storage space and the DBMS must maintain the index as rows are inserted, updated and deleted in the table.

  9. Jun 2020
  10. Jun 2017
  11. Dec 2016
    1. The real benefit of JSONB: IndexesWe want our application to be fast. Without indexes, the database is forced to go from record to record (a table scan), checking to see if a condition is true. It’s no different with JSON data. In fact, it’s most likely worse since Postgres has to step in to each JSON document as well.

      This solves the problem of the last implementation I handled where json (not jsonb) data was stored in postgres