45 Matching Annotations
  1. Sep 2019
    1. To address the availability concern, new architectures were developed to minimize the impact of partitions. For instance, splitting data sets into smaller ranges called shards can minimize the amount of data that is unavailable during partitions. Furthermore, mechanisms to automatically alter the roles of various cluster members based on network conditions allow them to regain availability quickly

      Qualities of NewSQL - mainly minimisation of the impact of partitions

    2. typically less flexible and generalized than their more conventional relational counterparts. They also usually only offer a subset of full SQL and relational features, which means that they might not be able to handle certain kinds of usage. Many NewSQL implementations also store a large part of or their entire dataset in the computer's main memory. This improves performance at the cost of greater risk to unpersisted changes

      Differences between NewSQL and relational databases:

      • typically less flexible and generalized
      • usually only offer a subset of full SQL and relational features, which means that they might not be able to handle certain kinds of usage.
      • many NewSQL implementations also store a large part of or their entire dataset in the computer's main memory. This improves performance at the cost of greater risk to unpersisted changes.
    3. using a mixture of different database types is the best approach for handling the data of your projects

      Many times mixing different databases is a good approach.

      For example:

      • store user information - relational databases
      • configuration values - in-memory key-value store
    4. best suited for use cases with high volumes of relational data in distributed, cloud-like environments

      Best suit of NewSQL

    5. CAP theorem is a statement about the trade offs that distributed databases must make between availability and consistency. It asserts that in the event of a network partition, a distributed database can choose either to remain available or remain consistent, but it cannot do both. Cluster members in a partitioned network can continue operating, leading to at least temporary inconsistency. Alternatively, at least some of the disconnected members must refuse to alter their data during the partition to ensure data consistency

      CAP Theorem relating to distributed databases

      CAP

    6. NewSQL databases: bringing modern scalability and performance to the traditional relational pattern

      NewSQL databases - designed with scalability and modern performance requirements. Follow the relational structure and semantics, but are built using more modern, scalable design. Rise in popularity in 2010s.

      Examples:

      • MemSQL
      • VoltDB
      • Spanner
      • Calvin
      • CockroachDB
      • FaunaDB
      • yugabyteDB
    7. aggregate queries like summing, averaging, and other analytics-oriented processes can be difficult or impossible

      Disadvantage of column databases

    8. Column-family databases are good when working with applications that requires great performance for row-based operations and highly scalability

      Advantage of column databases. They also collect row data in a cluster on the same machine, simplifying data sharding and scaling

    9. it helps to think of column family databases as key-value databases where each key (row identifier) returns a dictionary of arbitrary attributes and their values (the column names and their values)

      Tip to remember the idea of column databases

    10. Column-family databases: databases with flexible columns to bridge the gap between relational and document databases

      Column-family databases - also called as non-relational column stores, wide-column databases or column databases. Rise in popularity in 2000s. Look highly similar to relational databases. They have structure called column families, which contain rows of data, each of which define their own format. Therefore, each row in a column family defines its own schema.

      Examples:

      • Cassandra
      • HBase

      Diagram of column-family database structure

    11. querying for the connection between two users of a social media site in a relational database is likely to require multiple table joins and therefore be rather resource intensive. This same query would be straightforward in a graph database that directly maps connections

      Social media prefers graph databases over relational ones

    12. Graph databases are most useful when working with data where the relationships or connections are highly important

      Major use of graph databases

    13. network databases require step-by-step traversal to travel between items and are limited in the types of relationships they can represent.

      Difference between network databases (SQL) and graph databases (NoSQL)

    14. Graph databases: mapping relationships by focusing on how connections between data are meaningful

      Graph databases - establishes connections using the concepts of nodes, edges, and properties. Rise in popularity in 2000s.

      Examples:

      • Neo4j
      • JanusGraph
      • Dgraph

      Diagram of a graph database structure

    15. Document databases: Storing all of an item's data in flexible, self-describing structures

      Document databases - also known as document-oriented databases or document stores, share the basic access and retrieval semantics of key-value stores. Rise in popularity in 2009.

      They also used keys to uniquely identify data, therefore the line between advanced key-value stores and document databases can be fairly unclear.

      Instead of storing arbitrary blobs of data, document databases store data in structured formats called documents, often using formats like JSON, BSON, or XML.

      Examples:

      • MongoDB
      • RethinkDB
      • Couchbase

      Diagram of document database

    16. Document databases are a good choice for rapid development because you can change the properties of the data you want to save at any point without altering existing structures or data. You only need to backfill records if you want to. Each document within the database stands on its own with its own system of organization. If you're still figuring out your data structure and your data is mainly composed discrete entries that don't include a lot of cross references, a document database might be a good place to start. Be careful, however, as the extra flexibility means that you are responsible for maintaining the consistency and structure of your data, which can be extremely challenging

      Pros and cons of document databases

    17. Though the data within documents is organized within a structure, document databases do not prescribe any specific format or schema

      Therefore, unlike in key-value stores, the content stored in document databases can be queried and analysed

    18. Key-value stores are often used to store configuration data, state information, and any data that might be represented by a dictionary or hash in a programming language. Key-value stores provide fast, low-complexity access to this type of data

      Use and advantages of of key-value stores

    19. Key-value databases: simple, dictionary-style lookups for basic storage and retrieval

      Key-value databases - one of the simplest database types. Initially introduced in 1970s (rise in popularity: 2000-2010). Work by storing arbitrary data accessible through a specific key.

      • to store data, you provide a key and the blob of data you wish to save, for example a JSON object, an image, or plain text.
      • to retrieve data, you provide the key and will then be given the blob of data back.

      Examples:

      • Redis
      • memcached
      • etcd

      Diagram of key-value store

    20. NoSQL databases: modern alternatives for data that doesn't fit the relational paradigm

      NoSQL databases - stands for either non-SQL or not only SQL to clarify that sometimes they allow SQL-like querying.

      4 types:

      • Key-value
      • Document
      • Graph
      • Column-family
    21. relational databases are often a good fit for any data that is regular, predictable, and benefits from the ability to flexibly compose information in various formats. Because relational databases work off of a schema, it can be more challenging to alter the structure of data after it is in the system. However, the schema also helps enforce the integrity of the data, making sure values match the expected formats, and that required information is included. Overall, relational databases are a solid choice for many applications because applications often generate well-ordered, structured data

      Pros and cons of relational database

    22. querying language called SQL, or structured query language, was created to access and manipulate data stored with that format

      SQL was created for relational databases

    23. Relational databases: working with tables as a standard solution to organize well-structured data

      Relational databases - oldest general purpose database type still widely used today. They comprise the majority of databases currently used in production. Initially introduced in 1969.

      They organise data using tables - structures that impose a schema on the records that they hold.

      • each column has a name and a data type
      • each row represents an individual record

      Examples:

      • MySQL
      • MariaDB
      • PostgreSQL
      • SQLite

      Diagram of relational schema used to map entities for a school

    24. database schema is a description of the logical structure of a database or the elements it contains. Schemas often include declarations for the structure of individual entries, groups of entries, and the individual attributes that database entries are comprised of. These may also define data types and additional constraints to control the type of data that may be added to the structure

      Database schema

    25. Network databases: mapping more flexible connections with non-hierarchical links

      Network databases - built on the foundation provided by hierarchical databases by adding additional flexibility. Initially introduced in late 1960s. Instead of always having a single parent, as in hierarchical databases, network database entries can have more than one parent, which effectively allows them to model more complex relationships.

      Examples:

      • IDMS

      Have graph-like structure Diagram of a network database

    26. Hierarchical databases: using parent-child relationships to map data into trees

      Hierarchical databases - the next evolution in database development. Initially introduced in 1960s. They encode a relationship between items where every record has a single parent.

      Examples:

      • Filesystems
      • DNS
      • LDAP directories

      Have tree-like structure Diagram of a hierarchical database

    27. Hierarchical databases are not used much today due to their limited ability to organize most data and because of the overhead of accessing data by traversing the hierarchy

      Hierarchical databases aren't used as much anymore

    28. The first flat file databases represented information in regular, machine parse-able structures within files. Data is stored in plain text, which limits the type of content that can be represented within the database itself. Sometimes, a special character or other indicator is chosen to use as a delimiter, or marker for when one field ends and the next begins. For example, a comma is used in CSV (comma-separated values) files, while colons or white-space are used in many data files in Unix-like systems

      Flat-file databases - 1st type of databases with a simple data structure for organising small amounts of local data.

      Examples:

      • /etc/passwd and /etc/fstab on Linux and Unix-like systems
      • CSV files
    29. Some advantages of this format

      Advantages of flat-file format:

      • has robust, flexible toolkit
      • easily managed without specialised software
      • easy to understand and work with
    30. While flat file databases are simple, they are very limited in the level of complexity they can handle

      Disadvantages of flat-file databases:

      • system that reads or manipulates the data cannot make easy connections between the data represented
      • usually don't have any type of user or data concurrency features either
      • usually only practical for systems with small read or write requirements. For example, many operating systems use flat-files to store configuration data
  2. Dec 2018
  3. Aug 2018
  4. May 2018
  5. Oct 2017
    1. MySQL’s replication architecture means that if bugs do cause table corruption, the problem is unlikely to cause a catastrophic failure.

      I can't follow the reasoning here. I guess it's not guaranteed to replicate the corruption like Postgres would, but it seems totally possible to trigger similar or identical corruption because the implementation of the logical statement would be similar on the replica.

    2. The bug we ran into only affected certain releases of Postgres 9.2 and has been fixed for a long time now. However, we still find it worrisome that this class of bug can happen at all. A new version of Postgres could be released at any time that has a bug of this nature, and because of the way replication works, this issue has the potential to spread into all of the databases in a replication hierarchy.

      Not really a criticism of Postgres so much as it is a criticism of software in general.

  6. Aug 2017
  7. Jun 2016
    1. If the RRID is well-formed, and if the lookup found the right record, a human validator tags it a valid RRID — one that can now be associated mechanically with occurrences of the same resource in other contexts. If the RRID is not well-formed, or if the lookup fails to find the right record, a human validator tags the annotation as an exception and can discuss with others how to handle it. If an RRID is just missing, the validator notes that with another kind of exception tag.

      Sounds a lot like the way reference managers work. In many cases, people keep the invalid or badly-formed results.

  8. Apr 2016
  9. Jan 2016
  10. Dec 2015
    1. Data gathering is ubiquitous in science. Giant databases are currently being minedfor unknown patterns, but in fact there are many (many) known patterns that simplyhave not been catalogued. Consider the well-known case of medical records. A patient’smedical history is often known by various individual doctor-offices but quite inadequatelyshared between them. Sharing medical records often means faxing a hand-written noteor a filled-in house-created form between offices.
    2. I will use a mathematical tool calledologs, or ontology logs, to givesome structure to the kinds of ideas that are often communicated in pictures like theone on the cover. Each olog inherently offers a framework in which to record data aboutthe subject. More precisely it encompasses adatabase schema, which means a system ofinterconnected tables that are initially empty but into which data can be entered.
  11. May 2015
  12. Oct 2014
    1. This in turn means that Redis Cluster does not have to take meta data in the data structures in order to attempt a value merge, and that the fancy commands and data structures supported by Redis are also supported by Redis Cluster. So no additional memory overhead, no API limits, no limits in the amount of elements a value can contain, but less safety during partitions.

      A solid trade-off, I think, and says a lot about the intended use cases.

  13. Sep 2014
    1. Fast restart. If a server is temporarily taken down, this capability restores the index from a saved copy, eliminating delays due to index rebuilding.

      This point seems to be in direct contradiction to the claim above that "Indexes (primary and secondary) are always stored in DRAM for fast access and are never stored on Solid State Drives (SSDs) to ensure low wear."

    2. Unlike other databases that use the linux file system that was built for rotational drives, Aerospike has implemented a log structured file system to access flash – raw blocks on SSDs – directly.

      Does this really mean to suggest that Aerospike bypasses the linux block device layer? Is there a kernel driver? Does this mean I can't use any filesystem I want and know how to administrate? Is the claim that the "linux file system" (which I take to mean, I guess, the virtual file system layer) "built for rotation drives" even accurate? We've had ram disks for a long, long time. And before that we've had log structured filesystems, too, and even devices that aren't random access like tape drives. Seems like dubious claims all around.