78 Matching Annotations
  1. Last 7 days
  2. Jul 2020
  3. Jun 2020
    1. Normalize the database for this case if your data is going to be modified multiple times
    2. normalizing our dabatase will help us. What means normalize? Well, it simply means to separate our information as much as we can

      directly contradicts firebase's official advice: denormalize the structure by duplicating some of the data: https://youtu.be/lW7DWV2jST0?t=378

    1. Denormalization is a database optimization technique in which we add redundant data to one or more tables
    1. Deadlocks are a classic problem in transactional databases, but they are not dangerous unless they are so frequent that you cannot run certain transactions at all. Normally, you must write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock.
    1. transaction calls can be nested. By default, this makes all database statements in the nested transaction block become part of the parent transaction. For example, the following behavior may be surprising: User.transaction do User.create(username: 'Kotori') User.transaction do User.create(username: 'Nemu') raise ActiveRecord::Rollback end end creates both “Kotori” and “Nemu”. Reason is the ActiveRecord::Rollback exception in the nested block does not issue a ROLLBACK. Since these exceptions are captured in transaction blocks, the parent block does not see it and the real transaction is committed.

      How is this okay??

      When would it ever be the desired/intended behavior for a raise ActiveRecord::Rollback to have absolutely no effect? What good is the transaction then??

      What happened to the principle of least surprise?

      Is there any reason we shouldn't just always use requires_new: true?

      If, like they say, the inner transaction "become[s] part of the parent transaction", then if anything, it should roll back the parent transaction too — not roll back nothing.

    2. One workaround is to begin a transaction on each class whose models you alter:
  4. May 2020
    1. I think you should normalize if you feel that introducing update or insert anomalies can severely impact the accuracy or performance of your database application.  If not, then determine whether you can rely on the user to recognize and update the fields together. There are times when you’ll intentionally denormalize data.  If you need to present summarized or complied data to a user, and that data is very time consuming or resource intensive to create, it may make sense to maintain this data separately.

      When to normalize and when to denormalize. The key is to think about UX, in this case the factors are db integrity (don't create errors that annoy users) and speed (don't make users wait for what they want)

    2. Can database normalization be taken too far?  You bet!  There are times when it isn’t worth the time and effort to fully normalize a database.  In our example you could argue to keep the database in second normal form, that the CustomerCity to CustomerPostalCode dependency isn’t a deal breaker.

      Normalization has diminishing returns

    3. Now each column in the customer table is dependent on the primary key.  Also, the columns don’t rely on one another for values.  Their only dependency is on the primary key.

      Columns dependency on the primary key and no dependency on other columns is how you get 2NF and 3NF

    4. A table is in third normal form if: A table is in 2nd normal form. It contains only columns that are non-transitively dependent on the primary key

      3NF Definition

  5. Apr 2020
    1. From a narratological perspective, it would probably be fair to say that most databases are tragic. In their design, the configuration of their user interfaces, the selection of their contents, and the indexes that manage their workings, most databases are limited when set against the full scope of the field of information they seek to map and the knowledge of the people who created them. In creating a database, we fight against the constraints of the universe – the categories we use to sort out the world; the limitations of time and money and technology – and succumb to them.

      databases are tragic!

    1. columnar databases are well-suited for OLAP-like workloads (e.g., data warehouses) which typically involve highly complex queries over all data (possibly petabytes). However, some work must be done to write data into a columnar database. Transactions (INSERTs) must be separated into columns and compressed as they are stored, making it less suited for OLTP workloads. Row-oriented databases are well-suited for OLTP-like workloads which are more heavily loaded with interactive transactions. For example, retrieving all data from a single row is more efficient when that data is located in a single location (minimizing disk seeks), as in row-oriented architectures. However, column-oriented systems have been developed as hybrids capable of both OLTP and OLAP operations, with some of the OLTP constraints column-oriented systems face mediated using (amongst other qualities) in-memory data storage.[6] Column-oriented systems suitable for both OLAP and OLTP roles effectively reduce the total data footprint by removing the need for separate systems

      typical applications (adding new users data, or even retrieving user data) are better done in (standard) row-oriented DB. Typical analytics application, such as even simple AVG(whole column) are much slower because the elements of the same column are stored far away from each other in a traditional row-oriented DB, hence increasing disk-access time.

    2. seek time is incredibly long compared to the other bottlenecks in computers
    3. Operations that retrieve all the data for a given object (the entire row) are slower. A row-based system can retrieve the row in a single disk read, whereas numerous disk operations to collect data from multiple columns are required from a columnar database.
    1. Relational databases are designed around joins, and optimized to do them well. Unless you have a good reason not to use a normalized design, use a normalised design. jsonb and things like hstore are good for when you can't use a normalized data model, such as when the data model changes rapidly and is user defined. If you can model it relationally, model it relationally. If you can't, consider json etc.
    2. Joins are not expensive. Who said it to you? As basically the whole concept of relational databases revolve around joins (from a practical point of view), these product are very good at joining. The normal way of thinking is starting with properly normalized structures and going into fancy denormalizations and similar stuff when the performance really needs it on the reading side. JSON(B) and hstore (and EAV) are good for data with unknown structure.
  6. Mar 2020
    1. I chose all my scholarly journals, I put them together. I chose some YouTube videos; they were –IF:        Mm-hmm.CF:        – like, a bunch of TED talks.        

      Compiling research materials.

      Is there room for us to think about the iterative process; can we work with instructors to "reward" (or assign) students to alternate the searching, reading and writing.

    2. And – And I seen how – I saw how many, um, scholarly journals or how many sources came up for it, right? Um, number of sources. Right. And then, if I – if I felt like it wasn’t enough for me to thoroughly talk about the topic, I would move on. Right? So, when I did segregation, there – like, I guess, like, my specific topic was modern-day, so there wasn’t really much about it. Right? So, not much info. Right? And then, when I did gentrification, there were a lot, right?

      This part of the process is interesting to me. Links topic selection to search (seemingly a single search).

      It also seems a little misguided. What can we do in our lessons that could make tiny changes to this attitude?

  7. Oct 2019
  8. Sep 2019
    1. The problem with the annotation notion is that it's the first time that we consider a piece of data which is not merely a projection of data already present in the message store: it is out-of-band data that needs to be stored somewhere.

      could be same, schemaless datastore?

    2. many of the searches we want to do could be accomplished with a database that was nothing but a glorified set of hash tables

      Hello sql and cloure.set ns! ;P

    3. There are objects, sets of objects, and presentation tools. There is a presentation tool for each kind of object; and one for each kind of object set.

      very clojure-y mood, makes me think of clojure REBL (browser) which in turn is inspired by the smalltalk browser and was taken out of datomic (which is inspired by RDF, mentioned above!)

  9. May 2019
  10. Oct 2018
  11. Sep 2018
  12. Apr 2018
    1. The takeaway from the article: Choose document-oriented database only when the data can be treated as a self-contained document

  13. Dec 2017
  14. alleledb.gersteinlab.org alleledb.gersteinlab.org
    1. AlleleDB is a repository, providing genomic annotation of cis-regulatory single nucleotide variants (SNVs) associated with allele-specific binding (ASB) and expression (ASE).
  15. Nov 2017
    1. select top 1 * from newsletters where IsActive = 1 order by PublishDate desc

      This doesn't require a full table scan or a join operation. That's just COOL

    1. They have a very simplistic view of the activity being monitored by only distilling it down into only a few dimensions for the rule to interrogate

      Number of dimensions need to be large. In normal database systems these dimensions are small.

  16. Aug 2017
    1. Football Leaks, which consists of 1.9 terabytes of information and some 18.6 million documents, ranging from player contracts to emails revealing secret transfer fees and wages, is the largest leak in the history of sport.

      "Football Leaks, which consists of 1.9 terabytes of information and some 18.6 million documents, ranging from player contracts to emails revealing secret transfer fees and wages, is the largest leak in the history of sport."

      A pity this information is not available to the public.

      Given the limited release of documents, is it really the largest leak in the history of sport?

      The ICIJ offshore database may not be complete, but there is at least something, and it is searchable.

      Hopefully EIC will also follow this example.

  17. Jun 2017
    1. The vertices and edges of a graph, known as Atoms, are used to represent not only "data", but also "procedures"; thus, many graphs are executable programs as well as data structures.

      Rohan indicated that procedures are also part of the graph. let us find out why.

  18. May 2017
  19. Mar 2017
    1. Genome Sequence Archive (GSA)

      Database URL is here: http://gsa.big.ac.cn/

      Note: metadata is INSDC format, but this database isn't part of the INSDC, so you'll still need to submit your data to one of those databases to meet internationally recognised mandates

  20. Feb 2017
  21. Jan 2016
  22. Mar 2015
    1. ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently we have an average of over five hundred images per node. We hope ImageNet will become a useful resource for researchers, educators, students and all of you who share our passion for pictures.
  23. Dec 2014
  24. Sep 2014
    1. Most systems av ailable today use a single database.

      Even after all these years...this is essentially still true.

  25. Jan 2014
    1. The initial inputs for deriving quantitative information of gene expression and embryonic morphology are raw image data, either of fluorescent proteins expressed in live embryos or of stained fluorescent markers in fixed material. These raw images are then analyzed by computational algorithms that extract features, such as cell location, cell shape, and gene product concentration. Ideally, the extracted features are then recorded in a searchable database, an atlas, that researchers from many groups can access. Building a database with quantitative graphical and visualization tools has the advantage of allowing developmental biologists who lack specialized skills in imaging and image analysis to use their knowledge to interrogate and explore the information it contains.

      1) Initial input is raw image data 2) feature extraction on raw image data 3) extracted features stored in shared, searchable database 4) database available to researchers from many groups 5) quantitative graphical and visualization tools allow access to those without specialized skill in imaging and image analysis