231 Matching Annotations
  1. Last 7 days
    1. The top reasons why a product fails:It’s too complicated – simplicity is kingIt doesn’t spread by word-of-mouthIt doesn’t take advantage of the power of iterationThe founder is too fearful of creating something novelIt’s not launched into a community

      Think about it before developing a commercial product

    1. When you run a PyTorch/TensorFlow model, most of the work isn’t actually being done in the framework itself, but rather by third party kernels. These kernels are often provided by the hardware vendor, and consist of operator libraries that higher-level frameworks can take advantage of. These are things like MKLDNN (for CPU) or cuDNN (for Nvidia GPUs). Higher-level frameworks break their computational graphs into chunks, which can then call these computational libraries. These libraries represent thousands of man hours of effort, and are often optimized for the architecture and application to yield the best performance

      What happens behind when you run ML frameworks

    2. At their core, PyTorch and Tensorflow are auto-differentiation frameworks

      auto-differentation = takes derivative of some function. It can be implemented in many ways so most ML frameworks choose "reverse-mode auto-differentation" (known as "backpropagation")

    3. Jax is built by the same people who built the original Autograd, and features both forward- and reverse-mode auto-differentiation. This allows computation of higher order derivatives orders of magnitude faster than what PyTorch/TensorFlow can offer

      Jax

    4. the transition from TensorFlow 1.0 to 2.0 will be difficult and provides a natural point for companies to evaluate PyTorch

      Chance of faster transition to PyTorch in industry

    5. At the API level, TensorFlow eager mode is essentially identical to PyTorch’s eager mode, originally made popular by Chainer. This gives TensorFlow most of the advantages of PyTorch’s eager mode (ease of use, debuggability, and etc.) However, this also gives TensorFlow the same disadvantages. TensorFlow eager models can’t be exported to a non-Python environment, they can’t be optimized, they can’t run on mobile, etc. This puts TensorFlow in the same position as PyTorch, and they resolve it in essentially the same way - you can trace your code (tf.function) or reinterpret the Python code (Autograph).

      Tensorflow Eager

    6. Once your PyTorch model is in this IR, we gain all the benefits of graph mode. We can deploy PyTorch models in C++ without a Python dependency , or optimize it.

    7. Tracing takes a function and an input, records the operations that were executed with that input, and constructs the IR. Although straightforward, tracing has its downsides. For example, it can’t capture control flow that didn’t execute. For example, it can’t capture the false block of a conditional if it executed the true block

      Tracing mode in PyTorch

    8. Script mode takes a function/class, reinterprets the Python code and directly outputs the TorchScript IR. This allows it to support arbitrary code, however it essentially needs to reinterpret Python

      Script mode in PyTorch

    9. The PyTorch JIT is an intermediate representation (IR) for PyTorch called TorchScript. TorchScript is the “graph” representation of PyTorch. You can turn a regular PyTorch model into TorchScript by using either tracing or script mode.

      PyTorch JIT

    10. On the other hand, industry has a litany of restrictions/requirements

      TensorFlow's requirements:

      • no Python <--- overhead of the Python runtime might be too much to take
      • mobile <--- Python can't be embedded in the mobile binary
      • serving <--- no-downtime updates of models, switching between models seamlessly, etc.
    11. Researchers care about how fast they can iterate on their research, which is typically on relatively small datasets (datasets that can fit on one machine) and run on <8 GPUs. This is not typically gated heavily by performance considerations, but by their ability to quickly implement new ideas. On the other hand, industry considers performance to be of the utmost priority. While 10% faster runtime means nothing to a researcher, that could directly translate to millions of savings for a company

      Researchers value how fast they can implement tools on their research.

      Industry considers value performance as it brings money.

    12. TensorFlow came out years before PyTorch, and industry is slower to adopt new technologies than researchers

      Reason why PyTorch wasn't previously more popular than TensorFlow

    13. TensorFlow is still the dominant framework. For example, based on data [2] [3] from 2018 to 2019, TensorFlow had 1541 new job listings vs. 1437 job listings for PyTorch on public job boards, 3230 new TensorFlow Medium articles vs. 1200 PyTorch, 13.7k new GitHub stars for TensorFlow vs 7.2k for PyTorch, etc

      Nowadays, the numbers still play against PyTorch

    14. TensorFlow will always have a captive audience within Google/DeepMind, but I wonder whether Google will eventually relax this

      Generally, PyTorch will be much more favorised that maybe one day it will replace TensorFlow at Google's offices

    15. Why do researchers love PyTorch?
      • simplicity <--- pythonic like, easily integrates with its ecosystem
      • great API <--- TensorFlow used to switch API many times
      • performance <--- it's not so clear if it's faster than TensorFlow
    16. In 2018, PyTorch was a minority. Now, it is an overwhelming majority, with 69% of CVPR using PyTorch, 75+% of both NAACL and ACL, and 50+% of ICLR and ICML

    17. every major conference in 2019 has had a majority of papers implemented in PyTorch

      Legend:

      • CVPR, ICCV, ECCV - computer vision conferences
      • NAACL, ACL, EMNLP - NLP conferences
      • ICML, ICLR, NeurIPS - general ML conferences

      Interactive version

      PyTorch vs TensorFlow

    18. In 2019, the war for ML frameworks has two remaining main contenders: PyTorch and TensorFlow. My analysis suggests that researchers are abandoning TensorFlow and flocking to PyTorch in droves. Meanwhile in industry, Tensorflow is currently the platform of choice, but that may not be true for long
      • in research: PyTorch > TensorFlow
      • in industry: TensorFlow > PyTorch
    19. From the early academic outputs Caffe and Theano to the massive industry-backed PyTorch and TensorFlow

      It's not easy to track all the ML frameworks

      Caffe, Theano ---> PyTorch, TensorFlow

    1. principle of least privilege states that any process, user or program has only the privileges it needs to do its job

      Principle of least privilege

    2. If you really want to impress your security consultant, then casually mention Kerckhoffs Principle which is a more formal way of saying ‘security through obscurity is not sufficient’

      Kerckhoffs Principle

    3. Hashing is the process of turning one set of data into another through a reproducible algorithm

      Hashing

    4. symmetric key is. It’s a key that is ‘the same’ one used on both sides of the communication

      Symmetric key

    5. asymmetric key is one where access to the key used to encrypt the message does not imply access to decrypt the message

      Asymmetric key

    6. Authorization is the process of determining whether you are allowed to do something or not

      Authorisation

    7. Security through obscurity is security through the design of a system. In other words, if the design of your system were to become public then it would be easy to expose

      Security Through Obscurity

    8. Role-Based Access Control gives permission to a more abstract entity called a role. Rather than giving access to that user directly, you give the user access to the role, and then that role has the access permissions set for it

      Role-Based Access Control (RBAC)

    9. This is why it’s important to ‘salt‘ your hash with a secret key so that knowledge of the hash algorithm isn’t enough to crack a lot of passwords

      Improving hashing algorithms

    10. Encryption vs Encoding

      Encoding - converting some data into some other format

      Encryption - involves needing some secret or secure process to get access to the data, like a private 'key' that you store in your ~/.ssh folder

    1. for many women and purple can induce soothing and calmness with the image of royalty

      purple colour

    2. Colours and emotion 

      Colours and emotion

    3. Both men and women have blue as their top colour

      Blue wins

    4. The colours black & white have opposing meaning in western and eastern cultures

      west culture:

      • black = finality, death, formality
      • white = purity, peace

      east culture:

      • white = death
      • black = wealth, health, prosperity
    5. Think of each colour in context of its environment, for example do you have a mostly grey, white or muted colours on your website then make your call to action button(s) green or red

      There is no universal guide in choosing website colours. Go with your own intuition

    6. our brain processes visuals 60,000x faster than text
    7. Colour is such a pervasive part of everything we encounter visually in our world, it evokes emotions which in turn drives decision making

      Effect of colour

    8. When creating or refining your brand identity think about pairing your main colour with a complimentary colour or use the 3 grouping guides below

      colour harmony

    9. We are using colour to communicate the value of our product or service
      • Red – Passionate, Aggressive, Important
      • Orange – Playful, Energetic, Cheap
      • Yellow – Happy, Friendly, Warning
      • Green – Natural, Stable, Prosperous
      • Blue – Serene, Trustworthy, Inviting
      • Violet – Luxurious, Mysterious, Romantic
      • Pink – Feminine, Young, Innocent
      • Black – Powerful, Sophisticated, Edgy
      • Brown – Earthy, Sturdy, Rustic
      • Grey – Neutral, Formal, Gloomy
      • White – Clean, Virtuous, Healthy
      • Beige – Accentuates surrounding colours
    10. our eyes can only pick up certain light wavelengths

      We can only pick up the visible spectrum Colors

    11. The theory of colour is a discipline that stretches back to at least the 15th century. It encompasses chemistry, physics and mathematics to effectively explain colour

      The theory of colour

    12. There are 2 primary colour systems (to reproduce colour) we use on a daily basis additive & subtractive. Anything that emits light (sun, screen or projector) uses additive and everything else reflects colour and uses subtractive colour

      2 primary colour systems: 2 primary colour systems

    13. The colour wheel is where you need to start when planning a colour scheme or branding for your business and for sales and marketing campaigns. The colour wheel consists of primary, secondary and tertiary colours.
      1. Primary: can not be made from any other colours
      2. Secondary: formed by mixing the primary colours
      3. Tertiary: formed by mixing primary and secondary colours colour wheel
    14. round shapes are more trustworthy & straight sharp edges are more striking
  2. Oct 2019
    1. “When scientists analyze people who appear to have tremendous self-control, it turns out those individuals aren’t all that different from those who are struggling. Instead, “disciplined” people are better at structuring their lives in a way that does not require heroic willpower and self-control.” - Atomic Habits

      Where all the self-control comes from...

    2. “The professional has learned that success, like happiness, comes as a by-product of work.” - The War of Art

      Where all the happiness comes from...

    3. “To write a great book, you must first become the book.”  - Atomic Habits
    4. I could and probably will write an entire article (or 10) on how I approach SEO research, but for the purpose of this article, I’ll just share that I use these tools: Ahrefs, Keyword Planner, and Keywords Everywhere.

      SEO tools to consider while writing a blog post:

    5. Conversations, books and articles, personal experiences, and random Google searches

      Sources of ideas

    6. What is something that I can uniquely contribute?

      Good question to ask while searching for idea to write about

    7. I set an open invitation to my brain to take note of anything interesting. More specifically, I look for anything within the intersection of: (1) interesting, (2) doesn’t exist online in that exact form, and (3) something I can contribute uniquely to

      Ideation - look for topics to write about: Idea Ikigai

    8. In my opinion, writing should never be something that you do just to check off of your to-do list. At its core, writing is a mechanism to try to deliver value to others. And over time, I’ve defined that as my core goal.

      Indeed, writing for others is a great devotion

    9. There’s a great tool called Draftback, which essentially lets you “go back in time to look over your own shoulder as you write”

      Draftback Chrome Extension allows to "rewatch" your writing process

    10. Writing Process Steps:Ideation: PassiveTracking: ActiveOutline - ActiveIdea Arbitrage - PassiveResearch - ActiveWriting - Active

      Proposition of steps to follow while writing. Active state requires a "hard-core execute mode".

      1. Ideation (passive)
      2. Tracking (active)
      3. Outline (active)
      4. Idea Arbitrage (passive)
      5. Research (active)
      6. Writing (active)
    11. Practice makes any seemingly impossible task familiar

      Well said

    1. dplyr in R also lets you use a different syntax for querying SQL databases like Postgres, MySQL and SQLite, which is also in a more logical order
    2. It’s just that it often makes sense to write code in the order JOIN / WHERE / GROUP BY / HAVING. (I’ll often put a WHERE first to improve performance though, and I think most database engines will also do a WHERE first in practice)

      Pandas usually writes code in this syntax:

      1. JOIN
      2. WHERE
      3. GROUP BY
      4. HAVING

      Example:

      1. df = thing1.join(thing2) # like a JOIN
      2. df = df[df.created_at > 1000] # like a WHERE
      3. df = df.groupby('something', num_yes = ('yes', 'sum')) # like a GROUP BY
      4. df = df[df.num_yes > 2] # like a HAVING, filtering on the result of a GROUP BY
      5. df = df[['num_yes', 'something1', 'something']] # pick the columns I want to display, like a SELECT
      6. df.sort_values('sometthing', ascending=True)[:30] # ORDER BY and LIMIT
      7. df[:30]
    3. LINQ (a querying syntax in C# and VB.NET) uses the order FROM ... WHERE ... SELECT

      LINQ starts queries in a different order:

      1. FROM
      2. WHERE
      3. SELECT
    4. Database engines in practice don’t actually run queries by joining, and then filtering, and then grouping, because they implement a bunch of optimizations reorder things to make the query run faster as long as reordering things won’t change the results of the query

      SQL queries are run by database engines in different order than we write them down

    5. SELECT isn’t the first thing, it’s like the 5th thing!

      Order of SQL queries:

      1. FROM/JOIN and all the ON conditions
      2. WHERE
      3. GROUP BY
      4. HAVING
      5. SELECT (including window functions)
      6. ORDER BY
      7. LIMIT SQL queries happen in this order* 1.
    1. When we talk about abstraction levels, we can classify the code in 3 levels: high: getAdress medium: inactiveUsers = Users.findInactives low: .split(" ")

      3 abstraction levels:

      1. high: getAdress
      2. medium: inactiveUsers = Users.findInactives
      3. low: .split(" ")

      Explanation:

      • The high level abstraction are functions that you create, like searchForsomething()
      • The medium level are methods in your object, like account.unverifyAccount
      • The low level are methods that the language provides, like map, to_downncase and so on
    2. Uncle Bob, in clean code, defends that the best order to write code is: Write unit tests. Create code that works. Refactor to clean the code.

      Best order to write code (according to Uncle Bob):

      1. Write unit tests.
      2. Create code that works.
      3. Refactor to clean the code.
    3. Sometimes it's interesting to explain some code (How many time you spend trying to figure out a regex pattern when you see one?), but, in 99% of the time, comments could be avoided.

      Generally try to avoid (avoid != forbid) comments.

      Comments:

      1. Become outdated, confusing your future self (Yesterday I lost some precious time because of an outdated comment)
      2. Could be replaced for some better named variable/function/class.
      3. They pollute the code unnecessarily.
    4. There is another maxim also that says: you must write the same code a maximum of 3 times. The third time you should consider refactoring and reducing duplication

      Avoid repeating the same code over and over

    5. decrease the switch/if/else is to use polymorphism

      It's better to avoid excessive switch/if/else statements

      example

    6. The ideal is not to mix the abstraction levels in only one function.

      Try not mixing abstraction levels inside a single function

    7. In the ideal world, they should be 1 or 2 levels of indentation

      Functions in the ideal world shouldn't be long

    8. Should be verbs, and not nouns, because methods represent actions that objects must do

      Methods names = verbs

    9. Should be nouns, and not verbs, because classes represent concrete objects

      Class names = nouns

    10. int d could be int days

      When naming things, focus on giving meaningful names, that you can pronounce and are searchable. Also, avoid prefixes

    11. naming things, write better functions and a little about comments. Next, I intend to talk about formatting, objects and data structures, how to handle with errors, about boundaries (how to deal with another's one code), unit testing and how to organize your class better. I know that it'll be missing an important topic about code smells

      Ideas to consider while developing clean code:

      • naming things
      • better functions
      • comments
      • formatting
      • objects and data structures
      • handling error
      • boundaries (handling another's one code)
      • unit testing
      • organising classes
      • code smells
    1. read Knuth, or Pragmatic Programming, or Clean Code, or some other popular book

      Classic programming related books

    2. Considering that there are still a ton of COBOL jobs out there, there is no particular technology that you need to know

      RIght, there is no specific need to learn that one technology

    3. Senior developers are more cautious, thoughtful, pragmatic, practical and simple in their approaches to solving problems.

      Interesting definition of senior devs

    4. "The Big Picture" is one of those things that people say a whole lot but can mean so many different things. Going through all of these articles, they tend to mean any (or all) of these things

      Thinking about The Big Picture:

      1. The Business Stuff - how to meet KPIs or the current big deadline or whatever.
      2. The User Stuff - how to actually provide value to the people who use what you make.
      3. The Technology Stuff - how to build something that will last a long time.
    5. The vertical bar on the letter T represents the depth of related skills and expertise in a single field, whereas the horizontal bar is the ability to collaborate across disciplines with experts in other areas and to apply knowledge in areas of expertise other than one's own.

      T shaped knowledge

    1. Don’t commit directly to the master or development branches. Don’t hold up work by not committing local branch changes to remote branches. Never commit application secrets in public repositories. Don’t commit large files in the repository. This will increase the size of the repository. Use Git LFS for large files.  Learn more about what Git LFS is and how to utilize it in this advanced Learning Git with GitKraken tutorial. Don’t create one pull request addressing multiple issues. Don’t work on multiple issues in the same branch. If a feature is dropped, it will be difficult to revert changes. Don’t reset a branch without committing/stashing your changes. If you do so, your changes will be lost. Don’t do a force push until you’re extremely comfortable performing this action. Don’t modify or delete public history. 

      Git Don'ts

    2. Create a Git repository for every new project. Learn more about what a Git repo is in this beginner Learning Git with GitKraken tutorial. Always create a new branch for every new feature and bug. Regularly commit and push changes to the remote branch to avoid loss of work. Include a gitignore file in your project to avoid unwanted files being committed. Always commit changes with a concise and useful commit message.  Utilize git-submodule for large projects. Keep your branch up to date with development branches. Follow a workflow like Gitflow. There are many workflows available, so choose the one that best suits your needs. Always create a pull request for merging changes from one branch to another. Learn more about what a pull request is and how to create them in this intermediate Learning Git with GitKraken tutorial. Always create one pull request addressing one issue. Always review your code once by yourself before creating a pull request. Have more than one person review a pull request. It’s not necessary, but is a best practice. Enforce standards by using pull request templates and adding continuous integrations. Learn more about enhancing the pull request process with templates.  Merge changes from the release branch to master after each release. Tag the master sources after every release. Delete branches if a feature or bug fix is merged to its intended branches and the branch is no longer required. Automate general workflow checks using Git hooks. Learn more about how to trigger Git hooks in this intermediate Learning Git with GitKraken tutorial. Include read/write permission access control to repositories to prevent unauthorized access. Add protection for special branches like master and development to safeguard against accidental deletion.

      Git Dos

  3. Sep 2019
    1. To address the availability concern, new architectures were developed to minimize the impact of partitions. For instance, splitting data sets into smaller ranges called shards can minimize the amount of data that is unavailable during partitions. Furthermore, mechanisms to automatically alter the roles of various cluster members based on network conditions allow them to regain availability quickly

      Qualities of NewSQL - mainly minimisation of the impact of partitions

    2. typically less flexible and generalized than their more conventional relational counterparts. They also usually only offer a subset of full SQL and relational features, which means that they might not be able to handle certain kinds of usage. Many NewSQL implementations also store a large part of or their entire dataset in the computer's main memory. This improves performance at the cost of greater risk to unpersisted changes

      Differences between NewSQL and relational databases:

      • typically less flexible and generalized
      • usually only offer a subset of full SQL and relational features, which means that they might not be able to handle certain kinds of usage.
      • many NewSQL implementations also store a large part of or their entire dataset in the computer's main memory. This improves performance at the cost of greater risk to unpersisted changes.
    3. using a mixture of different database types is the best approach for handling the data of your projects

      Many times mixing different databases is a good approach.

      For example:

      • store user information - relational databases
      • configuration values - in-memory key-value store
    4. best suited for use cases with high volumes of relational data in distributed, cloud-like environments

      Best suit of NewSQL

    5. CAP theorem is a statement about the trade offs that distributed databases must make between availability and consistency. It asserts that in the event of a network partition, a distributed database can choose either to remain available or remain consistent, but it cannot do both. Cluster members in a partitioned network can continue operating, leading to at least temporary inconsistency. Alternatively, at least some of the disconnected members must refuse to alter their data during the partition to ensure data consistency

      CAP Theorem relating to distributed databases

      CAP

    6. NewSQL databases: bringing modern scalability and performance to the traditional relational pattern

      NewSQL databases - designed with scalability and modern performance requirements. Follow the relational structure and semantics, but are built using more modern, scalable design. Rise in popularity in 2010s.

      Examples:

      • MemSQL
      • VoltDB
      • Spanner
      • Calvin
      • CockroachDB
      • FaunaDB
      • yugabyteDB
    7. aggregate queries like summing, averaging, and other analytics-oriented processes can be difficult or impossible

      Disadvantage of column databases

    8. Column-family databases are good when working with applications that requires great performance for row-based operations and highly scalability

      Advantage of column databases. They also collect row data in a cluster on the same machine, simplifying data sharding and scaling

    9. it helps to think of column family databases as key-value databases where each key (row identifier) returns a dictionary of arbitrary attributes and their values (the column names and their values)

      Tip to remember the idea of column databases

    10. Column-family databases: databases with flexible columns to bridge the gap between relational and document databases

      Column-family databases - also called as non-relational column stores, wide-column databases or column databases. Rise in popularity in 2000s. Look highly similar to relational databases. They have structure called column families, which contain rows of data, each of which define their own format. Therefore, each row in a column family defines its own schema.

      Examples:

      • Cassandra
      • HBase

      Diagram of column-family database structure

    11. querying for the connection between two users of a social media site in a relational database is likely to require multiple table joins and therefore be rather resource intensive. This same query would be straightforward in a graph database that directly maps connections

      Social media prefers graph databases over relational ones

    12. Graph databases are most useful when working with data where the relationships or connections are highly important

      Major use of graph databases

    13. network databases require step-by-step traversal to travel between items and are limited in the types of relationships they can represent.

      Difference between network databases (SQL) and graph databases (NoSQL)

    14. Graph databases: mapping relationships by focusing on how connections between data are meaningful

      Graph databases - establishes connections using the concepts of nodes, edges, and properties. Rise in popularity in 2000s.

      Examples:

      • Neo4j
      • JanusGraph
      • Dgraph

      Diagram of a graph database structure

    15. Document databases: Storing all of an item's data in flexible, self-describing structures

      Document databases - also known as document-oriented databases or document stores, share the basic access and retrieval semantics of key-value stores. Rise in popularity in 2009.

      They also used keys to uniquely identify data, therefore the line between advanced key-value stores and document databases can be fairly unclear.

      Instead of storing arbitrary blobs of data, document databases store data in structured formats called documents, often using formats like JSON, BSON, or XML.

      Examples:

      • MongoDB
      • RethinkDB
      • Couchbase

      Diagram of document database

    16. Document databases are a good choice for rapid development because you can change the properties of the data you want to save at any point without altering existing structures or data. You only need to backfill records if you want to. Each document within the database stands on its own with its own system of organization. If you're still figuring out your data structure and your data is mainly composed discrete entries that don't include a lot of cross references, a document database might be a good place to start. Be careful, however, as the extra flexibility means that you are responsible for maintaining the consistency and structure of your data, which can be extremely challenging

      Pros and cons of document databases

    17. Though the data within documents is organized within a structure, document databases do not prescribe any specific format or schema

      Therefore, unlike in key-value stores, the content stored in document databases can be queried and analysed

    18. Key-value stores are often used to store configuration data, state information, and any data that might be represented by a dictionary or hash in a programming language. Key-value stores provide fast, low-complexity access to this type of data

      Use and advantages of of key-value stores

    19. Key-value databases: simple, dictionary-style lookups for basic storage and retrieval

      Key-value databases - one of the simplest database types. Initially introduced in 1970s (rise in popularity: 2000-2010). Work by storing arbitrary data accessible through a specific key.

      • to store data, you provide a key and the blob of data you wish to save, for example a JSON object, an image, or plain text.
      • to retrieve data, you provide the key and will then be given the blob of data back.

      Examples:

      • Redis
      • memcached
      • etcd

      Diagram of key-value store

    20. NoSQL databases: modern alternatives for data that doesn't fit the relational paradigm

      NoSQL databases - stands for either non-SQL or not only SQL to clarify that sometimes they allow SQL-like querying.

      4 types:

      • Key-value
      • Document
      • Graph
      • Column-family
    21. relational databases are often a good fit for any data that is regular, predictable, and benefits from the ability to flexibly compose information in various formats. Because relational databases work off of a schema, it can be more challenging to alter the structure of data after it is in the system. However, the schema also helps enforce the integrity of the data, making sure values match the expected formats, and that required information is included. Overall, relational databases are a solid choice for many applications because applications often generate well-ordered, structured data

      Pros and cons of relational database

    22. querying language called SQL, or structured query language, was created to access and manipulate data stored with that format

      SQL was created for relational databases

    23. Relational databases: working with tables as a standard solution to organize well-structured data

      Relational databases - oldest general purpose database type still widely used today. They comprise the majority of databases currently used in production. Initially introduced in 1969.

      They organise data using tables - structures that impose a schema on the records that they hold.

      • each column has a name and a data type
      • each row represents an individual record

      Examples:

      • MySQL
      • MariaDB
      • PostgreSQL
      • SQLite

      Diagram of relational schema used to map entities for a school

    24. database schema is a description of the logical structure of a database or the elements it contains. Schemas often include declarations for the structure of individual entries, groups of entries, and the individual attributes that database entries are comprised of. These may also define data types and additional constraints to control the type of data that may be added to the structure

      Database schema

    25. Network databases: mapping more flexible connections with non-hierarchical links

      Network databases - built on the foundation provided by hierarchical databases by adding additional flexibility. Initially introduced in late 1960s. Instead of always having a single parent, as in hierarchical databases, network database entries can have more than one parent, which effectively allows them to model more complex relationships.

      Examples:

      • IDMS

      Have graph-like structure Diagram of a network database

    26. Hierarchical databases: using parent-child relationships to map data into trees

      Hierarchical databases - the next evolution in database development. Initially introduced in 1960s. They encode a relationship between items where every record has a single parent.

      Examples:

      • Filesystems
      • DNS
      • LDAP directories

      Have tree-like structure Diagram of a hierarchical database

    27. Hierarchical databases are not used much today due to their limited ability to organize most data and because of the overhead of accessing data by traversing the hierarchy

      Hierarchical databases aren't used as much anymore

    28. The first flat file databases represented information in regular, machine parse-able structures within files. Data is stored in plain text, which limits the type of content that can be represented within the database itself. Sometimes, a special character or other indicator is chosen to use as a delimiter, or marker for when one field ends and the next begins. For example, a comma is used in CSV (comma-separated values) files, while colons or white-space are used in many data files in Unix-like systems

      Flat-file databases - 1st type of databases with a simple data structure for organising small amounts of local data.

      Examples:

      • /etc/passwd and /etc/fstab on Linux and Unix-like systems
      • CSV files
    29. Some advantages of this format

      Advantages of flat-file format:

      • has robust, flexible toolkit
      • easily managed without specialised software
      • easy to understand and work with
    30. While flat file databases are simple, they are very limited in the level of complexity they can handle

      Disadvantages of flat-file databases:

      • system that reads or manipulates the data cannot make easy connections between the data represented
      • usually don't have any type of user or data concurrency features either
      • usually only practical for systems with small read or write requirements. For example, many operating systems use flat-files to store configuration data
    1. One of the earliest interactions with a customer is to signal a lack of trust. The bartender is suggesting the customer will leave without paying.

      Show your trust to the customer

    2. I think many business owners do not consider the subconscious effects on how they manage their business

      Don't forget about the subconscious effects

    3. In another example, there is a plaza near where I work that has a number of eating options but my preferred is a sandwich place at the far end of the building. When I pull in I often look for parking in front, if there is not I might need to go to the opposite end of the plaza to find parking. While walking to the sandwich shop I pass by 3 other places to eat. I almost never make it to the sandwich shop. I always convince myself to try something different before getting to the front door of my go-to lunch spot

      That's what would happen if your customers would need to take a long walk from the other end of the parking lot

    4. the experience his customers have while visiting him was very important to him and it started when they pulled into the parking lot. In his view, they should have front row parking

      Why to park far away from your own business office

    1. "Secrecy is the first law of Magic" - from Julia Cameron's Artist's Way.Talking about your project seems to spread your energy in every direction other than towards completion.

      Inspiring comment

    2. Four different tests of 63 people found that those who kept their intentions private were more likely to achieve them

      Speaking about numbers

    3. W. Mahler found that if a person announced the solution to a problem, and was acknowledged by others, it was now in the brain as a “social reality”, even if the solution hadn’t actually been achieved
    4. Announcing your plans to others satisfies your self-identity just enough that you’re less motivated to do the hard work needed

      Tests done since 1933 show that people who talk about their intentions are less likely to make them happen

    1. the __ methods allow us to interact with core concepts of the python language. You can see them also as a mechanism of implementing behaviours, interface methods.

      __ methods

    2. Dunder or magic method, are methods that start and end with double _ like __init__ or __str__. This kind of methods are the mechanism we use to interact directly with python's data model

      Dunder methods

    3. use pyenv. With it, you will be able to have any version you want at your disposal, very easy.

      pyenv allows you to easily switch between Python versions

    1. Pretty much the same things as the engineering blog, but make sure to skim through the “Issues” section to see if you can find anything else interesting.

      Things to look for in open source projects

    2. What projects/products have they developed recently? And more importantly, what led them to build these things? What business challenges or goals drove the project? What technical challenges drove it?

      Things to look for in the developer's blogs

    3. The list of product(s). Is there anything similar you’ve worked on that you can show you understand the business problems and domain? Anything similar you’ve worked on where you helped make UX and/or feature decisions (where you stepped outside your developer-world bubble?). Anything similar that you had to develop a unique technical solution for? You want to show you can understand the business/product side of things and translate that into technical solutions. List of customers (company’s love to list customer logos!). While on the surface this might not seem that helpful, it can actually provide helpful information. Is there a particular type of customer they have that you have developed solutions for before? (i.e. – government, insurance, etc). Any specific customer you have built products for before? News section. Companies will often talk about new customers, recent acquisitions, and new product developments here. This will give you a sense of where the company is headed and is really useful to bring up in interviews as it shows you have an understanding of the current state of the company.

      Things to look for on company websites

    4. Things to look for are: What are recent things they’ve worked on or tools they’ve built? What are things they’re working on now? Projects/products/etc What projects/products do they mention that you would be working on? Do they mention any specific technologies you have experience with (not Node/React etc, but for example, performance testing tools -> this suggest they have a lot of traffic and they need to profile their services, something that you would be a good fit for if you have this experience) Anything they explicitly mention they need help with? Sometimes job posting will say things like “We just had a huge increase in users and need to hire another developer to help us re-architect some of our core services”.

      Things to look for in job postings

    5. At a very high level, it is:

      Better approach to look for jobs:

      Step 1: figure out what companies’ problems are: – Research company website, engineering blog, etc. to find out what these problems are

      Step 2, show how you can help solve those problems: – create your “pitch” (whether this is a resume and quick paragraph email, or something in person, approach is the same) by showing how your skills and experience will help solve their problems

    6. The job search process is a sales process – one in which you are selling your skills and experience
    1. Summary

      In doing a code review, you should make sure that:

      • The code is well-designed.
      • The functionality is good for the users of the code.
      • Any UI changes are sensible and look good.
      • Any parallel programming is done safely.
      • The code isn’t more complex than it needs to be.
      • The developer isn’t implementing things they might need in the future but don’t know they need now.
      • Code has appropriate unit tests.
      • Tests are well-designed.
      • The developer used clear names for everything.
      • Comments are clear and useful, and mostly explain why instead of what.
      • Code is appropriately documented (generally in g3doc).
      • The code conforms to our style guides.
    1. Continuous Delivery for Machine Learning end-to-end process

      end-to-end process

    2. We chose to use GoCD as our Continuous Delivery tool, as it was built with the concept of pipelines as a first-class concern

      GoCD - open source Continuous Delivery tool

    3. A deployment pipeline automates the process for getting software from version control into production, including all the stages, approvals, testing, and deployment to different environments

      Deployment pipeline

    4. example of how to combine different test pyramids for data, model, and code in CD4ML

      Combining tests for data (purple), model (green) and code (blue) testing

    5. There are different types of testing that can be introduced in the ML workflow.

      Automated tests for ML system:

      • validating data
      • validating component integration
      • validating the model quality
      • validating model bias and fairness
    6. Another approach is to use a tool like H2O to export the model as a POJO in a JAR Java library, which you can then add as a dependency in your application. The benefit of this approach is that you can train the models in a language familiar to Data Scientists, such as Python or R, and export the model as a compiled binary that runs in a different target environment (JVM), which can be faster at inference time

      H2O - export models trained in Python/R as a POJO in JAR

    7. In order to formalise the model training process in code, we used an open source tool called DVC (Data Science Version Control). It provides similar semantics to Git, but also solves a few ML-specific problems:

      DVC - transform model training process into code.

      Advantages:

      • it has multiple backend plugins to fetch and store large files on an external storage outside of the source control repository;
      • it can keep track of those files' versions, allowing us to retrain our models when the data changes;
      • it keeps track of the dependency graph and commands used to execute the ML pipeline, allowing the process to be reproduced in other environments;
      • it can integrate with Git branches to allow multiple experiments to co-exist
    8. Machine Learning pipeline for our Sales Forecasting problem, and the 3 steps to automate it with DVC

      Sales Forecasting process Sales Forecasting process

    9. process that takes input data through a series of transformation stages, producing data as output

      Data pipeline

    10. common functional silos in large organizations can create barriers, stifling the ability to automate the end-to-end process of deploying ML applications to production

      Common ML process (leading to delays and frictions) ML process

    11. Continuous Delivery for Machine Learning (CD4ML) is a software engineering approach in which a cross-functional team produces machine learning applications based on code, data, and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles.

      Continuous Delivery for Machine Learning (CD4ML) (long definition)

      Basic principles:

      • software engineering approach
      • cross-functional team
      • producing software based on code, data, and ml models
      • small and safe increments
      • reproducible and reliable software release
      • short adaptation cycles
    12. Continuous Delivery for Machine Learning (CD4ML) is the discipline of bringing Continuous Delivery principles and practices to Machine Learning applications.

      Continuous Delivery for Machine Learning (CD4ML)

    13. "Continuous Delivery is the ability to get changes of all types — including new features, configuration changes, bug fixes, and experiments — into production, or into the hands of users, safely and quickly in a sustainable way". -- Jez Humble and Dave Farley

      Continuous Delivery

    1. The confidence of knowing that once something is added to Anki it won't be forgotten is intoxicating

      Intoxicating

    2. And for the last three years, I've added EVERYTHING to Anki. Bash aliases, IDE Shortcuts, programming APIs, documentation, design patterns, etc. Having done that, I wouldn't recommend adding EVERYTHING

      Put just the relevant information into Anki

    3. Habit: Whenever I search StackOverflow, I'll immediately create a flashcard of my question and the answer(s) into Anki.

      Example habit to make a flashcard

    4. Anki seems more common among software engineers
    5. Engineers are creatures of habit. Make reviewing your flashcard app your first work task (or the train, the toilet right before Candy Crush). Stop StackOverflowing "how do i amend my git commit" five times every month.

      Spaced repetition is a solution to googling 5 times a month the same thing

    6. Outside of medical students and language learning apps like Duolingo, spaced repetition isn't common. It's not as cool as cramming, but it works. Medical students use it to memorize those awful thousand page textbooks. Duolingo uses it because it's effective

      The most popular appliers of spaced repetition:

      1. Medical students
      2. Duolingo users
    7. But Why Option 3?

      Why spaced repetition is superior to cramming (reviewing just a week before the exam):

      1. Cramming rarely works after it passes from short-term memory. How many cram sessions do you remember from high school?
      2. Evenly spaced reminders sort-of works, but you'd have to review all your knowledge at every interval, which doesn't sound scalable/fun/have a social life.
      3. Our brains work best with exponentially spaced reminders.
    8. Spaced repetition is a remembering technique that will remind you concepts at spaced intervals to maximize memory retention efficiently

      Spaced repetition Spaced repetition

    9. Kyle had a super hero ability. Photographic memory in API syntax and documentation. I wanted that and I was jealous. My career was stuck and something needed to change. And so I began a dedicated journey into spaced repetition. Every day for three years, I spent one to three hours in spaced repetition

      Spaced repetition as a tool for photographic memory in API syntax and documentation

    1. “I think SM is only good for a small minority of learners. But they will probably value it very much.”

      I totally agree with it

    2. “Anki is a tool and SuperMemo is a lifestyle.”

      Anki vs SuperMemo

    3. Using either SRS has already given you a huge edge over not using any SRS: No SRS: 70 hours Anki: 10 hours SuperMemo: 6 hours The difference between using any SRS (whether it’s Anki or SM) and not using is huge, but the difference between Anki or SM is not

      It doesn't matter as much which SRS you're using. It's most important to use one of them at least

    4. A: Read an article from start to finish. ONLY THEN do you import parts into Anki for remembering B: Incremental Reading: interleaving between reading and remembering

      Two algorithms (A and B) for studying

    5. in SM, learning and remembering are blended into one: you read (learn) and review (remember) at the same time. Incremental Reading is essentially “spaced repetition-ing” your reading

      Super Memo combines learning + remembering

    6. In Anki, you are only doing the remembering part. You are not reading anything new in Anki

      Anki is for remembering

    7. Learning = reading and understanding new things Remembering = memorizing what you learned

      Learning vs remembering

    1. Another common component to profile is the memory usage. The purpose is to find memory leaks and optimize the memory usage in your Python programs

      Memory usage can be tracked with pympler or objgraph libraries

    2. The profile module gives similar results with similar commands. Typically, you switch to profile if cProfile isn’t available

      cProfile > profile. Use profile only when cProfile isn't available

    3. Line profiling, as the name suggests, means to profile your Python code line by line

      Line profiling

    4. If a method has an acceptable speed but is so frequently called that it becomes a huge time sink, you would want to know this from your profiler

      We also want to measure the frequency of method calls. cProfile can highlight the number of function calls and how many of those are native calls

    5. method profiling tool like cProfile (which is available in the Python language), the timing metrics for methods can show you statistics, such as the number of calls (shown as ncalls), total time spent in the function (tottime), time per call (tottime/ncalls and shown as percall), cumulative time spent in a function (cumtime), and cumulative time per call (quotient of cumtime over the number of primitive calls and shown as percall after cumtime)

      cProfile is one of the Python tools to measure method execution time. Specifically:

      • number of calls (shown as ncalls)
      • total time spent in the function (tottime)
      • time per call (tottime/ncalls and shown as percall)
      • cumulative time spent in a function (cumtime)
      • cumulative time per call (quotient of cumtime over the number of primitive calls and shown as percall after cumtime)
    6. what parts of the software do we profile (measure its performance metrics)

      Most profiled parts of the software:

      • Method or function (most common)
      • Lines (similar to method profiling, but doing it line by line)
      • Memory (memory usage)
    7. its purpose is to dump Python tracebacks explicitly on a fault, after a timeout, or on a user signal

      Faulthandler in contrast to tracing tracks specific events and has slightly better documentation

    8. The purpose of trace module is to “monitor which statements and functions are executed as a program runs to produce coverage and call-graph information

      Purpose of trace module

    9. trace and faulthandler modules cover basic tracing

      Basic Python libraries for tracing

    10. tracing is a special use case of logging in order to record information about a program’s execution

      Tracing (more for software devs) is very similar to event logging (more for system administrators)

    1. First up, regular citizens who download copyrighted content from illegal sources will not be criminalized. This means that those who obtain copies of the latest movies from the Internet, for example, will be able to continue doing so without fear of reprisals. Uploading has always been outlawed and that aspect has not changed.

      In Switzerland you will be able to download, but not upload pirate content

    1. So why are map, filter and reduce useful?

      Advantages of map, filter, reduce:

      • don't have to manually loop over array
      • chain together for short, straightforward array transformations
      • can reuse callback functions and compose them together
    2. During each loop of reduce, you can get result of the last loop, and the next element in the array Change the result, then return it for the next loop iteration When you're done, you have the completed collection

      reduce reduce

    3. This lets you transform each element into something new (or keep it the same) Types don't have to be the same: can return objects, string, numbers - anything!

      map map

    4. It loops over the elements, passing each one to a callback function You can return true to include that element in the new array, or false to exclude it

      filter filter

    5. map, filter and reduce can all operate independently, or be chained together

      map, filter, reduce

    1. 75 countries already using the technology

      75 countries already use facial recognition

    1. git config --global alias.s status

      Replace git status with git s:

      git config --global alias.s status
      

      It will modify config in .gitconfig file.

      Other set of useful aliases:

      [alias]
        s = status
        d = diff
        co = checkout
        br = branch
        last = log -1 HEAD
        cane = commit --amend --no-edit
        lo = log --oneline -n 10
        pr = pull --rebase
      

      You can apply them (^) with:

      git config --global alias.s status
      git config --global alias.d diff
      git config --global alias.co checkout
      git config --global alias.br branch
      git config --global alias.last "log -1 HEAD"
      git config --global alias.cane "commit --amend --no-edit"
      git config --global alias.pr "pull --rebase"
      git config --global alias.lo "log --oneline -n 10"
      
    2. alias g=git
      alias g=git
      

      This command will let you type g s in your shell to check git status

    1. However, if you believe that you are indistractable, you empower yourself to respond more healthily to whatever distractions get in your way

      How to be Indistractable (summary): How to be Indistractable (summary)

    2. identity pact is another way to change your response to distractions. Your self-image has a profound impact on your behavior

      Identity pact - give yourself a new identity, like "vegetarian" to force yourself to stop eating meat

    3. price pact puts money on the line. If you stick to your intended behavior, you keep the cash. If you get distracted, you forfeit your funds

      Price pact - make an agreement with your friend that you will give him a lot of money in case you won't finish what you want to

    4. effort pact is a kind of precommitment that involves increasing the amount of effort required to do something you don’t want to do

      Effort pack is one of the examples of precommitments. You can use "Forest" app as example to help you with it

    5. you can take back your smartphone in four steps
      1. Remove needless apps.
      2. Shift the usage of mobile apps to desktop.
      3. Rearrange icons on your screen.
      4. Adjust notifications.
    6. The right approach is to ask whether the external trigger is serving you, or whether you are serving it

      Way to decide if it's better to eliminate the trigger or not

    7. research shows that ignoring a call or message can be just as distracting as responding to one

      Notifications are one of the main sources of distraction

    8. After all, the most important people in your life deserve better than the leftover time in your day

      That's influential...

    9. Go ahead and scroll through social media, but at allotted times

      Don't stress about purely working on your values. Allocate the time for the other activities, but only do them at the allocated time

    10. Turn your values into time

      Don't just talk about your values, but invest them into time

    11. Don’t pick your goals, pick your values

      Pick your values instead of goals. Otherwise others will dictate your activities and use your time. Example values:

      • being a contributing member of a team
      • being a loving parent
      • being in an equitable marriage
      • seeking wisdom
    12. “leaves on a stream” method. Imagine yourself beside a stream, on which leaves gently float by. Place each thought and negative feeling in your mind on one leaf and watch them float away

      Use "leaves on a stream" method when facing distraction. Put then on the leaves and let them float away. Apart from it:

      • identify things that prompt the distraction
      • log how you feel at that time
    13. You can’t control how you feel, but you can learn to control how you react to the way you feel.To start, you can change how you think about the bad feelings that can lead to distraction.

      We have lack of control over our feelings, but not over our reaction

    14. The truth is, we overuse video games, social media, and our cell phones not just for the pleasure they provide, but because they free us from psychological discomfort

      Root cause of human behaviour is the desire to escape discomfort

    15. The opposite of “distraction” is “traction.” Traction is any action that moves us towards what we really want. Tractions are actions, done with intent.

      Traction

    16. distractions aren’t necessarily your fault, they are your responsibility

      Learn to become indistrictable

    1. People who frequently consume artificially sweetened drinks biologically look 5-10 years older compared to those who don’t
    1. 3 lessons in branding

      3 lessons in branding:

      • “At the end of the day, brands are about trust”
      • All great brands are authentic, credible, and aspirational

        If you check all three of these boxes, there’s no reason why you can’t enter a new product vertical

      • To put your brand on steroids, attach yourself to a celebrity or influencer

    1. 16 MOOCs offered by the University of Pennsylvania through a company called Coursera. They found that completion rates averaged at around 4 percent.

      Completion rate of most MOOCs is fairly small

    2. The UT System invested $5 million in edX and committed to spending another $5 million on course development. It was in good company; Harvard University and the Massachusetts Institute of Technology had each ponied up $30 million to launch the service. Rice University, Texas’ elite private institution, also partnered with edX.

      Money invested in creation of edX

    1. 400 sized probability sample (a small random sample from the whole population) is often better than a millions sized administrative sample (of the kind you can download from gov sites). The reason is that an arbitrary sample (as opposed to a random one) is very likely to be biased, and, if large enough, a confidence interval (which actually doesn't really make sense except for probability samples) will be so narrow that, because of the bias, it will actually rarely, if ever, include the true value we are trying to estimate. On the other hand, the small, random sample will be very likely to include the true value in its (wider) confidence interval
    1. Some of the best and most affordable premium laptops on the market are now shipping with Linux pre-installed. More specifically they’re shipping with Deepin

      Huawei started selling laptops with Deepin

    1. Segway was $5,000 at launch in 2001, which is a whopping $7,248 in inflation adjusted dollars. Here in 2019, cheap $200 to $300 electric scooters are basically the transformational technology the Segway was supposed to be, aren't they

      Great decrease in price over the electric scooters

    2. Did I mention there's a bluetooth connection, and an app, and it is possible to hack the M365 firmware?

      M365 scooter is not fully secure

    3. There are some challenges with electric scooters, starting with the biggest one: your local government has no idea how to regulate the darn things.

      Regulating electric scooters brings a ton of questions