2,548 Matching Annotations
  1. Mar 2020
    1. We’ve always lived in a world which we didn’t completely understand but now we’re living in a world designed by us – for Pedro, that’s actually an improvement

      We never really understood the surroundings, but now we have a great impact to modify it

    2. But at the end of the day, what we know about neuroscience today is not enough to determine what we do in AI, it’s only enough to give us ideas.  In fact it’s a two way street – AI can help us to learn how the brain works and this loop between the two disciplines is a very important one and is growing very rapidly

      Neuroscience can help us understand AI and the opposite

    3. Pedro believes that success will come from unifying the different major types of learning and their master algorithms –not just combining, but unifying them such that “it feels like using one thing”

      Interesting point of view on designing the master algorithm

    4. if you look at the number  of connections that the state of the art machine learning systems for some of these problems have, they’re more than many animals – they have many hundreds of millions or billions of connections

      State of the art ML systems are composed of millions or billions of connections (close to humans)

    5. There was this period of a couple of 100 years where we understood our technology.  Now we just have to learn live in a world where we don’t understand the machines that work  for us, we just have to be confident they are working for us and doing their best

      Should we just accept the fact that machines will rule the world with a mysterious intelligence?

    1. When you run a PyTorch/TensorFlow model, most of the work isn’t actually being done in the framework itself, but rather by third party kernels. These kernels are often provided by the hardware vendor, and consist of operator libraries that higher-level frameworks can take advantage of. These are things like MKLDNN (for CPU) or cuDNN (for Nvidia GPUs). Higher-level frameworks break their computational graphs into chunks, which can then call these computational libraries. These libraries represent thousands of man hours of effort, and are often optimized for the architecture and application to yield the best performance

      What happens behind when you run ML frameworks

    2. Jax is built by the same people who built the original Autograd, and features both forward- and reverse-mode auto-differentiation. This allows computation of higher order derivatives orders of magnitude faster than what PyTorch/TensorFlow can offer

      Jax

    3. TensorFlow will always have a captive audience within Google/DeepMind, but I wonder whether Google will eventually relax this

      Generally, PyTorch will be much more favorised that maybe one day it will replace TensorFlow at Google's offices

    4. At their core, PyTorch and Tensorflow are auto-differentiation frameworks

      auto-differentation = takes derivative of some function. It can be implemented in many ways so most ML frameworks choose "reverse-mode auto-differentation" (known as "backpropagation")

    5. At the API level, TensorFlow eager mode is essentially identical to PyTorch’s eager mode, originally made popular by Chainer. This gives TensorFlow most of the advantages of PyTorch’s eager mode (ease of use, debuggability, and etc.) However, this also gives TensorFlow the same disadvantages. TensorFlow eager models can’t be exported to a non-Python environment, they can’t be optimized, they can’t run on mobile, etc. This puts TensorFlow in the same position as PyTorch, and they resolve it in essentially the same way - you can trace your code (tf.function) or reinterpret the Python code (Autograph).

      Tensorflow Eager

    6. The PyTorch JIT is an intermediate representation (IR) for PyTorch called TorchScript. TorchScript is the “graph” representation of PyTorch. You can turn a regular PyTorch model into TorchScript by using either tracing or script mode.

      PyTorch JIT

    7. Tracing takes a function and an input, records the operations that were executed with that input, and constructs the IR. Although straightforward, tracing has its downsides. For example, it can’t capture control flow that didn’t execute. For example, it can’t capture the false block of a conditional if it executed the true block

      Tracing mode in PyTorch

    8. On the other hand, industry has a litany of restrictions/requirements

      TensorFlow's requirements:

      • no Python <--- overhead of the Python runtime might be too much to take
      • mobile <--- Python can't be embedded in the mobile binary
      • serving <--- no-downtime updates of models, switching between models seamlessly, etc.
    9. TensorFlow is still the dominant framework. For example, based on data [2] [3] from 2018 to 2019, TensorFlow had 1541 new job listings vs. 1437 job listings for PyTorch on public job boards, 3230 new TensorFlow Medium articles vs. 1200 PyTorch, 13.7k new GitHub stars for TensorFlow vs 7.2k for PyTorch, etc

      Nowadays, the numbers still play against PyTorch

    1. Merlin returns the following labels: facial_hair, beard, screenshot, chin, human, film

      Types of features Merlin Video can generate from a single trailer frame.

      Final result of feature collecting and ordering:

    2. The obvious choice was Cloud Machine Learning Engine (Cloud ML Engine), in conjunction with the TensorFlow deep learning framework

      Merlin Video is powered by:

      • Cloud Machine Learning Engine - automating infrastructure (resources, provisioning and monitoring)
      • TensorFlow
      • Cloud Dataflow and Data Studio - Dataflow generates reports in Data Studio
      • BigQuery and BigQueryML - used in a final step to merge Merlin’s millions of customer predictions with other data sources to create useful reports and to quickly prototype media plans for marketing campaigns
    3. custom model learns the temporal sequencing of labels in the movie trailer

      Temporal sequencing - times of different shots (e.g. long or short).

      Temporal sequencing can convey information on:

      • movie type
      • movie plot
      • roles of the main characters
      • filmmakers' cinematographic choices.

      When combined with historical customer data, sequencing analysis can be used to create predictions of customer behavior.

      arxiv paper on Merlin Video

    4. The elasticity of Cloud ML Engine allowed the data science team to iterate and test quickly, without compromising the integrity of the deep learning model

      Cloud ML Engine reduced the deployment time from months to days

    5. The first challenge is the temporal position of the labels in the trailer: it matters when the labels occur in the trailer. The second challenge is the high dimensionality of this data

      2 challenges that we find in labelling video clips: occurrence and volume of labels

    6. When it comes to movies, analyzing text taken from a script is limiting because it only provides a skeleton of the story, without any of the additional dynamism that can entice an audience to see a movie

      Analysing movie script isn't enough to predict the overall movie's attractiveness to the audience

    7. model is trained end-to-end, and the loss of the logistic regression is back-propagated to all the trainable components (weights). Merlin’s data pipeline is refreshed weekly to account for new trailer releases

      Way the model is trained and located in the pipeline

    8. After a movie’s release, we are able to process the data on which movies were previously seen by that audience. The table below shows the top 20 actual moviegoer audiences (Comp ACT) compared to the top 20 predicted audiences (Comp PRED)

      Way of validating the Merlin model

    1. It’s possible to check whether a variable refers to it with the comparison operators == and !=

      Checking against None with == and !=

      >>> x, y = 2, None
      >>> x == None
      False
      >>> y == None
      True
      >>> x != None
      True
      >>> y != None
      False
      

      More Pythonic way by using is and is not:

      >>> x is None
      False
      >>> y is None
      True
      >>> x is not None
      True
      >>> y is not None
      False
      
    2. Python allows defining getter and setter methods similarly as C++ and Java

      Getters and Setters in Python:

      >>> class C:
      ...     def get_x(self):
      ...         return self.__x
      ...     def set_x(self, value):
      ...         self.__x = value
      

      Get and set the state of the object:

      >>> c = C()
      >>> c.set_x(2)
      >>> c.get_x()
      2
      
    3. In almost all cases, you can use the range to get an iterator that yields integers

      Iterating over Sequences and Mappings

      >>> x = [1, 2, 4, 8, 16]
      >>> for i in range(len(x)):
      ...     print(x[i])
      ... 
      1
      2
      4
      8
      16
      

      better way of iterating over a sequence:

      >>> for item in x:
      ...     print(item)
      ... 
      1
      2
      4
      8
      16
      
    4. Sometimes you need both the items from a sequence and the corresponding indices

      Reversing with indices:

      >>> for i in range(len(x)):
      ...     print(i, x[i])
      ... 
      0 1
      1 2
      2 4
      3 8
      4 16
      

      Better way by using enumerate:

      >>> for i, item in enumerate(x):
      ...     print(i, item)
      ... 
      0 1
      1 2
      2 4
      3 8
      4 16
      
    5. But what if you want to iterate in the reversed order? Of course, the range is an option again

      Iterating over a reversed order:

      >>> for i in range(len(x)-1, -1, -1):
      ...     print(x[i])
      ... 
      16
      8
      4
      2
      1
      

      More elegant way:

      >>> for item in x[::-1]:
      ...     print(item)
      ... 
      16
      8
      4
      2
      1
      

      Pythonic way of reversing an order:

      >>> for item in reversed(x):
      ...     print(item)
      ... 
      16
      8
      4
      2
      1
      
    6. it’s often more elegant to define and use properties, especially in simple cases

      Defining some properties (considered to be more Pythonic):

      >>> class C:
      ...     @property
      ...     def x(self):
      ...         return self.__x
      ...     @x.setter
      ...     def x(self, value):
      ...         self.__x = value
      

      Result:

      >>> c = C()
      >>> c.x = 2
      >>> c.x
      2
      
    7. Python has a very flexible system of providing arguments to functions and methods. Optional arguments are a part of this offer. But be careful: you usually don’t want to use mutable optional arguments

      Avoiding multiple optional arguments:

      >>> def f(value, seq=[]):
      ...     seq.append(value)
      ...     return seq
      

      If you don't provide seq, f() appends a value to an empty list and returns something like [value]:

      >>> f(value=2)
      [2]
      

      Don't be fooled. This option isn't fine...

      >>> f(value=4)
      [2, 4]
      >>> f(value=8)
      [2, 4, 8]
      >>> f(value=16)
      [2, 4, 8, 16]
      
    8. Iterating over a dictionary yields its keys

      Iterating over a dictionary:

      >>> z = {'a': 0, 'b': 1}
      >>> for k in z:
      ... print(k, z[k])
      ... 
      a 0
      b 1
      

      Applying method .items():

      >>> for k, v in z.items():
      ...     print(k, v)
      ... 
      a 0
      b 1
      

      You can also use the methods .keys() and .values()

    9. what if you want to iterate over two or more sequences? Of course, you can use the range again

      Iterating over two or more sequences:

      >>> y = 'abcde'
      >>> for i in range(len(x)):
      ...     print(x[i], y[i])
      ... 
      1 a
      2 b
      4 c
      8 d
      16 e
      

      Better solution by applying zip:

      >>> for item in zip(x, y):
      ...     print(item)
      ... 
      (1, 'a')
      (2, 'b')
      (4, 'c')
      (8, 'd')
      (16, 'e')
      

      Combining it with unpacking:

      >>> for x_item, y_item in zip(x, y):
      ...     print(x_item, y_item)
      ... 
      1 a
      2 b
      4 c
      8 d
      16 e
      
    10. Python allows you to chain the comparison operations. So, you don’t have to use and to check if two or more comparisons are True

      Chaining <--- checking if two or more operations are True

      >>> x = 4
      >>> x >= 2 and x <= 8
      True
      

      More compact (mathematical) form:

      >>> 2 <= x <= 8
      True
      >>> 2 <= x <= 3
      False
      
    11. Unpacking can be used for the assignment to multiple variables in more complex cases

      Unpacking <--- assign even more variables

      >>> x = (1, 2, 4, 8, 16)
      >>> a = x[0]
      >>> b = x[1]
      >>> c = x[2]
      >>> d = x[3]
      >>> e = x[4]
      >>> a, b, c, d, e
      (1, 2, 4, 8, 16)
      

      more readable approach:

      >>> a, b, c, d, e = x
      >>> a, b, c, d, e
      (1, 2, 4, 8, 16)
      

      even cooler (* collects values not assigned to others):

      >>> a, *y, e = x
      >>> a, e, y
      (1, 16, [2, 4, 8])
      
    12. Python doesn’t have real private class members. However, there’s a convention that says that you shouldn’t access or modify the members beginning with the underscore (_) outside their instances. They are not guaranteed to preserve the existing behavior.

      Avoiding accessing protected class members Consider the code:

      >>> class C:
      ...     def __init__(self, *args):
      ...         self.x, self._y, self.__z = args
      ... 
      >>> c = C(1, 2, 4)
      

      The instances of class C have three data members: .x, .y, and ._Cz (because z has two underscores, unlike y). If a member’s name begins with a double underscore (dunder), it becomes mangled, that is modified. That’s why you have ._Cz instead of ._z.

      Now, it's quite OK to access/modify .x directly:

      >>> c.x  # OK
      1
      

      You can also access ._y, from outside its instance, but it's considered a bad practice:

      >>> c._y  # Possible, but a bad practice!
      2
      

      You can’t access .z because it’s mangled, but you can access or modify ._Cz:

      >>> c.__z # Error!
      Traceback (most recent call last):
      File "", line 1, in 
      AttributeError: 'C' object has no attribute '__z'
      >>> c._C__z # Possible, but even worse!
      4
      >>>
      
    13. what if an exception occurs while processing your file? Then my_file.close() is never executed. You can handle this with exception-handling syntax or with context managers

      **`with a`` block to handle exceptions:

      >>> with open('filename.csv', 'w') as my_file:
      ...     # do something with `my_file`
      

      Using the with block means that the special methods .enter() and .exit() are called, even in the cases of exceptions

    14. Python code should be elegant, concise, and readable. It should be beautiful. The ultimate resource on how to write beautiful Python code is Style Guide for Python Code or PEP 8

      Write beautiful Python code with

      • Style Guide for Python Code
      • PEP 8

      PEP 8 provides the style guide for Python code, and PEP 20 represents the principles of Python language

    15. Python also supports chained assignments. So, if you want to assign the same value to multiple variables, you can do it in a straightforward way

      Chained assignments <--- assign the same value to multiple variables:

      >>> x = 2
      >>> y = 2
      >>> z = 2
      

      More elegant way:

      >>> x, y, z = 2, 2, 2
      

      Chained assignments:

      >>> x = y = z = 2
      >>> x, y, z
      (2, 2, 2)
      
    16. open a file and process it

      Open a file:

      >>> my_file = open('filename.csv', 'w')
      >>> # do something with `my_file`
      

      Close the file to properly manage memory:

      >>> my_file = open('filename.csv', 'w')
      >>> # do something with `my_file and`
      >>> my_file.close()
      
    17. You can keep away from that with some additional logic. One of the ways is this:

      You can keep away from the problem:

      >>> def f(value, seq=[]):
      ...     seq.append(value)
      ...     return seq
      

      with some additional logic:

      >>> def f(value, seq=None):
      ...     if seq is None:
      ...         seq = []
      ...     seq.append(value)
      ...     return seq
      

      Shorter version:

      >>> def f(value, seq=None):
      ...     if not seq:
      ...         seq = []
      ...     seq.append(value)
      ...     return seq
      

      The result:

      >>> f(value=2)
      [2]
      >>> f(value=4)
      [4]
      >>> f(value=8)
      [8]
      >>> f(value=16)
      [16]
      
    18. The Pythonic way is to exploit the fact that zero is interpreted as False in a Boolean context, while all other numbers are considered as True

      Comparing to zero - Pythonic way:

      >>> bool(0)
      False
      >>> bool(-1), bool(1), bool(20), bool(28.4)
      (True, True, True, True)
      

      Using if item instead of if item != 0:

      >>> for item in x:
      ...     if item:
      ...         print(item)
      ... 
      1
      2
      3
      4
      

      You can also use if not item instead of if item == 0

    19. When you have numeric data, and you need to check if the numbers are equal to zero, you can but don’t have to use the comparison operators == and !=

      Comparing to zero:

      >>> x = (1, 2, 0, 3, 0, 4)
      >>> for item in x:
      ...     if item != 0:
      ...         print(item)
      ... 
      1
      2
      3
      4
      
    20. it’s a good practice to use properties when you can and C++-like getters and setters when you have to

      Use

      • properties <--- when you can
      • getters and setters <--- when you have to
    1. First of all, write a script that carries out the task in a sequential fashion. Secondly, transform the script so that it carries out the task using the map function. Lastly, replace map with a neat function from the concurrent.futures module

      Concurrent Python programs in 3 steps:

      1. Write a script that carries out the task in a sequential fashion.
      2. Transform the script so that it carries out the task using the map function.
      3. Replace map with a neat function from the concurrent.futures module.
    2. Python is a poor choice for concurrent programming. A principal reason for this is the ‘Global Interpreter Lock’ or GIL. The GIL ensures that only one thread accesses Python objects at a time, effectively preventing Python from being able to distribute threads onto several CPUs by default

      Python isn't the best choice for concurrent programming

    3. Introducing multiprocessing now is a cinch; I just replace ThreadPoolExecutor with ProcessPoolExecutor in the previous listing

      Replacing multithreading with multiprocessing:

      replace ThreadPoolExecutor with ProcessPoolExecutor

    4. As the name suggests, multiprocessing spawns processes, while multithreading spawns threads. In Python, one process can run several threads. Each process has its proper Python interpreter and its proper GIL. As a result, starting a process is a heftier and more time-consuming undertaking than starting a thread.

      Reason for multiprocessing being slower than multithreading:

      Multiprocessing is more time-consuming to use because of its architecture

    1. its purpose is to dump Python tracebacks explicitly on a fault, after a timeout, or on a user signal

      Faulthandler in contrast to tracing tracks specific events and has slightly better documentation

    2. what parts of the software do we profile (measure its performance metrics)

      Most profiled parts of the software:

      • Method or function (most common)
      • Lines (similar to method profiling, but doing it line by line)
      • Memory (memory usage)
    3. The profile module gives similar results with similar commands. Typically, you switch to profile if cProfile isn’t available

      cProfile > profile. Use profile only when cProfile isn't available

    4. The purpose of trace module is to “monitor which statements and functions are executed as a program runs to produce coverage and call-graph information

      Purpose of trace module

    5. method profiling tool like cProfile (which is available in the Python language), the timing metrics for methods can show you statistics, such as the number of calls (shown as ncalls), total time spent in the function (tottime), time per call (tottime/ncalls and shown as percall), cumulative time spent in a function (cumtime), and cumulative time per call (quotient of cumtime over the number of primitive calls and shown as percall after cumtime)

      cProfile is one of the Python tools to measure method execution time. Specifically:

      • number of calls (shown as ncalls)
      • total time spent in the function (tottime)
      • time per call (tottime/ncalls and shown as percall)
      • cumulative time spent in a function (cumtime)
      • cumulative time per call (quotient of cumtime over the number of primitive calls and shown as percall after cumtime)
    6. tracing is a special use case of logging in order to record information about a program’s execution

      Tracing (more for software devs) is very similar to event logging (more for system administrators)

    7. If a method has an acceptable speed but is so frequently called that it becomes a huge time sink, you would want to know this from your profiler

      We also want to measure the frequency of method calls. cProfile can highlight the number of function calls and how many of those are native calls

    1. It’s just that it often makes sense to write code in the order JOIN / WHERE / GROUP BY / HAVING. (I’ll often put a WHERE first to improve performance though, and I think most database engines will also do a WHERE first in practice)

      Pandas usually writes code in this syntax:

      1. JOIN
      2. WHERE
      3. GROUP BY
      4. HAVING

      Example:

      1. df = thing1.join(thing2) # like a JOIN
      2. df = df[df.created_at > 1000] # like a WHERE
      3. df = df.groupby('something', num_yes = ('yes', 'sum')) # like a GROUP BY
      4. df = df[df.num_yes > 2] # like a HAVING, filtering on the result of a GROUP BY
      5. df = df[['num_yes', 'something1', 'something']] # pick the columns I want to display, like a SELECT
      6. df.sort_values('sometthing', ascending=True)[:30] # ORDER BY and LIMIT
      7. df[:30]
    1. vectors here have length 7 instead of 6 because of the extra 0 element at the beginning. This is an inconsequential detail - Keras reserves index 0 and never assigns it to any word.

      Keras always reserves 0 and never assigns any word to it; therefore, even when we have 6 words, we end up with the length of 7:

      [0. 1. 1. 1. 0. 0. 0.]

    1. # Creating a 5x5 matrix arr = [[i for i in range(5)] for j in range(5)] arr >>> [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]

      Nested for loop using list comprehension to come up with 5x5 matrix:

      arr = [[i for i in range(5)] for j in range(5)]
      arr
      >>> [[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]]
      
    2. x = [2,45,21,45] y = {i:v for i,v in enumerate(x)} print(y) >>> {0: 2, 1: 45, 2: 21, 3: 45}

      List comprehension in Python to create a simple dictionary:

      x = [2,45,21,45]
      y = {i:v for i,v in enumerate(x)}
      print(y)
      >>> {0: 2, 1: 45, 2: 21, 3: 45}
      
    3. x = [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]] arr = [i for j in x for i in j] print(arr) >>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

      Flattening a multi-dimensional matrix into a 1-D array using list comprehension:

      x = [[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]]
      
      arr = [i for j in x for i in j]
      print(arr)
      >>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
      
    4. arr = [i for i in range(10) if i % 2 == 0] print(arr) >>> [0, 2, 4, 6, 8] arr = ["Even" if i % 2 == 0 else "Odd" for i in range(10)] print(arr) >>> ['Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd']

      2 examples of conditional statements in list comprehension:

      arr = [i for i in range(10) if i % 2 == 0]
      print(arr)
      >>> [0, 2, 4, 6, 8]
      

      and:

      arr = ["Even" if i % 2 == 0 else "Odd" for i in range(10)]
      print(arr)
      >>> ['Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd']
      
    1. the __ methods allow us to interact with core concepts of the python language. You can see them also as a mechanism of implementing behaviours, interface methods.

      __ methods

    2. Dunder or magic method, are methods that start and end with double _ like __init__ or __str__. This kind of methods are the mechanism we use to interact directly with python's data model

      Dunder methods

    1. Dutch programmer Guido van Rossum designed Python in 1991, naming it after the British television comedy Monty Python's Flying Circus because he was reading the show's scripts at the time.

      Origins of Python name

    1. supporting this field is extremely easy If you keep raw data, it's just a matter of adding a getter method to the Article class.

      Way of supporting a new field in JSON is much easier than in a relational database:

      @property
      def highlights(self) -> Sequence[Highlight]:
          default = [] # defensive to handle older export formats that had no annotations
          jsons = self.json.get('annotations', default)
          return list(map(Highlight, jsons))
      
    2. query language doesn't necessarily mean a database. E.g. see pandas which is capable of what SQL is capable of, and even more convenient than SQL for our data exploration purposes.

      Query language, not always = database. For example, see pandas

    3. cachew lets you cache function calls into an sqlite database on your disk in a matter of single decorator (similar to functools.lru_cache). The difference from functools.lru_cache is that cached data is persisted between program runs, so next time you call your function, it will only be a matter of reading from the cache.

      cachew tool isolates the complexity of database access patterns in a Python library

    1. We only need to use global keyword in a function if we want to do assignments / change them. global is not needed for printing and accessing.

      global inside a function is:

      • required if we want to assign/change variable
      • not required if we just want to print/access a variable.

      Example code:

      # This function modifies global variable 's' 
      def f(): 
          global s 
          print s 
          s = "Look for Geeksforgeeks Python Section"
          print s  
      
      # Global Scope 
      s = "Python is great!" 
      f() 
      print s
      
    1. The nonlocal keyword is used to work with variables inside nested functions, where the variable should not belong to the inner function. Use the keyword nonlocal to declare that the variable is not local.

      nonlocal is used to declare variables inside the nested functions.

      Example (if nonlocal wouldn't be used, the output would be "John", not "hello"):

      def myfunc1():
        x = "John"
        def myfunc2():
          nonlocal x
          x = "hello"
        myfunc2()
        return x
      
      print(myfunc1())
      
    1. Here, I rewrite this example using the walrus operator

      How to reduce Python expressions using the walrus operator (:=). It's used to both assign and evaluate variable names in a single expression, thus reducing repetition.

      Before:

      count = fresh_fruit.get('lemon', 0)
      if count:
      ...
      

      After:

      if count := fresh_fruit.get('lemon', 0):
      ...
      
    1. During each loop of reduce, you can get result of the last loop, and the next element in the array Change the result, then return it for the next loop iteration When you're done, you have the completed collection

      reduce reduce

    2. So why are map, filter and reduce useful?

      Advantages of map, filter, reduce:

      • don't have to manually loop over array
      • chain together for short, straightforward array transformations
      • can reuse callback functions and compose them together
    1. Each year the flu infects between 2 to 11% of the population. 11The infectiousness of a disease is measured by its R0 value. R0 is a measure of how many people will catch a disease from one infected person. The R0 value of the flu is 1.28. The R0 of COVID-19 is assumed to be between 1.4–3.8

      R0 - measure of how many people will catch a disease from one infected person.

      R0 (Flu): 1.28

      R0 (COVID-19): 1.4 - 3.8 (but some say 4.7 - 7.12)

    2. cases outside of china are doubling roughly every 5 days. The reported rate of infection in China is lower, but China has taken drastic quarantine measures, including welding people inside apartment buildings.

      Doubling tendency of COVID-19

    3. In The Atlantic, Marc Lipsitch, a leading epidemiologist at Harvard reported that “that within the coming year, some 40 to 70 percent of people around the world will be infected with the virus that causes COVID-19”. This was last week and while ridiculed at the time, his assumptions are now the generally accepted position among epidemiologists.

      40 - 70 % of people will be infected with COVID-19 (it might even increase)

    1. 10-50 minutes in natural spaces was the most effective to improve mood, focus and physiological markers like blood pressure and heart rate. “It’s not that there’s a decline after 50 minutes, but rather that the physiological and self-reported psychological benefits tend to plateau after that,”

      Nature therapy

      • 10 to 50 minutes in nature was found to be most effective to improve mood, focus, blood pressure, and heart rate
      • After 50 minutes the effects tend to plateau
  2. Feb 2020
    1. We’re mapping a human health tragedy that may get way worse before it subsides. Do we really want the map to be screaming bright red? Red is a very emotive colour. It has meaning. It can easily connotate danger, and death, which is still statistically extremely rare for coronavirus.

      Why using a red colour on choropleth map might be not the best choice

    1. Even ignoring quality of life you are looking at a 3-7 fold return on every minute you spend exercising in extended life,[1] perhaps even exceeding that if you are making optimal use of your time. Something just clicked and I was consistent since reading this. Even 1-fold return would worth it: basically you gain free consciousness (quality of thinking is shit when I exercise, but it's better than nothing).

      While exercising, you get additional lifespan, which obviously isn't linear, but still worth the effort

    2. Annotation is the process of adding a layer of personal (with a potential for sharing) metadata on top of existing content such as highlights, notes, anchors etc.

      Just like I'm doing it here now (annotation) ;)

    3. Human brains seem to be best for generating new ideas. I want to learn more, think faster, distract less, interact and visualize, effortlessly remember everything; not memorize and do routine information processing, which computers seem better at.

      Why Personal Knowledge Management (PKM) is important

    4. Extended mind is the idea that your mind isn't limited by your physical body, but augmented by external means. Paper notes & diaries, data in your computer/phone/cloud – all these things contribute to your cognitive process, thinking and help you navigate the landscape of ideas. One of the biggest motivations to start dumping my brain out here, write and share is to offload my mind a bit and interact with people even in my physical absence.

      Extended mind - idea that our mind doesn't only resist in our physical body

    5. I can't imagine going on a run without my HR monitor, because whatever the benefit exercise has and however exhausting run would be, at least I'll have a data point. learning about routine and trying to optimize routine is also way more fun than doing the actual routine Human body is fragile and needs constant care, but it's still a fascinating mechanism.

      Quantifying self during the run is also my main use. It adds a layer of motivation when I see just the timestamps of all my journeys

    1. quantified self refers both to the cultural phenomenon of self-tracking with technology and to a community of users and makers of self-tracking tools who share an interest in “self-knowledge through numbers.”[1] Quantified Self practices overlap with the practice of lifelogging and other trends that incorporate technology and data acquisition into daily life, often with the goal of improving physical, mental, and/or emotional performance.

      Quantified self relates to all of us managing our life with Mi Bands, Fitbits, etc.

    1. code below instantiates a client, opens a session, and runs an insertion query

      Start by instantiating a client and opening a session. Afterwards, you can begin to run queries

    2. We just created a Grakn keyspace experiment and defined its schema.

      It's good to have a single Grakn keyspace per application (outermost container for data in a Grakn knowledge graph, corresponding closely to a relational database)

    3. Now that we have the schema ready, the next step is to load it into Grakn.

      After defining schema, it needs to be loaded into Grakn:

      1) Place schema.gql in the container volume, such as db/schema.gql.

      2) Run:

      docker exec -ti grakn bash -c '/grakn-core-all-linux/grakn console --keyspace experiment --file /grakn-core-all-linux/server/db/schema.gql'
      

      3) Observe a similar result:

      Loading: /grakn-core-all-linux/server/db/schema.gql
      ...
      {}
      Successful commit: schema.gql
      
    4. An attribute can be abstract if you never assign it directly and use it only as a parent type. Entities can be abstract, too, if they are never instantiated.

      Attributes and entities can be abstract

    5. There's just one more step – defining the attribute types

      Don't forget to define attribute types, such as:

      name sub attribute,
          datatype string;
      
      address sub attribute,
          datatype string;
      
      timestamp sub attribute, abstract,
          datatype date;
      
          created sub timestamp;
          last-modified sub timestamp;
          last-accessed sub timestamp;
          penalty-until sub timestamp;
      
      url sub attribute,
          datatype string;
      

      ...

    6. We've ended up with three entities: user, badge and location. How to glue them together? Using relations.

      Use of relations to glue different entities:

      location-of-user sub relation,
          relates located-user,
          relates user-location;
      
      achievements sub relation,
          has score,
          relates contributor,
          relates award;
      
    7. Some things are common to multiple users, like a location (e.g. Austin, TX, USA) or the types of badges they've been awarded (bronze, silver, gold). We'll model locations and badges as separate entities.

      Modelling separate entities:

      location sub entity,
          key address,
          plays user-location;
      
      badge sub entity,
          key color,
          plays award;
      
    1. Imagine that you're using a database to export them, so your schema is: TABLE Article(STRING id, STRING url, STRING title, DATETIME added). One day, the developers expose highlights (or annotations) from the private API and your export script stats receiving it in the response JSON. It's quite useful data to have! However, your database can't just magically change to conform to the new field.

      Relational model can be sometimes hand tying, unlike JSON

    2. if necessary use databases as an intermediate layer to speed access up and as an additional interface to your data Nothing wrong with using databases for caching if you need it!

      You may want to use databases for:

      • speeding access up
      • creating additional layer
      • caching
    3. I want to argue very strongly against forcing the data in the database, unless it's really inevitable.

      After scraping some data, don't go immediately to databases, unless it's a great stream of data

    1. But I think we read a lot of self-help because we need to. As I’ve already mentioned, we need lots of examples to drive this wisdom home. We should be more forgiving of self-help (the genre) and more forgiving of ourselves. Putting wisdom into practice takes requires reading, reflection, and practice—but it’s worth it.

      Read lots of self-help as it's worth it

    2. no business book can predict what sorts of situations (businesses, market conditions, etc.) the reader will encounter, so instead it offers general, obvious-sounding rules.

      Business books

    3. We feel divinely inspired while reading Minimalism, but when it’s time to actually cull our wardrobes, it turns out we have good reasons for keeping everything!

      Read less, apply more!

    4. We’re embarrassed by self-help because (at its best) it’s full of banal platitudes—but these are platitudes because they’re so general.

      We tend to be embarrassed by self-help

    5. So which is it? Is our obsession with self-help embarrassing or admirable? Is self-help snake oil or salvation? I’m going to argue that it’s both.

      Self-help can be good and bad

    6. But what if you wanted wisdom, not knowledge? Are there books that contain wisdom? In other words, are there books that give you general-purpose, one-size-fits-all advice for navigating life? Of course there is! It’s called self-help.

      Self-help books are there to make you wise, not knowledgable

    7. “a wise person knows what to do in most situations, while a [knowledgeable]1 person knows what to do in situations where few others could.” In other words, wise people are moderately successful in many domains, while knowledgeable people are very successful in a few.

      ~ Paul Graham

    1. there are two sources of feeling like a noob: being stupid, and doing something novel. Our dislike of feeling like a noob is our brain telling us "Come on, come on, figure this out."

      Two sources of being a noob

    2. the more of a noob you are locally, the less of a noob you are globally.For example, if you stay in your home country, you'll feel less of a noob than if you move to Farawavia, where everything works differently. And yet you'll know more if you move. So the feeling of being a noob is inversely correlated with actual ignorance.

      Being a noob

    1. n the end, it didn’t matter if the students walked more than even 15,000 steps; they still gained weight
      • Step-counting experiment tracked caloric intake and weight of 120 participants throughout 24 weeks of walking different amounts of steps a day for 6 days per week.
      • Participants gained 1.5 kg (3.5 lbs) on average during the experiment, even those who walked 15,000 steps a day.
      • While their weight wasn’t affected, they did notice positive changes in their physical activities, and reduced sedentary time.
      • Overall, the more steps we walk, the better.
      • Step-a-day goals are great to get up and get moving, but if you need to lose weight, you must do other things as well.
    1. Scientists at EPFL in Switzerland have shown that you are more likely to initiate a voluntary decision as you exhale.

      We make conscious decisions when we breathe out, says new study involving 52 people pressing a button, monitored with brain, heart and lung sensors.

    1. A combinaison of split(), subString(), removePrefix(), removeSuffix() is usually enough.

      Sometimes the following functions are more than enough for your string matching problems

  3. Jan 2020
    1. When you get tired of thinking about a piece of work and feeling bad for not finishing it yet, go "screw it, let's do it" and start with something, anything.

      One way of starting to do what we postpone

    1. Level 0 is no automation whatsoever. Level 1 is partial assistance with certain aspects of driving, like lane keep assist or adaptive cruise control. Level 2 is a step up to systems that can take control of the vehicle in certain situations, like Tesla's Autopilot or Cadillac's Super Cruise, while still requiring the driver to pay attention. Get past that and we enter the realm of speculation: Level 3 promises full computer control under defined conditions during a journey, Level 4 expands that to start-to-finish autonomous tech limited only by virtual safeguards like a geofence, and Level 5 is the total hands-off, go-anywhere-at-the-push-of-a-button experience.

      Description of 6 levels defining autonomous cars:

      1. Level 0 - no automation.
      2. Level 1 - partial assistance with certain aspects of driving, like lane keep assist or adaptive cruise control.
      3. Level 2 - step up to systems that can take control of the vehicle in certain situations, like Tesla's Autopilot or Cadillac's Super Cruise, while still requiring the driver to pay attention.
      4. Level 3 - promises full computer control under defined conditions during a journey.
      5. Level 4 - expands that to start-to-finish autonomous tech limited only by virtual safeguards like a geofence.
      6. Level 5 - total hands-off, go-anywhere-at-the-push-of-a-button experience.
    1. I have observed something else under the sun. The fastest runner doesn’t always win the race, and the strongest warrior doesn’t always win the battle. The wise sometimes go hungry, and the skillful are not necessarily wealthy. And those who are educated don’t always lead successful lives. It is all decided by chance, by being in the right place at the right time. — Ecclesiastes 9:11
    2. They simply possess the willpower and drive to observe people, get to know people, appear in gatherings that involve people that are aligned with their goals, and connect people with one another.

      Skill that puts sometimes the smartest minds below you

    1. Summarizing a paper in your own words restructures the content to focus on learning rather than novelty.

      In the scientific papers we convey novelty, hence, some of the early readers might confuse themselves that this is the right way to speak in a daily scientific community

    2. Blogging has taught me how to read a paper because explaining something is a more active form of understanding. Now I summarize the main contribution in my own words, write out the notation and problem setup, define terms, and rederive the main equations or results. This process mimics the act of presenting and is great practice for it.

      Why teaching others/blogging has a great value in terms of learning new topics

    3. When I first started teaching myself to program, I felt that I had no imagination. I couldn’t be creative because I was too focused on finding the syntax bug or reasoning about program structure. However, with proficiency came creativity. Programming became less important than what I was building and why.

      While learning, don't worry about the creativity, which shall come after gaining proficiency (knowledge base)

    4. In my opinion the reason most people fail to do great research is that they are not willing to pay the price in self-development. Say some new field opens up that combines field XXX and field YYY. Researchers from each of these fields flock to the new field. My experience is that virtually none of the researchers in either field will systematically learn the other field in any sort of depth. The few who do put in this effort often achieve spectacular results.

      I think we all know that...

    5. Many of us have done this on exams, hoping for partial credit by stitching together the outline of a proof or using the right words in an essay with the hopes that the professor connects the dots for us.

      Often we tend to communicate with a jargon we don't understand just to pretend we know something

    1. Since water is denser than air, and the reflection is diffuse. A lot of light is internally reflected, thereof, increasing the probability of absorption at surface.

      The light is reflected back inside the water, because of the total internal reflection:

      • water is denser than air
      • angle of incidence is greater than the so-called critical angle

    2. This is because the light now has a layer of water to go through. And due to the reflectance of water, not all light at the air-liquid-interface (border between air and water) goes through the water. Some of it is reflected.

      Wet things become darker, because of the water consistency, reflectance that doesn't let all the light to transmit through it.

      The probability of light getting transmitted is: 1 - R1 (reflectance at the air-liquid interface)

    3. There are two types of reflection (two ways the wave can be thrown back). Specular Diffuse

      Two types of reflection:

      1. specular - light leaves the surface at the same angle it hits it
      2. diffuse - hitting light is scattered into all angles when reflected
    1. ericb 12 days ago | unvote [-] * Better googling. Time-restricted, url restricted, site restricted searches. Search with the variant parts of error messages removed.* Read the source of upstream dependencies. Fix or fork them if needed.* They're better at finding forks with solutions and gleaning hints from semi-related issues.* Formulate more creative hypothesis when obvious lines of investigation run out. The best don't give up.* Dig in to problems with more angles of investigation.* Have more tools in their toolbelt for debugging like adding logging, monkey-patching, swapping parts out, crippling areas to rule things out, binary search of affected code areas.* Consider the business.* Consider user-behavior.* Assume hostile users (security-wise).* Understand that the UI is not a security layer. Anything you can do with PostMan your backend should handle.* Whitelist style-security over blacklist style.* See eventual problems implied by various solutions.* "The Math."

      What do top engineers do that others don't?

      • Better googling. Time-restricted, url restricted, site restricted searches. Search with the variant parts of error messages removed.
      • Read the source of upstream dependencies. Fix or fork them if needed.
      • They're better at finding forks with solutions and gleaning hints from semi-related issues.
      • Formulate more creative hypothesis when obvious lines of investigation run out. The best don't give up.
      • Dig in to problems with more angles of investigation.
      • Have more tools in their toolbelt for debugging like adding logging, monkey-patching, swapping parts out, crippling areas to rule things out, binary search of affected code areas.
      • Consider the business.
      • Consider user-behavior.
      • Assume hostile users (security-wise).
      • Understand that the UI is not a security layer. Anything you can do with PostMan your backend should handle.
      • Whitelist style-security over blacklist style.
      • See eventual problems implied by various solutions.
      • "The Math."
    1. technology diffused more easily along lines of latitude than along lines of longitude because climate changed more rapidly along lines of longitude making it more difficult for both humans and technologies to adapt

      Technology adapts better across latitude than longitude

    2. Anything that’s invented between when you’re fifteen and thirty-five is new and exciting and revolutionary and you can probably get a career in it.

      Before 15 it's just a part of our world creation, after 35 it's the natural order of things

    3. "One of history’s few iron laws is that luxuries tend to become necessities and to spawn new obligations. Once people get used to a certain luxury, they take it for granted. Then they begin to count on it. Finally they reach a point where they can’t live without it.”

      Be careful what becomes your necessity to live

    4. A 90s study showed that women preferred the scents of men whose immune systems were most different from their own immune-system genes. Evolutionarily this makes sense as, children should be healthier if their parents’ genes vary, protecting them from more pathogens.

      Why women are attracted to opposite scents in a men

    5. Here’s Warren Buffett: “Cola has no taste memory. You can drink one at 9am, 11am, 5pm. You can't do that with cream soda, root beer, orange, grape. You get sick of them after a while. The average person drinks 64 ounces of liquid per day, and you can have all 64 ounces of that be Coke.”Same with Doritos, Cheetos, most popular junk food. They are engineered to overcome “sensory-specific satiety” and to give a sense of “vanishing caloric density.”

      Why chips and coca-cola are addicting:

      the taste is vanishing

    6. Humor treads at the frontier of consciousness. When a comic finds a funny joke, they are unearthing a truth that people are only kind of aware of, but the whole room grasps that everybody else is aware of the truth, and laughter ensues.

      Humor as a comprehension

    7. “Twitter is the most amazing networking and learning network ever built.For someone whose pursuing their dream job, or chasing a group of mentors or peers, it’s remarkable. In any given field, 50-80% of the top experts in that field are on Twitter and they’re sharing ideas, and you can connect to them or follow them in your personal feed.If you get lucky enough and say something they find interesting, they might follow you, and the reason this becomes super interesting is that unlocks direct message, and now all of a sudden you can communicate directly or electronically with that individual. Very, very powerful.If you’re not using Twitter, you’re missing out.” — Bill Gurley

      I cannot agree more on this, since I finally accumulated a great network on Twitter. It's important to hit the bell icon next to the profiles we value the most, so that we're never missing out of their new content

    8. "The easiest way to be discovered right now in technology and perhaps many fields is to create your own independent blog and write. There is a huge dearth in availability of good, current, first party content today.The single most important advice I can give to actually write is to write.The thing that happens which you don’t see until you write is that your content engages some of the smartest people who are lurking around the internet. And they reach out to you."

      Totally agree with this ;)

    9. By some estimates, more than 50,000 pieces of artwork are stolen each year, amounting to annual losses of around $6 to $8 billion globally. This makes art theft one of the most valuable criminal enterprises around, exceeded only by drug trafficking and arms dealing.

      Art crime is even more serious than we think

    10. I've noticed a weird pattern: In most of the best marriages I see, one person is an early-bird, and the other is a night-owl. They have opposite circadian rhythms.I think this is healthy. The two partners get the benefits of time together and time alone, which helps the marriage.

      Circadian rhythm as a way to validate a great marriage

    11. look around and figure out who you want to be on your team. Figure out the people around you that you want to work with for the rest of your life. Figure out the people who are smart & awesome, who share your values, who get things done — and maybe most important, who you like to be with and who you want to help win. And treat them right, always. Look for ways to help, to work together, to learn. Because in 20 years you’ll all be in amazing places doing amazing things.

      One of the best life advises some can get

  4. Dec 2019
  5. unix4lyfe.org unix4lyfe.org
    1. If you want to store a humanly-readable time (e.g. logs), consider storing it along with Unix time, not instead of Unix time.

      Logs should be stored along with Unix time

    2. Unix time: Measured as the number of seconds since epoch (the beginning of 1970 in UTC). Unix time is not affected by time zones or daylight saving

      Unix time - # of seconds since 1970 in UTC