We’ve always lived in a world which we didn’t completely understand but now we’re living in a world designed by us – for Pedro, that’s actually an improvement
We never really understood the surroundings, but now we have a great impact to modify it
We’ve always lived in a world which we didn’t completely understand but now we’re living in a world designed by us – for Pedro, that’s actually an improvement
We never really understood the surroundings, but now we have a great impact to modify it
But at the end of the day, what we know about neuroscience today is not enough to determine what we do in AI, it’s only enough to give us ideas. In fact it’s a two way street – AI can help us to learn how the brain works and this loop between the two disciplines is a very important one and is growing very rapidly
Neuroscience can help us understand AI and the opposite
Pedro believes that success will come from unifying the different major types of learning and their master algorithms –not just combining, but unifying them such that “it feels like using one thing”
Interesting point of view on designing the master algorithm
if you look at the number of connections that the state of the art machine learning systems for some of these problems have, they’re more than many animals – they have many hundreds of millions or billions of connections
State of the art ML systems are composed of millions or billions of connections (close to humans)
There was this period of a couple of 100 years where we understood our technology. Now we just have to learn live in a world where we don’t understand the machines that work for us, we just have to be confident they are working for us and doing their best
Should we just accept the fact that machines will rule the world with a mysterious intelligence?
When you run a PyTorch/TensorFlow model, most of the work isn’t actually being done in the framework itself, but rather by third party kernels. These kernels are often provided by the hardware vendor, and consist of operator libraries that higher-level frameworks can take advantage of. These are things like MKLDNN (for CPU) or cuDNN (for Nvidia GPUs). Higher-level frameworks break their computational graphs into chunks, which can then call these computational libraries. These libraries represent thousands of man hours of effort, and are often optimized for the architecture and application to yield the best performance
What happens behind when you run ML frameworks
Jax is built by the same people who built the original Autograd, and features both forward- and reverse-mode auto-differentiation. This allows computation of higher order derivatives orders of magnitude faster than what PyTorch/TensorFlow can offer
Jax
TensorFlow will always have a captive audience within Google/DeepMind, but I wonder whether Google will eventually relax this
Generally, PyTorch will be much more favorised that maybe one day it will replace TensorFlow at Google's offices
At their core, PyTorch and Tensorflow are auto-differentiation frameworks
auto-differentation = takes derivative of some function. It can be implemented in many ways so most ML frameworks choose "reverse-mode auto-differentation" (known as "backpropagation")
Once your PyTorch model is in this IR, we gain all the benefits of graph mode. We can deploy PyTorch models in C++ without a Python dependency , or optimize it.
At the API level, TensorFlow eager mode is essentially identical to PyTorch’s eager mode, originally made popular by Chainer. This gives TensorFlow most of the advantages of PyTorch’s eager mode (ease of use, debuggability, and etc.) However, this also gives TensorFlow the same disadvantages. TensorFlow eager models can’t be exported to a non-Python environment, they can’t be optimized, they can’t run on mobile, etc. This puts TensorFlow in the same position as PyTorch, and they resolve it in essentially the same way - you can trace your code (tf.function) or reinterpret the Python code (Autograph).
Tensorflow Eager
TensorFlow came out years before PyTorch, and industry is slower to adopt new technologies than researchers
Reason why PyTorch wasn't previously more popular than TensorFlow
The PyTorch JIT is an intermediate representation (IR) for PyTorch called TorchScript. TorchScript is the “graph” representation of PyTorch. You can turn a regular PyTorch model into TorchScript by using either tracing or script mode.
PyTorch JIT
In 2018, PyTorch was a minority. Now, it is an overwhelming majority, with 69% of CVPR using PyTorch, 75+% of both NAACL and ACL, and 50+% of ICLR and ICML
Tracing takes a function and an input, records the operations that were executed with that input, and constructs the IR. Although straightforward, tracing has its downsides. For example, it can’t capture control flow that didn’t execute. For example, it can’t capture the false block of a conditional if it executed the true block
Tracing mode in PyTorch
On the other hand, industry has a litany of restrictions/requirements
TensorFlow's requirements:
TensorFlow is still the dominant framework. For example, based on data [2] [3] from 2018 to 2019, TensorFlow had 1541 new job listings vs. 1437 job listings for PyTorch on public job boards, 3230 new TensorFlow Medium articles vs. 1200 PyTorch, 13.7k new GitHub stars for TensorFlow vs 7.2k for PyTorch, etc
Nowadays, the numbers still play against PyTorch
every major conference in 2019 has had a majority of papers implemented in PyTorch
Legend:
the transition from TensorFlow 1.0 to 2.0 will be difficult and provides a natural point for companies to evaluate PyTorch
Chance of faster transition to PyTorch in industry
Merlin returns the following labels: facial_hair, beard, screenshot, chin, human, film
Types of features Merlin Video can generate from a single trailer frame.
Final result of feature collecting and ordering:
The obvious choice was Cloud Machine Learning Engine (Cloud ML Engine), in conjunction with the TensorFlow deep learning framework
Merlin Video is powered by:
custom model learns the temporal sequencing of labels in the movie trailer
Temporal sequencing - times of different shots (e.g. long or short).
Temporal sequencing can convey information on:
When combined with historical customer data, sequencing analysis can be used to create predictions of customer behavior.
The elasticity of Cloud ML Engine allowed the data science team to iterate and test quickly, without compromising the integrity of the deep learning model
Cloud ML Engine reduced the deployment time from months to days
Architecture flow diagram for Merlin
The first challenge is the temporal position of the labels in the trailer: it matters when the labels occur in the trailer. The second challenge is the high dimensionality of this data
2 challenges that we find in labelling video clips: occurrence
and volume
of labels
When it comes to movies, analyzing text taken from a script is limiting because it only provides a skeleton of the story, without any of the additional dynamism that can entice an audience to see a movie
Analysing movie script isn't enough to predict the overall movie's attractiveness to the audience
20th Century Fox has been using this tool since the release of The Greatest Showman in 2017, and continues to use it to inform their latest releases
The Merlin Video tool is used nowadays by 20th Century Fox
model is trained end-to-end, and the loss of the logistic regression is back-propagated to all the trainable components (weights). Merlin’s data pipeline is refreshed weekly to account for new trailer releases
Way the model is trained and located in the pipeline
After a movie’s release, we are able to process the data on which movies were previously seen by that audience. The table below shows the top 20 actual moviegoer audiences (Comp ACT) compared to the top 20 predicted audiences (Comp PRED)
Way of validating the Merlin model
Visual Question Answering (VQA): answering open-ended questions about images. VQA is interesting because it requires combining visual and language understanding.
Visual Question Answering (VQA) = visual + language understanding
Most VQA models would use some kind of Recurrent Neural Network (RNN) to process the question input
The standard approach to performing VQA looks something like this: Process the image. Process the question. Combine features from steps 1/2. Assign probabilities to each possible answer.
Approach to handle VQA problems:
It’s possible to check whether a variable refers to it with the comparison operators == and !=
Checking against None with ==
and !=
>>> x, y = 2, None
>>> x == None
False
>>> y == None
True
>>> x != None
True
>>> y != None
False
More Pythonic way by using is
and is not
:
>>> x is None
False
>>> y is None
True
>>> x is not None
True
>>> y is not None
False
Python allows defining getter and setter methods similarly as C++ and Java
Getters and Setters in Python:
>>> class C:
... def get_x(self):
... return self.__x
... def set_x(self, value):
... self.__x = value
Get and set the state of the object:
>>> c = C()
>>> c.set_x(2)
>>> c.get_x()
2
In almost all cases, you can use the range to get an iterator that yields integers
Iterating over Sequences and Mappings
>>> x = [1, 2, 4, 8, 16]
>>> for i in range(len(x)):
... print(x[i])
...
1
2
4
8
16
better way of iterating over a sequence:
>>> for item in x:
... print(item)
...
1
2
4
8
16
Sometimes you need both the items from a sequence and the corresponding indices
Reversing with indices:
>>> for i in range(len(x)):
... print(i, x[i])
...
0 1
1 2
2 4
3 8
4 16
Better way by using enumerate:
>>> for i, item in enumerate(x):
... print(i, item)
...
0 1
1 2
2 4
3 8
4 16
But what if you want to iterate in the reversed order? Of course, the range is an option again
Iterating over a reversed order:
>>> for i in range(len(x)-1, -1, -1):
... print(x[i])
...
16
8
4
2
1
More elegant way:
>>> for item in x[::-1]:
... print(item)
...
16
8
4
2
1
Pythonic way of reversing an order:
>>> for item in reversed(x):
... print(item)
...
16
8
4
2
1
it’s often more elegant to define and use properties, especially in simple cases
Defining some properties (considered to be more Pythonic):
>>> class C:
... @property
... def x(self):
... return self.__x
... @x.setter
... def x(self, value):
... self.__x = value
Result:
>>> c = C()
>>> c.x = 2
>>> c.x
2
Python has a very flexible system of providing arguments to functions and methods. Optional arguments are a part of this offer. But be careful: you usually don’t want to use mutable optional arguments
Avoiding multiple optional arguments:
>>> def f(value, seq=[]):
... seq.append(value)
... return seq
If you don't provide seq
, f()
appends a value to an empty list and returns something like [value]
:
>>> f(value=2)
[2]
Don't be fooled. This option isn't fine...
>>> f(value=4)
[2, 4]
>>> f(value=8)
[2, 4, 8]
>>> f(value=16)
[2, 4, 8, 16]
Iterating over a dictionary yields its keys
Iterating over a dictionary:
>>> z = {'a': 0, 'b': 1}
>>> for k in z:
... print(k, z[k])
...
a 0
b 1
Applying method .items():
>>> for k, v in z.items():
... print(k, v)
...
a 0
b 1
You can also use the methods .keys()
and .values()
following the rules called The Zen of Python or PEP 20
The Zen of Python or PEP 20 <--- rules followed by Python
You can use unpacking to assign values to your variables
Unpacking <--- assign values
>>> a, b = 2, 'my-string'
>>> a
2
>>> b
'my-string'
what if you want to iterate over two or more sequences? Of course, you can use the range again
Iterating over two or more sequences:
>>> y = 'abcde'
>>> for i in range(len(x)):
... print(x[i], y[i])
...
1 a
2 b
4 c
8 d
16 e
Better solution by applying zip:
>>> for item in zip(x, y):
... print(item)
...
(1, 'a')
(2, 'b')
(4, 'c')
(8, 'd')
(16, 'e')
Combining it with unpacking:
>>> for x_item, y_item in zip(x, y):
... print(x_item, y_item)
...
1 a
2 b
4 c
8 d
16 e
None is a special and unique object in Python. It has a similar purpose, like null in C-like languages
None (Python) ==similar== Null (C)
Python allows you to chain the comparison operations. So, you don’t have to use and to check if two or more comparisons are True
Chaining <--- checking if two or more operations are True
>>> x = 4
>>> x >= 2 and x <= 8
True
More compact (mathematical) form:
>>> 2 <= x <= 8
True
>>> 2 <= x <= 3
False
Unpacking can be used for the assignment to multiple variables in more complex cases
Unpacking <--- assign even more variables
>>> x = (1, 2, 4, 8, 16)
>>> a = x[0]
>>> b = x[1]
>>> c = x[2]
>>> d = x[3]
>>> e = x[4]
>>> a, b, c, d, e
(1, 2, 4, 8, 16)
more readable approach:
>>> a, b, c, d, e = x
>>> a, b, c, d, e
(1, 2, 4, 8, 16)
even cooler (* collects values not assigned to others):
>>> a, *y, e = x
>>> a, e, y
(1, 16, [2, 4, 8])
the most concise and elegant variables swap
Unpacking <--- swap values
>>> a, b = b, a
>>> a
'my-string'
>>> b
2
Python doesn’t have real private class members. However, there’s a convention that says that you shouldn’t access or modify the members beginning with the underscore (_) outside their instances. They are not guaranteed to preserve the existing behavior.
Avoiding accessing protected class members Consider the code:
>>> class C:
... def __init__(self, *args):
... self.x, self._y, self.__z = args
...
>>> c = C(1, 2, 4)
The instances of class C have three data members: .x
, .y
, and ._Cz
(because z has two underscores, unlike y). If a member’s name begins with a double underscore (dunder), it becomes mangled, that is modified. That’s why you have ._Cz instead of ._z.
Now, it's quite OK to access/modify .x
directly:
>>> c.x # OK
1
You can also access ._y
, from outside its instance, but it's considered a bad practice:
>>> c._y # Possible, but a bad practice!
2
You can’t access .z because it’s mangled, but you can access or modify ._Cz
:
>>> c.__z # Error!
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'C' object has no attribute '__z'
>>> c._C__z # Possible, but even worse!
4
>>>
what if an exception occurs while processing your file? Then my_file.close() is never executed. You can handle this with exception-handling syntax or with context managers
**`with a`` block to handle exceptions:
>>> with open('filename.csv', 'w') as my_file:
... # do something with `my_file`
Using the with block means that the special methods .enter()
and .exit()
are called, even in the cases of exceptions
Python code should be elegant, concise, and readable. It should be beautiful. The ultimate resource on how to write beautiful Python code is Style Guide for Python Code or PEP 8
Write beautiful Python code with
PEP 8 provides the style guide for Python code, and PEP 20 represents the principles of Python language
Python also supports chained assignments. So, if you want to assign the same value to multiple variables, you can do it in a straightforward way
Chained assignments <--- assign the same value to multiple variables:
>>> x = 2
>>> y = 2
>>> z = 2
More elegant way:
>>> x, y, z = 2, 2, 2
Chained assignments:
>>> x = y = z = 2
>>> x, y, z
(2, 2, 2)
open a file and process it
Open a file:
>>> my_file = open('filename.csv', 'w')
>>> # do something with `my_file`
Close the file to properly manage memory:
>>> my_file = open('filename.csv', 'w')
>>> # do something with `my_file and`
>>> my_file.close()
The author of the class probably begins the names with the underscore(s) to tell you, “don’t use it”
__
<--- don't touch!
You can keep away from that with some additional logic. One of the ways is this:
You can keep away from the problem:
>>> def f(value, seq=[]):
... seq.append(value)
... return seq
with some additional logic:
>>> def f(value, seq=None):
... if seq is None:
... seq = []
... seq.append(value)
... return seq
Shorter version:
>>> def f(value, seq=None):
... if not seq:
... seq = []
... seq.append(value)
... return seq
The result:
>>> f(value=2)
[2]
>>> f(value=4)
[4]
>>> f(value=8)
[8]
>>> f(value=16)
[16]
The Pythonic way is to exploit the fact that zero is interpreted as False in a Boolean context, while all other numbers are considered as True
Comparing to zero - Pythonic way:
>>> bool(0)
False
>>> bool(-1), bool(1), bool(20), bool(28.4)
(True, True, True, True)
Using if item
instead of if item != 0
:
>>> for item in x:
... if item:
... print(item)
...
1
2
3
4
You can also use if not item
instead of if item == 0
When you have numeric data, and you need to check if the numbers are equal to zero, you can but don’t have to use the comparison operators == and !=
Comparing to zero:
>>> x = (1, 2, 0, 3, 0, 4)
>>> for item in x:
... if item != 0:
... print(item)
...
1
2
3
4
it’s a good practice to use properties when you can and C++-like getters and setters when you have to
Use
The reason that Julia is fast (ten to 30 times faster than Python) is because it is compiled and not interpreted
Julia seems to be even faster than Scala when comparing to the speed of Python
Scala is ten times faster than Python
Interesting estimation
First of all, write a script that carries out the task in a sequential fashion. Secondly, transform the script so that it carries out the task using the map function. Lastly, replace map with a neat function from the concurrent.futures module
Concurrent Python programs in 3 steps:
map
function.map
with a neat function from the concurrent.futures
module.Python standard library makes it fairly easy to create threads and processes
Fortunately, there is a workaround for concurrent programming in Python
Python is a poor choice for concurrent programming. A principal reason for this is the ‘Global Interpreter Lock’ or GIL. The GIL ensures that only one thread accesses Python objects at a time, effectively preventing Python from being able to distribute threads onto several CPUs by default
Python isn't the best choice for concurrent programming
Introducing multiprocessing now is a cinch; I just replace ThreadPoolExecutor with ProcessPoolExecutor in the previous listing
Replacing multithreading with multiprocessing:
replace ThreadPoolExecutor
with ProcessPoolExecutor
this article merely scratches the surface. If you want to dig deeper into concurrency in Python, there is an excellent talk titled Thinking about Concurrency by Raymond Hettinger on the subject. Make sure to also check out the slides whilst you’re at it
Learn more about concurrency:
As the name suggests, multiprocessing spawns processes, while multithreading spawns threads. In Python, one process can run several threads. Each process has its proper Python interpreter and its proper GIL. As a result, starting a process is a heftier and more time-consuming undertaking than starting a thread.
Reason for multiprocessing being slower than multithreading:
Multiprocessing is more time-consuming to use because of its architecture
its purpose is to dump Python tracebacks explicitly on a fault, after a timeout, or on a user signal
Faulthandler in contrast to tracing tracks specific events and has slightly better documentation
what parts of the software do we profile (measure its performance metrics)
Most profiled parts of the software:
Line profiling, as the name suggests, means to profile your Python code line by line
Line profiling
The profile module gives similar results with similar commands. Typically, you switch to profile if cProfile isn’t available
cProfile > profile. Use profile only when cProfile isn't available
Another common component to profile is the memory usage. The purpose is to find memory leaks and optimize the memory usage in your Python programs
Memory usage can be tracked with pympler or objgraph libraries
The purpose of trace module is to “monitor which statements and functions are executed as a program runs to produce coverage and call-graph information
Purpose of trace module
method profiling tool like cProfile (which is available in the Python language), the timing metrics for methods can show you statistics, such as the number of calls (shown as ncalls), total time spent in the function (tottime), time per call (tottime/ncalls and shown as percall), cumulative time spent in a function (cumtime), and cumulative time per call (quotient of cumtime over the number of primitive calls and shown as percall after cumtime)
cProfile is one of the Python tools to measure method execution time. Specifically:
tracing is a special use case of logging in order to record information about a program’s execution
Tracing (more for software devs) is very similar to event logging (more for system administrators)
If a method has an acceptable speed but is so frequently called that it becomes a huge time sink, you would want to know this from your profiler
We also want to measure the frequency of method calls. cProfile can highlight the number of function calls and how many of those are native calls
trace and faulthandler modules cover basic tracing
Basic Python libraries for tracing
It’s just that it often makes sense to write code in the order JOIN / WHERE / GROUP BY / HAVING. (I’ll often put a WHERE first to improve performance though, and I think most database engines will also do a WHERE first in practice)
Pandas usually writes code in this syntax:
JOIN
WHERE
GROUP BY
HAVING
Example:
df = thing1.join(thing2) # like a JOIN
df = df[df.created_at > 1000] # like a WHERE
df = df.groupby('something', num_yes = ('yes', 'sum')) # like a GROUP BY
df = df[df.num_yes > 2] # like a HAVING, filtering on the result of a GROUP BY
df = df[['num_yes', 'something1', 'something']] # pick the columns I want to display, like a SELECT
df.sort_values('sometthing', ascending=True)[:30] # ORDER BY and LIMIT
df[:30]
vectors here have length 7 instead of 6 because of the extra 0 element at the beginning. This is an inconsequential detail - Keras reserves index 0 and never assigns it to any word.
Keras always reserves 0 and never assigns any word to it; therefore, even when we have 6 words, we end up with the length of 7:
[0. 1. 1. 1. 0. 0. 0.]
# Creating a 5x5 matrix arr = [[i for i in range(5)] for j in range(5)] arr >>> [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]
Nested for loop using list comprehension to come up with 5x5 matrix:
arr = [[i for i in range(5)] for j in range(5)]
arr
>>> [[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]]
x = [2,45,21,45] y = {i:v for i,v in enumerate(x)} print(y) >>> {0: 2, 1: 45, 2: 21, 3: 45}
List comprehension in Python to create a simple dictionary:
x = [2,45,21,45]
y = {i:v for i,v in enumerate(x)}
print(y)
>>> {0: 2, 1: 45, 2: 21, 3: 45}
x = [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]] arr = [i for j in x for i in j] print(arr) >>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Flattening a multi-dimensional matrix into a 1-D array using list comprehension:
x = [[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]]
arr = [i for j in x for i in j]
print(arr)
>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
arr = [i for i in range(10) if i % 2 == 0] print(arr) >>> [0, 2, 4, 6, 8] arr = ["Even" if i % 2 == 0 else "Odd" for i in range(10)] print(arr) >>> ['Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd']
2 examples of conditional statements in list comprehension:
arr = [i for i in range(10) if i % 2 == 0]
print(arr)
>>> [0, 2, 4, 6, 8]
and:
arr = ["Even" if i % 2 == 0 else "Odd" for i in range(10)]
print(arr)
>>> ['Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd', 'Even', 'Odd']
use pyenv. With it, you will be able to have any version you want at your disposal, very easy.
pyenv allows you to easily switch between Python versions
the __ methods allow us to interact with core concepts of the python language. You can see them also as a mechanism of implementing behaviours, interface methods.
__ methods
Dunder or magic method, are methods that start and end with double _ like __init__ or __str__. This kind of methods are the mechanism we use to interact directly with python's data model
Dunder methods
Dutch programmer Guido van Rossum designed Python in 1991, naming it after the British television comedy Monty Python's Flying Circus because he was reading the show's scripts at the time.
Origins of Python name
supporting this field is extremely easy If you keep raw data, it's just a matter of adding a getter method to the Article class.
Way of supporting a new field in JSON is much easier than in a relational database:
@property
def highlights(self) -> Sequence[Highlight]:
default = [] # defensive to handle older export formats that had no annotations
jsons = self.json.get('annotations', default)
return list(map(Highlight, jsons))
query language doesn't necessarily mean a database. E.g. see pandas which is capable of what SQL is capable of, and even more convenient than SQL for our data exploration purposes.
Query language, not always = database. For example, see pandas
cachew lets you cache function calls into an sqlite database on your disk in a matter of single decorator (similar to functools.lru_cache). The difference from functools.lru_cache is that cached data is persisted between program runs, so next time you call your function, it will only be a matter of reading from the cache.
cachew tool isolates the complexity of database access patterns in a Python library
We only need to use global keyword in a function if we want to do assignments / change them. global is not needed for printing and accessing.
global inside a function is:
Example code:
# This function modifies global variable 's'
def f():
global s
print s
s = "Look for Geeksforgeeks Python Section"
print s
# Global Scope
s = "Python is great!"
f()
print s
The nonlocal keyword is used to work with variables inside nested functions, where the variable should not belong to the inner function. Use the keyword nonlocal to declare that the variable is not local.
nonlocal is used to declare variables inside the nested functions.
Example (if nonlocal wouldn't be used, the output would be "John", not "hello"):
def myfunc1():
x = "John"
def myfunc2():
nonlocal x
x = "hello"
myfunc2()
return x
print(myfunc1())
Here, I rewrite this example using the walrus operator
How to reduce Python expressions using the walrus operator (:=). It's used to both assign and evaluate variable names in a single expression, thus reducing repetition.
Before:
count = fresh_fruit.get('lemon', 0)
if count:
...
After:
if count := fresh_fruit.get('lemon', 0):
...
map, filter and reduce can all operate independently, or be chained together
During each loop of reduce, you can get result of the last loop, and the next element in the array Change the result, then return it for the next loop iteration When you're done, you have the completed collection
reduce
This lets you transform each element into something new (or keep it the same) Types don't have to be the same: can return objects, string, numbers - anything!
map
It loops over the elements, passing each one to a callback function You can return true to include that element in the new array, or false to exclude it
filter
So why are map, filter and reduce useful?
Advantages of map
, filter
, reduce
:
The formula China is using is current total deaths/current confirmed cases.
Mortality rate of COVID-19 used by China =
\(\dfrac{currentTotalDeaths}{currentConfirmedCases}\)
Each year the flu infects between 2 to 11% of the population. 11The infectiousness of a disease is measured by its R0 value. R0 is a measure of how many people will catch a disease from one infected person. The R0 value of the flu is 1.28. The R0 of COVID-19 is assumed to be between 1.4–3.8
R0 - measure of how many people will catch a disease from one infected person.
R0 (Flu): 1.28
R0 (COVID-19): 1.4 - 3.8 (but some say 4.7 - 7.12)
cases outside of china are doubling roughly every 5 days. The reported rate of infection in China is lower, but China has taken drastic quarantine measures, including welding people inside apartment buildings.
Doubling tendency of COVID-19
“confirmed cases” is the lowest possible number of cases in that country. It’s the lower bound of an estimate.
What confirmed cases actually means
In The Atlantic, Marc Lipsitch, a leading epidemiologist at Harvard reported that “that within the coming year, some 40 to 70 percent of people around the world will be infected with the virus that causes COVID-19”. This was last week and while ridiculed at the time, his assumptions are now the generally accepted position among epidemiologists.
40 - 70 % of people will be infected with COVID-19 (it might even increase)
Death Rate
Death rate of COVID-19:
The Virus may persist on surfaces for up to 9 days, on some surfaces up to 27 days
Coronavirus' lifespan
10-50 minutes in natural spaces was the most effective to improve mood, focus and physiological markers like blood pressure and heart rate. “It’s not that there’s a decline after 50 minutes, but rather that the physiological and self-reported psychological benefits tend to plateau after that,”
Nature therapy
Steven Hoober shed light on the three ways users hold their phones.
3 most popular ways people use their smartphones:
We’re mapping a human health tragedy that may get way worse before it subsides. Do we really want the map to be screaming bright red? Red is a very emotive colour. It has meaning. It can easily connotate danger, and death, which is still statistically extremely rare for coronavirus.
Why using a red colour on choropleth map might be not the best choice
you cannot map totals using a choropleth thematic mapping technique. The reason is simple. Each of the areas on the map is a different size, and has a different number of people in it.
Why using choropleth thematic mapping isn't a good idea for a Covid-19 map
Learn how to write good emails, how to present your work, how to phrase your questions in a thoughtful manner.
Important soft skills to be learned by junior devs
I want to have a choice whether to forget or remember events, and I'd like to be able to potentially access forgotten ones.
It's better to lifelog than not if we have a choice
Even ignoring quality of life you are looking at a 3-7 fold return on every minute you spend exercising in extended life,[1] perhaps even exceeding that if you are making optimal use of your time. Something just clicked and I was consistent since reading this. Even 1-fold return would worth it: basically you gain free consciousness (quality of thinking is shit when I exercise, but it's better than nothing).
While exercising, you get additional lifespan, which obviously isn't linear, but still worth the effort
Exobrain, or "second brain", or "brain dump" is something like public wiki where you can keep your notes, ideas and thoughts.
Exobrain examples:
Annotation is the process of adding a layer of personal (with a potential for sharing) metadata on top of existing content such as highlights, notes, anchors etc.
Just like I'm doing it here now (annotation) ;)
Human brains seem to be best for generating new ideas. I want to learn more, think faster, distract less, interact and visualize, effortlessly remember everything; not memorize and do routine information processing, which computers seem better at.
Why Personal Knowledge Management (PKM) is important
Extended mind is the idea that your mind isn't limited by your physical body, but augmented by external means. Paper notes & diaries, data in your computer/phone/cloud – all these things contribute to your cognitive process, thinking and help you navigate the landscape of ideas. One of the biggest motivations to start dumping my brain out here, write and share is to offload my mind a bit and interact with people even in my physical absence.
Extended mind - idea that our mind doesn't only resist in our physical body
I can't imagine going on a run without my HR monitor, because whatever the benefit exercise has and however exhausting run would be, at least I'll have a data point. learning about routine and trying to optimize routine is also way more fun than doing the actual routine Human body is fragile and needs constant care, but it's still a fascinating mechanism.
Quantifying self during the run is also my main use. It adds a layer of motivation when I see just the timestamps of all my journeys
quantified self refers both to the cultural phenomenon of self-tracking with technology and to a community of users and makers of self-tracking tools who share an interest in “self-knowledge through numbers.”[1] Quantified Self practices overlap with the practice of lifelogging and other trends that incorporate technology and data acquisition into daily life, often with the goal of improving physical, mental, and/or emotional performance.
Quantified self relates to all of us managing our life with Mi Bands, Fitbits, etc.
Its automated reasoning sounds great
You can also use automated reasoning inside Grakn (machine learning)
In Grakn, these algorithms are built-in as native functionalities of the language.
In Grakn, distributed analytics are built-in
code below instantiates a client, opens a session, and runs an insertion query
Start by instantiating a client and opening a session. Afterwards, you can begin to run queries
We will use the python client API to interact with Grakn.
After setting up Grakn keyspace, you can interact with it using Python client API
We just created a Grakn keyspace experiment and defined its schema.
It's good to have a single Grakn keyspace per application (outermost container for data in a Grakn knowledge graph, corresponding closely to a relational database)
Now that we have the schema ready, the next step is to load it into Grakn.
After defining schema, it needs to be loaded into Grakn:
1) Place schema.gql
in the container volume, such as db/schema.gql
.
2) Run:
docker exec -ti grakn bash -c '/grakn-core-all-linux/grakn console --keyspace experiment --file /grakn-core-all-linux/server/db/schema.gql'
3) Observe a similar result:
Loading: /grakn-core-all-linux/server/db/schema.gql
...
{}
Successful commit: schema.gql
An attribute can be abstract if you never assign it directly and use it only as a parent type. Entities can be abstract, too, if they are never instantiated.
Attributes and entities can be abstract
There's just one more step – defining the attribute types
Don't forget to define attribute types, such as:
name sub attribute,
datatype string;
address sub attribute,
datatype string;
timestamp sub attribute, abstract,
datatype date;
created sub timestamp;
last-modified sub timestamp;
last-accessed sub timestamp;
penalty-until sub timestamp;
url sub attribute,
datatype string;
...
Attributes can be assigned to anything, including relations.
We've ended up with three entities: user, badge and location. How to glue them together? Using relations.
Use of relations to glue different entities:
location-of-user sub relation,
relates located-user,
relates user-location;
achievements sub relation,
has score,
relates contributor,
relates award;
Some things are common to multiple users, like a location (e.g. Austin, TX, USA) or the types of badges they've been awarded (bronze, silver, gold). We'll model locations and badges as separate entities.
Modelling separate entities:
location sub entity,
key address,
plays user-location;
badge sub entity,
key color,
plays award;
Graql — Grakn's query language that allows you to model, query and reason over data.
Graql is part of Grakn
What can we have in a schema? There are three types of things
In schema we can have three types of things:
schema – a skeleton structure that represents the logical view of the entire knowledge graph
Before using JSON data in the database, we need to define its schema
To keep things nice and platform agnostic, let’s go with Docker.
You can run Grakn with Docker using 3 simple commands (check below this highlight)
Grakn is a knowledge base for intelligent systems. A quick look at their developer site and examples piqued my curiosity
Of the three primary color channels, red, green and blue, green contributes the most to luminosity.
Green colour vs red and blue (RGB)
Imagine that you're using a database to export them, so your schema is: TABLE Article(STRING id, STRING url, STRING title, DATETIME added). One day, the developers expose highlights (or annotations) from the private API and your export script stats receiving it in the response JSON. It's quite useful data to have! However, your database can't just magically change to conform to the new field.
Relational model can be sometimes hand tying, unlike JSON
Storage saved by using a database instead of plaintext is marginal and not worth the effort.
Databases save some space used by data, but it's marginal
if necessary use databases as an intermediate layer to speed access up and as an additional interface to your data Nothing wrong with using databases for caching if you need it!
You may want to use databases for:
I want to argue very strongly against forcing the data in the database, unless it's really inevitable.
After scraping some data, don't go immediately to databases, unless it's a great stream of data
black and white with monotones
Hues <--- pure colors
Tints <--- hue + white
Tones <--- hue + grey
Shades <--- hue + black
Basic Color Schemes
Warm colors exhibit energy and joy (best for personal messages), while cool colors convey calmness and peace (best for office use).
Warm and Cool colors
But I think we read a lot of self-help because we need to. As I’ve already mentioned, we need lots of examples to drive this wisdom home. We should be more forgiving of self-help (the genre) and more forgiving of ourselves. Putting wisdom into practice takes requires reading, reflection, and practice—but it’s worth it.
Read lots of self-help as it's worth it
advice can either be arcane or obvious, general or specific.
Aim to share the good self-help as the bad self-help might not apply to everyone.
no business book can predict what sorts of situations (businesses, market conditions, etc.) the reader will encounter, so instead it offers general, obvious-sounding rules.
Business books
We feel divinely inspired while reading Minimalism, but when it’s time to actually cull our wardrobes, it turns out we have good reasons for keeping everything!
Read less, apply more!
We’re embarrassed by self-help because (at its best) it’s full of banal platitudes—but these are platitudes because they’re so general.
We tend to be embarrassed by self-help
Wisdom Knowledge
Wisdom vs Knowledge: Easily understood Difficult to learn Widely applicable Narrowly useful Hard to implement Easy to implement Self-help Textbooks
So which is it? Is our obsession with self-help embarrassing or admirable? Is self-help snake oil or salvation? I’m going to argue that it’s both.
Self-help can be good and bad
But what if you wanted wisdom, not knowledge? Are there books that contain wisdom? In other words, are there books that give you general-purpose, one-size-fits-all advice for navigating life? Of course there is! It’s called self-help.
Self-help books are there to make you wise, not knowledgable
“a wise person knows what to do in most situations, while a [knowledgeable]1 person knows what to do in situations where few others could.” In other words, wise people are moderately successful in many domains, while knowledgeable people are very successful in a few.
~ Paul Graham
there are two sources of feeling like a noob: being stupid, and doing something novel. Our dislike of feeling like a noob is our brain telling us "Come on, come on, figure this out."
Two sources of being a noob
the more of a noob you are locally, the less of a noob you are globally.For example, if you stay in your home country, you'll feel less of a noob than if you move to Farawavia, where everything works differently. And yet you'll know more if you move. So the feeling of being a noob is inversely correlated with actual ignorance.
Being a noob
n the end, it didn’t matter if the students walked more than even 15,000 steps; they still gained weight
Scientists at EPFL in Switzerland have shown that you are more likely to initiate a voluntary decision as you exhale.
We make conscious decisions when we breathe out, says new study involving 52 people pressing a button, monitored with brain, heart and lung sensors.
A combinaison of split(), subString(), removePrefix(), removeSuffix() is usually enough.
Sometimes the following functions are more than enough for your string matching problems
When you get tired of thinking about a piece of work and feeling bad for not finishing it yet, go "screw it, let's do it" and start with something, anything.
One way of starting to do what we postpone
When you know that you don't have to make the greatest thing ever right from the start, it's easier to start. And then it's easier to continue.
Apply the MVP principle to start doing
Level 0 is no automation whatsoever. Level 1 is partial assistance with certain aspects of driving, like lane keep assist or adaptive cruise control. Level 2 is a step up to systems that can take control of the vehicle in certain situations, like Tesla's Autopilot or Cadillac's Super Cruise, while still requiring the driver to pay attention. Get past that and we enter the realm of speculation: Level 3 promises full computer control under defined conditions during a journey, Level 4 expands that to start-to-finish autonomous tech limited only by virtual safeguards like a geofence, and Level 5 is the total hands-off, go-anywhere-at-the-push-of-a-button experience.
Description of 6 levels defining autonomous cars:
The CEO of Volkswagen's autonomous driving division recently admitted that Level 5 autonomy—that's full computer control of the vehicle with zero limitations—might actually never happen.
Majority of the times, the only way to break into a circle is for someone within that circle to speak positively on your behalf.
Who speaks on your behalf?!
I have observed something else under the sun. The fastest runner doesn’t always win the race, and the strongest warrior doesn’t always win the battle. The wise sometimes go hungry, and the skillful are not necessarily wealthy. And those who are educated don’t always lead successful lives. It is all decided by chance, by being in the right place at the right time. — Ecclesiastes 9:11
They simply possess the willpower and drive to observe people, get to know people, appear in gatherings that involve people that are aligned with their goals, and connect people with one another.
Skill that puts sometimes the smartest minds below you
This quote from Richard Feynman is at the top of my blog’s landing page: I learned very early the difference between knowing the name of something and knowing something.
Inspiration to maintain a research blog
Summarizing a paper in your own words restructures the content to focus on learning rather than novelty.
In the scientific papers we convey novelty, hence, some of the early readers might confuse themselves that this is the right way to speak in a daily scientific community
Blogging has taught me how to read a paper because explaining something is a more active form of understanding. Now I summarize the main contribution in my own words, write out the notation and problem setup, define terms, and rederive the main equations or results. This process mimics the act of presenting and is great practice for it.
Why teaching others/blogging has a great value in terms of learning new topics
When I first started teaching myself to program, I felt that I had no imagination. I couldn’t be creative because I was too focused on finding the syntax bug or reasoning about program structure. However, with proficiency came creativity. Programming became less important than what I was building and why.
While learning, don't worry about the creativity, which shall come after gaining proficiency (knowledge base)
In my opinion the reason most people fail to do great research is that they are not willing to pay the price in self-development. Say some new field opens up that combines field XXX and field YYY. Researchers from each of these fields flock to the new field. My experience is that virtually none of the researchers in either field will systematically learn the other field in any sort of depth. The few who do put in this effort often achieve spectacular results.
I think we all know that...
Many of us have done this on exams, hoping for partial credit by stitching together the outline of a proof or using the right words in an essay with the hopes that the professor connects the dots for us.
Often we tend to communicate with a jargon we don't understand just to pretend we know something
Since water is denser than air, and the reflection is diffuse. A lot of light is internally reflected, thereof, increasing the probability of absorption at surface.
The light is reflected back inside the water, because of the total internal reflection:
This is because the light now has a layer of water to go through. And due to the reflectance of water, not all light at the air-liquid-interface (border between air and water) goes through the water. Some of it is reflected.
Wet things become darker, because of the water consistency, reflectance that doesn't let all the light to transmit through it.
The probability of light getting transmitted is: 1 - R1 (reflectance at the air-liquid interface)
There are two types of reflection (two ways the wave can be thrown back). Specular Diffuse
Two types of reflection:
ericb 12 days ago | unvote [-] * Better googling. Time-restricted, url restricted, site restricted searches. Search with the variant parts of error messages removed.* Read the source of upstream dependencies. Fix or fork them if needed.* They're better at finding forks with solutions and gleaning hints from semi-related issues.* Formulate more creative hypothesis when obvious lines of investigation run out. The best don't give up.* Dig in to problems with more angles of investigation.* Have more tools in their toolbelt for debugging like adding logging, monkey-patching, swapping parts out, crippling areas to rule things out, binary search of affected code areas.* Consider the business.* Consider user-behavior.* Assume hostile users (security-wise).* Understand that the UI is not a security layer. Anything you can do with PostMan your backend should handle.* Whitelist style-security over blacklist style.* See eventual problems implied by various solutions.* "The Math."
What do top engineers do that others don't?
technology diffused more easily along lines of latitude than along lines of longitude because climate changed more rapidly along lines of longitude making it more difficult for both humans and technologies to adapt
Technology adapts better across latitude than longitude
Anything that’s invented between when you’re fifteen and thirty-five is new and exciting and revolutionary and you can probably get a career in it.
Before 15 it's just a part of our world creation, after 35 it's the natural order of things
"One of history’s few iron laws is that luxuries tend to become necessities and to spawn new obligations. Once people get used to a certain luxury, they take it for granted. Then they begin to count on it. Finally they reach a point where they can’t live without it.”
Be careful what becomes your necessity to live
A 90s study showed that women preferred the scents of men whose immune systems were most different from their own immune-system genes. Evolutionarily this makes sense as, children should be healthier if their parents’ genes vary, protecting them from more pathogens.
Why women are attracted to opposite scents in a men
Between 2011 and 2013, china used 50% more cement than the United States in the 20th century.Of the world’s 100 highest bridges, 81 are in China, including some unfinished ones.
China's infrastructure is growing amazingly fast
Here’s Warren Buffett: “Cola has no taste memory. You can drink one at 9am, 11am, 5pm. You can't do that with cream soda, root beer, orange, grape. You get sick of them after a while. The average person drinks 64 ounces of liquid per day, and you can have all 64 ounces of that be Coke.”Same with Doritos, Cheetos, most popular junk food. They are engineered to overcome “sensory-specific satiety” and to give a sense of “vanishing caloric density.”
Why chips and coca-cola are addicting:
the taste is vanishing
One of the tasks of true friendship is to listen compassionately and creatively to the hidden silences. Often secrets are not revealed in words, they lie concealed in the silence between words.
Listen to the silence
Video Games are a Booming IndustryThe video game industry generates more revenue than movies and music.
Revenue raised by videogames > films + music
Humor treads at the frontier of consciousness. When a comic finds a funny joke, they are unearthing a truth that people are only kind of aware of, but the whole room grasps that everybody else is aware of the truth, and laughter ensues.
Humor as a comprehension
Highest Grossing Media Franchises
What type of media sells the most:
How the Sun Moves
“Twitter is the most amazing networking and learning network ever built.For someone whose pursuing their dream job, or chasing a group of mentors or peers, it’s remarkable. In any given field, 50-80% of the top experts in that field are on Twitter and they’re sharing ideas, and you can connect to them or follow them in your personal feed.If you get lucky enough and say something they find interesting, they might follow you, and the reason this becomes super interesting is that unlocks direct message, and now all of a sudden you can communicate directly or electronically with that individual. Very, very powerful.If you’re not using Twitter, you’re missing out.” — Bill Gurley
I cannot agree more on this, since I finally accumulated a great network on Twitter. It's important to hit the bell icon next to the profiles we value the most, so that we're never missing out of their new content
80% of Americans live, work, and hang out in the pink areas — 3.6 percent of the landmass of the lower 48 states.
Map where 80% of Americans' life is
"The easiest way to be discovered right now in technology and perhaps many fields is to create your own independent blog and write. There is a huge dearth in availability of good, current, first party content today.The single most important advice I can give to actually write is to write.The thing that happens which you don’t see until you write is that your content engages some of the smartest people who are lurking around the internet. And they reach out to you."
Totally agree with this ;)
By some estimates, more than 50,000 pieces of artwork are stolen each year, amounting to annual losses of around $6 to $8 billion globally. This makes art theft one of the most valuable criminal enterprises around, exceeded only by drug trafficking and arms dealing.
Art crime is even more serious than we think
Where is Wealth Concentrated?
Map of wealth concentration
This map shows the relative density of commercial shipping around the world. The darker the color, the busier the route.
Map of global shipping routes
I've noticed a weird pattern: In most of the best marriages I see, one person is an early-bird, and the other is a night-owl. They have opposite circadian rhythms.I think this is healthy. The two partners get the benefits of time together and time alone, which helps the marriage.
Circadian rhythm as a way to validate a great marriage
look around and figure out who you want to be on your team. Figure out the people around you that you want to work with for the rest of your life. Figure out the people who are smart & awesome, who share your values, who get things done — and maybe most important, who you like to be with and who you want to help win. And treat them right, always. Look for ways to help, to work together, to learn. Because in 20 years you’ll all be in amazing places doing amazing things.
One of the best life advises some can get
It goes completely against what most believe, but out of all major energy sources, nuclear is the safest
Nuclear energy as the safest energy
Studies show it takes ~23 minutes to deeply refocus on something when you get disturbed
if you care at all about storing timestamps in MySQL, store them as integers and use the UNIX_TIMESTAMP() and FROM_UNIXTIME() functions.
MySQL does not store offset
The system clock is inaccurate. You're on a network? Every other system's clock is differently inaccurate.
System clocks are inaccurate
A time format without an offset is useless.
Always use timezone offset
If you want to store a humanly-readable time (e.g. logs), consider storing it along with Unix time, not instead of Unix time.
Logs should be stored along with Unix time
When storing a timestamp, store Unix time. It's a single number.
Store Unix timestamps (single numbers)
Most of your code shouldn't be dealing with timezones or local time, it should be passing Unix time around.
Most of your code should use Unix time
Unix time: Measured as the number of seconds since epoch (the beginning of 1970 in UTC). Unix time is not affected by time zones or daylight saving
Unix time - # of seconds since 1970 in UTC
GMT is still used: it's the British national timezone in winter. In summer it becomes BST.
GMT and BST in Britain