2,505 Matching Annotations

Mar 2020
threadreaderapp.com threadreaderapp.com

Thread by @cyberomin: "It's always fun when I speak to founders and potential founders and they are quick to tell me how they want to use AI/ML to improve customer […]"

2
1. pyxelr 19 Mar 2020
  
  in Public
  
  Another nice SQL script paired with CRON jobs was the one that reminded people of carts that was left for more than 48 hours. Select from cart where state is not empty and last date is more than or equal to 48hrs.... Set this as a CRON that fires at 2AM everyday, period with less activity and traffic. People wake up to emails reminding them about their abandoned carts. Then sit watch magic happens. No AI/ML needed here. Just good 'ol SQL + Bash.
  
  Another example of using SQL + CRON job + Bash to remind customers of cart that was left (again no ML needed here)
  
  SQL MachineLearning marketing business
2. pyxelr 19 Mar 2020
  
  in Public
  
  I will write a query like select from order table where last shop date is 3 or greater months. When we get this information, we will send a nice "we miss you, come back and here's X Naira voucher" email. The conversation rate for this one was always greater than 50%.
  
  Sometimes SQL is much more than enough (you don't need ML)
  
  SQL MachineLearning marketing business
Visit annotations in context

Tags

business

marketing

SQL

MachineLearning

Annotators

pyxelr

URL

threadreaderapp.com/thread/987602838594445312.html
datagenetics.com datagenetics.com

Toilet Paper: How long does it last?

3
1. pyxelr 18 Mar 2020
  
  in Public
  
  This volume of paper should be the same as the coaxial plug of paper on the roll.
  
  Calculating volume of the paper roll: $$\mathbf{Lwt = \pi w(R^2 - r^2)} \~\ L = \text{length of the paper} \ w = \text{width of the paper} \ t = \text{thickness} \ R = \text{outer radius} \ r = \text{inner radius}$$ And that simplifies into a formula for R: $$\color{red} {\bf R = \sqrt{\frac{Lt}{\pi}+r^2}}$$
  
  statistics math
2. pyxelr 18 Mar 2020
  
  in Public
  
  This shows the nonlinear relationship and how the consumption accelerates. The first 10% used makes just a 5% change in the diameter of the roll. The last 10% makes an 18.5% change.
  
  Consumption of a toilet paper roll has a nonlinear relationship between the:
  
  y-axis (outer Radius of the roll (measured as a percentage of a full roll))
  
  x-axis (% of the roll consumed)
  
  statistics
3. pyxelr 18 Mar 2020
  
  in Public
  
  Toilet paper is typically supplied in rolls of perforated material wrapped around a central cardboard tube. There’s a little variance between manufacturers, but a typical roll is approximately 4.5” wide with an 5.25” external diameter, and a central tube of diameter 1.6” Toilet paper is big business (see what I did there?) Worldwide, approximately 83 million rolls are produced per day; that’s a staggering 30 billion rolls per year. In the USA, about 7 billion rolls a year are sold, so the average American citizen consumes two dozen rolls a year (two per month). Americans use 24 rolls per capita a year of toilet paper Again, it depends on the thickness and luxuriousness of the product, but the perforations typically divide the roll into approximately 1,000 sheets (for single-ply), or around 500 sheets (for double-ply). Each sheet is typically 4” long so the length of a (double-ply) toilet roll is approximately 2,000” or 167 feet (or less, if your cat gets to it).
  
  Statistics on the type and use of toilet paper in the USA.
  
  1" (inch) = 2.54 cm
  
  statistics
Visit annotations in context

Tags

math

statistics

Annotators

pyxelr

URL

datagenetics.com/blog/march52020/index.html
intellspot.com intellspot.com

10 Interval Data Examples: Interval Scale Definition & Meaning

2
1. pyxelr 15 Mar 2020
  
  in Public
  
  In the interval scale, there is no true zero point or fixed beginning. They do not have a true zero even if one of the values carry the name “zero.” For example, in the temperature, there is no point where the temperature can be zero. Zero degrees F does not mean the complete absence of temperature. Since the interval scale has no true zero point, you cannot calculate Ratios. For example, there is no any sense the ratio of 90 to 30 degrees F to be the same as the ratio of 60 to 20 degrees. A temperature of 20 degrees is not twice as warm as one of 10 degrees.
  
  Interval data:
  
  show not only order and direction, but also the exact differences between the values
  
  the distances between each value on the interval scale are meaningful and equal
  
  no true zero point
  
  no fixed beginning
  
  no possibility to calculate ratios (only add and substract)
  
  e.g.: temperature in Fahrenheit or Celsius (but not Kelvin) or IQ test
  
  DataScience statistics
2. pyxelr 15 Mar 2020
  
  in Public
  
  As the interval scales, Ratio scales show us the order and the exact value between the units. However, in contrast with interval scales, Ratio ones have an absolute zero that allows us to perform a huge range of descriptive statistics and inferential statistics. The ratio scales possess a clear definition of zero. Any types of values that can be measured from absolute zero can be measured with a ratio scale. The most popular examples of ratio variables are height and weight. In addition, one individual can be twice as tall as another individual.
  
  Ratio data is like interval data, but with:
  
  absolute zero
  
  possibility to calculate ratio (e.g. someone can be twice as tall)
  
  possibility to not only add and subtract, but multiply and divide values
  
  e.g.: weight, height, Kelvin scale (50K is 2x hot as 25K)
  
  DataScience statistics
Visit annotations in context

Tags

DataScience

statistics

Annotators

pyxelr

URL

intellspot.com/interval-data-examples/
paramaggarwal.substack.com paramaggarwal.substack.com

How Websites Evolved Back to Static HTML/CSS/JS Files — · My Reading List

4
1. pyxelr 14 Mar 2020
  
  in Public
  
  Javascript, APIs and Markup — this stack is all about finding middleground from the chaos of SSR+SPA. It is about stepping back and asking yourself, what parts of my page change and what parts don’t change?
  
  JavaScript, APIs and Markup (JAM Stack) - middleground between SSR + SPA.
  
  Advantages:
  
  The parts that don’t change often are pre-rendered on the server and saved to static HTML files. Anything else is implemented in JS and run on the client using API calls.
  
  Avoids too much data transfer (like the hydration data for SSR), therefore finds a good tradeoff to ship web content
  
  Allows to leverage the power and cost of Content delivery networks (CDNs) to effectively serve your content
  
  With serverless apps your APIs will never need a server to SSH into and manage
  
  JAM JavaScript API Markup SSR SPA webdev
2. pyxelr 14 Mar 2020
  
  in Public
  
  Somewhere on this path to render pages on the fly (SSR) and render pages on the client (SPA) we forgot about the performance of our webpages. We were trying to build apps. But the web is about presenting content first and foremost!
  
  Website performance break with Client-side Rendering (SSR) and Single-page App (SPA)
  
  SSR SPA webdev
3. pyxelr 14 Mar 2020
  
  in Public
  
  We were not satisfied with the basic capabilities like bold and italics so we built CSS. Now, we wanted to modify some parts of the HTML/CSS in response to things like clicking things, so we implemented a scripting language to quickly specify such relations and have then run within the browser itself instead of a round trip to the server.
  
  Birth of CSS - advanced styling
  
  (history of websites)
  
  HTML CSS webdev
4. pyxelr 14 Mar 2020
  
  in Public
  
  And so was born PHP, it feels like a natural extension to HTML itself. You write your code between your HTML file itself and then be able to run those parts on the server, which further generate HTML and the final HTML gets send to the browser.This was extremely powerful. We could serve completely different pages to different users even though all of them access the same URL like Facebook. We could use a database on a server and store some data there, then based on some conditions use this data to modify the generated HTML and technically have an infinite number of pages available to serve (e-commerce).
  
  Birth of PHP - way to serve different content under the same URL
  
  PHP webdev HTML
Visit annotations in context

Tags

Markup

API

JavaScript

CSS

HTML

SPA

SSR

webdev

JAM

PHP

Annotators

pyxelr

URL

paramaggarwal.substack.com/p/how-websites-evolved-back-to-static-html-css-js-files-57ce549f81db
dev.to dev.to

Guide for the daily JavaScripter - DEV Community 👩‍💻👨‍💻

6
1. pyxelr 08 Mar 2020
  
  in Public
  
  TL;DR;
  
  Don't use:
  
  No global vars.
  
  Declare variables using "var".
  
  Declare functions using "function" keyword.
  
  Avoid use "for" in loops.
  
  Array push, inmutability.
  
  Class.
  
  Use delete to remove a object atribute.
  
  Avoid nested if.
  
  else if.
  
  Heavy nesting.
  
  Avoid to add prototype to functions that could be used in a module.
  
  Use:
  
  Common code in functions, follow D.R.Y principle.
  
  Shorcut notation.
  
  Spread operator over Object.assign (airbnb 3.8).
  
  Pascal case naming.
  
  Modularize your code in modules.
  
  const and let!.
  
  Literal syntax for object creation (airbnb 3.1).
  
  Computed property names when creating objects (airbnb 3.2).
  
  Property value shorthand (airbnb 3.4).
  
  Group your shorthand properties at the beginning of your objec (airbnb 3.5).
  
  use the literal syntax for array creation (airnbnb 4.1).
  
  Use array spreads ... to copy arrays. (airbnb 4.3).
  
  use spreads ... instead of, Array.from. (airbnb 4.4).
  
  Use return statements in array method callbacks (airbnb 4.7).
  
  JavaScript
2. pyxelr 08 Mar 2020
  
  in Public
  
  Don’t use iterators, prefer js higher-order functions instead of for / for..in
  
  // bad const increasedByOne = []; for (let i = 0; i < numbers.length; i++) { increasedByOne.push(numbers[i] + 1); } // good const increasedByOne = []; numbers.forEach((num) => { increasedByOne.push(num + 1); });
  
  JavaScript
3. pyxelr 08 Mar 2020
  
  in Public
  
  Ternaries should not be nested and generally be single line expressions.
  
  e.g.
  
  const foo = maybe1 > maybe2 ? 'bar' : maybeNull;
  
  Also:
  
  // bad const foo = a ? a : b; const bar = c ? true : false; const baz = c ? false : true; // good const foo = a || b; const bar = !!c; const baz = !c;
  
  JavaScript
4. pyxelr 08 Mar 2020
  
  in Public
  
  Use === and !== over == and !=.
  
  Bases on the Airbnb guide
  
  JavaScript
5. pyxelr 08 Mar 2020
  
  in Public
  
  Use JSDOC https://jsdoc.app/about-getting-started.html format.
  
  Standardise your JavaScript comments:
  
  Use block comment
  /** This is a description of the foo function. */ function foo() { }
  
  Use JSDOC tag to describe a function: ```javascript /**
  
  Represents a book.
  
  @constructor
  
  @param {string} title - The title of the book.
  
  @param {string} author - The author of the book. */ function Book(title, author) { } ```
  
  JavaScript
6. pyxelr 08 Mar 2020
  
  in Public
  
  When JavaScript encounters a line break without a semicolon, it uses a set of rules called Automatic Semicolon Insertion to determine whether or not it should regard that line break as the end of a statement, and (as the name implies) place a semicolon into your code before the line break if it thinks so. ASI contains a few eccentric behaviors, though, and your code will break if JavaScript misinterprets your line break.
  
  Better place a semicolon (;) in the right place and do not rely on the Automatic Semicolon Insertion
  
  e.g.
  
  const luke = {}; const leia = {}; [luke, leia].forEach((jedi) => { jedi.father = 'vader'; });
  
```

JavaScript
Visit annotations in context

Tags

JavaScript

Annotators

pyxelr

URL

dev.to/damxipo/guide-for-the-daily-javascripter-87o
naukawpolsce.pap.pl naukawpolsce.pap.pl

Jedz mniej, by dłużej żyć

1
1. pyxelr 08 Mar 2020
  
  in Public
  
  Wykazali, iż u szczurów znajdujących się na diecie o niskiej zawartości tłuszczu wolniej dochodzi do niekorzystnych zmian strukturalnych i genetycznych w obrębie poszczególnych tkanek. Gryzonie, które jadły mniej, dłużej zachowywały młodość.
  
  Low-calorie diet reduces inflammation, delays the onset of old age diseases, and generally prolongs life.
  
  health
Visit annotations in context

Tags

health

Annotators

pyxelr

URL

naukawpolsce.pap.pl/aktualnosci/news,80961,jedz-mniej-dluzej-zyc.html
xd.adobe.com xd.adobe.com

How to Write & Design User-Friendly Error Messages

1
1. pyxelr 07 Mar 2020
  
  in Public
  
  If sentences contain eight words or less, readers understand 100 percent of the information.If sentences contain 43 words or longer, the reader’s comprehension drops to less than 10 percent.
  
  <= 8 words <--- 100% understanding
  
  .>= 43 words <--- <10% understanding
  
  writing
Visit annotations in context

Tags

writing

Annotators

pyxelr

URL

xd.adobe.com/ideas/process/information-architecture/error-message-design-ux/
pythonspeed.com pythonspeed.com

“Let’s use Kubernetes!” Now you have 8 problems

11
1. pyxelr 07 Mar 2020
  
  in Public
  
  from Docker Compose on a single machine, to Heroku and similar systems, to something like Snakemake for computational pipelines.
  
  Other alternatives to Kubernetes:
  
  Docker Compose on a single machine
  
  Heroku and similar systems
  
  Snakemake for computational pipelines
  
  Kubernetes Docker DataEngineering
2. pyxelr 07 Mar 2020
  
  in Public
  
  if what you care about is downtime, your first thought shouldn’t be “how do I reduce deployment downtime from 1 second to 1ms”, it should be “how can I ensure database schema changes don’t prevent rollback if I screw something up.”
  
  Caring about downtime
  
  Kubernetes DataEngineering
3. pyxelr 07 Mar 2020
  
  in Public
  
  The features Kubernetes provides for reliability (health checks, rolling deploys), can be implemented much more simply, or already built-in in many cases. For example, nginx can do health checks on worker processes, and you can use docker-autoheal or something similar to automatically restart those processes.
  
  Kubernetes' health checks can be replaced with nginx on worker processes + docker-autoheal to automatically restart those processes
  
  Kubernetes DataEngineering
4. pyxelr 07 Mar 2020
  
  in Public
  
  Scaling for many web applications is typically bottlenecked by the database, not the web workers.
  
  Kubernetes DataEngineering
5. pyxelr 07 Mar 2020
  
  in Public
  
  Kubernetes might be useful if you need to scale a lot. But let’s consider some alternatives
  
  Kubernetes alternatives:
  
  cloud VMs with up to 416 vCPUs and 8 TiB RAM
  
  scale many web apps with Heroku
  
  Kubernetes DataEngineering
6. pyxelr 07 Mar 2020
  
  in Public
  
  Distributed applications are really hard to write correctly. Really. The more moving parts, the more these problems come in to play. Distributed applications are hard to debug. You need whole new categories of instrumentation and logging to getting understanding that isn’t quite as good as what you’d get from the logs of a monolithic application.
  
  Microservices stay as a hard nut to crack.
  
  They are fine for an organisational scaling technique: when you have 500 developers working on one live website (so they can work independently). For example, each team of 5 developers can be given one microservice
  
  Kubernetes DataEngineering
7. pyxelr 07 Mar 2020
  
  in Public
  
  you need to spin up a complete K8s system just to test anything, via a VM or nested Docker containers.
  
  You need a complete K8s to run your code, or you can use Telepresence to code locally against a remote Kubernetes cluster
  
  Kubernetes DataEngineering
8. pyxelr 07 Mar 2020
  
  in Public
  
  “Kubernetes is a large system with significant operational complexity. The assessment team found configuration and deployment of Kubernetes to be non-trivial, with certain components having confusing default settings, missing operational controls, and implicitly defined security controls.”
  
  Deployment of Kubernetes is non-trivial
  
  Kubernetes DataEngineering
9. pyxelr 07 Mar 2020
  
  in Public
  
  Before you can run a single application, you need the following highly-simplified architecture
  
  Before running the simplest Kubernetes app, you need at least this architecture:
  
  Kubernetes DataEngineering
10. pyxelr 07 Mar 2020
  
  in Public
  
  the Kubernetes codebase has significant room for improvement. The codebase is large and complex, with large sections of code containing minimal documentation and numerous dependencies, including systems external to Kubernetes.
  
  As of March 2020, the Kubernetes code base has more than 580 000 lines of Go code
  
  Kubernetes DataEngineering
11. pyxelr 07 Mar 2020
  
  in Public
  
  Kubernetes has plenty of moving parts—concepts, subsystems, processes, machines, code—and that means plenty of problems.
  
  Kubernetes might be not the best solution in a smaller team
  
  Kubernetes DataEngineering
Visit annotations in context

Tags

DataEngineering

Docker

Kubernetes

Annotators

pyxelr

URL

pythonspeed.com/articles/dont-need-kubernetes/
jvns.ca jvns.ca

How containers work: overlayfs - Julia Evans

3
1. pyxelr 07 Mar 2020
  
  in Public
  
  That makes sense, the new file gets created in the upper directory.
  
  If you add a new file, such as with:
  
  $ echonew file> merged/new_file
  
  It will be created in the upper directory
  
  Docker
2. pyxelr 07 Mar 2020
  
  in Public
  
  Combining the upper and lower directories is pretty easy: we can just do it with mount!
  
  Combining lower and upper directories using mount:
  
  $ sudo mount -t overlay overlay -o lowerdir=/home/bork/test/lower,upperdir=/home/bork/test/upper,workdir=/home/bork/test/work /home/bork/test/merged
  
  Docker
3. pyxelr 07 Mar 2020
  
  in Public
  
  Overlay filesystems, also known as “union filesystems” or “union mounts” let you mount a filesystem using 2 directories: a “lower” directory, and an “upper” directory.
  
  Docker doesn't make copies of images, but instead uses an overlay.
  
  Overlay filesystems, let you mount a system using 2 directories:
  
  the lower directory (read-only)
  
  the upper directory (read and write).
  
  When a process:
  
  reads a file, the overlayfs filesystem driver looks into the upper directory and if it's not present, it looks into the lower one
  
  writes a file, overlayfs will just use the upper directory
  
  Docker
Visit annotations in context

Tags

Docker

Annotators

pyxelr

URL

jvns.ca/blog/2019/11/18/how-containers-work--overlayfs/
www.calnewport.com www.calnewport.com

Top Economists Study What Happens When You Stop Using Facebook - Study Hacks - Cal Newport

1
1. pyxelr 06 Mar 2020
  
  in Public
  
  Using Facebook ads, the researchers recruited 2,743 users who were willing to leave Facebook for one month in exchange for a cash reward. They then randomly divided these users into a Treatment group, that followed through with the deactivation, and a Control group, that was asked to keep using the platform.
  
  The effects of not using Facebook for a month:
  
  on average another 60 free mins per day
  
  small but significant improvement in well-being, and in particular in self-reported happiness, life satisfaction, depression and anxiety
  
  participants were less willing to use Facebook from now
  
  the group was less likely to follow politics
  
  deactivation significantly reduced polarization of views on policy issues and a measure of exposure to polarizing news
  
  80% agreed that the deactivation was good for them
  
  psychology facebook
Visit annotations in context

Tags

psychology

facebook

Annotators

pyxelr

URL

calnewport.com/blog/2020/02/29/top-economists-study-what-happens-when-you-stop-using-facebook/
news.uark.edu news.uark.edu

Caffeine Boosts Problem-Solving Ability but Not Creativity, Study Indicates

3
1. pyxelr 06 Mar 2020
  
  in Public
  
  In addition to the results on creativity, caffeine did not significantly affect working memory, but test subjects who took it did report feeling less sad.
  
  Coffee also:
  
  .+ happier
  
  health caffeine
2. pyxelr 06 Mar 2020
  
  in Public
  
  While the cognitive benefits of caffeine — increased alertness, improved vigilance, enhanced focus and improved motor performance — are well established, she said, the stimulant’s affect on creativity is less known.
  
  Coffee: .+ alertness
  
  .+ vigilance
  
  .+ focus
  
  .+ motor performance
  
  ? creativity
  
  health caffeine
3. pyxelr 06 Mar 2020
  
  in Public
  
  Caffeine increases the ability to focus and problem solve, but a new study by a University of Arkansas researcher indicates it doesn’t stimulate creativity.
  
  health caffeine
Visit annotations in context

Tags

caffeine

health

Annotators

pyxelr

URL

news.uark.edu/articles/52447/caffeine-boosts-problem-solving-ability-but-not-creativity-study-indicates
www.quora.com www.quora.com

Adrian Olszewski's answer to Are there any advantages of using SAS over R or Python, or is it just a fancy expensive toy at this point? - Quora

1
1. pyxelr 06 Mar 2020
  
  in Public
  
  Python is used, however, in researching new drug molecules, simulations.
  
  Python has a slight input into pharma, but it cannot outbeat SAS and R
  
  Python biostatistics
Visit annotations in context

Tags

Python

biostatistics

Annotators

pyxelr

URL

quora.com/Are-there-any-advantages-of-using-SAS-over-R-or-Python-or-is-it-just-a-fancy-expensive-toy-at-this-point/answer/Adrian-Olszewski-1
www.linkedin.com www.linkedin.com

LinkedIn

1
1. pyxelr 06 Mar 2020
  
  in Public
  
  SAS is also industry standard in other areas, like: military, energy, banking, econometrics.
  
  SAS is not only an industry standard in pharma
  
  SAS biostatistics
Visit annotations in context

Tags

biostatistics

SAS

Annotators

pyxelr

URL

linkedin.com/feed/update/urn:li:activity:6641342788732547072/
www.reddit.com www.reddit.com

r/China_Flu - The WHO sent 25 international experts to China and here are their main findings after 9 days

1
1. pyxelr 05 Mar 2020
  
  in Public
  
  The new virus is genetically 96% identical to a known coronavirus in bats and 86-92% identical to a coronavirus in pangolin. Therefore, the transmission of a mutated virus from animals to humans is the most likely cause of the appearance of the new virus.
  
  Source of COVID-19
  
  COVID-19
Visit annotations in context

Tags

COVID-19

Annotators

pyxelr

URL

reddit.com/r/China_Flu/comments/fbt49e/the_who_sent_25_international_experts_to_china/
medium.com medium.com

Forget about the Death rate, this is why you should be worried about the coronavirus

8
1. pyxelr 02 Mar 2020
  
  in Public
  
  When asked why Wuhan was so much higher than the national level, the Chinese official replied that it was for lack of resources, citing as an example that there were only 110 critical care beds in the three designated hospitals where most of the cases were sent.
  
  Wuhan's rate then = 4.9%
  
  National rate = 2.1%
  
  COVID-19
2. pyxelr 02 Mar 2020
  
  in Public
  
  80% of food in the UK is imported. How about your country? Your town?
  
  Food for thought
  
  COVID-19
3. pyxelr 02 Mar 2020
  
  in Public
  
  The South Korean government is delivering food parcels to those in quarantine. Our national and local governments need to quickly organise the capacity and resources required to do this.Japanese schools are scheduled to be closed for march.
  
  Food delivery in South Korea and closing schools in Japan
  
  COVID-19
4. pyxelr 02 Mar 2020
  
  in Public
  
  The limited availability of beds in Wuhan raised their mortality rate from 0.16% to 4.9%This is why the Chinese government built a hospital in a week. Are our governments capable of doing the same?
  
  Case of Wuhan
  
  COVID-19
5. pyxelr 02 Mar 2020
  
  in Public
  
  The UK population is 67 million people, that’s 5.4 million infected.Currents predictions are that 80% of the cases will be mild.If 20% of those people require hospitalization for 3–6 weeks?That’s 1,086,176 People.Do you know how many beds the NHS has?140,000
  
  There will be a great lack of beds
  
  COVID-19
6. pyxelr 02 Mar 2020
  
  in Public
  
  Evolving to be observant of direct dangers to ourselves seems to have left us terrible at predicting second and third-order effects of events.When worrying about earthquakes we think first of how many people will die from collapsing buildings and falling rubble. Do we think of how many will die due to destroyed hospitals?
  
  Thinking of second and third-order effects of events
  
  COVID-19
7. pyxelr 02 Mar 2020
  
  in Public
  
  Can you guess the number of people that have contracted the flu this year that needed hospitalisation in the US? 0.9%
  
  0.9% of flu cases that required hospitalisation vs 20% of COVID-19
  
  COVID-19
8. pyxelr 02 Mar 2020
  
  in Public
  
  The UK has 2.8 million people over the age of 85.The US has 12.6 million people over the Age of 80.Trump told people not to worry because 60,000 people a year die of the flu. If just 25% of the US over 80’s cohort get infected, given current mortality rates that’s 466,200 deaths in that age group alone with the assumption that the healthcare system has the capacity to handle all of the infected.
  
  Interesting calculation of probabilistic deaths of people > 80. Basically, with at least 25% infected people in the US who are > 80, we might have almost x8 more deaths than by flu
  
  COVID-19
Visit annotations in context

Tags

COVID-19

Annotators

pyxelr

URL

medium.com/@amwren/forget-about-the-death-rate-this-is-why-you-should-be-worried-about-the-coronavirus-890fbf9c4de6
www.devever.net www.devever.net

XML is almost always misused

7
1. pyxelr 02 Mar 2020
  
  in Public
  
  in XML the ordering of elements is significant. Whereas in JSON the ordering of the key-value pairs inside objects is meaningless and undefined
  
  XML vs JSON (ordering elements)
  
  XML JSON
2. pyxelr 02 Mar 2020
  
  in Public
  
  XML is a document markup language; JSON is a structured data format, and to compare the two is to compare apples and oranges
  
  XML vs JSON
  
  XML JSON
3. pyxelr 02 Mar 2020
  
  in Public
  
  dictionary (a piece of structured data) can be converted into n different possible documents (XML, PDF, paper or otherwise), where n is the number of possible permutations of the elements in the dictionary
  
  Dictionary
  
  XML
4. pyxelr 02 Mar 2020
  
  in Public
  
  The correct way to express a dictionary in XML is something like this
  
  Correct dictionary in XML:
  
  <root> <item> <key>Name</key> <value>John</value> </item> <item> <key>City</key> <value>London</value> </item> </root>
  
  XML
5. pyxelr 02 Mar 2020
  
  in Public
  
  Broadly speaking, XML excels at annotating corpuses of text with structure and metadata
  
  The right use of XML
  
  XML
6. pyxelr 02 Mar 2020
  
  in Public
  
  XML has no notion of numbers (or booleans, or other data types), any numbers represented are just considered more text
  
  Numbers in XML
  
  XML
7. pyxelr 02 Mar 2020
  
  in Public
  
  To date, the only XML schemas I have seen which I would actually consider a good use of XML are XHTML and DocBook
  
  Good use of XML
  
  XML
Visit annotations in context

Tags

XML

JSON

Annotators

pyxelr

URL

devever.net/~hl/xml
engineering.instawork.com engineering.instawork.com

When XML beats JSON: UI layouts

6
1. pyxelr 02 Mar 2020
  
  in Public
  
  A particular strength of JSON is its support for nested data structures
  
  JSON can facilitate arrays, such as:
  
  "favorite_movies": [ "Diehard", "Shrek" ]
  
  JSON
2. pyxelr 02 Mar 2020
  
  in Public
  
  JSON’s origins as a subset of JavaScript can be seen with how easily it represents key/value object data. XML, on the other hand, optimizes for document tree structures, by cleanly separating node data (attributes) from child data (elements)
  
  JSON for key/value object data
  
  XML for document tree structures (clearly separating node data (attributes) from child data (elements))
  
  JSON XML
3. pyxelr 02 Mar 2020
  
  in Public
  
  The advantages of XML over JSON for trees becomes more pronounced when we introduce different node types. Assume we wanted to introduce departments into the org chart above. In XML, we can just use an element with a new tag name
  
  XML JSON
4. pyxelr 02 Mar 2020
  
  in Public
  
  JSON is well-suited for representing lists of objects with complex properties. JSON’s key/value object syntax makes it easy. By contrast, XML’s attribute syntax only works for simple data types. Using child elements to represent complex properties can lead to inconsistencies or unnecessary verbosity.
  
  JSON works well for list of objects with complex properties. XML not so much
  
  JSON XML
5. pyxelr 02 Mar 2020
  
  in Public
  
  UI layouts are represented as component trees. And XML is ideal for representing tree structures. It’s a match made in heaven! In fact, the most popular UI frameworks in the world (HTML and Android) use XML syntax to define layouts.
  
  XML works great for displaying UI layouts
  
  XML
6. pyxelr 02 Mar 2020
  
  in Public
  
  XML may not be ideal to represent generic data structures, but it excels at representing one particular structure: the tree. By separating node data (attributes) from parent/child relationships, the tree structure of the data shines through, and the code to process the data can be quite elegant.
  
  XML is good for representing tree structured data
  
  XML
Visit annotations in context

Tags

JSON

XML

Annotators

pyxelr

URL

engineering.instawork.com/when-xml-beats-json-ui-layouts-53c7f1d3fdb7
developer.sh developer.sh

Modern Data Lakes Overview - Developer.sh

15
1. pyxelr 02 Mar 2020
  
  in Public
  
  Here is a high level comparison of the tools we reviewed above:
  
  Comparison of Delta Lake, Apache Iceberg and Apache Hive:
  
  DataEngineering ApacheHadoop
2. pyxelr 02 Mar 2020
  
  in Public
  
  To address Hadoop’s complications and scaling challenges, Industry is now moving towards a disaggregated architecture, with Storage and Analytics layers very loosely coupled using REST APIs.
  
  Things used to address Hadoop's lacks
  
  DataEngineering ApacheHadoop
3. pyxelr 02 Mar 2020
  
  in Public
  
  Hive is now trying to address consistency and usability. It facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage.
  
  Apache Hive offers:
  
  Streaming ingest of data - allowing readers to get a consistent view of data and avoiding too many files
  
  Slow changing dimensions - dimensions of table change slowly over time
  
  Data restatement - supported via INSERT, UPDATE, and DELETE
  
  Bulk updates with SQL MERGE
  
  DataEngineering ApacheHadoop
4. pyxelr 02 Mar 2020
  
  in Public
  
  Delta Lake is an open-source platform that brings ACID transactions to Apache Spark™. Delta Lake is developed by Spark experts, Databricks. It runs on top of your existing storage platform (S3, HDFS, Azure) and is fully compatible with Apache Spark APIs.
  
  Delta Lake offers:
  
  ACID transactions on Spark
  
  Scalable metadata handling
  
  Streaming and batch unification
  
  Schema enforcement
  
  Time travel
  
  Upserts and deletes
  
  DataEngineering ApacheHadoop
5. pyxelr 02 Mar 2020
  
  in Public
  
  Apache Iceberg is an open table format for huge analytic data sets. Iceberg adds tables to Presto and Spark that use a high-performance format that works just like a SQL table. Iceberg is focussed towards avoiding unpleasant surprises, helping evolve schema and avoid inadvertent data deletion.
  
  Apache Iceberg offers:
  
  Schema evolution (add, drop, update, rename)
  
  Hidden partitioning
  
  Partition layout evolution
  
  Time travel (reproducible queries)
  
  Version rollback (resetting tables)
  
  DataEngineering ApacheHadoop
6. pyxelr 02 Mar 2020
  
  in Public
  
  in this disaggregated model, users can choose to use Spark for batch workloads for analytics, while Presto for SQL heavy workloads, with both Spark and Presto using the same backend storage platform.
  
  Disaggregated model allows more flexible choice of tools
  
  DataEngineering ApacheHadoop
7. pyxelr 02 Mar 2020
  
  in Public
  
  rise of Hadoop as the defacto Big Data platform and its subsequent downfall. Initially, HDFS served as the storage layer, and Hive as the analytics layer. When pushed really hard, Hadoop was able to go up to few 100s of TBs, allowed SQL like querying on semi-structured data and was fast enough for its time.
  
  Hadoop's HDFS and Hive became unprepared for even larger sets of data
  
  DataEngineering ApacheHadoop
8. pyxelr 02 Mar 2020
  
  in Public
  
  These projects sit between the storage and analytical platforms and offer strong ACID guarantees to the end user while dealing with the object storage platforms in a native manner.
  
  Solutions to the disaggregated models:
  
  Delta Lake
  
  Apache Iceberg
  
  Apache Hive
  
  DataEngineering ApacheHadoop
9. pyxelr 02 Mar 2020
  
  in Public
  
  Disaggregated model means the storage system sees data as a collection of objects or files. But end users are not interested in the physical arrangement of data, they instead want to see a more logical view of their data.
  
  File or Tables problem of disaggregated models
  
  DataEngineering
10. pyxelr 02 Mar 2020
  
  in Public
  
  ACID stands for Atomicity (an operation either succeeds completely or fails, it does not leave partial data), Consistency (once an application performs an operation the results of that operation are visible to it in every subsequent operation), Isolation (an incomplete operation by one user does not cause unexpected side effects for other users), and Durability (once an operation is complete it will be preserved even in the face of machine or system failure).
  
  ACID definition
  
  DataEngineering SQL
11. pyxelr 02 Mar 2020
  
  in Public
  
  Currently this may be possible using version management of object store, but that as we saw earlier is at a lower layer of physical detail which may not be useful at higher, logical level.
  
  Change management issue of disaggregated models
  
  DataEngineering
12. pyxelr 02 Mar 2020
  
  in Public
  
  Traditionally Data Warehouse tools were used to drive business intelligence from data. Industry then recognized that Data Warehouses limit the potential of intelligence by enforcing schema on write. It was clear that all the dimensions of data-set being collected could not be thought of at the time of data collection.
  
  Data Warehouses were later being replaced with Data Lakes to face the amount of big data
  
  DataEngineering
13. pyxelr 02 Mar 2020
  
  in Public
  
  As explained above, users are no longer willing to consider inefficiencies of underlying platforms. For example, data lakes are now also expected to be ACID compliant, so that the end user doesn’t have the additional overhead of ensuring data related guarantees.
  
  SQL Interface issue of disaggregated models
  
  DataEngineering
14. pyxelr 02 Mar 2020
  
  in Public
  
  Commonly used Storage platforms are object storage platforms like AWS S3, Azure Blob Storage, GCS, Ceph, MinIO among others. While analytics platforms vary from simple Python & R based notebooks to Tensorflow to Spark, Presto to Splunk, Vertica and others.
  
  Commonly used storage platforms:
  
  AWS S3
  
  Azure Blob Storage
  
  GCS
  
  Ceph
  
  MinlO
  
  Commonly used analytics platforms:
  
  Python & R based notebooks
  
  TensorFlow
  
  Spark
  
  Presto
  
  Splunk
  
  Vertica
  
  DataEngineering
15. pyxelr 02 Mar 2020
  
  in Public
  
  Data Lakes that are optimized for unstructured and semi-structured data, can scale to PetaBytes easily and allowed better integration of a wide range of tools to help businesses get the most out of their data.
  
  Data Lake definitions / what do offer us:
  
  support for unstructured and semi-structured data.
  
  scalability to PetaBytes and higher
  
  SQL like interface to interact with the stored data
  
  ability to connect various analytics tools as seamlessly as possible
  
  modern data lakes are generally a combination of decoupled storage and analytics tools
  
  DataEngineering
Visit annotations in context

Tags

SQL

ApacheHadoop

DataEngineering

Annotators

pyxelr

URL

developer.sh/posts/delta-lake-and-iceberg
jvns.ca jvns.ca

SQL queries don't start with SELECT - Julia Evans

1
1. pyxelr 02 Mar 2020
  
  in Public
  
  dplyr in R also lets you use a different syntax for querying SQL databases like Postgres, MySQL and SQLite, which is also in a more logical order
  
  dplyr R
Visit annotations in context

Tags

R

dplyr

Annotators

pyxelr

URL

jvns.ca/blog/2019/10/03/sql-queries-don-t-start-with-select/
towardsdatascience.com towardsdatascience.com

Creating interactive dashboards in R Shiny using Python scripts as the backend.

7
1. pyxelr 02 Mar 2020
  
  in Public
  
  We save all of this code, the ui object, the server function, and the call to the shinyApp function, in an R script called app.R
  
  The same basic structure for all Shiny apps:
  
  ui object.
  
  server function.
  
  call to the shinyApp function.
  
  ---> examples <---
  
  Shiny R
2. pyxelr 02 Mar 2020
  
  in Public
  
  ui
  
  UI example of a Shiny app (check the code below)
  
  R Shiny
3. pyxelr 02 Mar 2020
  
  in Public
  
  server
  
  Server example of a Shiny app (check the code below):
  
  random distribution is plotted as a histogram with the requested number of bins
  
  code that generates the plot is wrapped in a call to renderPlot
  
  R Shiny
4. pyxelr 02 Mar 2020
  
  in Public
  
  I want to get the selected number of bins from the slider and pass that number into a python method and do some calculation/manipulation (return: “You have selected 30bins and I came from a Python Function”) inside of it then return some value back to my R Shiny dashboard and view that result in a text field.
  
  Using Python scripts inside R Shiny (in 6 steps):
  
  In ui.R create textOutput: textOutput("textOutput") (after plotoutput()).
  
  In server.R create handler: output$textOutput <- renderText({ }].
  
  Create python_ref.py and insert this code:
  
  Import reticulate library: library(reticulate).
  
  source_python() function will make Python available in R:
  
  Make sure you've these files in your directory:
  
  app.R
  
  python_ref.py and that you've imported the reticulate package to R Environment and sourced the script inside your R code.
  
  Hit run.
  
  R Shiny Python
5. pyxelr 02 Mar 2020
  
  in Public
  
  Currently Shiny is far more mature than Dash. Dash doesn’t have a proper layout tool yet, and also not build in theme, so if you are not familiar with Html and CSS, your application will not look good (You must have some level of web development knowledge). Also, developing new components will need ReactJS knowledge, which has a steep learning curve.
  
  Shiny > Dash:
  
  Dash isn't yet as stabilised
  
  Shiny has much more layout options, whereas in Dash you need to utilise HTML and CSS
  
  developing new components in Dash needs ReactJS knowledge (not so easy)
  
  R Shiny dash plotly
6. pyxelr 02 Mar 2020
  
  in Public
  
  You can host standalone apps on a webpage or embed them in R Markdown documents or build dashboards. You can also extend your Shiny apps with CSS themes, Html widgets, and JavaScript actions.
  
  Typical tools used for working with Shiny
  
  R Shiny
7. pyxelr 02 Mar 2020
  
  in Public
  
  You can either create a one R file named app.R and create two seperate components called (ui and server inside that file) or create two R files named ui.R and server.R
  
  R Shiny
Visit annotations in context

Tags

R

Python

plotly

dash

Shiny

Annotators

pyxelr

URL

towardsdatascience.com/creating-interactive-dashboards-in-r-shiny-using-python-scripts-as-the-backend-aed40e18fb3f
towardsdatascience.com towardsdatascience.com

How to analyse 100s of GBs of data on your laptop with Python

10
1. pyxelr 02 Mar 2020
  
  in Public
  
  Vaex supports Just-In-Time compilation via Numba (using LLVM) or Pythran (acceleration via C++), giving better performance. If you happen to have a NVIDIA graphics card, you can use CUDA via the jit_cuda method to get even faster performance.
  
  Tools supported by Vaex
  
  Vaex
2. pyxelr 02 Mar 2020
  
  in Public
  
  virtual columns. These columns just house the mathematical expressions, and are evaluated only when required
  
  Virtual columns
  
  Vaex
3. pyxelr 02 Mar 2020
  
  in Public
  
  displaying a Vaex DataFrame or column requires only the first and last 5 rows to be read from disk
  
  Vaex tries to go over the entire dataset with as few passes as possible
  
  Vaex
4. pyxelr 02 Mar 2020
  
  in Public
  
  Why is it so fast? When you open a memory mapped file with Vaex, there is actually no data reading going on. Vaex only reads the file metadata
  
  Vaex only reads the file metadata:
  
  location of the data on disk
  
  data structure (number of rows, columns...)
  
  file description
  
  and so on...
  
  Vaex
5. pyxelr 02 Mar 2020
  
  in Public
  
  When filtering a Vaex DataFrame no copies of the data are made. Instead only a reference to the original object is created, on which a binary mask is applied
  
  Filtering Vaex DataFrame works on reference to the original data, saving lots of RAM
  
  Vaex
6. pyxelr 02 Mar 2020
  
  in Public
  
  If you are interested in exploring the dataset used in this article, it can be used straight from S3 with Vaex. See the full Jupyter notebook to find out how to do this.
  
  Example of EDA in Vaex ---> Jupyter Notebook
  
  Vaex DataScience
7. pyxelr 02 Mar 2020
  
  in Public
  
  Vaex is an open-source DataFrame library which enables the visualisation, exploration, analysis and even machine learning on tabular datasets that are as large as your hard-drive. To do this, Vaex employs concepts such as memory mapping, efficient out-of-core algorithms and lazy evaluations.
  
  Vaex - library to manage as large datasets as your HDD, thanks to:
  
  memory mapping
  
  efficient out-of-core algorithms
  
  lazy evaluations.
  
  All wrapped in a Pandas-like API
  
  Python DataScience Vaex
8. pyxelr 02 Mar 2020
  
  in Public
  
  The first step is to convert the data into a memory mappable file format, such as Apache Arrow, Apache Parquet, or HDF5
  
  Before opening data with Vaex, we need to convert it into a memory mappable file format (e.g. Apache Arrow, Apache Parquet or HDF5). This way, 100 GB data can be load in Vaex in 0.052 seconds!
  
  Example of converting CSV ---> HDF5.
  
  DataScience Vaex
9. pyxelr 02 Mar 2020
  
  in Public
  
  The describe method nicely illustrates the power and efficiency of Vaex: all of these statistics were computed in under 3 minutes on my MacBook Pro (15", 2018, 2.6GHz Intel Core i7, 32GB RAM). Other libraries or methods would require either distributed computing or a cloud instance with over 100GB to preform the same computations.
  
  Possibilities of Vaex
  
  DataScience Vaex
10. pyxelr 02 Mar 2020
  
  in Public
  
  AWS offers instances with Terabytes of RAM. In this case you still have to manage cloud data buckets, wait for data transfer from bucket to instance every time the instance starts, handle compliance issues that come with putting data on the cloud, and deal with all the inconvenience that come with working on a remote machine. Not to mention the costs, which although start low, tend to pile up as time goes on.
  
  AWS as a solution to analyse data too big for RAM (like 30-50 GB range). In this case, it's still uncomfortable:
  
  managing cloud data buckets
  
  waiting for data transfer from bucket to instance every time the instance starts
  
  handling compliance issues coming by putting data on the cloud
  
  dealing with remote machines
  
  costs
  
  aws DataScience
Visit annotations in context

Tags

Vaex

Python

aws

DataScience

Annotators

pyxelr

URL

towardsdatascience.com/how-to-analyse-100s-of-gbs-of-data-on-your-laptop-with-python-f83363dda94

git config --global alias.s status

Replace git status with git s:

git config --global alias.s status

It will modify config in .gitconfig file.

Other set of useful aliases:

[alias]
  s = status
  d = diff
  co = checkout
  br = branch
  last = log -1 HEAD
  cane = commit --amend --no-edit
  lo = log --oneline -n 10
  pr = pull --rebase

You can apply them (^) with:

git config --global alias.s status
git config --global alias.d diff
git config --global alias.co checkout
git config --global alias.br branch
git config --global alias.last "log -1 HEAD"
git config --global alias.cane "commit --amend --no-edit"
git config --global alias.pr "pull --rebase"
git config --global alias.lo "log --oneline -n 10"

Git

pyxelr 02 Mar 2020

in Public

alias g=git
```
alias g=git
```
This command will let you type g s in your shell to check git status
Git

Visit annotations in context

Annotators

pyxelr

URL

victorzhou.com/blog/git-aliases/

fatbusinessman.com fatbusinessman.com

My favourite Git commit

1
1. pyxelr 02 Mar 2020
  
  in Public
  
  The best commit messages I’ve seen don’t just explain what they’ve changed: they explain why
  
  Proper commits:
  
  explains the reason for the change
  
  is searchable (contains the error message)
  
  tells a story (explains investigation process)
  
  makes everyone a little smarter
  
  builds compassion and trust (adds an extra bit of human context)
  
  Git
Visit annotations in context

Tags

Git

Annotators

pyxelr

URL

fatbusinessman.com/2019/my-favourite-git-commit
yvonnickfrin.dev yvonnickfrin.dev

A guide on commit messages - DEV Community 👩‍💻👨‍💻

3
1. pyxelr 02 Mar 2020
  
  in Public
  
  If you use practices like pair or mob programming, don't forget to add your coworkers names in your commit messages
  
  It's good to give a shout-out to developers who collaborated on the commit. For example:
  
  $ git commit -m "Refactor usability tests. > > Co-authored-by: name <name@example.com> Co-authored-by: another-name <another-name@example.com>"
  
  Git
2. pyxelr 02 Mar 2020
  
  in Public
  
  I'm fond of gitmoji commit convention. It lies on categorizing commits using emojies. I'm a visual person so it fits well to me but I understand this convention is not made for everyone.
  
  You can add gitmojis (emojis) in your commits, such as:
  
  :recycle: Make core independent from the git client (#171) :whale: Upgrade Docker image version (#167)
  
  which will transfer on GitHub/GitLab to:
  
  ♻️ Make core independent from the git client (#171) 🐳 Upgrade Docker image version (#167)
  
  Git
3. pyxelr 02 Mar 2020
  
  in Public
  
  Separate subject from body with a blank line Limit the subject line to 50 characters Capitalize the subject line Do not end the subject line with a period Use the imperative mood in the subject line Wrap the body at 72 characters Use the body to explain what and why vs. how
  
  7 rules of good commit messages.
  
  >more info<
  
  Git
Visit annotations in context

Tags

Git

Annotators

pyxelr

URL

yvonnickfrin.dev/a-guide-on-commit-messages
blog.axosoft.com blog.axosoft.com

Git Dos and Don’ts

2
1. pyxelr 02 Mar 2020
  
  in Public
  
  Don’t commit directly to the master or development branches. Don’t hold up work by not committing local branch changes to remote branches. Never commit application secrets in public repositories. Don’t commit large files in the repository. This will increase the size of the repository. Use Git LFS for large files. Learn more about what Git LFS is and how to utilize it in this advanced Learning Git with GitKraken tutorial. Don’t create one pull request addressing multiple issues. Don’t work on multiple issues in the same branch. If a feature is dropped, it will be difficult to revert changes. Don’t reset a branch without committing/stashing your changes. If you do so, your changes will be lost. Don’t do a force push until you’re extremely comfortable performing this action. Don’t modify or delete public history.
  
  Git Don'ts
  
  Git
2. pyxelr 02 Mar 2020
  
  in Public
  
  Create a Git repository for every new project. Learn more about what a Git repo is in this beginner Learning Git with GitKraken tutorial. Always create a new branch for every new feature and bug. Regularly commit and push changes to the remote branch to avoid loss of work. Include a gitignore file in your project to avoid unwanted files being committed. Always commit changes with a concise and useful commit message. Utilize git-submodule for large projects. Keep your branch up to date with development branches. Follow a workflow like Gitflow. There are many workflows available, so choose the one that best suits your needs. Always create a pull request for merging changes from one branch to another. Learn more about what a pull request is and how to create them in this intermediate Learning Git with GitKraken tutorial. Always create one pull request addressing one issue. Always review your code once by yourself before creating a pull request. Have more than one person review a pull request. It’s not necessary, but is a best practice. Enforce standards by using pull request templates and adding continuous integrations. Learn more about enhancing the pull request process with templates. Merge changes from the release branch to master after each release. Tag the master sources after every release. Delete branches if a feature or bug fix is merged to its intended branches and the branch is no longer required. Automate general workflow checks using Git hooks. Learn more about how to trigger Git hooks in this intermediate Learning Git with GitKraken tutorial. Include read/write permission access control to repositories to prevent unauthorized access. Add protection for special branches like master and development to safeguard against accidental deletion.
  
  Git Dos
  
  Git
Visit annotations in context

Tags

Git

Annotators

pyxelr

URL

blog.axosoft.com/git-dos-donts/
dev.to dev.to

🙏 Please Add .gitattributes To Your Git Repository - DEV Community 👩‍💻👨‍💻

3
1. pyxelr 02 Mar 2020
  
  in Public
  
  To add the .gitattributes to the repo first you need to create a file called .gitattributes into the root folder for the repo.
  
  With such a content of .gitattributes:
  
  *.js eol=lf *.jsx eol=lf *.json eol=lf
  
  the end of line will be the same for everyone
  
  Git
2. pyxelr 02 Mar 2020
  
  in Public
  
  On the Windows machine the default for the line ending is a Carriage Return Line Feed (CRLF), whereas on Linux/MacOS it's a Line Feed (LF).
  
  Thar is why you might want to use .gitattributes to prevent such differences.
  
  On Windows Machine if endOfLine property is set to lf
  
  { "endOfLine": "lf" }
  
  On the Windows machine the developer will encounter linting issues from prettier:
  
  Git
3. pyxelr 02 Mar 2020
  
  in Public
  
  The above commands will now update the files for the repo using the newly defined line ending as specified in the .gitattributes.
  
  Use these lines to update the current repo files:
  
  git rm --cached -r . git reset --hard
  
  Git
Visit annotations in context

Tags

Git

Annotators

pyxelr

URL

dev.to/deadlybyte/please-add-gitattributes-to-your-git-repository-1jld
www.taniarascia.com www.taniarascia.com

Migrating From WordPress to Gatsby

7
1. pyxelr 02 Mar 2020
  
  in Public
  
  I feel great that all of my posts are now safely saved in version control and markdown. It’s a relief for me to know that they’re no longer an HTML mess inside of a MySQL database, but markdown files which are easy to read, write, edit, share, and backup.
  
  Good feeling of switching to GatsbyJS
  
  Gatsby.js
2. pyxelr 02 Mar 2020
  
  in Public
  
  However, I realized that a static site generator like Gatsby utilizes the power of code/data splitting, pre-loading, pre-caching, image optimization, and all sorts of performance enhancements that would be difficult or impossible to do with straight HTML.
  
  Benefits of mixing HTML/CSS with some JavaScript (GatsbyJS):
  
  code/data splitting
  
  pre-loading
  
  pre-caching
  
  image optimisation
  
  performance enhancements impossible with HTML
  
  Gatsby.js
3. pyxelr 02 Mar 2020
  
  in Public
  
  I’ll give you the basics of what I did in case you also want to make the switch.
  
  (check the text below this highlight for a great guide of migrating from WordPress to GatsbyJS)
  
  Gatsby.js
4. pyxelr 02 Mar 2020
  
  in Public
  
  A few things I really like about Gatsby
  
  Main benefits of GatsbyJS:
  
  No page reloads
  
  Image optimisation
  
  Pre-fetch resources
  
  Bundling and minification
  
  Server-side rendered, at build time
  
  Articles are saved in beautiful Markdown
  
  Using Netlify your sites automatically updates while pushing the repo
  
  Gatsby.js
5. pyxelr 02 Mar 2020
  
  in Public
  
  I had over 100 guides and tutorials to migrate, and in the end I was able to move everything in 10 days, so it was far from the end of the world.
  
  If you're smart, you can move from WordPress to GatsbyJS in ~ 10 days
  
  Gatsby.js
6. pyxelr 02 Mar 2020
  
  in Public
  
  There are a lot of static site generators to choose from. Jekyll, Hugo, Next, and Hexo are some of the big ones, and I’ve heard of and looked into some interesting up-and-coming SSGs like Eleventy as well.
  
  Other statis site generators to consider, apart from GatsbyJS:
  
  Jekyll
  
  Hugo
  
  Next
  
  Hexo
  
  Eleventy
  
  Gatsby.js
7. pyxelr 02 Mar 2020
  
  in Public
  
  There is a good amount of prerequisite knowledge required to set up a Gatsby site - HTML, CSS, JavaScript, ES6, Node.js development environment, React, and GraphQL are the major ones.
  
  There's a bit of technologies to be familiar with before setting up a GatsbyJS blog:
  
  HTML
  
  CSS
  
  JavaScript
  
  ES6
  
  Node.js
  
  React
  
  GraphQL
  
  but you can be fine with the Gatsby Getting Started Tutorial
  
  Gatsby.js
Visit annotations in context

Tags

Gatsby.js

Annotators

pyxelr

URL

taniarascia.com/migrating-from-wordpress-to-gatsby/
caseysmith.co.uk caseysmith.co.uk

Getting Started with GatsbyJS + WordPress – Gatsby API

2
1. pyxelr 02 Mar 2020
  
  in Public
  
  Gatsby is SEO friendly – it is part of the JAMStack after all!
  
  With Gatsby you don't have to worry about SEO
  
  Gatsby.js
2. pyxelr 02 Mar 2020
  
  in Public
  
  Gatsby is a React based framework which utilises the powers of Webpack and GraphQL to bundle real React components into static HTML, CSS and JS files. Gatsby can be plugged into and used straight away with any data source you have available, whether that is your own API, Database or CMS backend (Spoiler Alert!).
  
  Good GatsbyJS explanation in a single paragraph
  
  Gatsby.js
Visit annotations in context

Tags

Gatsby.js

Annotators

pyxelr

URL

caseysmith.co.uk/API/getting-started-with-gatsbyjs-wordpress/
dev.to dev.to

Finding joy in WordPress again, with React, Gatsby & GraphQL

1
1. pyxelr 02 Mar 2020
  
  in Public
  
  The combination of WordPress, React, Gatsby and GraphQL is just that - fun
  
  Intriguing combination of technologies.
  
  Keep an eye on the post author, who is going to discuss the technologies in the next writings
  
  webdev wordpress JavaScript Gatsby.js react graphql
Visit annotations in context

Tags

wordpress

webdev

graphql

JavaScript

Gatsby.js

react

Annotators

pyxelr

URL

dev.to/markbloomfield/finding-joy-in-wordpress-again-with-react-gatsby-graphql-303j
masterhowtolearn.wordpress.com masterhowtolearn.wordpress.com

This Guide Will Help You Decide Whether to Use Anki or SuperMemo

5
1. pyxelr 02 Mar 2020
  
  in Public
  
  A: Read an article from start to finish. ONLY THEN do you import parts into Anki for remembering B: Incremental Reading: interleaving between reading and remembering
  
  Two algorithms (A and B) for studying
  
  learning Anki
2. pyxelr 02 Mar 2020
  
  in Public
  
  “I think SM is only good for a small minority of learners. But they will probably value it very much.”
  
  I totally agree with it
  
  learning Anki supermemo
3. pyxelr 02 Mar 2020
  
  in Public
  
  In Anki, you are only doing the remembering part. You are not reading anything new in Anki
  
  Anki is for remembering
  
  learning Anki
4. pyxelr 02 Mar 2020
  
  in Public
  
  Using either SRS has already given you a huge edge over not using any SRS: No SRS: 70 hours Anki: 10 hours SuperMemo: 6 hours The difference between using any SRS (whether it’s Anki or SM) and not using is huge, but the difference between Anki or SM is not
  
  It doesn't matter as much which SRS you're using. It's most important to use one of them at least
  
  learning Anki supermemo
5. pyxelr 02 Mar 2020
  
  in Public
  
  “Anki is a tool and SuperMemo is a lifestyle.”
  
  Anki vs SuperMemo
  
  learning Anki supermemo
Visit annotations in context

Tags

supermemo

Anki

learning

Annotators

pyxelr

URL

masterhowtolearn.wordpress.com/2018/11/25/this-guide-will-help-you-decide-whether-to-use-anki-or-supermemo/
lsc.cornell.edu lsc.cornell.edu

The Cornell Note-taking System – Learning Strategies Center

1
1. pyxelr 02 Mar 2020
  
  in Public
  
  The Cornell Note-taking System
  
  The Cornell Note-taking System reassembling the combination of active learning and spaced repetition, just as Anki
  
  learning Anki
Visit annotations in context

Tags

learning

Anki

Annotators

pyxelr

URL

lsc.cornell.edu/how-to-study/taking-notes/cornell-note-taking-system/
senrigan.io senrigan.io

Chasing 10X: How Anki Saved My Software Career

4
1. pyxelr 02 Mar 2020
  
  in Public
  
  And for the last three years, I've added EVERYTHING to Anki. Bash aliases, IDE Shortcuts, programming APIs, documentation, design patterns, etc. Having done that, I wouldn't recommend adding EVERYTHING
  
  Put just the relevant information into Anki
  
  learning Anki
2. pyxelr 02 Mar 2020
  
  in Public
  
  Habit: Whenever I search StackOverflow, I'll immediately create a flashcard of my question and the answer(s) into Anki.
  
  Example habit to make a flashcard
  
  learning Anki
3. pyxelr 02 Mar 2020
  
  in Public
  
  The confidence of knowing that once something is added to Anki it won't be forgotten is intoxicating
  
  Intoxicating
  
  learning Anki
4. pyxelr 02 Mar 2020
  
  in Public
  
  Kyle had a super hero ability. Photographic memory in API syntax and documentation. I wanted that and I was jealous. My career was stuck and something needed to change. And so I began a dedicated journey into spaced repetition. Every day for three years, I spent one to three hours in spaced repetition
  
  Spaced repetition as a tool for photographic memory in API syntax and documentation
  
  learning Anki
Visit annotations in context

Tags

learning

Anki

Annotators

pyxelr

URL

senrigan.io/blog/chasing-10x-leveraging-a-poor-memory-in-software-engineering/
torrentfreak.com torrentfreak.com

Swiss Copyright Law: Downloading Stays Legal, No Site Blocking - TorrentFreak

1
1. pyxelr 02 Mar 2020
  
  in Public
  
  First up, regular citizens who download copyrighted content from illegal sources will not be criminalized. This means that those who obtain copies of the latest movies from the Internet, for example, will be able to continue doing so without fear of reprisals. Uploading has always been outlawed and that aspect has not changed.
  
  In Switzerland you will be able to download, but not upload pirate content
  
  piracy Switzerland
Visit annotations in context

Tags

Switzerland

piracy

Annotators

pyxelr

URL

torrentfreak.com/swiss-copyright-law-downloading-stays-legal-no-site-blocking/
www.fast.ai www.fast.ai

nbdev: use Jupyter Notebooks for everything

1
1. pyxelr 02 Mar 2020
  
  in Public
  
  The point of nbdev is to bring the key benefits of IDE/editor development into the notebook system, so you can work in notebooks without compromise for the entire lifecycle
  
  nbdev
  
  Jupyter
Visit annotations in context

Tags

Jupyter

Annotators

pyxelr

URL

fast.ai/2019/12/02/nbdev/
golden.com golden.com

Directed acyclic graph - Wiki

4
1. pyxelr 02 Mar 2020
  
  in Public
  
  'Directed' means that the edges of the graph only move in one direction, where future edges are dependent on previous ones.
  
  Meaning of "directed" in Directed Acyclic Graph
  
  DataEngineering
2. pyxelr 02 Mar 2020
  
  in Public
  
  Several cryptocurrencies use DAGs rather than blockchain data structures in order to process and validate transactions.
  
  DAG vs Blockchain:
  
  DAG transactions are linked to each other rather than grouped into blocks
  
  DAG transactions can be processed simultaneously with others
  
  DAG results in a lessened bottleneck on transaction throughput. In blockchain it's limited, such as transactions that can fit in a single block
  
  DataEngineering
3. pyxelr 02 Mar 2020
  
  in Public
  
  graph data structure that uses topological ordering, meaning that the graph flows in only one direction, and it never goes in circles.
  
  Simple definition of Directed Acyclic Graph (DAG)
  
  DataEngineering
4. pyxelr 02 Mar 2020
  
  in Public
  
  'Acyclic' means that it is impossible to start at one point of the graph and come back to it by following the edges.
  
  Meaning of "acyclic" in Directed Acyclic Graph
  
  DataEngineering
Visit annotations in context

Tags

DataEngineering

Annotators

pyxelr

URL

golden.com/wiki/Directed_acyclic_graph
towardsdatascience.com towardsdatascience.com

Understanding AUC - ROC Curve – Towards Data Science

5
1. pyxelr 02 Mar 2020
  
  in Public
  
  when AUC is 0.5, it means model has no class separation capacity whatsoever.
  
  If AUC = 0.5
  
  DataScience statistics
2. pyxelr 02 Mar 2020
  
  in Public
  
  ROC is a probability curve and AUC represents degree or measure of separability. It tells how much model is capable of distinguishing between classes.
  
  ROC & AUC
  
  DataScience statistics
3. pyxelr 02 Mar 2020
  
  in Public
  
  In multi-class model, we can plot N number of AUC ROC Curves for N number classes using One vs ALL methodology. So for Example, If you have three classes named X, Y and Z, you will have one ROC for X classified against Y and Z, another ROC for Y classified against X and Z, and a third one of Z classified against Y and X.
  
  Using AUC ROC curve for multi-class model
  
  DataScience statistics
4. pyxelr 02 Mar 2020
  
  in Public
  
  When AUC is approximately 0, model is actually reciprocating the classes. It means, model is predicting negative class as a positive class and vice versa
  
  If AUC = 0
  
  DataScience statistics
5. pyxelr 02 Mar 2020
  
  in Public
  
  AUC near to the 1 which means it has good measure of separability.
  
  If AUC = 1
  
  DataScience statistics
Visit annotations in context

Tags

DataScience

statistics

Annotators

pyxelr

URL

towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5
www.linkedin.com www.linkedin.com

LinkedIn

1
1. pyxelr 02 Mar 2020
  
  in Public
  
  LR is nothing but the binomial regression with logit link (or probit), one of the numerous GLM cases. As a regression - itself it doesn't classify anything, it models the conditional (to linear predictor) expected value of the Bernoulli/binomially distributed DV.
  
  Linear Regression - the ultimate definition (it's not a classification algorithm!)
  
  It's used for classification when we specify a 50% threshold.
  
  DataScience
Visit annotations in context

Tags

DataScience

Annotators

pyxelr

URL

linkedin.com/feed/update/urn:li:activity:6601827374533353472/
victorzhou.com victorzhou.com

A Simple Explanation of the Bag-of-Words Model - victorzhou.com

3
1. pyxelr 02 Mar 2020
  
  in Public
  
  BOW is often used for Natural Language Processing (NLP) tasks like Text Classification. Its strengths lie in its simplicity: it’s inexpensive to compute, and sometimes simpler is better when positioning or contextual info aren’t relevant
  
  Usefulness of BOW:
  
  simplicity
  
  low on computing requirements
  
  DataScience
2. pyxelr 02 Mar 2020
  
  in Public
  
  Notice that we lose contextual information, e.g. where in the document the word appeared, when we use BOW. It’s like a literal bag-of-words: it only tells you what words occur in the document, not where they occurred
  
  The analogy behind using bag term in the bag-of-words (BOW) model.
  
  DataScience
3. pyxelr 02 Mar 2020
  
  in Public
  
  Here’s my preferred way of doing it, which uses Keras’s Tokenizer class
  
  Keras's Tokenizer Class - Victor's preferred way of implementing BOW in Python
  
  DataScience Python keras
Visit annotations in context

Tags

keras

DataScience

Python

Annotators

pyxelr

URL

victorzhou.com/blog/bag-of-words/
martinfowler.com martinfowler.com

Continuous Delivery for Machine Learning

1
1. pyxelr 02 Mar 2020
  
  in Public
  
  process that takes input data through a series of transformation stages, producing data as output
  
  Data pipeline
  
  DataScience
Visit annotations in context

Tags

DataScience

Annotators

pyxelr

URL

martinfowler.com/articles/cd4ml.html
victorzhou.com victorzhou.com

A Simple Explanation of the Softmax Function - victorzhou.com

1
1. pyxelr 02 Mar 2020
  
  in Public
  
  Softmax turns arbitrary real values into probabilities
  
  Softmax function -
  
  outputs of the function are in range [0,1] and add up to 1. Hence, they form a probability distribution
  
  the calcualtion invloves e (mathematical constant) and performs operation on n numbers: $$s(x_i) = \frac{e^{xi}}{\sum{j=1}^n e^{x_j}}$$
  
  the bigger the value, the higher its probability
  
  lets us answer classification questions with probabilities, which are more useful than simpler answers (e.g. binary yes/no)
  
  DataScience statistics
Visit annotations in context

Tags

DataScience

statistics

Annotators

pyxelr

URL

victorzhou.com/blog/softmax/
www.linkedin.com www.linkedin.com

LinkedIn

1
1. pyxelr 02 Mar 2020
  
  in Public
  
  1. Logistic regression IS a binomial regression (with logit link), a special case of the Generalized Linear Model. It doesn't classify anything *unless a threshold for the probability is set*. Classification is just its application. 2. Stepwise regression is by no means a regression. It's a (flawed) method of variable selection. 3. OLS is a method of estimation (among others: GLS, TLS, (RE)ML, PQL, etc.), NOT a regression. 4. Ridge, LASSO - it's a method of regularization, NOT a regression. 5. There are tens of models for the regression analysis. You mention mainly linear and logistic - it's just the GLM! Learn the others too (link in a comment). STOP with the "17 types of regression every DS should know". BTW, there're 270+ statistical tests. Not just t, chi2 & Wilcoxon
  
  5 clarifications to common misconceptions shared over data science cheatsheets on LinkedIn
  
  DataScience statistics
Visit annotations in context

Tags

DataScience

statistics

Annotators

pyxelr

URL

linkedin.com/posts/adrianolszewski_biostatistics-statistics-rstats-activity-6621020198185111552-jw12/
news.ycombinator.com news.ycombinator.com

I Got More Data, My Model Is More Refined, but My Estimator Is Getting Worse [pdf] | Hacker News

1
1. pyxelr 02 Mar 2020
  
  in Public
  
  400 sized probability sample (a small random sample from the whole population) is often better than a millions sized administrative sample (of the kind you can download from gov sites). The reason is that an arbitrary sample (as opposed to a random one) is very likely to be biased, and, if large enough, a confidence interval (which actually doesn't really make sense except for probability samples) will be so narrow that, because of the bias, it will actually rarely, if ever, include the true value we are trying to estimate. On the other hand, the small, random sample will be very likely to include the true value in its (wider) confidence interval
  
  Summary of Lecture 01 (Data Science Lifecycle, Study Design) - Data 100 Su19
  
  DataScience
Visit annotations in context

Tags

DataScience

Annotators

pyxelr

URL

news.ycombinator.com/item
www.linkedin.com www.linkedin.com

LinkedIn

1
1. pyxelr 02 Mar 2020
  
  in Public
  
  An exploratory plot is all about you getting to know the data. An explanatory graphic, on the other hand, is about telling a story using that data to a specific audience.
  
  Exploratory vs Explanatory plot
  
  DataScience statistics visualisation
Visit annotations in context

Tags

visualisation

DataScience

statistics

Annotators

pyxelr

URL

linkedin.com/feed/update/urn:li:activity:6632255681946767360/
victorzhou.com victorzhou.com

Easy Visual Question Answering - victorzhou.com

4
1. pyxelr 02 Mar 2020
  
  in Public
  
  Here’s a very simple example of how a VQA system might answer the question “what color is the triangle?”
  
  Look for shapes and colours using CNN.
  
  Understand the question type with NLP.
  
  Determine strength for each possible answer.
  
  Convert each answer strength to % probability
  
  MachineLearning computervision
2. pyxelr 02 Mar 2020
  
  in Public
  
  Visual Question Answering (VQA): answering open-ended questions about images. VQA is interesting because it requires combining visual and language understanding.
  
  Visual Question Answering (VQA) = visual + language understanding
  
  MachineLearning computervision
3. pyxelr 02 Mar 2020
  
  in Public
  
  Most VQA models would use some kind of Recurrent Neural Network (RNN) to process the question input
  
  Most VQA will use RNN to process the question input
  
  Easier VQA datasets shall be fine with using BOW to transport vector input to a standard (feedforward) NN
  
  MachineLearning computervision
4. pyxelr 02 Mar 2020
  
  in Public
  
  The standard approach to performing VQA looks something like this: Process the image. Process the question. Combine features from steps 1/2. Assign probabilities to each possible answer.
  
  Approach to handle VQA problems:
  
  animation
  
  MachineLearning computervision
Visit annotations in context

Tags

computervision

MachineLearning

Annotators

pyxelr

URL

victorzhou.com/blog/easy-vqa/
thegradient.pub thegradient.pub

The State of Machine Learning Frameworks in 2019

19
1. pyxelr 02 Mar 2020
  
  in Public
  
  Script mode takes a function/class, reinterprets the Python code and directly outputs the TorchScript IR. This allows it to support arbitrary code, however it essentially needs to reinterpret Python
  
  Script mode in PyTorch
  
  MachineLearning
2. pyxelr 02 Mar 2020
  
  in Public
  
  In 2019, the war for ML frameworks has two remaining main contenders: PyTorch and TensorFlow. My analysis suggests that researchers are abandoning TensorFlow and flocking to PyTorch in droves. Meanwhile in industry, Tensorflow is currently the platform of choice, but that may not be true for long
  
  in research: PyTorch > TensorFlow
  
  in industry: TensorFlow > PyTorch
  
  MachineLearning
3. pyxelr 02 Mar 2020
  
  in Public
  
  Why do researchers love PyTorch?
  
  simplicity <--- pythonic like, easily integrates with its ecosystem
  
  great API <--- TensorFlow used to switch API many times
  
  performance <--- it's not so clear if it's faster than TensorFlow
  
  MachineLearning
4. pyxelr 02 Mar 2020
  
  in Public
  
  Researchers care about how fast they can iterate on their research, which is typically on relatively small datasets (datasets that can fit on one machine) and run on <8 GPUs. This is not typically gated heavily by performance considerations, but by their ability to quickly implement new ideas. On the other hand, industry considers performance to be of the utmost priority. While 10% faster runtime means nothing to a researcher, that could directly translate to millions of savings for a company
  
  Researchers value how fast they can implement tools on their research.
  
  Industry considers value performance as it brings money.
  
  MachineLearning
5. pyxelr 02 Mar 2020
  
  in Public
  
  From the early academic outputs Caffe and Theano to the massive industry-backed PyTorch and TensorFlow
  
  It's not easy to track all the ML frameworks
  
  Caffe, Theano ---> PyTorch, TensorFlow
  
  MachineLearning
6. pyxelr 02 Mar 2020
  
  in Public
  
  When you run a PyTorch/TensorFlow model, most of the work isn’t actually being done in the framework itself, but rather by third party kernels. These kernels are often provided by the hardware vendor, and consist of operator libraries that higher-level frameworks can take advantage of. These are things like MKLDNN (for CPU) or cuDNN (for Nvidia GPUs). Higher-level frameworks break their computational graphs into chunks, which can then call these computational libraries. These libraries represent thousands of man hours of effort, and are often optimized for the architecture and application to yield the best performance
  
  What happens behind when you run ML frameworks
  
  MachineLearning
7. pyxelr 02 Mar 2020
  
  in Public
  
  Jax is built by the same people who built the original Autograd, and features both forward- and reverse-mode auto-differentiation. This allows computation of higher order derivatives orders of magnitude faster than what PyTorch/TensorFlow can offer
  
  Jax
  
  MachineLearning
8. pyxelr 02 Mar 2020
  
  in Public
  
  TensorFlow will always have a captive audience within Google/DeepMind, but I wonder whether Google will eventually relax this
  
  Generally, PyTorch will be much more favorised that maybe one day it will replace TensorFlow at Google's offices
  
  MachineLearning
9. pyxelr 02 Mar 2020
  
  in Public
  
  Once your PyTorch model is in this IR, we gain all the benefits of graph mode. We can deploy PyTorch models in C++ without a Python dependency , or optimize it.
  
  MachineLearning
10. pyxelr 02 Mar 2020
  
  in Public
  
  At their core, PyTorch and Tensorflow are auto-differentiation frameworks
  
  auto-differentation = takes derivative of some function. It can be implemented in many ways so most ML frameworks choose "reverse-mode auto-differentation" (known as "backpropagation")
  
  MachineLearning
11. pyxelr 02 Mar 2020
  
  in Public
  
  At the API level, TensorFlow eager mode is essentially identical to PyTorch’s eager mode, originally made popular by Chainer. This gives TensorFlow most of the advantages of PyTorch’s eager mode (ease of use, debuggability, and etc.) However, this also gives TensorFlow the same disadvantages. TensorFlow eager models can’t be exported to a non-Python environment, they can’t be optimized, they can’t run on mobile, etc. This puts TensorFlow in the same position as PyTorch, and they resolve it in essentially the same way - you can trace your code (tf.function) or reinterpret the Python code (Autograph).
  
  Tensorflow Eager
  
  MachineLearning
12. pyxelr 02 Mar 2020
  
  in Public
  
  TensorFlow came out years before PyTorch, and industry is slower to adopt new technologies than researchers
  
  Reason why PyTorch wasn't previously more popular than TensorFlow
  
  MachineLearning
13. pyxelr 02 Mar 2020
  
  in Public
  
  The PyTorch JIT is an intermediate representation (IR) for PyTorch called TorchScript. TorchScript is the “graph” representation of PyTorch. You can turn a regular PyTorch model into TorchScript by using either tracing or script mode.
  
  PyTorch JIT
  
  MachineLearning
14. pyxelr 02 Mar 2020
  
  in Public
  
  In 2018, PyTorch was a minority. Now, it is an overwhelming majority, with 69% of CVPR using PyTorch, 75+% of both NAACL and ACL, and 50+% of ICLR and ICML
  
  MachineLearning
15. pyxelr 02 Mar 2020
  
  in Public
  
  Tracing takes a function and an input, records the operations that were executed with that input, and constructs the IR. Although straightforward, tracing has its downsides. For example, it can’t capture control flow that didn’t execute. For example, it can’t capture the false block of a conditional if it executed the true block
  
  Tracing mode in PyTorch
  
  MachineLearning
16. pyxelr 02 Mar 2020
  
  in Public
  
  On the other hand, industry has a litany of restrictions/requirements
  
  TensorFlow's requirements:
  
  no Python <--- overhead of the Python runtime might be too much to take
  
  mobile <--- Python can't be embedded in the mobile binary
  
  serving <--- no-downtime updates of models, switching between models seamlessly, etc.
  
  MachineLearning
17. pyxelr 02 Mar 2020
  
  in Public
  
  TensorFlow is still the dominant framework. For example, based on data [2] [3] from 2018 to 2019, TensorFlow had 1541 new job listings vs. 1437 job listings for PyTorch on public job boards, 3230 new TensorFlow Medium articles vs. 1200 PyTorch, 13.7k new GitHub stars for TensorFlow vs 7.2k for PyTorch, etc
  
  Nowadays, the numbers still play against PyTorch
  
  MachineLearning
18. pyxelr 02 Mar 2020
  
  in Public
  
  every major conference in 2019 has had a majority of papers implemented in PyTorch
  
  Legend:
  
  CVPR, ICCV, ECCV - computer vision conferences
  
  NAACL, ACL, EMNLP - NLP conferences
  
  ICML, ICLR, NeurIPS - general ML conferences
  
  Interactive version
  
  MachineLearning
19. pyxelr 02 Mar 2020
  
  in Public
  
  the transition from TensorFlow 1.0 to 2.0 will be difficult and provides a natural point for companies to evaluate PyTorch
  
  Chance of faster transition to PyTorch in industry
  
  MachineLearning
Visit annotations in context

Tags

MachineLearning

Annotators

pyxelr

URL

thegradient.pub/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry/
mattturck.com mattturck.com

The Killer App for Machine Learning: In Conversation with Pedro Domingos, Head of Machine Learning, D.E. Shaw

8
1. pyxelr 02 Mar 2020
  
  in Public
  
  For the application of machine learning in finance, it’s still very early days. Some of the stuff people have been doing in finance for a long time is simple machine learning, and some people were using neural networks back in the 80s and 90s. But now we have a lot more data and a lot more computing power, so with our creativity in machine learning research, “We are so much in the beginning that we can’t even picture where we’re going to be 20 years from now”
  
  We are just in time to apply modern ML techniques to financial industry
  
  MachineLearning finance
2. pyxelr 02 Mar 2020
  
  in Public
  
  ability to learn from data e.g. OpenAI and the Rubik’s Cube and DeepMind with AlphaGo required the equivalent of thousands of years of gameplay to achieve those milestones
  
  Even while making the perfect algorithm, we have to expect long hours of learning
  
  MachineLearning
3. pyxelr 02 Mar 2020
  
  in Public
  
  Pedro’s book “The Master Algorithm” takes readers on a journey through the five dominant paradigms of machine learning research on a quest for the master algorithm. Along the way, Pedro wanted to abstract away from the mechanics so that a broad audience, from the CXO to the consumer, can understand how machine learning is shaping our lives
  
  "The Master Algorithm" book seems to be too abstract in such a case; however, it covers the following 5 paradigms:
  
  Rule based learning (Decision trees, Random Forests, etc)
  
  Connectivism (neural networks, etc)
  
  Bayesian (Naive Bayes, Bayesian Networks, Probabilistic Graphical Models)
  
  Analogy (KNN & SVMs)
  
  Unsupervised Learning (Clustering, dimensionality reduction, etc)
  
  MachineLearning books
4. pyxelr 02 Mar 2020
  
  in Public
  
  We’ve always lived in a world which we didn’t completely understand but now we’re living in a world designed by us – for Pedro, that’s actually an improvement
  
  We never really understood the surroundings, but now we have a great impact to modify it
  
  MachineLearning
5. pyxelr 02 Mar 2020
  
  in Public
  
  But at the end of the day, what we know about neuroscience today is not enough to determine what we do in AI, it’s only enough to give us ideas. In fact it’s a two way street – AI can help us to learn how the brain works and this loop between the two disciplines is a very important one and is growing very rapidly
  
  Neuroscience can help us understand AI and the opposite
  
  MachineLearning
6. pyxelr 02 Mar 2020
  
  in Public
  
  Pedro believes that success will come from unifying the different major types of learning and their master algorithms –not just combining, but unifying them such that “it feels like using one thing”
  
  Interesting point of view on designing the master algorithm
  
  MachineLearning
7. pyxelr 02 Mar 2020
  
  in Public
  
  if you look at the number of connections that the state of the art machine learning systems for some of these problems have, they’re more than many animals – they have many hundreds of millions or billions of connections
  
  State of the art ML systems are composed of millions or billions of connections (close to humans)
  
  MachineLearning
8. pyxelr 02 Mar 2020
  
  in Public
  
  There was this period of a couple of 100 years where we understood our technology. Now we just have to learn live in a world where we don’t understand the machines that work for us, we just have to be confident they are working for us and doing their best
  
  Should we just accept the fact that machines will rule the world with a mysterious intelligence?
  
  MachineLearning
Visit annotations in context

Tags

finance

MachineLearning

books

Annotators

pyxelr

URL

mattturck.com/domingos/
cloud.google.com cloud.google.com

How 20th Century Fox uses ML to predict a movie audience | Google Cloud Blog

13
1. pyxelr 02 Mar 2020
  
  in Public
  
  team began its analysis on YouTube 8M, a publicly available dataset of YouTube videos
  
  YouTube 8M - public dataset of YouTube videos. With this, we can analyse video features like: *color
  
  illumination
  
  many types of faces
  
  thousands of objects
  
  several landscapes
  
  MachineLearning
2. pyxelr 02 Mar 2020
  
  in Public
  
  The trailer release for a new movie is a highly anticipated event that can help predict future success, so it behooves the business to ensure the trailer is hitting the right notes with moviegoers. To achieve this goal, the 20th Century Fox data science team partnered with Google’s Advanced Solutions Lab to create Merlin Video, a computer vision tool that learns dense representations of movie trailers to help predict a specific trailer’s future moviegoing audience
  
  Merlin Video - computer vision tool to help predict a specific trailer's moviegoing audience
  
  MachineLearning
3. pyxelr 02 Mar 2020
  
  in Public
  
  pipeline also includes a distance-based “collaborative filtering” (CF) model and a logistic regression layer that combines all the model outputs together to produce the movie attendance probability
  
  other elements of pipeline:
  
  collaborative filtering (CF) model
  
  logistic regression layer
  
  MachineLearning
4. pyxelr 02 Mar 2020
  
  in Public
  
  Merlin returns the following labels: facial_hair, beard, screenshot, chin, human, film
  
  Types of features Merlin Video can generate from a single trailer frame.
  
  Final result of feature collecting and ordering:
  
  MachineLearning
5. pyxelr 02 Mar 2020
  
  in Public
  
  The obvious choice was Cloud Machine Learning Engine (Cloud ML Engine), in conjunction with the TensorFlow deep learning framework
  
  Merlin Video is powered by:
  
  Cloud Machine Learning Engine - automating infrastructure (resources, provisioning and monitoring)
  
  TensorFlow
  
  Cloud Dataflow and Data Studio - Dataflow generates reports in Data Studio
  
  BigQuery and BigQueryML - used in a final step to merge Merlin’s millions of customer predictions with other data sources to create useful reports and to quickly prototype media plans for marketing campaigns
  
  MachineLearning
6. pyxelr 02 Mar 2020
  
  in Public
  
  custom model learns the temporal sequencing of labels in the movie trailer
  
  Temporal sequencing - times of different shots (e.g. long or short).
  
  Temporal sequencing can convey information on:
  
  movie type
  
  movie plot
  
  roles of the main characters
  
  filmmakers' cinematographic choices.
  
  When combined with historical customer data, sequencing analysis can be used to create predictions of customer behavior.
  
  arxiv paper on Merlin Video
  
  MachineLearning
7. pyxelr 02 Mar 2020
  
  in Public
  
  The elasticity of Cloud ML Engine allowed the data science team to iterate and test quickly, without compromising the integrity of the deep learning model
  
  Cloud ML Engine reduced the deployment time from months to days
  
  MachineLearning
8. pyxelr 02 Mar 2020
  
  in Public
  
  Architecture flow diagram for Merlin
  
  MachineLearning
9. pyxelr 02 Mar 2020
  
  in Public
  
  The first challenge is the temporal position of the labels in the trailer: it matters when the labels occur in the trailer. The second challenge is the high dimensionality of this data
  
  2 challenges that we find in labelling video clips: occurrence and volume of labels
  
  MachineLearning
10. pyxelr 02 Mar 2020
  
  in Public
  
  When it comes to movies, analyzing text taken from a script is limiting because it only provides a skeleton of the story, without any of the additional dynamism that can entice an audience to see a movie
  
  Analysing movie script isn't enough to predict the overall movie's attractiveness to the audience
  
  MachineLearning movies
11. pyxelr 02 Mar 2020
  
  in Public
  
  20th Century Fox has been using this tool since the release of The Greatest Showman in 2017, and continues to use it to inform their latest releases
  
  The Merlin Video tool is used nowadays by 20th Century Fox
  
  MachineLearning
12. pyxelr 02 Mar 2020
  
  in Public
  
  model is trained end-to-end, and the loss of the logistic regression is back-propagated to all the trainable components (weights). Merlin’s data pipeline is refreshed weekly to account for new trailer releases
  
  Way the model is trained and located in the pipeline
  
  MachineLearning
13. pyxelr 02 Mar 2020
  
  in Public
  
  After a movie’s release, we are able to process the data on which movies were previously seen by that audience. The table below shows the top 20 actual moviegoer audiences (Comp ACT) compared to the top 20 predicted audiences (Comp PRED)
  
  Way of validating the Merlin model
  
  MachineLearning
Visit annotations in context

Tags

movies

MachineLearning

Annotators

pyxelr

URL

cloud.google.com/blog/products/ai-machine-learning/how-20th-century-fox-uses-ml-to-predict-a-movie-audience
dev.to dev.to

Good and Bad Practices of Coding in Python

6
1. pyxelr 02 Mar 2020
  
  in Public
  
  It’s possible to check whether a variable refers to it with the comparison operators == and !=
  
  Checking against None with == and !=
  
  >>> x, y = 2, None >>> x == None False >>> y == None True >>> x != None True >>> y != None False
  
  More Pythonic way by using is and is not:
  
  >>> x is None False >>> y is None True >>> x is not None True >>> y is not None False
  
  Python
2. pyxelr 02 Mar 2020
  
  in Public
  
  Python allows defining getter and setter methods similarly as C++ and Java
  
  Getters and Setters in Python:
  
  >>> class C: ... def get_x(self): ... return self.__x ... def set_x(self, value): ... self.__x = value
  
  Get and set the state of the object:
  
  >>> c = C() >>> c.set_x(2) >>> c.get_x() 2
  
  Python
3. pyxelr 02 Mar 2020
  
  in Public
  
  In almost all cases, you can use the range to get an iterator that yields integers
  
  Iterating over Sequences and Mappings
  
  >>> x = [1, 2, 4, 8, 16] >>> for i in range(len(x)): ... print(x[i]) ... 1 2 4 8 16
  
  better way of iterating over a sequence:
  
  >>> for item in x: ... print(item) ... 1 2 4 8 16
  
  Python
4. pyxelr 02 Mar 2020
  
  in Public
  
  Sometimes you need both the items from a sequence and the corresponding indices
  
  Reversing with indices:
  
  >>> for i in range(len(x)): ... print(i, x[i]) ... 0 1 1 2 2 4 3 8 4 16
  
  Better way by using enumerate:
  
  >>> for i, item in enumerate(x): ... print(i, item) ... 0 1 1 2 2 4 3 8 4 16
  
  Python
5. pyxelr 02 Mar 2020
  
  in Public
  
  But what if you want to iterate in the reversed order? Of course, the range is an option again
  
  Iterating over a reversed order:
  
  >>> for i in range(len(x)-1, -1, -1): ... print(x[i]) ... 16 8 4 2 1
  
  More elegant way:
  
  >>> for item in x[::-1]: ... print(item) ... 16 8 4 2 1
  
  Pythonic way of reversing an order:
  
  >>> for item in reversed(x): ... print(item) ... 16 8 4 2 1
  
  Python
6. pyxelr 02 Mar 2020
  
  in Public
  
  it’s often more elegant to define and use properties, especially in simple cases
  
  Defining some properties (considered to be more Pythonic):
  
  >>> class C: ... @property ... def x(self): ... return self.__x ... @x.setter ... def x(self, value): ... self.__x = value
  
  Result:
  
  >>> c = C() >>> c.x = 2 >>> c.x 2
  
  Python
Visit annotations in context

Tags

Python

Annotators

pyxelr

URL

dev.to/duomly/the-best-practices-of-coding-in-python-and-what-to-avoid-3c65
medium.com medium.com

Beyond Python and R for Data Science — Part 1

1
1. pyxelr 02 Mar 2020
  
  in Public
  
  The reason that Julia is fast (ten to 30 times faster than Python) is because it is compiled and not interpreted
  
  Julia seems to be even faster than Scala when comparing to the speed of Python
  
  julia Python
Visit annotations in context

Tags

Python

julia

Annotators

pyxelr

URL

medium.com/better-programming/beyond-python-and-r-for-data-science-part-1-247e1b9417ce
pljung.de pljung.de

Easy Concurrency in Python - Philipps Blog

1
1. pyxelr 02 Mar 2020
  
  in Public
  
  First of all, write a script that carries out the task in a sequential fashion. Secondly, transform the script so that it carries out the task using the map function. Lastly, replace map with a neat function from the concurrent.futures module
  
  Concurrent Python programs in 3 steps:
  
  Write a script that carries out the task in a sequential fashion.
  
  Transform the script so that it carries out the task using the map function.
  
  Replace map with a neat function from the concurrent.futures module.
  
  Python
Visit annotations in context

Tags

Python

Annotators

pyxelr

URL

pljung.de/posts/easy-concurrency-in-python/
stackify.com stackify.com

Python Profilers: Learn the Basics of a profiler for Python in this blog

3
1. pyxelr 02 Mar 2020
  
  in Public
  
  its purpose is to dump Python tracebacks explicitly on a fault, after a timeout, or on a user signal
  
  Faulthandler in contrast to tracing tracks specific events and has slightly better documentation
  
  Python
2. pyxelr 02 Mar 2020
  
  in Public
  
  what parts of the software do we profile (measure its performance metrics)
  
  Most profiled parts of the software:
  
  Method or function (most common)
  
  Lines (similar to method profiling, but doing it line by line)
  
  Memory (memory usage)
  
  Python
3. pyxelr 02 Mar 2020
  
  in Public
  
  Line profiling, as the name suggests, means to profile your Python code line by line
  
  Line profiling
  
  Python
Visit annotations in context

Tags

Python

Annotators

pyxelr

URL

stackify.com/how-to-use-python-profilers-learn-the-basics/

Paweł Cisło

📊 Data connoisseur

💼 MLOps engineer

🌐 www.pawelcislo.com

📚 goodreads.com/pyxelr

Annotations: 2,505

Joined: September 12, 2019

Location: Rzeszów, Poland

Link: pawelcislo.com/

ORCID: 0000-0002-3855-4442

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators