python.testing.unittestArgs: Looks for any Python (.py) file with "test" in the name in the top-level project folder.
Default behaviour of test discovery by unittest framework
python.testing.unittestArgs: Looks for any Python (.py) file with "test" in the name in the top-level project folder.
Default behaviour of test discovery by unittest framework
Create a file named test_unittest.py that contains a test class with two test methods
Sample test file using unittest framework. inc_dec is the file that's being tested:
import inc_dec # The code to test
import unittest # The test framework
class Test_TestIncrementDecrement(unittest.TestCase):
def test_increment(self):
self.assertEqual(inc_dec.increment(3), 4) # checks if the results is 4 when x = 3
def test_decrement(self):
self.assertEqual(inc_dec.decrement(3), 4)
if __name__ == '__main__':
unittest.main()
Each test framework has its own conventions for naming test files and structuring the tests within, as described in the following sections. Each case includes two test methods, one of which is intentionally set to fail for the purposes of demonstration.
Nose2, the successor to Nose, is just unittest with plugins
Nose2 testing
Python tests are Python classes that reside in separate files from the code being tested.
general background on unit testing, see Unit Testing on Wikipedia. For a variety of useful unit test examples, see https://github.com/gwtw/py-sorting
Running the unit test early and often means that you quickly catch regressions, which are unexpected changes in the behavior of code that previously passed all its unit tests.
Regressions
Developers typically run unit tests even before committing code to a repository; gated check-in systems can also run unit tests before merging a commit.
When to run unit tests:
in unit testing you avoid external dependencies and use mock data or otherwise simulated inputs
Unit tests are small, isolated piece of code making them quick and inexpensive to run
The practice of test-driven development is where you actually write the tests first, then write the code to pass more and more tests until all of them pass.
Essence of TDD
The combined results of all the tests is your test report
test report
each test is very simple: invoke the function with an argument and assert the expected return value.
e.g. test of an exact number entry:
def test_validator_valid_string():
# The exact assertion call depends on the framework as well
assert(validate_account_number_format("1234567890"), true)
Unit tests are concerned only with the unit's interface—its arguments and return values—not with its implementation
unit is a specific piece of code to be tested, such as a function or a class. Unit tests are then other pieces of code that specifically exercise the code unit with a full range of different inputs, including boundary and edge cases.
Essence of unit testing
Open Command tool and search for coverage
Checking CSS coverage inside Chrome Dev Tools:
Stroną, która jako jedyna wyświetla wszystkie dostępne do śledzenia statki powietrze jest ADSBexchange.com, którą zasilimy naszymi danymi. Jest to największa na świecie społeczność właścicieli odbiorników ADS-B / Mode S / MLAT i jednocześnie największe na świecie, w pełni publiczne i darmowe, źródło niefiltrowanych danych o lotach. Zapewniany przez to narzędzie dostęp do danych lotów na całym świecie jest wykorzystywany zarówno przez hobbystów, badaczy, jak i, co ciekawe, dziennikarzy.
ADSBexchange.com in comparison to commercial sites like FlightAware or Flightradar24 doesn't hide any flights (as here no one can pay to hide them from the public access).
Can G Suite Admin Read My Email?
tl;dr: YES (check the article for detailed steps)
AinD launches Android apps in Docker, by nesting Anbox containers inside Docker.
AinD - useful tool when we need to run an Android app 24/7 in the cloud.
Unlike the alternatives, AinD is not VM, but IaaS based
gh repo create hello-world -d "A react app for the web" --public
GitHub released a new CLI: gh with which you can do much more operations.
For example, you can create repo without going into your browser:
gh repo create hello-world -d "A react app for the web" --public
Generally, it will be great for CI/CD pipelines
An emerging way to bypass the need for passwords is to use magic links. A magic link is a temporary URL that expires after use, or after a specific interval of time. Magic links can be sent to your email address, an app, or a security device. Clicking the link authorizes you to sign in.
Magic Links to replace passwords?
Hashing passwords won’t save you or your users. Once a database of passwords has been stolen, hackers aim immense distributed computing power at those password databases. They use parallel GPUs or giant botnets with hundreds of thousands of nodes to try hundreds of billions of password combinations per second in hopes of recovering plaintext username/password pairs.
What happens when the database is in hacker's hands
The solution will be to go for serverless functions. This means that instead of occupying a server completely, it will only use the server capacity when the function needs to run.
Serverless as a cheap back end option
My favourites are Zeit and Netlify. They are quite similar in the features they provide: continuous deployment, around 100GB of bandwidth per month, and a built-in CDN. Another benefit is that they both provide the option of serverless functions, as we will see in the next section. This simplifies the number of services we need to integrate for our entire stack.
Good website hosting:
Zeit or Netlify
(CD + 100 GB of bandwith / month + built-in CDN)
Guédelon Castle (Château de Guédelon) is a castle currently under construction near Treigny, France. The castle is the focus of an experimental archaeology project aimed at recreating a 13th-century castle and its environment using period technique, dress, and material.
Guédelon Castle (Château de Guédelon)
More info on HN
To take full advantage of tabular augmentation for time-series you would perform the techniques in the following order: (1) transforming, (2) interacting, (3) mapping, (4) extracting, and (5) synthesising
process of modular feature engineering and observation engineering while emphasising the order of augmentation to achieve the best predicted outcome from a given information set
Tabular Data Augmentation
1) Redash and Falcon focus on people that want to do visualizations on top of SQL2) Superset, Tableau and PowerBI focus on people that want to do visualizations with a UI3) Metabase and SeekTable focus on people that want to do quick analysis (they are the closest to an Excel replacement)
Comparison of data analysis tools:
1) Redash & Falcon - SQL focus
2) Superset, Tableau & PowerBI - UI workflow
3) Metabase & SeekTable - Excel like experience
Visual Studio Code supports working with Jupyter Notebooks natively, as well as through Python code files.
To run cells inside a Python script in VSCode, all you need to is to define Jupyter-like code cells within Python code using a # %% comment:
# %%
msg = "Hello World"
print(msg)
# %%
msg = "Hello again"
print(msg)

JupyterLab project, which enables a richer UI including a file browser, text editors, consoles, notebooks, and a rich layout system.
How JupyterLab differs from a traditional notebook
I could probably bootstrap my way up from this with the C compiler to write a terrible editor, then write a terrible TCP client, find my way out to ftp.gnu.org, get wget, and keep going from there. Assume that documentation is plentiful. You want a copy of the Stevens book so you can figure out how to do a DNS query by banging UDP over the network? Done.
What would the author do in a situation of being alone in a room with:
HD #1 is blank. HD #2 has a few scraps of a (Linux) OS on it: bootloader, kernel, C library and compiler, that sort of thing. There's a network connection of some sort, and that's about it. There are no editors and nothing more advanced than 'cat' to read files. You don't have jed, joe, emacs, pico, vi, or ed (eat flaming death). Don't even think about X. telnet, nc, ftp, ncftp, lftp, wget, curl, lynx, links? Luxury! Gone. Perl, Python and Ruby? Nope.
B&W is compelling because by removing color detail it draws attention to figure, curves, facial expressions, emotions, movement, time of day, depending on subject
What B&W photos expose
We find three important trends in the evolution of musical discourse: the restriction of pitch sequences (with metrics showing less variety in pitch progressions), the homogenization of the timbral palette (with frequent timbres becoming more frequent), and growing average loudness levels. The picture at right shows the timbral variety: Smaller values of β indicate less timbral variety: frequent codewords become more frequent, and infrequent ones become even less frequent. This evidences a growing homogenization of the global timbral palette. It also points towards a progressive tendency to follow more fashionable, mainstream sonorities.
Statistical evidence that modern pop music is boring or at least more homogeneous than in the past.
Dataset used: Million Song Dataset (which doesn't contain entries from the most recent years)
Unfortunately no - it cannot be done without Trusted security devices. There are several reasons for this. All of the below is working on the assumption you have no TPM or other trusted security device in place and are working in a "password only" environment.
Devices without a TPM module will be always asked for a password (e.g. by BitLocker) on every boot
Most people are surprised to learn that of Plotly’s 50 engineers, the vast majority are React developers. This is because Dash is primarily a frontend library — there are far more lines of JavaScript (Typescript) than Python, R, or Julia code. Plotly only has 3 full-time Python developers, 2 full-time R developers, and 0 full-time Julia developers.
Who works behind Plotly/Dash: 50 engineers:
Behind the scenes, when a Python, R or Julia engineer creates a Dash app, they are actually creating a React Single Page Application (“SPA”).
With Dash, any open-source React UI component can be pulled from npm or GitHub, stirred with water, transmogrified into a Dash component, then imported into your Dash app as a Python, R, or Julia library. C’est magnifique! 👨🍳 Dash makes the richness and innovation of the React frontend ecosystem available to Python, R, and Julia engineers for the first time.
Dash components are based on React
To use Gunicorn as your web server, it must be included in the requirements.txt file as an app dependency. It does not need to be installed in your virtual environment/host machine.
pip is a package manager. conda is both a package manager and an environment maanger.
conda vs pip:

To help you get started quickly, we created a special Installer of Visual Studio Code for Java developers. Download Visual Studio Code Java Pack Installer Note: The installer is currently only available for Windows. For other OS, please install those components (JDK, VS Code and Java extensions) individually. We're working on the macOS version, please stay tuned. The package can be used as a clean install or an update for an existing development environment to add Java or Visual Studio Code. Once downloaded and opened, it automatically detects if you have the fundamental components in your local development environment, including the JDK, Visual Studio Code, and essential Java extensions.
If you wish to use Java inside VSCode, try downloading the Installer of Visual Studio Code for Java developers
Note: When you create a new virtual environment, you should be prompted by VS Code to set it as the default for your workspace folder. If selected, the environment will automatically be activated when you open a new terminal.
After creating a new project related environment, it shall be specified as a default for this specific project
Tip: Use Logpoints instead of print statements: Developers often litter source code with print statements to quickly inspect variables without necessarily stepping through each line of code in a debugger. In VS Code, you can instead use Logpoints. A Logpoint is like a breakpoint except that it logs a message to the console and doesn't stop the program. For more information, see Logpoints in the main VS Code debugging article.
Try to use logpoints instead of print statements.
More info: https://code.visualstudio.com/docs/editor/debugging#_logpoints
Open an Anaconda command prompt and run conda create -n myenv python=3.7 pandas jupyter seaborn scikit-learn keras tensorflow
Command to quickly create a new Anaconda environment:
conda create -n myenv python=3.7 pandas jupyter seaborn scikit-learn keras tensorflow
On the job, if you notice poor infrastructure, speak up to your manager early on. Clearly document the problem, and try to incorporate a data engineering, infrastructure, or devops team to help resolve the issue! I'd also encourage you to learn these skills too!
Recommendation to Data Scientists dealing with poor infrastructure
When I worked at Target HQ in 2012, employees would arrive to work early - often around 7am - in order to query the database at a time when few others were doing so. They hoped they’d get database results quicker. Yet, they’d still often wait several hours just to get results.
What poor data infrastructure leads to
The Data Scientist, in many cases, should be called the Data Janitor. ¯_(ツ)_/¯
:D

In regards to quality of data on the job, I’d often compare it to a garbage bag that ripped, had its content spewed all over the ground and your partner has asked you to find a beautiful earring that was accidentally inside.
From my experience, I can only agree
On the job, I’d recommend you document your work well and calculate the monetary value of your analyses based on factors like employee salary, capital investments, opportunity cost, etc. These analyses will come in handy for a promotion/review packet later too.
Factors to keep a track of as a Data Scientist
you as a Data Scientist should try to find a situation to be incredibly valuable on the job! It’s tough to find that from the outskirts of applying to jobs, but internally, you can make inroads supporting stakeholders with evidence for their decisions!
Try showcasing the evidence of why the employer needs you
As a Data Scientist in the org, are you essential to the business? Probably not. The business could go on for a while and survive without you. Sales will still be made, features will still get built, customer support will handle customer concerns, etc.
As a Data Scientist, you are more of a "support" to the overall team
As the resident Data Scientist, you may become easily inundated with requests from multiple teams at once. Be prepared to ask these teams to qualify and defend their requests, and be prepared to say “no” if their needs fall outside the scope of your actual priority queue. I’d recommend utilizing the RICE prioritization technique for projects.
Being the only data person be sure to prioritise the requests
Common questions are: How many users click this button; what % of users that visit a screen click this button; how many users have signed up by region or account type? However, the data needed to answer those questions may not exist! If the data does exists, it’s likely “dirty” - undocumented, tough to find or could be factually inaccurate. It’ll be tough to work with! You could spend hours or days attempting to answer a single question only to discover that you can’t sufficiently answer it for a stakeholder. In machine learning, you may be asked to optimize some process or experience for consumers. However, there’s uncertainty with how much, if at all, the experience can be improved!
Common types of problems you might be working with in Data Science / Machine Learning industry
In one experience, a fellow researcher spent over a month researching a particular value among our customers through qualitative and quantitative data. She presented a well-written and evidence-backed report. Yet, a few days later, a key head of product outlined a vision for the team and supported it with a claim that was antithetical to the researcher’s findings! Even if a data science project you advocate for is greenlighted, you may be on your own as the rare knowledgeable person to plan and execute it. It’s unlikely leadership will be hands-on to help you research and plan out the project.
Data science leadership is sorely lacking
Because people don’t know what data science does, you may have to support yourself with work in devops, software engineering, data engineering, etc.
You might have multiple roles due to lack of clear "data science" terminology
In 50+ interviews for data related jobs, I’ve been asked about AB testing, SQL analytics questions, optimizing SQL queries, how to code a game in Python, Logistic Regression, Gradient Boosted Trees, data structures and/or algorithms programming problems!
Data science interviews lack a clarity of scope
seven most common (and at times flagrant) ways that data science has failed to meet expectations in industry
In data science community the performance of the model on the test dataset is one of the most important things people look at. Just look at the competitions on kaggle.com. They are extremely focused on test dataset and the performance of these models is really good.
In data science, performance of the model on the test model is the most important metric for the majority.
It's not always the best measurement since the most efficient model can completely misperform while receiving a different type of a dataset
It's basically a look up table, interpolating between known data points. Except, unlike other interpolants like 'linear', 'nearest neighbour' or 'cubic', the underlying functional form is determined to best represent the kind of data you have.
You can describe AI/ML methods as a look up table that adjusts to your data points unlike other interpolants (linear,nearest neighbor or cubic)
there must be a space before any punctuation sign that is made of two parts ( ; : ! ? ) and no space before any other punctuation sign
French rule
git reflog is a very useful command in order to show a log of all the actions that have been taken! This includes merges, resets, reverts: basically any alteration to your branch.
Reflog - shows the history of actions in the repo.
With this information, you can easily undo changes that have been made to a repository with git reset
git reflog
Say that we actually didn't want to merge the origin branch. When we execute the git reflog command, we see that the state of the repo before the merge is at HEAD@{1}. Let's perform a git reset to point HEAD back to where it was on HEAD@{1}!
it pull is actually two commands in one: a git fetch, and a git merge. When we're pulling changes from the origin, we're first fetching all the data like we did with a git fetch, after which the latest changes are automatically merged into the local branch.
Pulling - downloads content from a remote branch/repository like git fetch would do, and automatically merges the new changes
git pull origin master
It doesn't affect your local branch in any way: a fetch simply downloads new data.
Fetching - downloads content from a remote branch or repository without modifying the local state
git fetch origin master
When a certain branch contains a commit that introduced changes we need on our active branch, we can cherry-pick that command! By cherry-picking a commit, we create a new commit on our active branch that contains the changes that were introduced by the cherry-picked commit.
Cherry-picking - creates a new commit with the changes that the cherry-picked commit introduced.
By default, Git will only apply the changes if the current branch does not have these changes in order to prevent an empty commit
git cherry-pick 76d12
Another way of undoing changes is by performing a git revert. By reverting a certain commit, we create a new commit that contains the reverted changes!
Reverting - reverts the changes that commits introduce. Creates a new commit with the reverted changes
git revert ec5be
Sometimes, we don't want to keep the changes that were introduced by certain commits. Unlike a soft reset, we shouldn't need to have access to them any more.
Hard reset - points HEAD to the specified commit.
Discards changes that have been made since the new commit that HEAD points to, and deletes changes in working directory
git reset --hard HEAD~2
git status
git rebase copies the commits from the current branch, and puts these copied commits on top of the specified branch.
Rebasing - copies commits on top of another branch without creating a commit, which keeps a linear history.
Changes the history as new hashes are created for the copied commits.
git rebase master
A big difference compared to merging, is that Git won't try to find out which files to keep and not keep. The branch that we're rebasing always has the latest changes that we want to keep! You won't run into any merging conflicts this way, and keep a nice linear Git history.
Rebasing is great whenever you're working on a feature branch, and the master branch has been updated.
This can happen when the two branches we're trying to merge have changes on the same line in the same file, or if one branch deleted a file that another branch modified, and so on.
Merge conflict - you have to decide from which branch to keep the change.
After:
git merge dev
Git will notify you about the merge conflict so you can manually remove the changes you don't want to keep, save them, and then:
git add updated_file.md
git commit -m "Merge..."
If we committed changes on the current branch that the branch we want to merge doesn't have, git will perform a no-fast-forward merge.
No-fast-forward merge - default behavior when current branch contains commits that the merging branch doesn't have.
Create a new commit which merges two branches together without modifying existing branches.
git merge dev
fast-forward merge can happen when the current branch has no extra commits compared to the branch we’re merging.
Fast-forward merge - default behavior when the branch has all of the current branch's commits.
Doesn't create a new commit, thus doesn't modify existing branches.
git merge dev
soft reset moves HEAD to the specified commit (or the index of the commit compared to HEAD), without getting rid of the changes that were introduced on the commits afterward!
Soft reset - points HEAD to the specified commit.
Keeps changes that have been made since the new commit the HEAD points to, and keeps the modifications in the working directory
git reset --soft HEAD~2
git status
git reset gets rid of all the current staged files and gives us control over where HEAD should point to.
Reset - way to get rid of unwanted commits. We have soft and hard reset
There are 6 actions we can perform on the commits we're rebasing
Interactive rebase - makes it possible to edit commits before rebasing.
Creates new commits for the edited commits which history has been changed.
6 actions (options) of interactive rebase:
reword: Change the commit messageedit: Amend this commitsquash: Meld commit into the previous commitfixup: Meld commit into the previous commit, without keeping the commit's log messageexec: Run a command on each commit we want to rebasedrop: Remove the commitgit rebase -i HEAD~3
Pharma, which is one of the biggest, richest, most rewarding and promising industries in the world. Especially now, when the pharmaceutical industry, including the FDA, allows R to be used the domain occupied in 110% by SAS.
Pharma industry is one of the most rewarding industries, especially now
CR is one of the most controlled industries in this world. It's insanely conservative in both used statistical methods and programming. Once a program is written and validated, it may be used for decades. There are SAS macros written in 1980 working still by today without any change. That's because of brilliant backward compatibility of the SAS macro-language. New features DO NOT cause the old mechanisms to be REMOVED. It's here FOREVER+1 day.
Clinical Research is highly conservative, making SAS macros applicable for decades. Unfortunately, that's not the same case with R
wearing simple face masks which exert a barrier function that blocks those big projectile droplets that land in the nose or throat may substantially reduce the production rate R, to an extent that may be comparable to social distancing and washing hands.
Most important message of the article
Make sure to discard or launder after use without touching the outward surface
What to do with a mask after usage
CDC suggests use of scarf by health care providers as last resort when no face masks are available
Use of scarf
avoiding large droplets, which cannot enter the lung anyway but land in the upper respiratory tracts, could be the most effective means to prevent infection. Therefore, surgical masks, perhaps even your ski-mask, bandanas or scarf
Wear a mask!
The molecular analysis also show that the SARS-Cov2 virus is active and replicates already in the nasopharynx, unlike other respiratory viruses that dwell in deeper regions of the lung.
Surprisingly, ACE2 expression in the lung is very low: it is limited to a few molecules per cell in the alveolar cells (AT2 cells) deep in the lung. But a just published paper by the Human Cell Atlas (HCA) consortium reports that ACE2 is highly expressed in some type of (secretory) cells of the inner nose!
Major route of viral entry is likely via large droplets that land in the nose — where expression of the viral entry receptor, ACE2 is highest. This is the transmission route that could be effectively blocked already by simple masks that provide a physical barrier.

SARS-Cov-2 virus, like any virus, must dock onto human cells using a key-lock principle, in which the virus presents the key and the cell the lock that is complementary to the key to enter the cell and replicate. For the SARS-Cov-2 virus, the viral surface protein “Spike protein S” is the “key” and it must fit snugly into the “lock” protein that is expressed (=molecularly presented) on the surface of the host cells. The cellular lock protein that the SARS-Cov-2 virus uses is the ACE2 protein
SARS-Cov-2 enters the host cell by docking with its Spike protein to the ACE2 (blue) protein in cell surfaces:

Filtering effect for small droplets (aerosols) by various masks; home-made of tea cloth, surgical mask (3M “Tie-on”) and a FFP2 (N95) respirator mask. The numbers are scaled to the reference of 100 (source of droplets) for illustrative purposes, calculated from the PF (protection factor) values

The tacit notion at the CDC that the alveolae are the destination site for droplets to deliver the virus load (the alveolae are after all the anatomical site of life-threatening pneumonia), has elevated the apparent importance of N95 masks and led to the dismissal of surgical masks.
Why N95 masks are much better over the surgical masks
In the case of the SARS-Cov-2 virus it is not known what the minimal infectious load is (number of viral particles needed to start the pathogenesis cascade that causes a clinical disease).
Minimal infectious load
Of course many aerosol droplets in the exhalation or cough spray may not contain the virus, but some will do.
droplets of a typical cough expulsion have a size distribution such that approximately half of the droplet are in the categories of aerosols, albeit they collectively represent only less than 1/100,000 of the expelled volume
Droplets of a typical cough
For airborne particles to be inspired and reach deep into the lung, through all the air ducts down to the alveolar cells where gas-exchange takes place, it has to be small
Only droplets < 10 um can reach to alveolae (deep into lung). Larger droplets stuck in the nose, throat, upper air ducts of the lung, trachea and large bronchia.

Droplets can (for this discussion) be crudely divided in two large categories based on size
2 categories of droplets:
a) Droplets < 10 um: upper size limit of aerosol. Can float in the air/rooms by ventilation or winds and can be filtered (to 95%) by N95 favial masks (droplets < than 0.3 um). Here the surgical masks cannot help.
b) Droplets > 10 um (reaching 100+ um): called as spray droplets. Can be even visible by human from coughing/sneezing (0.1+ um).
Droplet larger than aerosols, when exhaled (at velocity of <1m/s), evaporate or fall to the ground less than 1.5 m away. When expelled at high velocity through coughing or sneezing, especially larger droplets (> 0.1 micrometers), can be carried by the jet more than 2m or 6m, respectively, away.

The official recommendation by CDC, FDA and others that masks worn by the non-health-care professionals are ineffective is incorrect at three levels: In the logic, in the mechanics of transmission, and in the biology of viral entry.
Flattening the curve”. Effect of mitigating interventions that would decrease the initial reproduction rate R0 by 50% when implemented at day 25. Red curve is the course of numbers of infected individuals (”case”) without intervention. Green curve reflects the changed (”flattened”) curve after intervention. Day 0 (March 3, 2020) is the time at which 100 cases of infections were confirmed (d100 = 0).
If people would start wearing a mask:

Ancestor of all animals identified in Australian fossils
Summary:
Połączona sieć komputerów działających w ramach inicjatywy Folding@Home przewyższyła swoją mocą obliczeniową najwydajniejsze siedem superkomputerów na świecie. Dobrze przeczytaliście: połączone w ramach F@H urządzenia dysponują mocą obliczeniową na poziomie 470 PetaFLOPS - wyższą od siedmiu najwydajniejszych superkomputerów na świecie razem wziętych! To trzy razy większa wydajność od tej, którą dysponuje najwydajniejszy obecnie superkomputer SUMMIT (149 PFLOPS).
Internauts build a supercomputer network stronger than 7 most efficient local supercomputers interconnected. The reason is to fight against COVID-19.
You can also join them by using Folding@home software
This denotes the factorial of a number. It is the product of numbers starting from 1 to that number.
Exclamation in Python: $$x!$$ is written as:
x = 5
fact = 1
for i in range(x, 0, -1):
fact = fact * i
print(fact)
it can be shortened as:
import math
math.factorial(x)
and the output is:
# 5*4*3*2*1
120
The hat gives the unit vector. This means dividing each component in a vector by it’s length(norm).
Hat in Python: $$\hat{x}$$ is written as:
x = [1, 2, 3]
length = math.sqrt(sum([e**2 for e in x]))
x_hat = [e/length for e in x]
This makes the magnitude of the vector 1 and only keeps the direction:
math.sqrt(sum([e**2 for e in x_hat]))
# 1.0
It gives the sum of the products of the corresponding entries of the two sequences of numbers.
Dot Product in Python: $$X.Y$$ is written as:
X = [1, 2, 3]
Y = [4, 5, 6]
dot = sum([i*j for i, j in zip(X, Y)])
# 1*4 + 2*5 + 3*6
# 32
It means multiplying the corresponding elements in two tensors. In Python, this would be equivalent to multiplying the corresponding elements in two lists.
Element wise multiplication in Python: $$z=x\odot y$$ is written as:
x = [[1, 2],
[3, 4]]
y = [[2, 2],
[2, 2]]
z = np.multiply(x, y)
and results in:
[[2, 4]],
[[6, 8]]
This is basically exchanging the rows and columns.
Transpose in Python: $$X^T$$ is written as:
X = [[1, 2, 3],
[4, 5, 6]]
np.transpose(X)
and results in:
[[1, 4],
[2, 5],
[3, 6]]
This denotes a function which takes a domain X and maps it to range Y. In Python, it’s equivalent to taking a pool of values X, doing some operation on it to calculate pool of values Y.
Function in Python: $$f:X \rightarrow Y$$ is written as:
def f(X):
Y = ...
return Y
Using R instead of X or Y means we're dealing with real numbers:
$$f:R \rightarrow R$$
then, R^2 means we're dealing with d-dimensional vector of real numbers (in this case, example of d=2 is X = [1,2]
This symbol checks if an element is part of a set.
Epilson in Python: $$3 \epsilon X$$ is written as:
X = {1,2,3}
3 in X
The norm is used to calculate the magnitude of a vector. In Python, this means squaring each element of an array, summing them and then taking the square root.
Norm of vector in Python (it's like Pythagorean theorem): $$| x|$$ is written as:
x = [1, 2, 3]
math.sqrt(x[0]**2 + x[1]**2 + x[2]**2)
This symbol denotes the absolute value of a number i.e. without a sign.
Absolute value in Python: $$|x|$$ is written as:
x = 10
y = -20
abs(x) #10
abs(y) #20
In Python, it is equivalent to looping over a vector from index 0 to index N-1 and multiplying them.
PI in Python is the same as sigma, but you multiply (*) the numbers inside the for loop. $$\prod_{i=1}^Nx^i$$
we reuse the sigma notation and divide by the number of elements to get an average.
Average in Python: $$\frac{1}{N}\sum_{i=1}^Nx_i$$ is written as:
x = [1, 2, 3, 4, 5]
result = 0
N = len(x)
for i in range(n):
result = result + x[i]
average = result / N
print(average)
or it can be shortened:
x = [1, 2, 3, 4, 5]
result = sum(x) / len(x)
In Python, it is equivalent to looping over a vector from index 0 to index N-1
Sigma in Python: $$\sum_{i=1}^Nx_i$$ is written as:
x = [1, 2, 3, 4, 5]
result = 0
N = len(x)
for i in range(N):
result = result + x[i]
print(result)
or it can be shortened:
x = [1, 2, 3, 4, 5]
result = sum(x)
2D vectors
2D vectors in Python: $$x_{ij}$$ are written as:
x = [ [10, 20, 30], [40, 50, 60] ]
i = 0
j = 1
print(x[i][j]) # 20
1–9–90 rule (sometimes 90–9–1 principle or the 89:10:1 ratio),[1] which states that in a collaborative website such as a wiki, 90% of the participants of a community only consume content, 9% of the participants change or update content, and 1% of the participants add content.
1% rule = 1% of the users create and 99% watch the content.
1-9-90 rule = 1% create, 9% modify and 90% watch
Another nice SQL script paired with CRON jobs was the one that reminded people of carts that was left for more than 48 hours. Select from cart where state is not empty and last date is more than or equal to 48hrs.... Set this as a CRON that fires at 2AM everyday, period with less activity and traffic. People wake up to emails reminding them about their abandoned carts. Then sit watch magic happens. No AI/ML needed here. Just good 'ol SQL + Bash.
Another example of using SQL + CRON job + Bash to remind customers of cart that was left (again no ML needed here)
I will write a query like select from order table where last shop date is 3 or greater months. When we get this information, we will send a nice "we miss you, come back and here's X Naira voucher" email. The conversation rate for this one was always greater than 50%.
Sometimes SQL is much more than enough (you don't need ML)
This volume of paper should be the same as the coaxial plug of paper on the roll.
Calculating volume of the paper roll: $$\mathbf{Lwt = \pi w(R^2 - r^2)} \~\ L = \text{length of the paper} \ w = \text{width of the paper} \ t = \text{thickness} \ R = \text{outer radius} \ r = \text{inner radius}$$ And that simplifies into a formula for R: $$\color{red} {\bf R = \sqrt{\frac{Lt}{\pi}+r^2}}$$
This shows the nonlinear relationship and how the consumption accelerates. The first 10% used makes just a 5% change in the diameter of the roll. The last 10% makes an 18.5% change.
Consumption of a toilet paper roll has a nonlinear relationship between the:

Toilet paper is typically supplied in rolls of perforated material wrapped around a central cardboard tube. There’s a little variance between manufacturers, but a typical roll is approximately 4.5” wide with an 5.25” external diameter, and a central tube of diameter 1.6” Toilet paper is big business (see what I did there?) Worldwide, approximately 83 million rolls are produced per day; that’s a staggering 30 billion rolls per year. In the USA, about 7 billion rolls a year are sold, so the average American citizen consumes two dozen rolls a year (two per month). Americans use 24 rolls per capita a year of toilet paper Again, it depends on the thickness and luxuriousness of the product, but the perforations typically divide the roll into approximately 1,000 sheets (for single-ply), or around 500 sheets (for double-ply). Each sheet is typically 4” long so the length of a (double-ply) toilet roll is approximately 2,000” or 167 feet (or less, if your cat gets to it).
Statistics on the type and use of toilet paper in the USA.
1" (inch) = 2.54 cm
In the interval scale, there is no true zero point or fixed beginning. They do not have a true zero even if one of the values carry the name “zero.” For example, in the temperature, there is no point where the temperature can be zero. Zero degrees F does not mean the complete absence of temperature. Since the interval scale has no true zero point, you cannot calculate Ratios. For example, there is no any sense the ratio of 90 to 30 degrees F to be the same as the ratio of 60 to 20 degrees. A temperature of 20 degrees is not twice as warm as one of 10 degrees.
Interval data:
As the interval scales, Ratio scales show us the order and the exact value between the units. However, in contrast with interval scales, Ratio ones have an absolute zero that allows us to perform a huge range of descriptive statistics and inferential statistics. The ratio scales possess a clear definition of zero. Any types of values that can be measured from absolute zero can be measured with a ratio scale. The most popular examples of ratio variables are height and weight. In addition, one individual can be twice as tall as another individual.
Ratio data is like interval data, but with:
Javascript, APIs and Markup — this stack is all about finding middleground from the chaos of SSR+SPA. It is about stepping back and asking yourself, what parts of my page change and what parts don’t change?
JavaScript, APIs and Markup (JAM Stack) - middleground between SSR + SPA.
Advantages:
Somewhere on this path to render pages on the fly (SSR) and render pages on the client (SPA) we forgot about the performance of our webpages. We were trying to build apps. But the web is about presenting content first and foremost!
Website performance break with Client-side Rendering (SSR) and Single-page App (SPA)
We were not satisfied with the basic capabilities like bold and italics so we built CSS. Now, we wanted to modify some parts of the HTML/CSS in response to things like clicking things, so we implemented a scripting language to quickly specify such relations and have then run within the browser itself instead of a round trip to the server.
Birth of CSS - advanced styling
(history of websites)
And so was born PHP, it feels like a natural extension to HTML itself. You write your code between your HTML file itself and then be able to run those parts on the server, which further generate HTML and the final HTML gets send to the browser.This was extremely powerful. We could serve completely different pages to different users even though all of them access the same URL like Facebook. We could use a database on a server and store some data there, then based on some conditions use this data to modify the generated HTML and technically have an infinite number of pages available to serve (e-commerce).
Birth of PHP - way to serve different content under the same URL
TL;DR;
Don't use:
Use:
Don’t use iterators, prefer js higher-order functions instead of for / for..in
// bad
const increasedByOne = [];
for (let i = 0; i < numbers.length; i++) {
increasedByOne.push(numbers[i] + 1);
}
// good
const increasedByOne = [];
numbers.forEach((num) => {
increasedByOne.push(num + 1);
});
Ternaries should not be nested and generally be single line expressions.
e.g.
const foo = maybe1 > maybe2 ? 'bar' : maybeNull;
Also:
// bad
const foo = a ? a : b;
const bar = c ? true : false;
const baz = c ? false : true;
// good
const foo = a || b;
const bar = !!c;
const baz = !c;
Use === and !== over == and !=.
Bases on the Airbnb guide
Use JSDOC https://jsdoc.app/about-getting-started.html format.
Standardise your JavaScript comments:
/** This is a description of the foo function. */
function foo() {
}
When JavaScript encounters a line break without a semicolon, it uses a set of rules called Automatic Semicolon Insertion to determine whether or not it should regard that line break as the end of a statement, and (as the name implies) place a semicolon into your code before the line break if it thinks so. ASI contains a few eccentric behaviors, though, and your code will break if JavaScript misinterprets your line break.
Better place a semicolon (;) in the right place and do not rely on the Automatic Semicolon Insertion
e.g.
const luke = {};
const leia = {};
[luke, leia].forEach((jedi) => {
jedi.father = 'vader';
});
```
Wykazali, iż u szczurów znajdujących się na diecie o niskiej zawartości tłuszczu wolniej dochodzi do niekorzystnych zmian strukturalnych i genetycznych w obrębie poszczególnych tkanek. Gryzonie, które jadły mniej, dłużej zachowywały młodość.
Low-calorie diet reduces inflammation, delays the onset of old age diseases, and generally prolongs life.
If sentences contain eight words or less, readers understand 100 percent of the information.If sentences contain 43 words or longer, the reader’s comprehension drops to less than 10 percent.
<= 8 words <--- 100% understanding
.>= 43 words <--- <10% understanding
from Docker Compose on a single machine, to Heroku and similar systems, to something like Snakemake for computational pipelines.
Other alternatives to Kubernetes:
if what you care about is downtime, your first thought shouldn’t be “how do I reduce deployment downtime from 1 second to 1ms”, it should be “how can I ensure database schema changes don’t prevent rollback if I screw something up.”
Caring about downtime
The features Kubernetes provides for reliability (health checks, rolling deploys), can be implemented much more simply, or already built-in in many cases. For example, nginx can do health checks on worker processes, and you can use docker-autoheal or something similar to automatically restart those processes.
Kubernetes' health checks can be replaced with nginx on worker processes + docker-autoheal to automatically restart those processes
Scaling for many web applications is typically bottlenecked by the database, not the web workers.
Kubernetes might be useful if you need to scale a lot. But let’s consider some alternatives
Kubernetes alternatives:
Distributed applications are really hard to write correctly. Really. The more moving parts, the more these problems come in to play. Distributed applications are hard to debug. You need whole new categories of instrumentation and logging to getting understanding that isn’t quite as good as what you’d get from the logs of a monolithic application.
Microservices stay as a hard nut to crack.
They are fine for an organisational scaling technique: when you have 500 developers working on one live website (so they can work independently). For example, each team of 5 developers can be given one microservice
you need to spin up a complete K8s system just to test anything, via a VM or nested Docker containers.
You need a complete K8s to run your code, or you can use Telepresence to code locally against a remote Kubernetes cluster
“Kubernetes is a large system with significant operational complexity. The assessment team found configuration and deployment of Kubernetes to be non-trivial, with certain components having confusing default settings, missing operational controls, and implicitly defined security controls.”
Deployment of Kubernetes is non-trivial
Before you can run a single application, you need the following highly-simplified architecture
Before running the simplest Kubernetes app, you need at least this architecture:

the Kubernetes codebase has significant room for improvement. The codebase is large and complex, with large sections of code containing minimal documentation and numerous dependencies, including systems external to Kubernetes.
As of March 2020, the Kubernetes code base has more than 580 000 lines of Go code
Kubernetes has plenty of moving parts—concepts, subsystems, processes, machines, code—and that means plenty of problems.
Kubernetes might be not the best solution in a smaller team
That makes sense, the new file gets created in the upper directory.
If you add a new file, such as with:
$ echonew file> merged/new_file
It will be created in the upper directory
Combining the upper and lower directories is pretty easy: we can just do it with mount!
Combining lower and upper directories using mount:
$ sudo mount -t overlay overlay
-o lowerdir=/home/bork/test/lower,upperdir=/home/bork/test/upper,workdir=/home/bork/test/work
/home/bork/test/merged
Overlay filesystems, also known as “union filesystems” or “union mounts” let you mount a filesystem using 2 directories: a “lower” directory, and an “upper” directory.
Docker doesn't make copies of images, but instead uses an overlay.
Overlay filesystems, let you mount a system using 2 directories:
When a process:
Using Facebook ads, the researchers recruited 2,743 users who were willing to leave Facebook for one month in exchange for a cash reward. They then randomly divided these users into a Treatment group, that followed through with the deactivation, and a Control group, that was asked to keep using the platform.
The effects of not using Facebook for a month:
In addition to the results on creativity, caffeine did not significantly affect working memory, but test subjects who took it did report feeling less sad.
Coffee also:
.+ happier
While the cognitive benefits of caffeine — increased alertness, improved vigilance, enhanced focus and improved motor performance — are well established, she said, the stimulant’s affect on creativity is less known.
Coffee: .+ alertness
.+ vigilance
.+ focus
.+ motor performance
? creativity
Caffeine increases the ability to focus and problem solve, but a new study by a University of Arkansas researcher indicates it doesn’t stimulate creativity.
Python is used, however, in researching new drug molecules, simulations.
Python has a slight input into pharma, but it cannot outbeat SAS and R
SAS is also industry standard in other areas, like: military, energy, banking, econometrics.
SAS is not only an industry standard in pharma
The new virus is genetically 96% identical to a known coronavirus in bats and 86-92% identical to a coronavirus in pangolin. Therefore, the transmission of a mutated virus from animals to humans is the most likely cause of the appearance of the new virus.
Source of COVID-19
When asked why Wuhan was so much higher than the national level, the Chinese official replied that it was for lack of resources, citing as an example that there were only 110 critical care beds in the three designated hospitals where most of the cases were sent.
Wuhan's rate then = 4.9%
National rate = 2.1%
80% of food in the UK is imported. How about your country? Your town?
Food for thought
The South Korean government is delivering food parcels to those in quarantine. Our national and local governments need to quickly organise the capacity and resources required to do this.Japanese schools are scheduled to be closed for march.
Food delivery in South Korea and closing schools in Japan
The limited availability of beds in Wuhan raised their mortality rate from 0.16% to 4.9%This is why the Chinese government built a hospital in a week. Are our governments capable of doing the same?
Case of Wuhan
The UK population is 67 million people, that’s 5.4 million infected.Currents predictions are that 80% of the cases will be mild.If 20% of those people require hospitalization for 3–6 weeks?That’s 1,086,176 People.Do you know how many beds the NHS has?140,000
There will be a great lack of beds
Evolving to be observant of direct dangers to ourselves seems to have left us terrible at predicting second and third-order effects of events.When worrying about earthquakes we think first of how many people will die from collapsing buildings and falling rubble. Do we think of how many will die due to destroyed hospitals?
Thinking of second and third-order effects of events
Can you guess the number of people that have contracted the flu this year that needed hospitalisation in the US? 0.9%
0.9% of flu cases that required hospitalisation vs 20% of COVID-19
The UK has 2.8 million people over the age of 85.The US has 12.6 million people over the Age of 80.Trump told people not to worry because 60,000 people a year die of the flu. If just 25% of the US over 80’s cohort get infected, given current mortality rates that’s 466,200 deaths in that age group alone with the assumption that the healthcare system has the capacity to handle all of the infected.
Interesting calculation of probabilistic deaths of people > 80. Basically, with at least 25% infected people in the US who are > 80, we might have almost x8 more deaths than by flu
in XML the ordering of elements is significant. Whereas in JSON the ordering of the key-value pairs inside objects is meaningless and undefined
XML vs JSON (ordering elements)
XML is a document markup language; JSON is a structured data format, and to compare the two is to compare apples and oranges
XML vs JSON
dictionary (a piece of structured data) can be converted into n different possible documents (XML, PDF, paper or otherwise), where n is the number of possible permutations of the elements in the dictionary
Dictionary
The correct way to express a dictionary in XML is something like this
Correct dictionary in XML:
<root>
<item>
<key>Name</key>
<value>John</value>
</item>
<item>
<key>City</key>
<value>London</value>
</item>
</root>
A particular strength of JSON is its support for nested data structures
JSON can facilitate arrays, such as:
"favorite_movies": [ "Diehard", "Shrek" ]
JSON’s origins as a subset of JavaScript can be seen with how easily it represents key/value object data. XML, on the other hand, optimizes for document tree structures, by cleanly separating node data (attributes) from child data (elements)
JSON for key/value object data
XML for document tree structures (clearly separating node data (attributes) from child data (elements))
The advantages of XML over JSON for trees becomes more pronounced when we introduce different node types. Assume we wanted to introduce departments into the org chart above. In XML, we can just use an element with a new tag name
JSON is well-suited for representing lists of objects with complex properties. JSON’s key/value object syntax makes it easy. By contrast, XML’s attribute syntax only works for simple data types. Using child elements to represent complex properties can lead to inconsistencies or unnecessary verbosity.
JSON works well for list of objects with complex properties. XML not so much
UI layouts are represented as component trees. And XML is ideal for representing tree structures. It’s a match made in heaven! In fact, the most popular UI frameworks in the world (HTML and Android) use XML syntax to define layouts.
XML works great for displaying UI layouts
XML may not be ideal to represent generic data structures, but it excels at representing one particular structure: the tree. By separating node data (attributes) from parent/child relationships, the tree structure of the data shines through, and the code to process the data can be quite elegant.
XML is good for representing tree structured data
Here is a high level comparison of the tools we reviewed above:
Comparison of Delta Lake, Apache Iceberg and Apache Hive:

To address Hadoop’s complications and scaling challenges, Industry is now moving towards a disaggregated architecture, with Storage and Analytics layers very loosely coupled using REST APIs.
Things used to address Hadoop's lacks
Hive is now trying to address consistency and usability. It facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage.
Apache Hive offers:
Delta Lake is an open-source platform that brings ACID transactions to Apache Spark™. Delta Lake is developed by Spark experts, Databricks. It runs on top of your existing storage platform (S3, HDFS, Azure) and is fully compatible with Apache Spark APIs.
Delta Lake offers:
Apache Iceberg is an open table format for huge analytic data sets. Iceberg adds tables to Presto and Spark that use a high-performance format that works just like a SQL table. Iceberg is focussed towards avoiding unpleasant surprises, helping evolve schema and avoid inadvertent data deletion.
Apache Iceberg offers:
in this disaggregated model, users can choose to use Spark for batch workloads for analytics, while Presto for SQL heavy workloads, with both Spark and Presto using the same backend storage platform.
Disaggregated model allows more flexible choice of tools
These projects sit between the storage and analytical platforms and offer strong ACID guarantees to the end user while dealing with the object storage platforms in a native manner.
Solutions to the disaggregated models:
rise of Hadoop as the defacto Big Data platform and its subsequent downfall. Initially, HDFS served as the storage layer, and Hive as the analytics layer. When pushed really hard, Hadoop was able to go up to few 100s of TBs, allowed SQL like querying on semi-structured data and was fast enough for its time.
Hadoop's HDFS and Hive became unprepared for even larger sets of data
dplyr in R also lets you use a different syntax for querying SQL databases like Postgres, MySQL and SQLite, which is also in a more logical order
We save all of this code, the ui object, the server function, and the call to the shinyApp function, in an R script called app.R
The same basic structure for all Shiny apps:
ui object.server function.shinyApp function.---> examples <---
ui
UI example of a Shiny app (check the code below)
server
Server example of a Shiny app (check the code below):
renderPlotI want to get the selected number of bins from the slider and pass that number into a python method and do some calculation/manipulation (return: “You have selected 30bins and I came from a Python Function”) inside of it then return some value back to my R Shiny dashboard and view that result in a text field.
Using Python scripts inside R Shiny (in 6 steps):
textOutput("textOutput") (after plotoutput()).output$textOutput <- renderText({
}].
library(reticulate).source_python() function will make Python available in R:

reticulate package to R Environment and sourced the script inside your R code.Hit run.
Currently Shiny is far more mature than Dash. Dash doesn’t have a proper layout tool yet, and also not build in theme, so if you are not familiar with Html and CSS, your application will not look good (You must have some level of web development knowledge). Also, developing new components will need ReactJS knowledge, which has a steep learning curve.
Shiny > Dash:
You can host standalone apps on a webpage or embed them in R Markdown documents or build dashboards. You can also extend your Shiny apps with CSS themes, Html widgets, and JavaScript actions.
Typical tools used for working with Shiny
You can either create a one R file named app.R and create two seperate components called (ui and server inside that file) or create two R files named ui.R and server.R

Vaex supports Just-In-Time compilation via Numba (using LLVM) or Pythran (acceleration via C++), giving better performance. If you happen to have a NVIDIA graphics card, you can use CUDA via the jit_cuda method to get even faster performance.
Tools supported by Vaex
virtual columns. These columns just house the mathematical expressions, and are evaluated only when required
Virtual columns
displaying a Vaex DataFrame or column requires only the first and last 5 rows to be read from disk
Vaex tries to go over the entire dataset with as few passes as possible
Why is it so fast? When you open a memory mapped file with Vaex, there is actually no data reading going on. Vaex only reads the file metadata
Vaex only reads the file metadata:
When filtering a Vaex DataFrame no copies of the data are made. Instead only a reference to the original object is created, on which a binary mask is applied
Filtering Vaex DataFrame works on reference to the original data, saving lots of RAM
If you are interested in exploring the dataset used in this article, it can be used straight from S3 with Vaex. See the full Jupyter notebook to find out how to do this.
Example of EDA in Vaex ---> Jupyter Notebook
Vaex is an open-source DataFrame library which enables the visualisation, exploration, analysis and even machine learning on tabular datasets that are as large as your hard-drive. To do this, Vaex employs concepts such as memory mapping, efficient out-of-core algorithms and lazy evaluations.
Vaex - library to manage as large datasets as your HDD, thanks to:
All wrapped in a Pandas-like API
The first step is to convert the data into a memory mappable file format, such as Apache Arrow, Apache Parquet, or HDF5
Before opening data with Vaex, we need to convert it into a memory mappable file format (e.g. Apache Arrow, Apache Parquet or HDF5). This way, 100 GB data can be load in Vaex in 0.052 seconds!
Example of converting CSV ---> HDF5.
The describe method nicely illustrates the power and efficiency of Vaex: all of these statistics were computed in under 3 minutes on my MacBook Pro (15", 2018, 2.6GHz Intel Core i7, 32GB RAM). Other libraries or methods would require either distributed computing or a cloud instance with over 100GB to preform the same computations.
Possibilities of Vaex
git config --global alias.s status
Replace git status with git s:
git config --global alias.s status
It will modify config in .gitconfig file.
Other set of useful aliases:
[alias]
s = status
d = diff
co = checkout
br = branch
last = log -1 HEAD
cane = commit --amend --no-edit
lo = log --oneline -n 10
pr = pull --rebase
You can apply them (^) with:
git config --global alias.s status
git config --global alias.d diff
git config --global alias.co checkout
git config --global alias.br branch
git config --global alias.last "log -1 HEAD"
git config --global alias.cane "commit --amend --no-edit"
git config --global alias.pr "pull --rebase"
git config --global alias.lo "log --oneline -n 10"
alias g=git
alias g=git
This command will let you type g s in your shell to check git status
The best commit messages I’ve seen don’t just explain what they’ve changed: they explain why
Proper commits:
If you use practices like pair or mob programming, don't forget to add your coworkers names in your commit messages
It's good to give a shout-out to developers who collaborated on the commit. For example:
$ git commit -m "Refactor usability tests.
>
>
Co-authored-by: name <name@example.com>
Co-authored-by: another-name <another-name@example.com>"
I'm fond of gitmoji commit convention. It lies on categorizing commits using emojies. I'm a visual person so it fits well to me but I understand this convention is not made for everyone.
You can add gitmojis (emojis) in your commits, such as:
:recycle: Make core independent from the git client (#171)
:whale: Upgrade Docker image version (#167)
which will transfer on GitHub/GitLab to:
♻️ Make core independent from the git client (#171)
🐳 Upgrade Docker image version (#167)
Separate subject from body with a blank line Limit the subject line to 50 characters Capitalize the subject line Do not end the subject line with a period Use the imperative mood in the subject line Wrap the body at 72 characters Use the body to explain what and why vs. how
7 rules of good commit messages.
Don’t commit directly to the master or development branches. Don’t hold up work by not committing local branch changes to remote branches. Never commit application secrets in public repositories. Don’t commit large files in the repository. This will increase the size of the repository. Use Git LFS for large files. Learn more about what Git LFS is and how to utilize it in this advanced Learning Git with GitKraken tutorial. Don’t create one pull request addressing multiple issues. Don’t work on multiple issues in the same branch. If a feature is dropped, it will be difficult to revert changes. Don’t reset a branch without committing/stashing your changes. If you do so, your changes will be lost. Don’t do a force push until you’re extremely comfortable performing this action. Don’t modify or delete public history.
Git Don'ts
Create a Git repository for every new project. Learn more about what a Git repo is in this beginner Learning Git with GitKraken tutorial. Always create a new branch for every new feature and bug. Regularly commit and push changes to the remote branch to avoid loss of work. Include a gitignore file in your project to avoid unwanted files being committed. Always commit changes with a concise and useful commit message. Utilize git-submodule for large projects. Keep your branch up to date with development branches. Follow a workflow like Gitflow. There are many workflows available, so choose the one that best suits your needs. Always create a pull request for merging changes from one branch to another. Learn more about what a pull request is and how to create them in this intermediate Learning Git with GitKraken tutorial. Always create one pull request addressing one issue. Always review your code once by yourself before creating a pull request. Have more than one person review a pull request. It’s not necessary, but is a best practice. Enforce standards by using pull request templates and adding continuous integrations. Learn more about enhancing the pull request process with templates. Merge changes from the release branch to master after each release. Tag the master sources after every release. Delete branches if a feature or bug fix is merged to its intended branches and the branch is no longer required. Automate general workflow checks using Git hooks. Learn more about how to trigger Git hooks in this intermediate Learning Git with GitKraken tutorial. Include read/write permission access control to repositories to prevent unauthorized access. Add protection for special branches like master and development to safeguard against accidental deletion.
Git Dos
To add the .gitattributes to the repo first you need to create a file called .gitattributes into the root folder for the repo.
With such a content of .gitattributes:
*.js eol=lf
*.jsx eol=lf
*.json eol=lf
the end of line will be the same for everyone
On the Windows machine the default for the line ending is a Carriage Return Line Feed (CRLF), whereas on Linux/MacOS it's a Line Feed (LF).
Thar is why you might want to use .gitattributes to prevent such differences.
On Windows Machine if endOfLine property is set to lf
{
"endOfLine": "lf"
}
On the Windows machine the developer will encounter linting issues from prettier:

The above commands will now update the files for the repo using the newly defined line ending as specified in the .gitattributes.
Use these lines to update the current repo files:
git rm --cached -r .
git reset --hard