This might be super basic, but... assume positive intent.Your parent is not your enemy. Your teacher is not your enemy. Your boss is not your enemy. The other team at work is not your enemy.
Idea that changed your life: assuming positive intent
This might be super basic, but... assume positive intent.Your parent is not your enemy. Your teacher is not your enemy. Your boss is not your enemy. The other team at work is not your enemy.
Idea that changed your life: assuming positive intent
We present results from technical experiments which reveal that WeChat communications conducted entirely among non-China-registered accounts are subject to pervasive content surveillance that was previously thought to be exclusively reserved for China-registered accounts.
WeChat not only tracks Chinese accounts
So write with a combination of short, medium, and long sentences. Create a sound that pleases the reader’s ear. Don’t just write words. Write music.”
Try writing fewer commas and using more periods. Make your sentences simpler, vary their lengths and your sentences will be like music
You can create estimation plots here at estimationstats.com, or with the DABEST packages which are available in R, Python, and Matlab.
You can create estimation plots with:
Relative to conventional plots, estimation plots offer five key benefits:
Estimation plots > bars-and-stars or boxplot & P.
They:
For comparisons between 3 or more groups that typically employ analysis of variance (ANOVA) methods, one can use the Cumming estimation plot, which can be considered a variant of the Gardner-Altman plot.
Cumming estimation plot
Efron developed the bias-corrected and accelerated bootstrap (BCa bootstrap) to account for the skew whilst obtaining the central 95% of the distribution.
Bias-corrected and accelerated bootstrap (BCa boostrap) deals with skewed sample distributions. However; it must be noted that it "may not give very accurate coverage in a small-sample non-parametric situation" (simply said, take caution with small datasets)
We can calculate the 95% CI of the mean difference by performing bootstrap resampling.
Bootstrap - simple but powerful technique that creates multiple resamples (with replacement) from a single set of observations, and computes the effect size of interest on each of these resamples. It can be used to determine the 95% CI (Confidence Interval).
We can use bootstrap resampling to obtain measure of precision and confidence about our estimate. It gives us 2 important benefits:
Bootstrap resampling can be used for such an example:
Computers can easily perform 5000 resamples:
Shown above is a Gardner-Altman estimation plot.
Gardner-Altman estimation plot shows all the relevant information:
Jitter plots avoid overlapping datapoints (i.e. datapoints with the same y-value) by adding a random factor to each point along the orthogonal x-axes.
Jitter plots displays all datapoints but it might not accurately depict the underlying distribution of the data:
Unfortunately, the boxplot still doesn't show all our data.
Boxplots may be better than barplots (they introduce medians, quartiles, minima and maxima), but still doesn't show all the information:
The barplot has several shortcomings, even though its use in academic journals is endemic.
Barplots are not the best choice for data visualisation:
Continuous Delivery of Deployment is about running as thorough checks as you can to catch issues on your code. Completeness of the checks is the most important factor. It is usually measured in terms code coverage or functional coverage of your tests. Catching errors early on prevents broken code to get deployed to any environment and saves the precious time of your test team.
Continuous Delivery of Deployment (quick summary)
Continuous Integration is a trade off between speed of feedback loop to developers and relevance of the checks your perform (build and test). No code that would impede the team progress should make it to the main branch.
Continuous Integration (quick summary)
A good CD build: Ensures that as many features as possible are working properly The faster the better, but it is not a matter of speed. A 30-60 minutes build is OK
Good CD build
A good CI build: Ensures no code that breaks basic stuff and prevents other team members to work is introduced to the main branch Is fast enough to provide feedback to developers within minutes to prevent context switching between tasks
Good CI build
The idea of Continuous Delivery is to prepare artefacts as close as possible from what you want to run in your environment. These can be jar or war files if you are working with Java, executables if you are working with .NET. These can also be folders of transpiled JS code or even Docker containers, whatever makes deploy shorter (i.e. you have pre built in advance as much as you can).
Idea of Continuous Delivery
Continuous Delivery is about being able to deploy any version of your code at all times. In practice it means the last or pre last version of your code.
Continous Delivery
Continuous Integration is not about tools. It is about working in small chunks and integrating your new code to the main branch and pulling frequently.
Continuous Integration is not about tools
The app should build and start Most critical features should be functional at all times (user signup/login journey and key business features) Common layers of the application that all the developers rely on, should be stable. This means unit tests on those parts.
Things to be checked by Continous Integration
Continuous Integration is all about preventing the main branch of being broken so your team is not stuck. That’s it. It is not about having all your tests green all the time and the main branch deployable to production at every commit.
Continuous Integration prevents other team members from wasting time through a pull of faulty code
If knowledge of one item makes it harder to remember another item, we have a case of memory interference.
Memory interference - once it happens you need to detect it and eliminate
Here again are the twenty rules of formulating knowledge.
If you see a word and immediately check it's translation, you'll hardly memorize it at all. If you try your best to recall what the word means before checking the translation, the chances of memorization are much better. It is much easier to memorize a word's meaning when you know how to pronounce it corretly, so get it right.
2 important tips for learning languages:
Somewhere between too hard and too easy, there’s a sweet-spot where reviews are challenging enough to hold your interest, but not so hard that it feels like torture. When the challenge of reviews is just right, you’ll actually get a sense of accomplishment and a little jolt of dopamine as you do them. Our brains actually enjoy challenges as long as they aren’t too hard or too easy. As I see it, this level of challenge is where you want to be.
The sweet spot is between 80 - 90% of right answers
Researchers have found that reviews are more effective when they’re difficult. That is, if you have to work at remembering a card, it’ll have a stronger effect on your memory. The harder a review is, the more it boosts your memory. This is called “desirable difficulty” in the literature.
Desirable difficulty
Kolejna zmienna - "Recency", czyli informacja o tym, jak dawno klient robił zakupy w sklepie.
To calculate RFM we need recency value. Firstly, we shall specify the most recent transaction as "today" and then to find the latest transaction of a specific client:
df['Recency'] = (today - df.InvoiceDate)/np.timedelta64(1,'D')
Calculating frequency and aggregating data of each client may be done with the groupby
method:
abt = df.groupby(['CustomerID']).agg({'Recency':'min', 'MonetaryValue':'sum', 'InvoiceNo':'count'})
lastly, we can update the column names and display RFM data:
abt = df.groupby(['CustomerID']).agg({'Recency':'min', 'MonetaryValue':'sum', 'InvoiceNo':'count'})
abt.rename(columns = {'InvoiceNo':'Frequency'}, inplace = True)
abt = abt[['Recency', 'Frequency', 'MonetaryValue']]
abt.head()
Będę oceniać klientów kryteriami, jakie zakłada metoda RFM.
RFM is a method used for analyzing customer value:
In the RFM method we usually analyse only the last 12 months since this is the most relevant data of our products.
We also have other RFM variants:
Usuwam brakujące wartości w zmiennej "CustomerID".
Deleting rows where value is null:
df = df[~df.CustomerID.isnull()]
Assigning different data types to columns:
df['CustomerID'] = df.CustomerID.astype(int)
Deleting irrelevant columns:
df.drop(['Description', 'StockCode', 'Country'], axis = 1, inplace = True)
Sprawdzam braki danych.
Checking the % of missing data:
print(str(round(df.isnull().any(axis=1).sum()/df.shape[0]*100,2))+'% obserwacji zawiera braki w danych.')
Sample output:
24.93% obserwacji zawiera braki w danych.
Buduję prosty data frame z podstawowymi informacjami o zbiorze.
Building a simple dataframe with a summary of our columns (data types, sum and % of nulls):
summary = pd.DataFrame(df.dtypes, columns=['Dtype'])
summary['Nulls'] = pd.DataFrame(df.isnull().any())
summary['Sum_of_nulls'] = pd.DataFrame(df.isnull().sum())
summary['Per_of_nulls'] = round((df.apply(pd.isnull).mean()*100),2)
summary.Dtype = summary.Dtype.astype(str)
print(summary)
the output:
Dtype Nulls Sum_of_nulls Per_of_nulls
InvoiceNo object False 0 0.000
StockCode object False 0 0.000
Description object True 1454 0.270
Quantity int64 False 0 0.000
InvoiceDate datetime64[ns] False 0 0.000
UnitPrice float64 False 0 0.000
CustomerID float64 True 135080 24.930
Country object False 0 0.000
An animated step-by-step guide to improving your pie charts.
How to improve 3D pie charts (turn them back into horizontal bar graphs) ;)
If you, on the other hand, want to go the student-like route (living in Wohngemeinschaft, not eating out too much) and try to save, you can easily live on 1,500-2,000 CHF per month and save the majority of your salary.
If you leave cheaply, you can spend around 1500 - 2000 CHF a month and save majority of your salary
We are now cooperating with Credit Agricole Bank and Revolut - if you have already moved to Switzerland you can open a free bank account and get 100 CHF bonus - email us to get the bonus code.
100 CHF bonus for opening a bank account in Switzerland
120,000 CHF annually according to this calculator gets you 7,746.20 CHF net per month.
120 000 CHF gets you around 7 746 CHF net per month
2) Rent only a room - it might be a good option if you come without family (in Switzerland it’s called living in a Wohngemeinschaft).
Renting a room in Switzerland = Living in a Wohngemeinschaft :o
Choose health insurance (Krankenkasse) - in Switzerland you have to pay your health insurance separately (it’s not deducted from your salary). You can use the Comparis website to compare the options. You have 3 months to choose both the company and your franchise.
Choosing health insurance in Switzerland
Other important things - if you plan to use public transport, we recommend you to buy the Half Fare card. It gives you a 50% discount on most public transport in Switzerland (it costs 185 CHF per year).
Recommendation to buy a Half Fare Card for a public transport discount
There are also some general expat groups like Zurich Together
Zurich Together <--- expat group for Zurich
WHERE clause introduces a condition on individual rows; HAVING clause introduces a condition on aggregations, i.e. results of selection where a single result, such as count, average, min, max, or sum, has been produced from multiple rows. Your query calls for a second kind of condition (i.e. a condition on an aggregation) hence HAVING works correctly. As a rule of thumb, use WHERE before GROUP BY and HAVING after GROUP BY. It is a rather primitive rule, but it is useful in more than 90% of the cases.
Difference between WHERE
and HAVING
in SQL:
WHERE
is used for individual rowsHAVING
is used for aggregations (result of a selection), such as: COUNT
, AVERAGE
, MIN
, MAX
or SUM
WHERE
before GROUP BY
and HAVING
after GROUP BY
(works in 90% of the cases)Scrum means that “you have to get certain things done with those two weeks.” Kanban means “do what you can do in two weeks.”
If you get a choice, push for Kanban over Scrum
What people will say is that estimates are for planning – that their purpose is to figure out how long some piece of work is going to take, so that everybody can plan accordingly. In all my five years shipping stuff, I can only recall one project where things really worked that way.
Project estimations are just energy drainers and stress producers
Be explicit about the difference between hard deadlines
Different types of deadlines:
If you delegate all your IT security to the InfoSec, they will come up with draconian rules
Try to do some of your own security before delegating everything to InfoSec that will come with draconian restrictions
you should always advocate for having a dedicated SRE if there’s any real risk of after-hours pages that are out of your control.
Site Reliability Engineers (ideally from different time zones) should've been settled when we might expect after-hours errors
I try to write a unit test any time the expected value of a defect is non-trivial.
Write unit tests at least for the most important parts of code, but every chunk of code should have a trivial unit test around it – this verifies that the code is written in a testable way, which indeed is extremely important
I’m defining an integration test as a test where you’re calling code that you don’t own
When to write integration tests:
Which database technology to choose
Which database to choose (advice from an Amazon employee):
I would use a serverless function when I have a relatively small and simple chunk of code that needs to run every once in a while.
When to make a serverless function (advice from an Amazon employee)
Programming languages These will probably expose my ignorance pretty nicely.
When to use different programming languages (advice from an Amazon employee):
Now, a couple of years later, my guidelines for JS are:
Advices on using JavaScript from a long time programmer:
Consuming media (books, blogs, whatever) is not inherently a compounding thing. Only if you have some kind of method to reflect, to digest, to incorporate your knowledge into your thoughts. If there is anything valuable in this post, you, reader, will probably not benefit from it unless you do something active to “process” it immediately.
Consuming books/blogs is not as compounding as we think
if you have an amazing manager at a shit company you’ll still have a shit time. In some ways, it’ll actually be worse. If they’re good at their job (including retaining you), they’ll keep you at a bad company for too long. And then they’ll leave, because they’re smart and competent. Maybe they’ll take you with them.
Danger of working with a great manager at a shit company
Some of the people in the company are your friends in the current context. It’s like your dorm in college.
"Company is like a college dorm"... interesting comparison
It’s also okay to take risks. Staying at a company that’s slowly dying has its costs too. Stick around too long and you’ll lose your belief that you can build, that change is possible. Try not to learn the wrong habits.
Cons of staying too long in the same company
I do think when a lot of managers realized they’ve hit their peak or comfort level, they then start to focus on playing politics instead of delivering results to hold onto their position. These are also the kind of managers who would only hire people less capable than them, for fear of being replaced.
The way corporate world works
“In practice people gravitate to, hire and promote individuals they like to be around, not people who demand accountability.”
Everybody likes having an agreeable and flattering person around them
Dr. Peter advises creative incompetence — pretending to be incompetent but doing it in an area or manner where it doesn’t actually impair your work.
Creative incompetence
Dr. Peter also surmised that “super competent” people tend to “disrupt the hierarchy.” I suppose that’s a nice way of saying you’ve made your boss look bad by being more capable.In such situations, you’ll probably find yourself deliberately suppressed or edged out sooner or later — for some stupid reason or blame pushing.
Being overly competent may get you fired
So if you’re a highly competent and aggressive individual, it’s best you find yourself a job in a startup, be an entrepreneur, or work in a company that needs turning around.
Advice to competitive workers
Dr. Peter also had another interesting theory about getting promoted. He considered working hard and improving your skill sets not as effective as something called pull promotion. That’s when you get promoted — faster than usual — when a mentor or patron pulls you up.No wonder there’s so much butt kissing in the corporate world. They must have read Dr. Peter’s research from the ‘60s.
Pull promotion
competency doesn’t factor as much as likability in most corporate promotions, especially when the ship is smooth sailing.
Another truth of the corporate world
Find a results-oriented job if you’re fiercely independent and opinionated. Climb the ladder in a big corporation if you’re highly diplomatic or a crowd-pleaser.
Advice for two different working profiles
managers fail to see and address this problem is that they are used to looking at communication and assume it's a good thing. Because they see activity
Managers in general perceive meetings as a good thing
A study conducted by Gloria Marks, a Professor of Informatics at the University of California, revealed that it takes us an average of 23 minutes and 15 seconds to refocus on a task after an interruption, and even when we do, we experience a decrease in productivity
23 minutes and 15 seconds - average time to refocus on task after an interruption
It doesn't mean that we ignore all messages and only look up from our work when something is on fire – but the general expectation is that it's okay to not be immediately available to your teammates when you are focusing on your work
One of the rules of "Office time"
Working in an open office renders us even more vulnerable
Like single standup meeting, open office doesn't improve the productivity of makers
Office hours are chunks of time that makers set aside for meetings, while the rest of the time they are free to go into a Do Not Disturb mode
"Office hours" - technique to improve makers schedule
People think it’s efficient to distribute information all at the same time to a bunch of people around a room. But it’s actually a lot less efficient than distributing it asynchronously by writing it up and sending it out and letting people absorb it when they’re ready to so it doesn’t break their days into smaller bits.”
Async > meetings
it's a matter of culture. None of these rules would work if the management fails to see that makers need to follow a different schedule
Change in the work environment needs acknowledgement of managers
context switching between communication and creative work only kills the quality of both
Context switching lowers the quality
since most powerful people operate on the manager schedule, they're in a position to force everyone to adapt to their schedule
Managers highly affect makers schedule
The most straightforward way to address this is to build a team knowledge base. Not only does that minimize the number of repetitive questions bounced around the office, it allows new team members to basically onboard themselves.
Building a team knowledge base
almost no organizations today support maker schedules
Unfortunate truth
For managers, interruptions in the form of meetings, phone calls, and Slack notifications are normal. For someone on the maker schedule, however, even the slightest distraction can have a disruptive effect
How ideal schedule should look like:
Immediate response becomes the implicit expectation, with barely any barriers or restrictions in place
Why Slack is a great distraction:
in the absence of barriers convenience always wins
In our experience, the best way to prevent a useless meeting is to write up our goals and thoughts first. Despite working in the same office, our team at Nuclino has converted nearly all of our meetings into asynchronously written reports.
Weekly status report (example):
For many data scientists, the finished product of a work session is a business analysis. They need to show team members—who oftentimes aren’t technical—how their data became a specific recommendation or insight.
Usual final product in Data Science is the business analysis which is perfectly explained with notebooks
Work never ends. No matter how much you get done there will always be more. I see a lot of colleagues burn out because they think their extra effort will be noticed. Most managers appriciate it but do not promote their employees.
Common reality of overworking
Don't focus too much on the salary. It's just one tiny part of the whole package.Your dev job pays your rent, food and savings. I assume that most dev jobs do this quite well.Beyond this, the main goal of a job is to increase your future market value, your professional network and to have fun. So. basically it's about how much you are worth in your next job and that you enjoy your time.A high salary doesn't help you if you do stuff which doesn't matter in a few years.
Don't focus on the salary in your dev job.
COVID-19 has spurred a shift to analyze things like supply chain disruptions, speech analytics, and filtering out pandemic-related behavior, such as binge shopping, Burtch Works says. Data teams are being asked to create new simulations for the coming recession, to create scorecards to track pandemic-related behavior, and add COVID-19 control variables to marketing attribution models, the company says.
How COVID-19 impacts activities of data positions
Data scientists and data engineers have built-in job security relative to other positions as businesses transition their operations to rely more heavily on data, data science, and AI. That’s a long-term trend that is not likely to change due to COVID-19, although momentum had started to slow in 2019 as venture capital investments ebbed.
According to a Dice Tech Jobs report released in February, demand for data engineers was up 50% and demand for data scientists was up 32% in 2019 compared to the prior year.
Need for Data Scientist / Engineers in 2019 vs 2018
70% async using Twist, Github, Paper25% sync using something like Zoom, Appear.in, or Google Meet5% physical meetings, e.g., annual company or team retreats
Currently applied work structure at Doist
According to the Harvard Business Review article “Collaborative Overload”, the time employees spend on collaboration has increased by 50% over the past two decades. Researchers found it was not uncommon for workers to spend a full 80% of their workdays communicating with colleagues in the form of email (on which workers’ spend an average of six hours a day); meetings (which fill up 15 percent of a company’s time, on average); and more recently instant messaging apps (the average Slack user sends an average of 200 messages a day, though 1,000-message power users are “not the exception”)
Time spent in the office
we think the async culture is one of the core reasons why most of the people we’ve hired at Doist the past 5 years have stayed with us. Our employee retention is 90%+ — much higher than the overall tech industry. For example, even a company like Google — with its legendary campuses full of perks from free meals to free haircuts — has a median tenure of just 1.1 years. Freedom to work from anywhere at any time beats fun vanity perks any day, and it costs our company $0 to provide
Employee retention rate at Doist vs Google
I also recently took about 10 months off of work, specifically to focus on learning. It was incredible, and I don’t regret it financially. I would often get up at 6 in the morning or even earlier (which I never do) just from excitement about what I was going to learn about and accomplish in the day. Spending my time focused Only on what I was most interested in was incredibly rewarding.
Approach of taking 10 months off from work just to learn something new
I'm working for myself right now, but if one day I needed to go get a full-time job again, I would almost certainly not go to big tech again. I'd rather get paid a fifth of what I was doing, but do something that leaves me with some energy after I put in a day's work
Reflections after working for FAANG
more money comes at the cost of very high expectations and brutal deadlines
Second, in my experience working with ex-FAANG - these engineers, while they all tend to be very smart, tend to be borderline junior engineers in the real world. They simply don't know how to build or operate something without the luxury of the mature tooling that exists at FAANG. You may be in for a reality shock when you leave the FAANG bubble
Working with engineers out of FAANG can be surprising
Things to consider when crafting out Job Ads and descriptions
Truth be told, we found that most companies we worked with preferred to own the analytical backend.
From the experience of Plotly Team
Talented people flock to employers that promise to invest in their development whether they will stay at the company or not.
Cannot agree more on that
We want to learn, but we worry that we might not like what we learn. Or that learning will cost us too much. Or that we will have to give up cherished ideas.
I believe it is normal to worry about the usage of a new domain-based knowledge
The two things I really like about working for smaller places or starting a company is you get very direct access to users and customers and their problems, which means you can actually have empathy for what's actually going on with them, and then you can directly solve it. That cycle is so powerful, the sooner you learn how to make that cycle happen in your career, the better off you'll be. If you can make software and make software for other people, the outcome truly is hundreds of millions of dollars worth of value if you get it right. That's where I'm here to try and encourage you to do. I'm not really saying that you shouldn't go work at a big tech company. I am saying you should probably leave before it makes you soft.
What are the benefits of working at the smaller companies/startups over the tech giants
afternoons are spent reading/researching/online classes.This has really helped me avoid burn out. I go into the weekend less exhausted and more motivated to return on Monday and implement new stuff. It has also helped generate some inspiration for weekend/personal projects.
Learning at work as solution to burn out and inspiration for personal projects
When People Work Together
How to lay off your lovely co-workers
Praca w Facebooku - doskonała znajomość JSa, React, zarządzanie projektem OSS na GitHubie, prowadzenie społeczności, pisanie dokumentacji i wpisów na blogu.Szkolenia - dobra znajomość JSa, React, tworzenie szkoleń (struktura, zadania, itd), uczenie i swobodne przekazywanie wiedzy, marketing, sprzedaż.Startupy - dobra znajomość JSa, React, praca w zespole, rozmawianie z klientami, analiza biznesowa, szybkie dowożenie MVP, praca w stresie i dziwnych strefach czasowych.
Examples of restructuring tasks into more precise actions:
Defining what “time well spent” means to you and making space for these moments is one of the greatest gifts you can make to your future self.
Think really well what "time well spent" means to you
Research shows that humans tend to do whatever it takes to keep busy, even if the activity feels meaningless to them. Dr Brené Brown from the University of Houston describes being “crazy busy” as a numbing strategy we use to avoid facing the truth of our lives.
People simply prefer to be busy
A few takeaways
Summarising the article:
Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true.
There is increasing concern that most current published research findings are false.
The probability that the research is true may depend on:
Research finding is less likely to be true when:
golden rule: If someone calls saying they’re from your bank, just hang up and call them back — ideally using a phone number that came from the bank’s Web site or from the back of your payment card.
Golden rule of talking to your bank
“When the representative finally answered my call, I asked them to confirm that I was on the phone with them on the other line in the call they initiated toward me, and so the rep somehow checked and saw that there was another active call with Mitch,” he said. “But as it turned out, that other call was the attackers also talking to my bank pretending to be me.”
Phishing situation scenario:
It is difficult to choose a typical reading speed, research has been conducted on various groups of people to get typical rates, what you regularly see quoted is: 100 to 200 words per minute (wpm) for learning, 200 to 400 wpm for comprehension.
On average people read:
DevOps tools enable DevOps in organizations
Common DevOps tools:
While talking about DevOps, three things are important continuous integration, continuous deployment, and continuous delivery.
DevOps process
another version of the image:
and one more:
Basic prerequisites to learn DevOps
Basic prerequisites to learn DevOps:
DevOps benefits
DevOps benefits:
Operations in the software industry include administrative processes and support for both hardware and software for clients as well as internal to the company. Infrastructure management, quality assurance, and monitoring are the basic roles for operations.
Operations (1/2 of DevOps):
I set it with a few clicks at Travis CI, and by creating a .travis.yml file in the repo
You can set CI with a few clicks using Travis CI and creating a .travis.yml file in your repo:
language: node_js
node_js: node
before_script:
- npm install -g typescript
- npm install codecov -g
script:
- yarn lint
- yarn build
- yarn test
- yarn build-docs
after_success:
- codecov
I set it with a few clicks at Travis CI, and by creating a .travis.yml file in the repo
You can set CI with a few clicks using Travis CI and creating a .travis.yml file in your repo:
language: node_js
node_js: node
before_script:
- npm install -g typescript
- npm install codecov -g
script:
- yarn lint
- yarn build
- yarn test
- yarn build-docs
after_success:
- codecov
Continuous integration makes it easy to check against cases when the code: does not work (but someone didn’t test it and pushed haphazardly), does work only locally, as it is based on local installations, does work only locally, as not all files were committed.
CI - Continuous Integration helps to check the code when it :
In Python, when trying to do a dubious operation, you get an error pretty soon. In JavaScript… an undefined can fly through a few layers of abstraction, causing an error in a seemingly unrelated piece of code.
Undefined nature of JavaScript can hide an error for a long time. For example,
function add(a,b) { return + (a + b) }
add(2,2)
add('2', 2)
will result in a number, but is it the same one?
With Codecov it is easy to make jest & Travis CI generate one more thing:
Codecov lets you generate a score on your tests:
I would use ESLint in full strength, tests for some (especially end-to-end, to make sure a commit does not make project crash), and add continuous integration.
Advantage of tests
It is fine to start adding tests gradually, by adding a few tests to things that are the most difficult (ones you need to keep fingers crossed so they work) or most critical (simple but with many other dependent components).
Start small by adding tests to the most crucial parts
I found that the overhead to use types in TypeScript is minimal (if any).
In TypeScript, unlike in JS we need to specify the types:
I need to specify types of input and output. But then I get speedup due to autocompletion, hints, and linting if for any reason I make a mistake.
In TypeScript, you spend a bit more time in the variable definition, but then autocompletion, hints, and linting will reward you. It also boosts code readability
TSDoc is a way of writing TypeScript comments where they’re linked to a particular function, class or method (like Python docstrings).
TSDoc <--- TypeScript comment syntax. You can create documentation with TypeDoc
ESLint does automatic code linting
ESLint <--- pluggable JS linter:
if (x = 5) { ... })
Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.
According to the Kernighan's Law, writing code is not as hard as debugging
Write a new test and the result. If you want to make it REPL-like, instead of writing console.log(x.toString()) use expect(x.toString()).toBe('') and you will directly get the result.
jest <--- interactive JavaScript (TypeScript and others too) testing framework. You can use it as a VS Code extension.
Basically, instead of console.log(x.toString())
, you can use except(x.toString()).toBe('')
. Check this gif to understand it further
interactive notebooks fall short when you want to write bigger, maintainable code
Survey regarding programming notebooks:
Continuous Deployment is the next step. You deploy the most up to date and production ready version of your code to some environment. Ideally production if you trust your CD test suite enough.
Continuous Deployment
the limitations of the PPS
Limitations of the PPS:
Although the PPS has many advantages over the correlation, there is some drawback: it takes longer to calculate.
PPS is slower to calculate the correlation.
How to use the PPS in your own (Python) project
Using PPS with Python
pip install ppscore
shellimport ppscore as pps
pps.score(df, "feature_column", "target_column")
pps.matrix(df)
The PPS clearly has some advantages over correlation for finding predictive patterns in the data. However, once the patterns are found, the correlation is still a great way of communicating found linear relationships.
PPS:
Let’s compare the correlation matrix to the PPS matrix on the Titanic dataset.
Comparing correlation matrix and the PPS matrix of the Titanic dataset:
findings about the correlation matrix:
TicketPrice
and Class
. For PPS, it's a strong predictor (0.9 PPS), but not the other way Class
to TicketPrice
(ticket of 5000-10000$ is most likely the highest class, but the highest class itself cannot determine the price)findings about the PPS matrix:
Survived
is the column Sex
(Sex
was dropped for correlation)TicketID
uncovers a hidden pattern as well as it's connection with the TicketPrice
Let’s use a typical quadratic relationship: the feature x is a uniform variable ranging from -2 to 2 and the target y is the square of x plus some error.
In this scenario:
how do you normalize a score? You define a lower and an upper limit and put the score into perspective.
Normalising a score:
For a classification problem, always predicting the most common class is pretty naive. For a regression problem, always predicting the median value is pretty naive.
What is a naive model:
Let’s say we have two columns and want to calculate the predictive power score of A predicting B. In this case, we treat B as our target variable and A as our (only) feature. We can now calculate a cross-validated Decision Tree and calculate a suitable evaluation metric.
If the target (B) variable is:
More often, relationships are asymmetric
a column with 3 unique values will never be able to perfectly predict another column with 100 unique values. But the opposite might be true
there are many non-linear relationships that the score simply won’t detect. For example, a sinus wave, a quadratic curve or a mysterious step function. The score will just be 0, saying: “Nothing interesting here”. Also, correlation is only defined for numeric columns.
Correlation:
Examples:
There are many types of CRDTs
CRDTs have different types, such as Grow-only set and Last-writer-wins register. Check more of them here
Some of our main takeaways:CRDT literature can be relevant even if you're not creating a decentralized systemMultiplayer for a visual editor like ours wasn't as intimidating as we thoughtTaking time to research and prototype in the beginning really paid off
Key takeaways of developing a live editing tool
traditional approaches that informed ours — OTs and CRDTs
Traditional approaches of the multiplayer technology
CRDTs refer to a collection of different data structures commonly used in distributed systems. All CRDTs satisfy certain mathematical properties which guarantee eventual consistency. If no more updates are made, eventually everyone accessing the data structure will see the same thing. This constraint is required for correctness; we cannot allow two clients editing the same Figma document to diverge and never converge again
CRDTs (Conflict-free Replicated Data Types)
They’re a great way of editing long text documents with low memory and performance overhead, but they are very complicated and hard to implement correctly
Characteristics of OTs
Even if you have a client-server setup, CRDTs are still worth researching because they provide a well-studied, solid foundation to start with
CRDTs are worth studying for a good foundation
Figma’s multiplayer servers keep track of the latest value that any client has sent for a given property on a given object
✅ No conflict:
❎ Conflict:
Figma doesn’t store any properties of deleted objects on the server. That data is instead stored in the undo buffer of the client that performed the delete. If that client wants to undo the delete, then it’s also responsible for restoring all properties of the deleted objects. This helps keep long-lived documents from continuing to grow in size as they are edited
Undo option
it's important to be able to iterate quickly and experiment before committing to an approach. That's why we first created a prototype environment to test our ideas instead of working in the real codebase
First work with a prototype, then the real codebase
Designers worried that live collaborative editing would result in “hovering art directors” and “design by committee” catastrophes.
Worries of using a live collaborative editing
We had a lot of trouble until we settled on a principle to help guide us: if you undo a lot, copy something, and redo back to the present (a common operation), the document should not change. This may seem obvious but the single-player implementation of redo means “put back what I did” which may end up overwriting what other people did next if you’re not careful. This is why in Figma an undo operation modifies redo history at the time of the undo, and likewise a redo operation modifies undo history at the time of the redo
Undo/Redo working
operational transforms (a.k.a. OTs), the standard multiplayer algorithm popularized by apps like Google Docs. As a startup we value the ability to ship features quickly, and OTs were unnecessarily complex for our problem space
Operational Transforms (OT) are unnecessarily complex for problems unlike Google Docs
Every Figma document is a tree of objects, similar to the HTML DOM. There is a single root object that represents the entire document. Underneath the root object are page objects, and underneath each page object is a hierarchy of objects representing the contents of the page. This tree is is presented in the layers panel on the left-hand side of the Figma editor.
Structure of Figma documents
When a document is opened, the client starts by downloading a copy of the file. From that point on, updates to that document in both directions are synced over the WebSocket connection. Figma lets you go offline for an arbitrary amount of time and continue editing. When you come back online, the client downloads a fresh copy of the document, reapplies any offline edits on top of this latest state, and then continues syncing updates over a new WebSocket connection
Offline editing isn't a problem, unlike the online one
An important consequence of this is that changes are atomic at the property value boundary. The eventually consistent value for a given property is always a value sent by one of the clients. This is why simultaneous editing of the same text value doesn’t work in Figma. If the text value is B and someone changes it to AB at the same time as someone else changes it to BC, the end result will be either AB or BC but never ABC
Consequence of approaches like last-writer-wins
We use a client/server architecture where Figma clients are web pages that talk with a cluster of servers over WebSockets. Our servers currently spin up a separate process for each multiplayer document which everyone editing that document connects to
Way Figma approaches client/server architecture
Sometimes it's interesting to explain some code (How many time you spend trying to figure out a regex pattern when you see one?), but, in 99% of the time, comments could be avoided.
Generally try to avoid (avoid != forbid) comments.
Comments:
When we talk about abstraction levels, we can classify the code in 3 levels: high: getAdress medium: inactiveUsers = Users.findInactives low: .split(" ")
3 abstraction levels:
getAdress
inactiveUsers = Users.findInactives
.split(" ")
Explanation:
searchForsomething()
account.unverifyAccount
map
, to_downncase
and so on The ideal is not to mix the abstraction levels in only one function.
Try not mixing abstraction levels inside a single function
There is another maxim also that says: you must write the same code a maximum of 3 times. The third time you should consider refactoring and reducing duplication
Avoid repeating the same code over and over
Should be nouns, and not verbs, because classes represent concrete objects
Class names = nouns
Uncle Bob, in clean code, defends that the best order to write code is: Write unit tests. Create code that works. Refactor to clean the code.
Best order to write code (according to Uncle Bob):
int d could be int days
When naming things, focus on giving meaningful names, that you can pronounce and are searchable. Also, avoid prefixes
naming things, write better functions and a little about comments. Next, I intend to talk about formatting, objects and data structures, how to handle with errors, about boundaries (how to deal with another's one code), unit testing and how to organize your class better. I know that it'll be missing an important topic about code smells
Ideas to consider while developing clean code:
Should be verbs, and not nouns, because methods represent actions that objects must do
Methods names = verbs
decrease the switch/if/else is to use polymorphism
It's better to avoid excessive switch/if/else statements
In the ideal world, they should be 1 or 2 levels of indentation
Functions in the ideal world shouldn't be long
"The Big Picture" is one of those things that people say a whole lot but can mean so many different things. Going through all of these articles, they tend to mean any (or all) of these things
Thinking about The Big Picture:
Considering that there are still a ton of COBOL jobs out there, there is no particular technology that you need to know
RIght, there is no specific need to learn that one technology
read Knuth, or Pragmatic Programming, or Clean Code, or some other popular book
Classic programming related books
Senior developers are more cautious, thoughtful, pragmatic, practical and simple in their approaches to solving problems.
Interesting definition of senior devs
In recent years we’ve also begun to see increasing interest in exploratory testing as an important part of the agile toolbox
Waterfall software development ---> agile ---> exploratory testing
When I began coding, around 30 years ago, waterfall software development was used nearly exclusively.
Mathematica didn’t really help me build anything useful, because I couldn’t distribute my code or applications to colleagues (unless they spent thousands of dollars for a Mathematica license to use it), and I couldn’t easily create web applications for people to access from the browser. In addition, I found my Mathematica code would often end up much slower and more memory hungry than code I wrote in other languages.
Disadvantages of Mathematica:
In the 1990s, however, things started to change. Agile development became popular. People started to understand the reality that most software development is an iterative process
a methodology that combines a programming language with a documentation language, thereby making programs more robust, more portable, more easily maintained, and arguably more fun to write than programs that are written only in a high-level language. The main idea is to treat a program as a piece of literature, addressed to human beings rather than to a computer.
Exploratory testing described by Donald Knuth
Developing in the cloud
Well paid cloud platforms:
Finding a database management system that works for you
Well paid database technologies:
Here are a few very prominent technologies that you can look into and what impact each one might have on your salary
Other well paid frameworks, libraries and tools:
What programming language should I learn next?
Most paid programming languages:
Android and iOS
Payment for mobile OS:
First, you’ve spread the logic across a variety of different systems, so it becomes more difficult to reason about the application as a whole. Second, more importantly, the logic has been implemented as configuration as opposed to code. The logic is constrained by the ability of the applications which have been wired together, but it’s still there.
Why "no code" trend is dangerous in some way (on the example of Zapier):
Summary
In doing a code review, you should make sure that:
"Continuous Delivery is the ability to get changes of all types — including new features, configuration changes, bug fixes, and experiments — into production, or into the hands of users, safely and quickly in a sustainable way". -- Jez Humble and Dave Farley
Continuous Delivery
Another approach is to use a tool like H2O to export the model as a POJO in a JAR Java library, which you can then add as a dependency in your application. The benefit of this approach is that you can train the models in a language familiar to Data Scientists, such as Python or R, and export the model as a compiled binary that runs in a different target environment (JVM), which can be faster at inference time
H2O - export models trained in Python/R as a POJO in JAR
Continuous Delivery for Machine Learning (CD4ML) is a software engineering approach in which a cross-functional team produces machine learning applications based on code, data, and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles.
Continuous Delivery for Machine Learning (CD4ML) (long definition)
Basic principles:
In order to formalise the model training process in code, we used an open source tool called DVC (Data Science Version Control). It provides similar semantics to Git, but also solves a few ML-specific problems:
DVC - transform model training process into code.
Advantages:
Sometimes, the best way to learn is to mimic others. Here are some great examples of projects that use documentation well:
Examples of projects that use documentation well
(chech the list below)
“Code is more often read than written.” — Guido van Rossum
Documenting code is describing its use and functionality to your users. While it may be helpful in the development process, the main intended audience is the users.
Documenting code:
Class method docstrings should contain the following: A brief description of what the method is and what it’s used for Any arguments (both required and optional) that are passed including keyword arguments Label any arguments that are considered optional or have a default value Any side effects that occur when executing the method Any exceptions that are raised Any restrictions on when the method can be called
Class method should contain:
(check example below)
Comments to your code should be kept brief and focused. Avoid using long comments when possible. Additionally, you should use the following four essential rules as suggested by Jeff Atwood:
Comments should be as concise as possible. Moreover, you should follow 4 rules of Jeff Atwood:
From examining the type hinting, you can immediately tell that the function expects the input name to be of a type str, or string. You can also tell that the expected output of the function will be of a type str, or string, as well.
Type hinting introduced in Python 3.5 extends 4 rules of Jeff Atwood and comments the code itself, such as this example:
def hello_name(name: str) -> str:
return(f"Hello {name}")
str
Docstrings can be further broken up into three major categories: Class Docstrings: Class and class methods Package and Module Docstrings: Package, modules, and functions Script Docstrings: Script and functions
3 main categories of docstrings
Each format makes tradeoffs in encoding, flexibility, and expressiveness to best suit a specific use case.
Each data format brings different tradeoffs:
Computers can only natively store integers, so they need some way of representing decimal numbers. This representation comes with some degree of inaccuracy. That's why, more often than not, .1 + .2 != .3
Computers make up their way to store decimal numbers
Cross-platform development is now a standard because of wide variety of architectures like mobile devices, cloud servers, embedded IoT systems. It was almost exclusively PCs 20 years ago.
A package management ecosystem is essential for programming languages now. People simply don’t want to go through the hassle of finding, downloading and installing libraries anymore. 20 years ago we used to visit web sites, downloaded zip files, copied them to correct locations, added them to the paths in the build configuration and prayed that they worked.
How library management changed in 20 years
IDEs and the programming languages are getting more and more distant from each other. 20 years ago an IDE was specifically developed for a single language, like Eclipse for Java, Visual Basic, Delphi for Pascal etc. Now, we have text editors like VS Code that can support any programming language with IDE like features.
How IDEs "unified" in comparison to the last 20 years
Your project has no business value today unless it includes blockchain and AI, although a centralized and rule-based version would be much faster and more efficient.
Comparing current project needs to those 20 years ago
Being a software development team now involves all team members performing a mysterious ritual of standing up together for 15 minutes in the morning and drawing occult symbols with post-its.
In comparison to 20 years ago ;)
I am not sure how but one kind soul somehow found the project, forked it, refactored it, "modernized" it, added linting, code sniffing, added CI and opened the pull request.
It's worth sharing your code, since someone can always find it and improve it, so that you can learn from it
Let's reason through our memoizer before we write any code.
Operations performed by a memoizer:
Which is written as:
// Takes a reference to a function
const memoize = func => {
// Creates a cache of results
const results = {};
// Returns a function
return (...args) => {
// Create a key for results cache
const argsKey = JSON.stringify(args);
// Only execute func if no cached value
if (!results[argsKey]) {
// Store function call result in cache
results[argsKey] = func(...args);
}
// Return cached value
return results[argsKey];
};
};