2,548 Matching Annotations
  1. May 2020
    1. This might be super basic, but... assume positive intent.Your parent is not your enemy. Your teacher is not your enemy. Your boss is not your enemy. The other team at work is not your enemy.

      Idea that changed your life: assuming positive intent

    1. We present results from technical experiments which reveal that WeChat communications conducted entirely among non-China-registered accounts are subject to pervasive content surveillance that was previously thought to be exclusively reserved for China-registered accounts.

      WeChat not only tracks Chinese accounts

    1. So write with a combination of short, medium, and long sentences. Create a sound that pleases the reader’s ear. Don’t just write words. Write music.”

      Try writing fewer commas and using more periods. Make your sentences simpler, vary their lengths and your sentences will be like music

    1. Relative to conventional plots, estimation plots offer five key benefits:

      Estimation plots > bars-and-stars or boxplot & P.

      They:

      • avoid false dictonomy
      • display all observed values
      • focus on intervention effect size
      • visualise estimate precision
      • show mean difference distribution
    2. Efron developed the bias-corrected and accelerated bootstrap (BCa bootstrap) to account for the skew whilst obtaining the central 95% of the distribution.

      Bias-corrected and accelerated bootstrap (BCa boostrap) deals with skewed sample distributions. However; it must be noted that it "may not give very accurate coverage in a small-sample non-parametric situation" (simply said, take caution with small datasets)

    3. We can calculate the 95% CI of the mean difference by performing bootstrap resampling.

      Bootstrap - simple but powerful technique that creates multiple resamples (with replacement) from a single set of observations, and computes the effect size of interest on each of these resamples. It can be used to determine the 95% CI (Confidence Interval).

      We can use bootstrap resampling to obtain measure of precision and confidence about our estimate. It gives us 2 important benefits:

      1. Non-parametric statistical analysis - no need to assume normal distribution of our observations. Thanks to Central Limit Theorem, the resampling distribution of the effect size will approach normality
      2. Easy construction of the 95% CI from the resampling distribution. For 1000 bootstrap resamples of the mean difference, 25th value and 975th value can be used as boundaries of the 95% CI.

      Bootstrap resampling can be used for such an example:

      Computers can easily perform 5000 resamples:

    1. Continuous Delivery of Deployment is about running as thorough checks as you can to catch issues on your code. Completeness of the checks is the most important factor. It is usually measured in terms code coverage or functional coverage of your tests. Catching errors early on prevents broken code to get deployed to any environment and saves the precious time of your test team.

      Continuous Delivery of Deployment (quick summary)

    2. Continuous Integration is a trade off between speed of feedback loop to developers and relevance of the checks your perform (build and test). No code that would impede the team progress should make it to the main branch.

      Continuous Integration (quick summary)

    3. A good CI build: Ensures no code that breaks basic stuff and prevents other team members to work is introduced to the main branch Is fast enough to provide feedback to developers within minutes to prevent context switching between tasks

      Good CI build

    4. The idea of Continuous Delivery is to prepare artefacts as close as possible from what you want to run in your environment. These can be jar or war files if you are working with Java, executables if you are working with .NET. These can also be folders of transpiled JS code or even Docker containers, whatever makes deploy shorter (i.e. you have pre built in advance as much as you can).

      Idea of Continuous Delivery

    5. The app should build and start Most critical features should be functional at all times (user signup/login journey and key business features) Common layers of the application that all the developers rely on, should be stable. This means unit tests on those parts.

      Things to be checked by Continous Integration

    6. Continuous Integration is all about preventing the main branch of being broken so your team is not stuck. That’s it. It is not about having all your tests green all the time and the main branch deployable to production at every commit.

      Continuous Integration prevents other team members from wasting time through a pull of faulty code

    1. Here again are the twenty rules of formulating knowledge.
      1. Do not learn if you do not understand
      2. Learn before you memorize - build the picture of the whole before you dismember it into simple items in SuperMemo. If the whole shows holes, review it again!
      3. Build upon the basics - never jump both feet into a complex manual because you may never see the end. Well remembered basics will help the remaining knowledge easily fit in
      4. Stick to the minimum information principle - if you continue forgetting an item, try to make it as simple as possible. If it does not help, see the remaining rules (cloze deletion, graphics, mnemonic techniques, converting sets into enumerations, etc.)
      5. Cloze deletion is easy and effective - completing a deleted word or phrase is not only an effective way of learning. Most of all, it greatly speeds up formulating knowledge and is highly recommended for beginners
      6. Use imagery - a picture is worth a thousand words
      7. Use mnemonic techniques - read about peg lists and mind maps. Study the books by Tony Buzan. Learn how to convert memories into funny pictures. You won't have problems with phone numbers and complex figures
      8. Graphic deletion is as good as cloze deletion - obstructing parts of a picture is great for learning anatomy, geography and more
      9. Avoid sets - larger sets are virtually un-memorizable unless you convert them into enumerations!
      10. Avoid enumerations - enumerations are also hard to remember but can be dealt with using cloze deletion
      11. Combat interference - even the simplest items can be completely intractable if they are similar to other items. Use examples, context cues, vivid illustrations, refer to emotions, and to your personal life
      12. Optimize wording - like you reduce mathematical equations, you can reduce complex sentences into smart, compact and enjoyable maxims
      13. Refer to other memories - building memories on other memories generates a coherent and hermetic structure that forgetting is less likely to affect. Build upon the basics and use planned redundancy to fill in the gaps
      14. Personalize and provide examples - personalization might be the most effective way of building upon other memories. Your personal life is a gold mine of facts and events to refer to. As long as you build a collection for yourself, use personalization richly to build upon well established memories
      15. Rely on emotional states - emotions are related to memories. If you learn a fact in the sate of sadness, you are more likely to recall it if when you are sad. Some memories can induce emotions and help you employ this property of the brain in remembering
      16. Context cues simplify wording - providing context is a way of simplifying memories, building upon earlier knowledge and avoiding interference
      17. Redundancy does not contradict minimum information principle - some forms of redundancy are welcome. There is little harm in memorizing the same fact as viewed from different angles. Passive and active approach is particularly practicable in learning word-pairs. Memorizing derivation steps in problem solving is a way towards boosting your intellectual powers!
      18. Provide sources - sources help you manage the learning process, updating your knowledge, judging its reliability, or importance
      19. Provide date stamping - time stamping is useful for volatile knowledge that changes in time
      20. Prioritize - effective learning is all about prioritizing. In incremental reading you can start from badly formulated knowledge and improve its shape as you proceed with learning (in proportion to the cost of inappropriate formulation). If need be, you can review pieces of knowledge again, split it into parts, reformulate, reprioritize, or delete.
    1. If you see a word and immediately check it's translation, you'll hardly memorize it at all. If you try your best to recall what the word means before checking the translation, the chances of memorization are much better. It is much easier to memorize a word's meaning when you know how to pronounce it corretly, so get it right.  

      2 important tips for learning languages:

      • Before checking the translation, at least try to recall what the word means
      • Know how to pronounce the word while trying to memorise it
    1. Somewhere between too hard and too easy, there’s a sweet-spot where reviews are challenging enough to hold your interest, but not so hard that it feels like torture. When the challenge of reviews is just right, you’ll actually get a sense of accomplishment and a little jolt of dopamine as you do them. Our brains actually enjoy challenges as long as they aren’t too hard or too easy. As I see it, this level of challenge is where you want to be.

      The sweet spot is between 80 - 90% of right answers

    2. Researchers have found that reviews are more effective when they’re difficult. That is, if you have to work at remembering a card, it’ll have a stronger effect on your memory. The harder a review is, the more it boosts your memory. This is called “desirable difficulty” in the literature.

      Desirable difficulty

    1. Kolejna zmienna - "Recency", czyli informacja o tym, jak dawno klient robił zakupy w sklepie.

      To calculate RFM we need recency value. Firstly, we shall specify the most recent transaction as "today" and then to find the latest transaction of a specific client:

      df['Recency'] = (today - df.InvoiceDate)/np.timedelta64(1,'D')
      

      Calculating frequency and aggregating data of each client may be done with the groupby method:

      abt = df.groupby(['CustomerID']).agg({'Recency':'min', 'MonetaryValue':'sum', 'InvoiceNo':'count'})
      

      lastly, we can update the column names and display RFM data:

      abt = df.groupby(['CustomerID']).agg({'Recency':'min', 'MonetaryValue':'sum', 'InvoiceNo':'count'})
      abt.rename(columns = {'InvoiceNo':'Frequency'}, inplace = True)
      abt = abt[['Recency', 'Frequency', 'MonetaryValue']]
      abt.head()
      
    2. Będę oceniać klientów kryteriami, jakie zakłada metoda RFM.

      RFM is a method used for analyzing customer value:

      • Recency – How recently did the customer purchase?
      • Frequency – How often do they purchase?
      • Monetary Value – How much do they spend?

      In the RFM method we usually analyse only the last 12 months since this is the most relevant data of our products.

      We also have other RFM variants:

      • RFD – Recency, Frequency, Duration
      • RFE – Recency, Frequency, Engagement
      • RFM-I – Recency, Frequency, Monetary Value – Interactions
      • RFMTC – Recency, Frequency, Monetary Value, Time, Churn rate
    3. Usuwam brakujące wartości w zmiennej "CustomerID".

      Deleting rows where value is null:

      df = df[~df.CustomerID.isnull()]
      

      Assigning different data types to columns:

      df['CustomerID'] = df.CustomerID.astype(int)
      

      Deleting irrelevant columns:

      df.drop(['Description', 'StockCode', 'Country'], axis = 1, inplace = True)
      
    4. Buduję prosty data frame z podstawowymi informacjami o zbiorze.

      Building a simple dataframe with a summary of our columns (data types, sum and % of nulls):

      summary = pd.DataFrame(df.dtypes, columns=['Dtype'])
      summary['Nulls'] = pd.DataFrame(df.isnull().any())
      summary['Sum_of_nulls'] = pd.DataFrame(df.isnull().sum())
      summary['Per_of_nulls'] = round((df.apply(pd.isnull).mean()*100),2)
      summary.Dtype = summary.Dtype.astype(str)
      print(summary)
      

      the output:

                            Dtype  Nulls  Sum_of_nulls  Per_of_nulls
      InvoiceNo            object  False             0         0.000
      StockCode            object  False             0         0.000
      Description          object   True          1454         0.270
      Quantity              int64  False             0         0.000
      InvoiceDate  datetime64[ns]  False             0         0.000
      UnitPrice           float64  False             0         0.000
      CustomerID          float64   True        135080        24.930
      Country              object  False             0         0.000
      
    1. If you, on the other hand, want to go the student-like route (living in Wohngemeinschaft, not eating out too much) and try to save, you can easily live on 1,500-2,000 CHF per month and save the majority of your salary.

      If you leave cheaply, you can spend around 1500 - 2000 CHF a month and save majority of your salary

    2. We are now cooperating with Credit Agricole Bank and Revolut - if you have already moved to Switzerland you can open a free bank account and get 100 CHF bonus - email us to get the bonus code.

      100 CHF bonus for opening a bank account in Switzerland

    3. Choose health insurance (Krankenkasse) - in Switzerland you have to pay your health insurance separately (it’s not deducted from your salary). You can use the Comparis website to compare the options. You have 3 months to choose both the company and your franchise.

      Choosing health insurance in Switzerland

    1. WHERE clause introduces a condition on individual rows; HAVING clause introduces a condition on aggregations, i.e. results of selection where a single result, such as count, average, min, max, or sum, has been produced from multiple rows. Your query calls for a second kind of condition (i.e. a condition on an aggregation) hence HAVING works correctly. As a rule of thumb, use WHERE before GROUP BY and HAVING after GROUP BY. It is a rather primitive rule, but it is useful in more than 90% of the cases.

      Difference between WHERE and HAVING in SQL:

      • WHERE is used for individual rows
      • HAVING is used for aggregations (result of a selection), such as: COUNT, AVERAGE, MIN, MAX or SUM
      • Use WHERE before GROUP BY and HAVING after GROUP BY (works in 90% of the cases)
    1. What people will say is that estimates are for planning – that their purpose is to figure out how long some piece of work is going to take, so that everybody can plan accordingly. In all my five years shipping stuff, I can only recall one project where things really worked that way.

      Project estimations are just energy drainers and stress producers

    2. Be explicit about the difference between hard deadlines

      Different types of deadlines:

      • Hard deadline - something seriously bad to the business will happen if the deadline isn’t met
      • Soft deadline - somebody will look bad if the deadline isn’t met
      • Internal deadline - target internal to the team that will not affect anybody outside of the team
      • Expected completion date - team currently predicts that work will be completed
    3. you should always advocate for having a dedicated SRE if there’s any real risk of after-hours pages that are out of your control.

      Site Reliability Engineers (ideally from different time zones) should've been settled when we might expect after-hours errors

    4. I try to write a unit test any time the expected value of a defect is non-trivial.

      Write unit tests at least for the most important parts of code, but every chunk of code should have a trivial unit test around it – this verifies that the code is written in a testable way, which indeed is extremely important

    5. Programming languages These will probably expose my ignorance pretty nicely.

      When to use different programming languages (advice from an Amazon employee):

      • Java - enterprise applications
      • C# - Microsoft's spin on Java (useful in the Microsoft's ecosystem)
      • Ruby - when speed is more important then legibility or debugging
      • Python - same as Ruby but also for ML/AI (don't forget to use type hinting to make life a little saner)
      • Go/Rust - fresh web service where latency and performance were more important than community/library support
      • Haskell/Erlang - for very elegant/mathematical functional approach without a lot of business logic
      • Clojure - in situation when you love Lisp (?)
      • Kotlin/Scala - languages compiling to JVM bytecode (preferable over Clojure). Kotlin works with Java and has great IntelliJ support
      • C - classes of applications (operating systems, language design, low-level programming and hardware)
      • C++ - robotics, video games and high frequency trading where the performance gains from no garbage collection make it preferable to Java
      • PHP/Hack - testing server changes without rebuilding. PHP is banned at Amazon due to security reasons, but its successor, Hack, runs a lot of Facebook and Slack's backends
    6. Now, a couple of years later, my guidelines for JS are:

      Advices on using JavaScript from a long time programmer:

      1. Use TypeScript instead.
      2. Push as much logic to the server as possible
      3. Use a framework like Vue or React if you need front interactivity
      4. Don't skip unit tests
    7. Consuming media (books, blogs, whatever) is not inherently a compounding thing. Only if you have some kind of method to reflect, to digest, to incorporate your knowledge into your thoughts. If there is anything valuable in this post, you, reader, will probably not benefit from it unless you do something active to “process” it immediately.

      Consuming books/blogs is not as compounding as we think

    1. if you have an amazing manager at a shit company you’ll still have a shit time. In some ways, it’ll actually be worse. If they’re good at their job (including retaining you), they’ll keep you at a bad company for too long. And then they’ll leave, because they’re smart and competent. Maybe they’ll take you with them.

      Danger of working with a great manager at a shit company

    2. It’s also okay to take risks. Staying at a company that’s slowly dying has its costs too. Stick around too long and you’ll lose your belief that you can build, that change is possible. Try not to learn the wrong habits.

      Cons of staying too long in the same company

    1. I do think when a lot of managers realized they’ve hit their peak or comfort level, they then start to focus on playing politics instead of delivering results to hold onto their position. These are also the kind of managers who would only hire people less capable than them, for fear of being replaced.

      The way corporate world works

    2. “In practice people gravitate to, hire and promote individuals they like to be around, not people who demand accountability.”

      Everybody likes having an agreeable and flattering person around them

    3. Dr. Peter advises creative incompetence — pretending to be incompetent but doing it in an area or manner where it doesn’t actually impair your work.

      Creative incompetence

    4. Dr. Peter also surmised that “super competent” people tend to “disrupt the hierarchy.” I suppose that’s a nice way of saying you’ve made your boss look bad by being more capable.In such situations, you’ll probably find yourself deliberately suppressed or edged out sooner or later — for some stupid reason or blame pushing.

      Being overly competent may get you fired

    5. So if you’re a highly competent and aggressive individual, it’s best you find yourself a job in a startup, be an entrepreneur, or work in a company that needs turning around.

      Advice to competitive workers

    6. Dr. Peter also had another interesting theory about getting promoted. He considered working hard and improving your skill sets not as effective as something called pull promotion. That’s when you get promoted — faster than usual — when a mentor or patron pulls you up.No wonder there’s so much butt kissing in the corporate world. They must have read Dr. Peter’s research from the ‘60s.

      Pull promotion

    7. Find a results-oriented job if you’re fiercely independent and opinionated. Climb the ladder in a big corporation if you’re highly diplomatic or a crowd-pleaser.

      Advice for two different working profiles

    1. A study conducted by Gloria Marks, a Professor of Informatics at the University of California, revealed that it takes us an average of 23 minutes and 15 seconds to refocus on a task after an interruption, and even when we do, we experience a decrease in productivity

      23 minutes and 15 seconds - average time to refocus on task after an interruption

    2. It doesn't mean that we ignore all messages and only look up from our work when something is on fire – but the general expectation is that it's okay to not be immediately available to your teammates when you are focusing on your work

      One of the rules of "Office time"

    3. People think it’s efficient to distribute information all at the same time to a bunch of people around a room. But it’s actually a lot less efficient than distributing it asynchronously by writing it up and sending it out and letting people absorb it when they’re ready to so it doesn’t break their days into smaller bits.”

      Async > meetings

    4. it's a matter of culture. None of these rules would work if the management fails to see that makers need to follow a different schedule

      Change in the work environment needs acknowledgement of managers

    5. The most straightforward way to address this is to build a team knowledge base. Not only does that minimize the number of repetitive questions bounced around the office, it allows new team members to basically onboard themselves.

      Building a team knowledge base

    6. For managers, interruptions in the form of meetings, phone calls, and Slack notifications are normal. For someone on the maker schedule, however, even the slightest distraction can have a disruptive effect

      How ideal schedule should look like:

    7. In our experience, the best way to prevent a useless meeting is to write up our goals and thoughts first. Despite working in the same office, our team at Nuclino has converted nearly all of our meetings into asynchronously written reports.

      Weekly status report (example):

    1. For many data scientists, the finished product of a work session is a business analysis. They need to show team members—who oftentimes aren’t technical—how their data became a specific recommendation or insight.

      Usual final product in Data Science is the business analysis which is perfectly explained with notebooks

    1. Work never ends. No matter how much you get done there will always be more. I see a lot of colleagues burn out because they think their extra effort will be noticed. Most managers appriciate it but do not promote their employees.

      Common reality of overworking

    1. Don't focus too much on the salary. It's just one tiny part of the whole package.Your dev job pays your rent, food and savings. I assume that most dev jobs do this quite well.Beyond this, the main goal of a job is to increase your future market value, your professional network and to have fun. So. basically it's about how much you are worth in your next job and that you enjoy your time.A high salary doesn't help you if you do stuff which doesn't matter in a few years.

      Don't focus on the salary in your dev job.

    1. COVID-19 has spurred a shift to analyze things like supply chain disruptions, speech analytics, and filtering out pandemic-related behavior, such as binge shopping, Burtch Works says. Data teams are being asked to create new simulations for the coming recession, to create scorecards to track pandemic-related behavior, and add COVID-19 control variables to marketing attribution models, the company says.

      How COVID-19 impacts activities of data positions

    2. Data scientists and data engineers have built-in job security relative to other positions as businesses transition their operations to rely more heavily on data, data science, and AI. That’s a long-term trend that is not likely to change due to COVID-19, although momentum had started to slow in 2019 as venture capital investments ebbed.
    3. According to a Dice Tech Jobs report released in February, demand for data engineers was up 50% and demand for data scientists was up 32% in 2019 compared to the prior year.

      Need for Data Scientist / Engineers in 2019 vs 2018

    1. According to the Harvard Business Review article “Collaborative Overload”, the time employees spend on collaboration has increased by 50% over the past two decades. Researchers found it was not uncommon for workers to spend a full 80% of their workdays communicating with colleagues in the form of email (on which workers’ spend an average of six hours a day); meetings (which fill up 15 percent of a company’s time, on average); and more recently instant messaging apps (the average Slack user sends an average of 200 messages a day, though 1,000-message power users are “not the exception”)

      Time spent in the office

    2. we think the async culture is one of the core reasons why most of the people we’ve hired at Doist the past 5 years have stayed with us. Our employee retention is 90%+ — much higher than the overall tech industry. For example, even a company like Google — with its legendary campuses full of perks from free meals to free haircuts — has a median tenure of just 1.1 years. Freedom to work from anywhere at any time beats fun vanity perks any day, and it costs our company $0 to provide

      Employee retention rate at Doist vs Google

    1. I also recently took about 10 months off of work, specifically to focus on learning. It was incredible, and I don’t regret it financially. I would often get up at 6 in the morning or even earlier (which I never do) just from excitement about what I was going to learn about and accomplish in the day. Spending my time focused Only on what I was most interested in was incredibly rewarding.

      Approach of taking 10 months off from work just to learn something new

    1. I'm working for myself right now, but if one day I needed to go get a full-time job again, I would almost certainly not go to big tech again. I'd rather get paid a fifth of what I was doing, but do something that leaves me with some energy after I put in a day's work

      Reflections after working for FAANG

    2. Second, in my experience working with ex-FAANG - these engineers, while they all tend to be very smart, tend to be borderline junior engineers in the real world. They simply don't know how to build or operate something without the luxury of the mature tooling that exists at FAANG. You may be in for a reality shock when you leave the FAANG bubble

      Working with engineers out of FAANG can be surprising

    1. We want to learn, but we worry that we might not like what we learn. Or that learning will cost us too much. Or that we will have to give up cherished ideas.

      I believe it is normal to worry about the usage of a new domain-based knowledge

    1. The two things I really like about working for smaller places or starting a company is you get very direct access to users and customers and their problems, which means you can actually have empathy for what's actually going on with them, and then you can directly solve it. That cycle is so powerful, the sooner you learn how to make that cycle happen in your career, the better off you'll be. If you can make software and make software for other people, the outcome truly is hundreds of millions of dollars worth of value if you get it right. That's where I'm here to try and encourage you to do. I'm not really saying that you shouldn't go work at a big tech company. I am saying you should probably leave before it makes you soft. 

      What are the benefits of working at the smaller companies/startups over the tech giants

    1. afternoons are spent reading/researching/online classes.This has really helped me avoid burn out. I go into the weekend less exhausted and more motivated to return on Monday and implement new stuff. It has also helped generate some inspiration for weekend/personal projects.

      Learning at work as solution to burn out and inspiration for personal projects

    1. Praca w Facebooku - doskonała znajomość JSa, React, zarządzanie projektem OSS na GitHubie, prowadzenie społeczności, pisanie dokumentacji i wpisów na blogu.Szkolenia - dobra znajomość JSa, React, tworzenie szkoleń (struktura, zadania, itd), uczenie i swobodne przekazywanie wiedzy, marketing, sprzedaż.Startupy - dobra znajomość JSa, React, praca w zespole, rozmawianie z klientami, analiza biznesowa, szybkie dowożenie MVP, praca w stresie i dziwnych strefach czasowych.

      Examples of restructuring tasks into more precise actions:

      • Working at Facebook - great JS, React, managing OS project on GitHub, managing a social group, writing documentation and blog
      • Workshops - good JS, React, delivering workshops (structure, tasks), learning and teaching, marketing, sale
      • Startups - good JS, React, work in a team, talking to clients, business analytics, quick MVP delivery, work under stress and in strange timezones
    1. Research shows that humans tend to do whatever it takes to keep busy, even if the activity feels meaningless to them. Dr Brené Brown from the University of Houston describes being “crazy busy” as a numbing strategy we use to avoid facing the truth of our lives.

      People simply prefer to be busy

    1. A few takeaways

      Summarising the article:

      • Types and tests save you from stupid mistakes; these're gifts for your future self!
      • Use ESLint and configure it to be your strict, but fair, friend.
      • Think of tests as a permanent console.
      • Types: It is not only about checks. It is also about code readability.
      • Testing with each commit makes fewer surprises when merging Pull Requests.
    1. There is increasing concern that most current published research findings are false.

      The probability that the research is true may depend on:

      • study power
      • bias
      • the number of other studies on the same question
      • the ratio of true to no relationships among the relationships probed in each scientific field.

      Research finding is less likely to be true when:

      • the studies are conducted in a smaller field
      • effect sizes are smaller
      • there is a greater number and lesser preselection of tested relationships
      • greater flexibility in designs, definition, outcomes and analytical modes
      • greater financial and other interest and prejudice
      • more teams are involved in a scientific field in chase of statistical significance
    1. golden rule: If someone calls saying they’re from your bank, just hang up and call them back — ideally using a phone number that came from the bank’s Web site or from the back of your payment card.

      Golden rule of talking to your bank

    2. “When the representative finally answered my call, I asked them to confirm that I was on the phone with them on the other line in the call they initiated toward me, and so the rep somehow checked and saw that there was another active call with Mitch,” he said. “But as it turned out, that other call was the attackers also talking to my bank pretending to be me.”

      Phishing situation scenario:

      • a person is called by attackers who identify as his bank
      • the victim tell them to hold the line
      • in the meantime, the victim calls his bank representative who confirms after a while that he is with them on another line
      • in reality, the another line is done by attackers pretending to be him
  2. Apr 2020
    1. It is difficult to choose a typical reading speed, research has been conducted on various groups of people to get typical rates, what you regularly see quoted is: 100 to 200 words per minute (wpm) for learning, 200 to 400 wpm for comprehension.

      On average people read:

      • 100-200 words/minute - learning
      • 200-400 words/minute - comprehension
    1. DevOps tools enable DevOps in organizations

      Common DevOps tools:

      • Plan: JIRA
      • Build: Maven, Gradle, Docker, GitHub, GitLab
      • Continuous integration: Jenkins, CircleCI, Travis CI
      • Release: Jenkins, Bamboo
      • Deploy: Ansible, Kubernetes, Heroku, Amazon Web Services, Google Cloud Platform
      • Operate: Botmetric, Docker, Ansible, Puppet, Chef
      • Monitor: Nagios, Splunk
      • Continuous Feedback: Slack
    2. While talking about DevOps, three things are important continuous integration, continuous deployment, and continuous delivery.

      DevOps process

      • Continuous Integration - code gets integrated several times a day (checked by automated pipeline(server))
      • Continuous Delivery - introducing changes with every commit, making code ready for production
      • Continuous Deployment - deployment in production is automatic, without explicit approval from a developer

      another version of the image: and one more:

    3. Basic prerequisites to learn DevOps

      Basic prerequisites to learn DevOps:

      • Basic understanding of Linux/Unix system concepts and administration
      • Familiarity with command-line interface
      • Knowing how build and deployment process works
      • Familiarity with text editor
      • Setting up a home lab environment with VirtualBox
      • Networking in VirtualBox
      • Setting up multiple VMs in VirtualBox
      • Basics of Vagrant
      • Linux networking basics
      • Good to know basic scripting
      • Basics of applications - Java, NodeJS, Python
      • Web servers - Apache HTTPD, G-Unicorn, PM2
      • Databases - MySQL, MongoDB
    4. DevOps benefits

      DevOps benefits:

      • Improves deployment frequency
      • Helps with faster time to market
      • Lowers the failure rate of new releases
      • Increased code quality
      • More collaboration between the teams and departments
      • Shorter lead times between fixes
      • Improves the mean time to recovery
    5. Operations in the software industry include administrative processes and support for both hardware and software for clients as well as internal to the company. Infrastructure management, quality assurance, and monitoring are the basic roles for operations.

      Operations (1/2 of DevOps):

      • administrative processes
      • support for both hardware and software for clients, as well as internal to the company
      • infrastructure management
      • quality assurance
      • monitoring
    1. I set it with a few clicks at Travis CI, and by creating a .travis.yml file in the repo

      You can set CI with a few clicks using Travis CI and creating a .travis.yml file in your repo:

      language: node_js
      node_js: node
      
      before_script:
        - npm install -g typescript
        - npm install codecov -g
      
      script:
        - yarn lint
        - yarn build
        - yarn test
        - yarn build-docs
      
      after_success:
        - codecov
      
    2. I set it with a few clicks at Travis CI, and by creating a .travis.yml file in the repo

      You can set CI with a few clicks using Travis CI and creating a .travis.yml file in your repo:

      language: node_js
      node_js: node
      
      before_script:
        - npm install -g typescript
        - npm install codecov -g
      
      script:
        - yarn lint
        - yarn build
        - yarn test
        - yarn build-docs
      
      after_success:
        - codecov
      
    3. Continuous integration makes it easy to check against cases when the code: does not work (but someone didn’t test it and pushed haphazardly), does work only locally, as it is based on local installations, does work only locally, as not all files were committed.

      CI - Continuous Integration helps to check the code when it :

      • does not work (but someone didn’t test it and pushed haphazardly),
      • does work only locally, as it is based on local installations,
      • does work only locally, as not all files were committed.
    4. In Python, when trying to do a dubious operation, you get an error pretty soon. In JavaScript… an undefined can fly through a few layers of abstraction, causing an error in a seemingly unrelated piece of code.

      Undefined nature of JavaScript can hide an error for a long time. For example,

      function add(a,b) { return + (a + b) }
      add(2,2)
      add('2', 2)
      

      will result in a number, but is it the same one?

    5. It is fine to start adding tests gradually, by adding a few tests to things that are the most difficult (ones you need to keep fingers crossed so they work) or most critical (simple but with many other dependent components).

      Start small by adding tests to the most crucial parts

    6. I need to specify types of input and output. But then I get speedup due to autocompletion, hints, and linting if for any reason I make a mistake.

      In TypeScript, you spend a bit more time in the variable definition, but then autocompletion, hints, and linting will reward you. It also boosts code readability

    7. Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

      According to the Kernighan's Law, writing code is not as hard as debugging

    1. Continuous Deployment is the next step. You deploy the most up to date and production ready version of your code to some environment. Ideally production if you trust your CD test suite enough.

      Continuous Deployment

    1. the limitations of the PPS

      Limitations of the PPS:

      1. Slower than correlation
      2. Score cannot be interpreted as easily as the correlation (it doesn't tell you anything about the type of relationship). PPS is better for finding patterns and correlation is better for communicating found linear relationships
      3. You cannot compare the scores for different target variables in a strict math way because they're calculated using different evaluation metrics
      4. There are some limitations of the components used underneath the hood
      5. You've to perform forward and backward selection in addition to feature selection
    2. Although the PPS has many advantages over the correlation, there is some drawback: it takes longer to calculate.

      PPS is slower to calculate the correlation.

      • single PPS = 10-500 ms
      • whole PPS matrix for 40 columns = 40*40 = 1600 individual calculations = 1-10 minutes
    3. The PPS clearly has some advantages over correlation for finding predictive patterns in the data. However, once the patterns are found, the correlation is still a great way of communicating found linear relationships.

      PPS:

      • good for finding predictive patterns
      • can be used for feature selection
      • can be used to detect information leakage between variables
      • interpret PPS matrix as a directed graph to find entity structures Correlation:
      • good for communicating found linear relationships
    4. Let’s compare the correlation matrix to the PPS matrix on the Titanic dataset.

      Comparing correlation matrix and the PPS matrix of the Titanic dataset:

      findings about the correlation matrix:

      1. Correlation matrix is smaller because it doesn't work for categorical data
      2. Correlation matrix shows a negative correlation between TicketPrice and Class. For PPS, it's a strong predictor (0.9 PPS), but not the other way Class to TicketPrice (ticket of 5000-10000$ is most likely the highest class, but the highest class itself cannot determine the price)

      findings about the PPS matrix:

      1. First row of the matrix tells you that the best univariate predictor of the column Survived is the column Sex (Sex was dropped for correlation)
      2. TicketID uncovers a hidden pattern as well as it's connection with the TicketPrice

    5. Let’s use a typical quadratic relationship: the feature x is a uniform variable ranging from -2 to 2 and the target y is the square of x plus some error.

      In this scenario:

      • we can predict y using x
      • we cannot predict x using y as x might be negative or positive (for y=4, x=2 or -2
      • the correlation is 0. Both from x to y and from y to x because the correlation is symmetric (more often relationships are assymetric!). However, the PPS from x to y is 0.88 (not 1 because of existing error)
      • PPS from y to x is 0 because there's no relationship that y can predict if it only knows its own value

    6. how do you normalize a score? You define a lower and an upper limit and put the score into perspective.

      Normalising a score:

      • you need to put a lower and upper limit
      • upper limit can be F1 = 1, and a perfect MAE = 0
      • lower limit depends on the evaluation metric and your data set. It's the value that a naive predictor achieves
    7. For a classification problem, always predicting the most common class is pretty naive. For a regression problem, always predicting the median value is pretty naive.

      What is a naive model:

      • predicting common class for a classification problem
      • predicting median value for a regression problem
    8. Let’s say we have two columns and want to calculate the predictive power score of A predicting B. In this case, we treat B as our target variable and A as our (only) feature. We can now calculate a cross-validated Decision Tree and calculate a suitable evaluation metric.

      If the target (B) variable is:

      • numeric - we can use a Decision Tree Regressor and calculate the Mean Absolute Error (MAE)
      • categoric - we can use a Decision Tree Classifier and calculate the weighted F1 (or ROC)
    9. there are many non-linear relationships that the score simply won’t detect. For example, a sinus wave, a quadratic curve or a mysterious step function. The score will just be 0, saying: “Nothing interesting here”. Also, correlation is only defined for numeric columns.

      Correlation:

      • doesn't work with non-linear data
      • doesn't work for categorical values

      Examples:

    1. Some of our main takeaways:CRDT literature can be relevant even if you're not creating a decentralized systemMultiplayer for a visual editor like ours wasn't as intimidating as we thoughtTaking time to research and prototype in the beginning really paid off

      Key takeaways of developing a live editing tool

    2. CRDTs refer to a collection of different data structures commonly used in distributed systems. All CRDTs satisfy certain mathematical properties which guarantee eventual consistency. If no more updates are made, eventually everyone accessing the data structure will see the same thing. This constraint is required for correctness; we cannot allow two clients editing the same Figma document to diverge and never converge again

      CRDTs (Conflict-free Replicated Data Types)

    3. Figma’s multiplayer servers keep track of the latest value that any client has sent for a given property on a given object

      No conflict:

      • two clients changing unrelated properties on the same object
      • two clients changing the same property on unrelated objects.

      Conflict:

      • two clients changing the same property on the same object (document will end up with the last value sent)
    4. Figma doesn’t store any properties of deleted objects on the server. That data is instead stored in the undo buffer of the client that performed the delete. If that client wants to undo the delete, then it’s also responsible for restoring all properties of the deleted objects. This helps keep long-lived documents from continuing to grow in size as they are edited

      Undo option

    5. it's important to be able to iterate quickly and experiment before committing to an approach. That's why we first created a prototype environment to test our ideas instead of working in the real codebase

      First work with a prototype, then the real codebase

    6. We had a lot of trouble until we settled on a principle to help guide us: if you undo a lot, copy something, and redo back to the present (a common operation), the document should not change. This may seem obvious but the single-player implementation of redo means “put back what I did” which may end up overwriting what other people did next if you’re not careful. This is why in Figma an undo operation modifies redo history at the time of the undo, and likewise a redo operation modifies undo history at the time of the redo

      Undo/Redo working

    7. operational transforms (a.k.a. OTs), the standard multiplayer algorithm popularized by apps like Google Docs. As a startup we value the ability to ship features quickly, and OTs were unnecessarily complex for our problem space

      Operational Transforms (OT) are unnecessarily complex for problems unlike Google Docs

    8. Every Figma document is a tree of objects, similar to the HTML DOM. There is a single root object that represents the entire document. Underneath the root object are page objects, and underneath each page object is a hierarchy of objects representing the contents of the page. This tree is is presented in the layers panel on the left-hand side of the Figma editor.

      Structure of Figma documents

    9. When a document is opened, the client starts by downloading a copy of the file. From that point on, updates to that document in both directions are synced over the WebSocket connection. Figma lets you go offline for an arbitrary amount of time and continue editing. When you come back online, the client downloads a fresh copy of the document, reapplies any offline edits on top of this latest state, and then continues syncing updates over a new WebSocket connection

      Offline editing isn't a problem, unlike the online one

    10. An important consequence of this is that changes are atomic at the property value boundary. The eventually consistent value for a given property is always a value sent by one of the clients. This is why simultaneous editing of the same text value doesn’t work in Figma. If the text value is B and someone changes it to AB at the same time as someone else changes it to BC, the end result will be either AB or BC but never ABC

      Consequence of approaches like last-writer-wins

    11. We use a client/server architecture where Figma clients are web pages that talk with a cluster of servers over WebSockets. Our servers currently spin up a separate process for each multiplayer document which everyone editing that document connects to

      Way Figma approaches client/server architecture

    1. Sometimes it's interesting to explain some code (How many time you spend trying to figure out a regex pattern when you see one?), but, in 99% of the time, comments could be avoided.

      Generally try to avoid (avoid != forbid) comments.

      Comments:

      1. Become outdated, confusing your future self (Yesterday I lost some precious time because of an outdated comment)
      2. Could be replaced for some better named variable/function/class.
      3. They pollute the code unnecessarily.
    2. When we talk about abstraction levels, we can classify the code in 3 levels: high: getAdress medium: inactiveUsers = Users.findInactives low: .split(" ")

      3 abstraction levels:

      1. high: getAdress
      2. medium: inactiveUsers = Users.findInactives
      3. low: .split(" ")

      Explanation:

      • The high level abstraction are functions that you create, like searchForsomething()
      • The medium level are methods in your object, like account.unverifyAccount
      • The low level are methods that the language provides, like map, to_downncase and so on
    3. There is another maxim also that says: you must write the same code a maximum of 3 times. The third time you should consider refactoring and reducing duplication

      Avoid repeating the same code over and over

    4. Uncle Bob, in clean code, defends that the best order to write code is: Write unit tests. Create code that works. Refactor to clean the code.

      Best order to write code (according to Uncle Bob):

      1. Write unit tests.
      2. Create code that works.
      3. Refactor to clean the code.
    5. naming things, write better functions and a little about comments. Next, I intend to talk about formatting, objects and data structures, how to handle with errors, about boundaries (how to deal with another's one code), unit testing and how to organize your class better. I know that it'll be missing an important topic about code smells

      Ideas to consider while developing clean code:

      • naming things
      • better functions
      • comments
      • formatting
      • objects and data structures
      • handling error
      • boundaries (handling another's one code)
      • unit testing
      • organising classes
      • code smells
    1. "The Big Picture" is one of those things that people say a whole lot but can mean so many different things. Going through all of these articles, they tend to mean any (or all) of these things

      Thinking about The Big Picture:

      1. The Business Stuff - how to meet KPIs or the current big deadline or whatever.
      2. The User Stuff - how to actually provide value to the people who use what you make.
      3. The Technology Stuff - how to build something that will last a long time.
    1. In recent years we’ve also begun to see increasing interest in exploratory testing as an important part of the agile toolbox

      Waterfall software development ---> agile ---> exploratory testing

    2. Mathematica didn’t really help me build anything useful, because I couldn’t distribute my code or applications to colleagues (unless they spent thousands of dollars for a Mathematica license to use it), and I couldn’t easily create web applications for people to access from the browser. In addition, I found my Mathematica code would often end up much slower and more memory hungry than code I wrote in other languages.

      Disadvantages of Mathematica:

      • memory hungry, slow code
      • expensive code
      • non-distributable license
    3. a methodology that combines a programming language with a documentation language, thereby making programs more robust, more portable, more easily maintained, and arguably more fun to write than programs that are written only in a high-level language. The main idea is to treat a program as a piece of literature, addressed to human beings rather than to a computer.

      Exploratory testing described by Donald Knuth

    1. First, you’ve spread the logic across a variety of different systems, so it becomes more difficult to reason about the application as a whole. Second, more importantly, the logic has been implemented as configuration as opposed to code. The logic is constrained by the ability of the applications which have been wired together, but it’s still there.

      Why "no code" trend is dangerous in some way (on the example of Zapier):

      1. You spread the logic across multiple systems.
      2. Logic is maintained in configuration rather than code.
    1. Summary

      In doing a code review, you should make sure that:

      • The code is well-designed.
      • The functionality is good for the users of the code.
      • Any UI changes are sensible and look good.
      • Any parallel programming is done safely.
      • The code isn’t more complex than it needs to be.
      • The developer isn’t implementing things they might need in the future but don’t know they need now.
      • Code has appropriate unit tests.
      • Tests are well-designed.
      • The developer used clear names for everything.
      • Comments are clear and useful, and mostly explain why instead of what.
      • Code is appropriately documented (generally in g3doc).
      • The code conforms to our style guides.
    1. "Continuous Delivery is the ability to get changes of all types — including new features, configuration changes, bug fixes, and experiments — into production, or into the hands of users, safely and quickly in a sustainable way". -- Jez Humble and Dave Farley

      Continuous Delivery

    2. Another approach is to use a tool like H2O to export the model as a POJO in a JAR Java library, which you can then add as a dependency in your application. The benefit of this approach is that you can train the models in a language familiar to Data Scientists, such as Python or R, and export the model as a compiled binary that runs in a different target environment (JVM), which can be faster at inference time

      H2O - export models trained in Python/R as a POJO in JAR

    3. Continuous Delivery for Machine Learning (CD4ML) is a software engineering approach in which a cross-functional team produces machine learning applications based on code, data, and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles.

      Continuous Delivery for Machine Learning (CD4ML) (long definition)

      Basic principles:

      • software engineering approach
      • cross-functional team
      • producing software based on code, data, and ml models
      • small and safe increments
      • reproducible and reliable software release
      • short adaptation cycles
    4. In order to formalise the model training process in code, we used an open source tool called DVC (Data Science Version Control). It provides similar semantics to Git, but also solves a few ML-specific problems:

      DVC - transform model training process into code.

      Advantages:

      • it has multiple backend plugins to fetch and store large files on an external storage outside of the source control repository;
      • it can keep track of those files' versions, allowing us to retrain our models when the data changes;
      • it keeps track of the dependency graph and commands used to execute the ML pipeline, allowing the process to be reproduced in other environments;
      • it can integrate with Git branches to allow multiple experiments to co-exist
    1. Class method docstrings should contain the following: A brief description of what the method is and what it’s used for Any arguments (both required and optional) that are passed including keyword arguments Label any arguments that are considered optional or have a default value Any side effects that occur when executing the method Any exceptions that are raised Any restrictions on when the method can be called

      Class method should contain:

      • brief description
      • arguments
      • label on default/optional arguments
      • side effects description
      • raised exceptions
      • restrictions on when the method can be called

      (check example below)

    2. Comments to your code should be kept brief and focused. Avoid using long comments when possible. Additionally, you should use the following four essential rules as suggested by Jeff Atwood:

      Comments should be as concise as possible. Moreover, you should follow 4 rules of Jeff Atwood:

      1. Keep comments close to the code being described.
      2. Don't use complex formatting (such as tables).
      3. Don't comment obvious things.
      4. Design code in a way it comments itself.
    3. From examining the type hinting, you can immediately tell that the function expects the input name to be of a type str, or string. You can also tell that the expected output of the function will be of a type str, or string, as well.

      Type hinting introduced in Python 3.5 extends 4 rules of Jeff Atwood and comments the code itself, such as this example:

      def hello_name(name: str) -> str:
          return(f"Hello {name}")
      
      • user knows that the code expects input of type str
      • the same about output
    1. Each format makes tradeoffs in encoding, flexibility, and expressiveness to best suit a specific use case.

      Each data format brings different tradeoffs:

      • A format optimized for size will use a binary encoding that won’t be human-readable.
      • A format optimized for extensibility will take longer to decode than a format designed for a narrow use case.
      • A format designed for flat data (like CSV) will struggle to represent nested data.
    1. Computers can only natively store integers, so they need some way of representing decimal numbers. This representation comes with some degree of inaccuracy. That's why, more often than not, .1 + .2 != .3

      Computers make up their way to store decimal numbers

    1. A package management ecosystem is essential for programming languages now. People simply don’t want to go through the hassle of finding, downloading and installing libraries anymore. 20 years ago we used to visit web sites, downloaded zip files, copied them to correct locations, added them to the paths in the build configuration and prayed that they worked.

      How library management changed in 20 years

    2. IDEs and the programming languages are getting more and more distant from each other. 20 years ago an IDE was specifically developed for a single language, like Eclipse for Java, Visual Basic, Delphi for Pascal etc. Now, we have text editors like VS Code that can support any programming language with IDE like features.

      How IDEs "unified" in comparison to the last 20 years

    3. Your project has no business value today unless it includes blockchain and AI, although a centralized and rule-based version would be much faster and more efficient.

      Comparing current project needs to those 20 years ago

    4. Being a software development team now involves all team members performing a mysterious ritual of standing up together for 15 minutes in the morning and drawing occult symbols with post-its.

      In comparison to 20 years ago ;)

    1. I am not sure how but one kind soul somehow found the project, forked it, refactored it, "modernized" it, added linting, code sniffing, added CI and opened the pull request.

      It's worth sharing your code, since someone can always find it and improve it, so that you can learn from it

    1. Let's reason through our memoizer before we write any code.

      Operations performed by a memoizer:

      • Takes a reference to a function as an input
      • Returns a function (so it can be used as it normally would be)
      • Creates a cache of some sort to hold the results of previous function calls
      • Any future time calling the function, returns a cached result if it exists
      • If the cached value doesn't exist, calls the function and store that result in the cache

      Which is written as:

      // Takes a reference to a function
      const memoize = func => {
        // Creates a cache of results
        const results = {};
        // Returns a function
        return (...args) => {
          // Create a key for results cache
          const argsKey = JSON.stringify(args);
          // Only execute func if no cached value
          if (!results[argsKey]) {
            // Store function call result in cache
            results[argsKey] = func(...args);
          }
          // Return cached value
          return results[argsKey];
        };
      };