Variables. Multiple lines. No more escaped quotation marks. Comments.
What are Makefile's advantages over package.json scripts
Variables. Multiple lines. No more escaped quotation marks. Comments.
What are Makefile's advantages over package.json scripts
People might think I’m not experiencing new things, but I think the secret to a good life is to enjoy your work. I could never stay indoors and watch TV. I hear London is a place best avoided. I think living in a city would be terrible – people living on top of one another in great tower blocks. I could never do it. Walking around the farm fills me with wonder. What makes my life is working outside, only going in if the weather is very bad.
How farmers perceive happiness in life
pronunciation cues in an individual’s speech communicate their social status more accurately than the content of their speech.
reciting seven random words is sufficient to allow people to discern the speaker’s social class with above-chance accuracy.
Cookiecutter takes a source directory tree and copies it into your new project. It replaces all the names that it finds surrounded by templating tags {{ and }} with names that it finds in the file cookiecutter.json. That’s basically it. [1]
The main idea behind cookiecutter
In short, MLflow makes it far easier to promote models to API endpoints on various cloud vendors compared to Kubeflow, which can do this but only with more development effort.
MLflow seems to be much easier
Bon Appétit?
Quick comparison of MLflow and Kubeflow (check below the annotation)
MLflow is a single python package that covers some key steps in model management. Kubeflow is a combination of open-source libraries that depends on a Kubernetes cluster to provide a computing environment for ML model development and production tools.
Brief comparison of MLflow and Kubeflow
I decided to embrace imperfection and use it as the unifying theme of the album.
You might find a lot of imperfection on the way, but instead of fighting with it, maybe try to embrace it?
I start by sampling loops from songs I like, and then use the concatenative synthesis and source separations tools to extract interesting musical ideas (e.g., melodies, percussion, ambiance). These results can be used directly, with a little FX (Replica XT, Serum FX), or translated into MIDI and resynthesized (Serum, VPS Avenger). This gives me the building blocks of the song (the tracks), each a few bars long, which I carefully combine using Ableton Live's session view into something that sounds "good". Later, each track is expanded by sampling more loops from the original songs until I have plenty of content to create something with enough structure to be called a "song".
Author's music production process:
Another extremely interesting and underrated task in the audio domain is translating audio into MIDI notes. Melodyne is state of the art in the area, but Ableton's Convert Audio to Midi works really well too.
For converting audio to midi, try Melodyne or Ableton's Convert Audio to Midi
imperfect separations are sometimes the most interesting, especially the results from models trained to separate something that isn't in the source sound (e.g., extract vocals from an instrumental track)
Right, it might produce some unique sounds
Sound Source Separation
It seems like Spleeter and Open Unmix perform equally well in Sound Source Separation
concatenative sound synthesis. The main idea is simple: given a database of sound snippets, concatenate or mix these sounds in order to create new and original ones.
Concatenative sound synthesis
Calibri has been the default font for all things Microsoft since 2007, when it stepped in to replace Times New Roman across Microsoft Office. It has served us all well, but we believe it’s time to evolve. To help us set a new direction, we’ve commissioned five original, custom fonts to eventually replace Calibri as the default.
Microsoft will be moving away from Calibri as a default font
To summarize, implementing ML in a production environment doesn't only mean deploying your model as an API for prediction. Rather, it means deploying an ML pipeline that can automate the retraining and deployment of new models. Setting up a CI/CD system enables you to automatically test and deploy new pipeline implementations. This system lets you cope with rapid changes in your data and business environment. You don't have to immediately move all of your processes from one level to another. You can gradually implement these practices to help improve the automation of your ML system development and production.
The ideal state of MLOps in a project (2nd level)
🐛 (home, components): Resolve issue with modal collapses 🚚 (home): Move icons folder ✨ (newsletter): Add Newsletter component
With gitmojis, we can replace the <type> part of a git commit
feat(home, components): Add login modal fix(home, components): Resolve issue with modal collapses chore(home): Move icons folder
Examples of readable commits in the format:
<type> [scope]: "Message"
When audio quality is high (vs low), people judge the content as better and more important. They also judge the speaker as more intelligent, competent, and likable.In an experiment, people rated a physicist’s talk at a scientific conference as 19.3% better when they listened to it in high quality audio vs slightly distorted, echo-prone audio.
High quality audio makes you sound smarter:

Just write your markdown as text/plain, email clients are broken.
E-mail clients still do not render Markdown in a convenient format
Bring up the board that shows the status of every item the team is working on.Starting in the right-most column, get an update for each ticket in the column.Be sure to ask: “What’s needed to move this ticket to the next stage of progress?”Highlight blockers that people bring up and define what’s needed to unblock work.Move to the next column to the left and get updates for each ticket in that column.Continue until you get to the left-most column.
Format of "walk the board" daily standup
More people brings more status updates.More updates mean more information that others won’t care about.The team may be working on multiple projects at once.If the team has customers, ad-hoc work will regularly come up.
Problems of dailies when more people show up
I start thinking about my update to prove I should keep my jobPeople zone out when a teammate starts talking about how they worked on something that doesn’t affect me.It’s finally my turn to give an update. No one is listening except for the facilitator.We go over our time and end when the next team starts lurking outside the room.
That's how a typical daily usually looks like
What did I complete yesterday that contributed to the team?What do I plan to complete today to contribute to the team?What impediments do I see that prevent me or the team from achieving its goals?
3 questions to be answered during a daily meeting
On the median case, Colab is going to assign users a K80, and the GTX 1080 is around double the speed, which does not stack up particularly well for Colab. However, on occasion, when a P100 is assigned, the P100 is an absolute killer GPU (again, for FREE).
Some of the GPUs from Google Colab are outstanding.

With Spark 3.1, the Spark-on-Kubernetes project is now considered Generally Available and Production-Ready.
With Spark 3.1 k8s becomes the right option to replace YARN
The key libraries of TFX are as follows
TensorFlow Extend (TFX) = TFDV + TFT + TF Estmators and Keras + TFMA + TFServing
The console is a killer SQLite feature for data analysis: more powerful than Excel and more simple than pandas. One can import CSV data with a single command, the table is created automatically
SQLite makes it fairly easy to import and analyse data. For example:
import --csv city.csv cityselect count(*) from city;This is not a problem if your DBMS supports SQL recursion: lots of data can be generated with a single query. The WITH RECURSIVE clause comes to the rescue.
WITH RECURSIVE can help you quickly generate a series of random data.
Interrupt if: T3F > RTW (IO + T3E)
Formula for interruption: T3F = A1 RTW = A4 + A5 IO = A3 T3E = A2
A1 > (A4 + A5)(A3 + A2)

When you seek advice, first write down everything you’ve tried.
If you’re stuck for over an hour, seek help.
Rule of thumb for when to seek help
Simple … a single Linode VPS.
You might not need all the Kubernetes clusters and run well on a single Linode VPS.
Twitter thread: https://twitter.com/levelsio/status/1101581928489078784
When peeking at your brain may help with mental illness
yyyy-mm-dd hh:mm:ss
ISO 8601 date format. The only right date format
We use Prometheus to collect time-series metrics and Grafana for graphs, dashboards, and alerts.
How Prometheus and Grafana can be used to collect information from running ML on K8s
large machine learning job spans many nodes and runs most efficiently when it has access to all of the hardware resources on each node. This allows GPUs to cross-communicate directly using NVLink, or GPUs to directly communicate with the NIC using GPUDirect. So for many of our workloads, a single pod occupies the entire node.
The way OpenAI runs large ML jobs on K8s
We use Kubernetes mainly as a batch scheduling system and rely on our autoscaler to dynamically scale up and down our cluster — this lets us significantly reduce costs for idle nodes, while still providing low latency while iterating rapidly.
For high availability, we always have at least 2 masters, and set the --apiserver-count flag to the number of apiservers we’re running (otherwise Prometheus monitoring can get confused between instances).
Tip for high availability:
--apiserver-count flag to the number of running apiservers We’ve increased the max etcd size with the --quota-backend-bytes flag, and the autoscaler now has a sanity check not to take action if it would terminate more than 50% of the cluster.
If we've more than 1k nodes, etcd's hard storage limit might stop accepting writes
Another helpful tweak was storing Kubernetes Events in a separate etcd cluster, so that spikes in Event creation wouldn’t affect performance of the main etcd instances.
Another trick apart from tweaking default settings of Fluentd & Datadog
The root cause: the default setting for Fluentd’s and Datadog’s monitoring processes was to query the apiservers from every node in the cluster (for example, this issue which is now fixed). We simply changed these processes to be less aggressive with their polling, and load on the apiservers became stable again:
Default settings of Fluentd and Datadog might not be suited for running many nodes
We then moved the etcd directory for each node to the local temp disk, which is an SSD connected directly to the instance rather than a network-attached one. Switching to the local disk brought write latency to 200us, and etcd became healthy!
One of the solutions for etcd using only about 10% of the available IOPS. It was working till about 1k nodes
When should you end a conversation? Probably sooner than you think
Why we’re so bad at daydreaming, and how to fix it
The newest generation of image codecs—in particular AVIF and JPEG XL—are a major improvement of the old JPEG and PNG codecs. To be sure, JPEG 2000 and WebP also compress more effectively and offer more features, yet the overall gain is not significant and consistent enough to warrant fast and widespread adoption. AVIF and JPEG XL will do much better—at least that’s what I hope.
Comparison of image codecs (JPEG, PNG, JPEG 2000, WebP, HEIC, AVIF, JPEG XL)

Scientists break through the wall of sleep to the untapped world of dreams
'Night owls' may be twice as likely as morning 'larks' to underperform at work
Consider the amount of data and the speed of the data, if low latency is your priority use Akka Streams, if you have huge amounts of data use Spark, Flink or GCP DataFlow.
For low latency = Akka Streams
For huge amounts of data = Spark, Flink or GCP DataFlow
As we mentioned before, the majority of machine learning implementations are based on running model serving as a REST service, which might not be appropriate for the high volume data processing or usage of the streaming system, which requires re coding/starting systems for model update, for example, TensorFlow or Flink. Model as Data is a great fit for big data pipelines. For online inference, it is quite easy to implement, you can store the model anywhere (S3, HDFS…), read it into memory and call it.
Model as Data <--- more appropriate approach than REST service for serving big data pipelines
The most common way to deploy a trained model is to save into the binary format of the tool of your choice, wrap it in a microservice (for example a Python Flask application) and use it for inference.
Model as Code <--- the most common way of deploying ML models
When we are providing our API endpoint to frontend team we need to ensure that we don’t overwhelm them with preprocessing technicalities.We might not always have a Python backend server (eg. Node.js server) so using numpy and keras libraries, for preprocessing, might be a pain.If we are planning to serve multiple models then we will have to create multiple TensorFlow Serving servers and will have to add new URLs to our frontend code. But our Flask server would keep the domain URL same and we only need to add a new route (a function).Providing subscription-based access, exception handling and other tasks can be carried out in the Flask app.
4 reasons why we might need Flask apart from TensorFlow serving

Next, imagine you have more models to deploy. You have three optionsLoad the models into the existing cluster — having one cluster serve all models.Spin up a new cluster to serve each model — having multiple clusters, one cluster serves one model.Combination of 1 and 2 — having multiple clusters, one cluster serves a few models.The first option would not scale, because it’s just not possible to load all models into one cluster as the cluster has limited resources.The second option will definitely work but it doesn’t sound like an effective process, as you need to create a set of resources every time you have a new model to deploy. Additionally, how do you optimize the usage of resources, e.g., there might be unutilized resources in your clusters that could potentially be shared by the rest.The third option looks promising, you can manually choose the cluster to deploy each of your new models into so that all the clusters’ resource utilization is optimal. The problem is you have to manuallymanage it. Managing 100 models using 25 clusters can be a challenging task. Furthermore, running multiple models in a cluster can also cause a problem as different models usually have different resource utilization patterns and can interfere with each other. For example, one model might use up all the CPU and the other model won’t be able to serve anymore.Wouldn’t it be better if we had a system that automatically orchestrates model deployments based on resource utilization patterns and prevents them from interfering with each other? Fortunately, that is exactly what Kubernetes is meant to do!
Solution for deploying lots of ML models
If you’re running lots of deployments of models then it becomes important to record which versions were deployed and when. This is needed to be able to go back to specific versions. Model registries help with this problem by providing ways to store and version models.
Model Registries <--- way to handle multiple ML models in production
Here is a quick recap table of every technology we discussed in this blog post.
Quick comparison of Python web scraping tools (socket, urlib3, requests, scrapy, selenium) [below this highlight]
Can the Brain Resist the Group Opinion?
The benefits of applying GitOps best practices are far reaching and provide:
The 6 provided benefits also explain GitOps in simple terms
GitOps is a way to do Kubernetes cluster management and application delivery. It works by using Git as a single source of truth for declarative infrastructure and applications. With GitOps, the use of software agents can alert on any divergence between Git with what's running in a cluster, and if there's a difference, Kubernetes reconcilers automatically update or rollback the cluster depending on the case. With Git at the center of your delivery pipelines, developers use familiar tools to make pull requests to accelerate and simplify both application deployments and operations tasks to Kubernetes.
Other definition of GitOps (source):
GitOps is a way of implementing Continuous Deployment for cloud native applications. It focuses on a developer-centric experience when operating infrastructure, by using tools developers are already familiar with, including Git and Continuous Deployment tools.
It was actually very common to have one Dockerfile to use for development (which contained everything needed to build your application), and a slimmed-down one to use for production, which only contained your application and exactly what was needed to run it. This has been referred to as the “builder pattern”.
Builder pattern - maintaining two Dockerfiles: 1st for development, 2nd for production. It's not an ideal solution and we shall aim for multi-stage builds.
Multi-stage build - uses multiple FROM commands in the same Dockerfile. The end result is the same tiny production image as before, with a significant reduction in complexity. You don’t need to create any intermediate images and you don’t need to extract any artifacts to your local system at all
volumes are often a better choice than persisting data in a container’s writable layer, because a volume does not increase the size of the containers using it, and the volume’s contents exist outside the lifecycle of a given container.
Aim for using volumes instead of bind mounts in Docker. Also, if your container generates non-persistent data, consider using a tmpfs mount to avoid storing the data permanently

One case where it is appropriate to use bind mounts is during development, when you may want to mount your source directory or a binary you just built into your container. For production, use a volume instead, mounting it into the same location as you mounted a bind mount during development.
after 4 decades, we can observe that not much has changed beside learning how to measure the “figuring out” time.
Comparing 1979 to 2018 results, we spend nearly the same amount of time for maintenance/comprehension of the code:

1979 in a book by Zelkowitz, Shaw, and Gannon entitled Principles of software engineering and design. It said that most of the development time was spent on maintenance (67%).
Where software developers spent most of their in 1979:

I can't recommend the Data Engineer career enough for junior developers. It's how I started and what I pursued for 6 years (and I would love doing it again), and I feel like it gave me such an incredible foundation for future roles :- Actually big data (so, not something you could grep...) will trigger your code in every possible way. You quickly learn that with trillions of input, the probabily to reach a bug is either 0% or 100%. In turn, you quickly learn to write good tests.- You will learn distributed processing at a macro level, which in turn enlighten your thinking at a micro level. For example, even though the order of magnitudes are different, hitting data over network versus on disk is very much like hitting data on disk versus in cache. Except that when the difference ends up being in hours or days, you become much more sensible to that, so it's good training for your thoughts.- Data engineering is full of product decisions. What's often called data "cleaning" is in fact one of the import product decisions made in a company, and a data engineer will be consistently exposed to his company product, which I think makes for great personal development- Data engineering is fascinating. In adtech for example, logs of where ads are displayed are an unfiltered window on the rest of humanity, for the better or the worse. But it definitely expands your views on what the "average" person actually does on its computer (spoiler : it's mainly watching porn...), and challenges quite a bit what you might think is "normal"- You'll be plumbing technologies from all over the web, which might or might not be good news for you.So yeah, data engineering is great ! It's not harder than other specialties for developers, but imo, it's one of the fun ones !
Many reasons why Data Engineer is a great starting position for junior developers
We recommend the Alpine image as it is tightly controlled and small in size (currently under 5 MB), while still being a full Linux distribution. This is fine advice for Go, but bad advice for Python, leading to slower builds, larger images, and obscure bugs.
Alipne Linux isn't the most convenient OS for Python, but fine for Go
If a service can run without privileges, use USER to change to a non-root user. This is excellent advice. Running as root exposes you to much larger security risks, e.g. a CVE in February 2019 that allowed escalating to root on the host was preventable by running as a non-root user. Insecure: However, the official documentation also says: … you should use the common, traditional port for your application. For example, an image containing the Apache web server would use EXPOSE 80. In order to listen on port 80 you need to run as root. You don’t want to run as root, and given pretty much every Docker runtime can map ports, binding to ports <1024 is completely unnecessary: you can always map port 80 in your external port. So don’t bind to ports <1024, and do run as a non-privileged user.
Due to security reasons, if you don't need the root privileges, bind to ports >=1024
Multi-stage builds allow you to drastically reduce the size of your final image, without struggling to reduce the number of intermediate layers and files. This is true, and for all but the simplest of builds you will benefit from using them. Bad: However, in the most likely image build scenario, naively using multi-stage builds also breaks caching. That means you’ll have much slower builds.
Multi-stage builds claim to reduce image size but it can also break caching
layer caching is great: it allows for faster builds and in some cases for smaller images. Insecure: However, it also means you won’t get security updates for system packages. So you also need to regularly rebuild the image from scratch.
Layer caching is great for speeding up the processes, but it can bring some security issues
Different data sources are better suited for different types of data transformations and provide access to different data quantities at different freshnesses
Comparison of data sources

MLOps platforms like Sagemaker and Kubeflow are heading in the right direction of helping companies productionize ML. They require a fairly significant upfront investment to set up, but once properly integrated, can empower data scientists to train, manage, and deploy ML models.
Two popular MLOps platforms: Sagemaker and Kubeflow
…Well, deploying ML is still slow and painful
How the typical ML production pipeline may look like:

Unfortunately, it ties hands of Data Scientists and takes a lot of time to experiment and eventually ship the results to production
Music gives the brain a crucial connective advantage
Downloading a pretrained model off the Tensorflow website on the Iris dataset probably is no longer enough to get that data science job. It’s clear, however, with the large number of ML engineer openings that companies often want a hybrid data practitioner: someone that can build and deploy models. Or said more succinctly, someone that can use Tensorflow but can also build it from source.
Who the industry really needs
When machine learning become hot 🔥 5-8 years ago, companies decided they need people that can make classifiers on data. But then frameworks like Tensorflow and PyTorch became really good, democratizing the ability to get started with deep learning and machine learning. This commoditized the data modelling skillset. Today, the bottleneck in helping companies get machine learning and modelling insights to production center on data problems.
Why Data Engineering became more important
Overall the consolidation made the differences even more pronounced! There are ~70% more open data engineer than data scientist positions. In addition, there are ~40% more open ML engineer than data scientist positions. There are also only ~30% as many ML scientist as data scientist positions.
Takeaway from the analysis:
Data scientist: Use various techniques in statistics and machine learning to process and analyse data. Often responsible for building models to probe what can be learned from some data source, though often at a prototype rather than production level. Data engineer: Develops a robust and scalable set of data processing tools/platforms. Must be comfortable with SQL/NoSQL database wrangling and building/maintaining ETL pipelines. Machine Learning (ML) Engineer: Often responsible for both training models and productionizing them. Requires familiarity with some high-level ML framework and also must be comfortable building scalable training, inference, and deployment pipelines for models. Machine Learning (ML) Scientist: Works on cutting-edge research. Typically responsible for exploring new ideas that can be published at academic conferences. Often only needs to prototype new state-of-the-art models before handing off to ML engineers for productionization.
4 different data profiles (and more):
consolidated:

I scraped the homepage URLs of every YC company since 2012, producing an initial pool of ~1400 companies. Why stop at 2012? Well, 2012 was the year that AlexNet won the ImageNet competition, effectively kickstarting the machine learning and data-modelling wave we are now living through. It’s fair to say that this birthed some of the earliest generations of data-first companies. From this initial pool, I performed keyword filtering to reduce the number of relevant companies I would have to look through. In particular, I only considered companies whose websites included at least one of the following terms: AI, CV, NLP, natural language processing, computer vision, artificial intelligence, machine, ML, data. I also disregarded companies whose website links were broken.
How data was collected
There are 70% more open roles at companies in data engineering as compared to data science. As we train the next generation of data and machine learning practitioners, let’s place more emphasis on engineering skills.
The resulting 70% is based on a real analysis
When you think about it, a data scientist can be responsible for any subset of the following: machine learning modelling, visualization, data cleaning and processing (i.e. SQL wrangling), engineering, and production deployment.
What tasks can Data Scientist be responsible for
A big reason for GPUs popularity in Machine Learning is that they are essentially fast matrix multiplication machines. Deep Learning is essentially matrix multiplication.
Deep Learning is mostly about matrix multiplication
Unity ML agents is a way for you to turn a video game into a Reinforcement Learning environment.
Unity ML agents is a great way to practice RNN
Haskell is the best functional programming language in the world and Neural Networks are functionsThis is the main motivation behind Hasktorch which lets you discover new kinds of Neural Network architectures by combining functional operators
Haskell is a great solution for neural networks
I can’t think of a single large company where the NLP team hasn’t experimented with HuggingFace. They add new Transformer models within days of the papers being published, they maintain tokenizers, datasets, data loaders, NLP apps. HuggingFace has created multiple layers of platforms that each could be a compelling company in its own right.
HuggingFace company
Keras is a user centric library whereas Tensorflow especially Tensorflow 1.0 is a machine centric library. ML researchers think in terms of terms of layers, automatic differentiation engines think in terms of computational graphs.As far as I’m concerned my time is more valuable than the cycles of a machine so I’d rather use something like Keras.
Why simplicity of Keras is important
A matrix is a linear map but linear maps are far more intuitive to think about than matrices
Graduate Student Descent
:)
BERT engineer is now a full time job. Qualifications include:Some bash scriptingDeep knowledge of pip (starting a new environment is the suckier version of practicing scales)Waiting for new HuggingFace models to be releasedWatching Yannic Kilcher’s new Transformer paper the day it comes outRepeating what Yannic said at your team reading group
Structure of a BERT engineer job
“Useful” Machine Learning research on all datasets has essentially reduced to making Transformers faster, smaller and scale to longer sequence lengths.
Typical type of advancement we see in ML
The best people in empirical fields are typically those who have accumulated the biggest set of experiences and there’s essentially two ways to do this.Spend lots of time doing itGet really good at running many concurrent experiments
How to derive with the best research
I often get asked by young students new to Machine Learning, what math do I need to know for Deep Learning and my answer is Matrix Multiplication and Derivatives of square functions.
Deep Neural Networks are a composition of matrix multiplications with the occasional non-linearity in between
Slack is the best digital watercooler in the world but it’s a terrible place to collaborate - long winded disagreements should happen over Zoom and collaborations should happen in a document instead.
What Slack is good for and for what it's not
Even if you really love meetings, you can only attend about 10h of them per day but a single well written document will continue being read even while you’re sleeping.It’s unlikely that Christianity would have garnered millions of followers if Jesus Christ had to get on a “quick call” with each new potential prospect.Oral cultures don’t scale.
Oral cultures don't scale
Friendship doesn’t scale easily to large groups and especially not forcibly. Friendships happen spontaneously at smaller scales.
That's why modern 20+ remote group lunches aren't making it easier to build friendships
Remote work forces a writing culture. In person work encourages an oral culture
He attributes his longevity to being an avid reader, his spirituality, diet, frequent exercise, steadfast persistence and discipline.
104-year-old PhD student advices to:
- To biznes, eksperci i obywatele są prawdziwymi twórcami polskiego ekosystemu AI. Państwo powinno przede wszystkim ich wspierać. W najbliższym czasie planujemy serię otwartych spotkań z każdą z tych grup, na których będziemy wspólnie pracować nad uszczegółowieniem – zapowiedział Antoni Rytel, wicedyrektor GovTech Polska. - Oprócz tego, specjalne zespoły będą zapewniać ciągłe wsparcie wszystkim tym podmiotom. Uruchomimy też kanał bieżącego zgłaszania pomysłów technicznych i organizacyjnych wspierających rozwój AI w naszym kraju – dodał.
The first steps of developing AI in Poland
W okresie krótkoterminowym decydujące dla sukcesu polityki sztucznej inteligencji będzie ochrona talentów posiadających zdolności modelowania wiedzy i analityki danych w systemach AI oraz wsparcie dla rozwoju własności intelektualnej wytwarzanej w naszym kraju – dodaje Robert Kroplewski, pełnomocnik ministra cyfryzacji ds. społeczeństwa informacyjnego.
AI talents will be even more demanded in Poland
Dokument określa działania i cele dla Polski w perspektywie krótkoterminowej (do 2023 r.), średnioterminowej (do 2027 r.) i długoterminowej (po 2027 r.). Podzieliliśmy je na sześć obszarów: AI i społeczeństwo – działania, które mają uczynić z Polski jednego z większych beneficjentów gospodarki opartej na danych, a z Polaków - społeczeństwo świadome konieczności ciągłego podnoszenia kompetencji cyfrowych. AI i innowacyjne firmy – wsparcie polskich przedsiębiorstw AI, m.in. tworzenie mechanizmów finansowania ich rozwoju, współpracy start up-ów z rządem. AI i nauka – wsparcie polskiego środowiska naukowego i badawczego w projektowaniu interdyscyplinarnych wyzwań lub rozwiązań w obszarze AI, m.in. działania mające na celu przygotowanie kadry ekspertów AI. AI i edukacja – działania podejmowane od kształcenia podstawowego, aż do poziomu uczelni wyższych – programy kursów dla osób zagrożonych utratą pracy na skutek rozwoju nowych technologii, granty edukacyjne. AI i współpraca międzynarodowa – działania na rzecz wsparcia polskiego biznesu w zakresie AI oraz rozwój technologii na arenie międzynarodowej. AI i sektor publiczny – wsparcie sektora publicznego w realizacji zamówień na rzecz AI, lepszej koordynacji działań oraz dalszym rozwoju takich programów jak GovTech Polska.
AI priorities in Poland
Rozwój AI w Polsce zwiększy dynamikę PKB o nawet 2,65 pp w każdym roku. Do 2030 r. pozwoli zautomatyzować ok. 49% czasu pracy w Polsce, generując jednocześnie lepiej płatne miejsca pracy w kluczowych sektorach.
Prediction of developing AI in Poland
Elon Musk declared that we are marching towards a near future without language, in which we’ll be able to communicate directly mind-to-mind through neural links. ‘Our brain spends a lot of effort compressing a complex concept into words,’ he said in a recent interview, ‘and there’s a lot of loss of information that occurs when compressing a complex concept into words.’
Elon Musk on the future of talking
Mute inner speech can appear as an inner dialogue as well, but its truncated form encourages us to create a ‘secret’ abbreviated language and deploy mental shortcuts. By forcing us to articulate ourselves more fully, self-talk summons up the image of an imagined listener or interrogator more vividly. In this way, it allows us to question ourselves more critically by adopting an external perspective on our ideas, and so to consider shortcomings in our arguments – all while using our own speech.
The essential difference between mute speach and self-talk
Contemporary theories in cognition and the science of learning reaffirm Kleist’s speculations, and show how self-talk contributes not only to motivation and emotional regulation, but also to some higher cognitive functions such as developing metacognition and reasoning.
Self-talk brings lots of benefits
It’s no coincidence that we walk when we need to think: evidence shows that movement enhances thinking and learning, and both are activated in the same centre of motor control in the brain. In the influential subfield of cognitive science concerned with ‘embodied’ cognition, one prominent claim is that actions themselves are constitutive of cognitive processes. That is, activities such as playing a musical instrument, writing, speaking or dancing don’t start in the brain and then emanate out to the body as actions; rather, they entail the mind and body working in concert as a creative, integrated whole, unfolding and influencing each other in turn. It’s therefore a significant problem that many of us are trapped in work and study environments that don’t allow us to activate these intuitive cognitive muscles, and indeed often even encourage us to avoid them.
Symbiosis of movement and thinking
Did you know that everything you can do in VBA can also be done in Python? The Excel Object Model is what you use when writing VBA, but the same API is available in Python as well.See Python as a VBA Replacement from the PyXLL documentation for details of how this is possible.
We can replace VBA with Python
You can write Excel worksheet functions in your Jupyter notebook too. This is a really great way of trying out ideas without leaving Excel to go to a Python IDE.
We can define functions in Python to later use in Excel
Use the magic function “%xl_get” to get the current Excel selection in Python. Have a table of data in Excel? Select the top left corner (or the whole range) and type “%xl_get” in your Jupyter notebook and voila! the Excel table is now a pandas DataFrame.
%xl_get lets us get the current Excel selection in Python
to run Python code in Excel you need the PyXLL add-in. The PyXLL add-in is what lets us integrate Python into Excel and use Python instead of VBA
PyXLL lets us use Python/Jupyter Notebooks in Excel
New Quantum Algorithms Finally Crack Nonlinear Equations
2020 is a strong year for NLP. Companies like Hugging Face 🤗 , spaCy, Rasa became stronger and also more educational which ultimately drove a huge NLP revolution (at even Industry-level which is quite hard usually).
Up till 2020, CV had the most attention. Nowadays, it's NLP.
Other hot topics of 2020:
Teen dislike of physical appearance strong predictor of depression in early adulthood
Summary of the article:
To the brain, reading computer code is not the same as reading language Neuroscientists find that interpreting code activates a general-purpose brain network, but not language-processing centers.
Summary of the article:
blue light has been found to have the strongest impact on your natural production of melatonin, filtering it out when you’re using devices at night can make it easier to fall asleep, according to a recent study published in the Journal of Applied Psychology. But even here, experts see a more mundane connection between devices and comfort: “Most of the negative impact of electronics on sleep and on your circadian clock is not due to the light. It’s due to the fact that these things are engineered to keep you awake and entertained and engaged in what you’re doing for an extended period of time,”
Finding it hard to sleep might be either because of the blue light or the psychological effect of screens on us
Reynolds encourages following the 20-20-20 rule: “Take a 20-second break to view something 20 feet away every 20 minutes.”
20-20-20 rule (screen-free break technique)
The American Academy of Ophthalmology asserts that dry eyes are more likely to be the result of decreased blinking. The AAO asserts that people blink 15 times per minute in normal circumstances (“normal” being relative, here) but only five to seven times per minute while staring at a screen. Solutions for preventing dry eyes can be as simple as reminding yourself to blink, or using artificial tear drops throughout your day as a preventive measure. It’s also possible that you’re blinking even less if your superclose to your screen.
The true reason of dry eyes: rare blinking
Once one has learned how to concentrate, the next thing to learn is to be able to concentrate on the right thing.
Importance of concentration on the right thing
Review your own code first Write a clear changelist description Automate the easy stuff Answer questions with the code itself Narrowly scope changes Separate functional and non-functional changes Break up large changelists Respond graciously to critiques Be patient when your reviewer is wrong Communicate your responses explicitly Artfully solicit missing information Award all ties to your reviewer Minimize lag between rounds of review
13 best practices for participating in a code review when you're the author
TFX and Tensorflow run anywhere Python runs, and that’s a lot of places
You can run your Tensorflow models on:
since TFX and Tensorflow were built by Google, it has first-class support in the Google Cloud Platform.
TFX and Tensorflow work well with GCP
After consideration, you decide to use Python as your programming language, Tensorflow for model building because you will be working with a large dataset that includes images, and Tensorflow Extended (TFX), an open-source tool released and used internally at Google, for building your pipelines.
Sample tech stack of a ML project:
These components has built-in support for ML modeling, training, serving, and even managing deployments to different targets.
Components of TFX:
Most data scientists feel that model deployment is a software engineering task and should be handled by software engineers because the required skills are more closely aligned with their day-to-day work. While this is somewhat true, data scientists who learn these skills will have an advantage, especially in lean organizations. Tools like TFX, Mlflow, Kubeflow can simplify the whole process of model deployment, and data scientists can (and should) quickly learn and use them.
As a Data Scientist, you shall think of practicing TFX, Mlflow or Kubeflow
TFX Component called TensorFlow Model Analysis (TFMA) allows you to easily evaluate new models against current ones before deployment.
TFMA component of TFX seems to be its core functionality
In general, for smaller businesses like startups, it is usually cheaper and better to use managed cloud services for your projects.
Advice for startups working with ML in production
This has its pros and cons and may depend on your use case as well. Some of the pros to consider when considering using managed cloud services are:
Pros of using cloud services:
The data is already in the cloud, so it may be better to build your ML system in the cloud. You’ll get better latency for I/O, easy scaling as data becomes larger (hundreds of gigabytes), and quick setup and configuration for any additional GPUs and TPUs.
If the data for your project is already in the cloud, try to stick to cloud solutions
ML projects are never static. This is part of engineering and design that must be considered from the start. Here you should answer questions like:
We need to prepare for these 2 things in ML projects:
The choice of framework is very important, as it can decide the continuity, maintenance, and use of a model. In this step, you must answer the following questions:
Questions to ask before choosing a particular tool/framework:
Try to compare the tools based on:
These questions are important as they will guide you on what frameworks or tools to use, how to approach your problem, and how to design your ML model.
Critical questions for ML projects:
you should not invest in an ML project if you have no plan to put it in production, except of course when doing pure research.
Tip #1 when applying ML in business
The difficulties in model deployment and management have given rise to a new, specialized role: the machine learning engineer. Machine learning engineers are closer to software engineers than typical data scientists, and as such, they are the ideal candidate to put models into production.
Why Machine Learning Engineer role exists
The goal of building a machine learning model is to solve a problem, and a machine learning model can only do so when it is in production and actively in use by consumers. As such, model deployment is as important as model building.
Model deployment is as important as model building
Venturebeat reports that 87% of data science projects never make it to production, while redapt claims it is 90%.
It was great for three things: motivation (people used what I built!)feedback (they sent me the bugs they encountered)accountability (from the moment I tweeted about this publicly, I really felt like that I’d have to finish this project… otherwise, it would be embarrassing)
Three positive things that might come after publishing your beta software: motivation, feedback and accountability
Turning my Jupyter-compatible Python code into Flask-compatible Python code took dozens of hours. I ran into small bugs and errors
That's how I always expected Python notebook development to be
I had a proof-of-concept. It was great, because when you do a project, there’s always that small uncertainty that your idea is stupid and it can’t be done at all. But when you have a working prototype, all that insecurity is gone and it becomes much easier to go on.
So true
The point is: the initial idea should be simple—because even that will turn out to be pretty complex.
Main point to consider when starting a new project
The problem with hobby projects is that they don’t pay very well. At least, compared to real-life projects. But hobby projects have a lot of cool advantages that we tend to forget.
With hobby projects:
The user has a starting stack: 10,000 units of play money.In each round, she gets a deal. Something like “You have an x% probability to win y amount of money.”She can decide how much of her play money she wants to risk.And the goal is to win as much as she can in 50 rounds.
Great idea for a 1st game dev project
$ git log --grep "commit message search"
git blame: searching commit messages themselves with grep
$ git log -S "deleted code" -p
git blame: way to find deleted piece of code
$ git log -S "some code" -p app/models/user.rb
git blame: limiting search to a specific folder
$ git log -G "REGEX HERE"
git blame: searching with a regular expression
Find the entire history of a snippet of code with git log -S Include -p to see the diff as well as the commit messages Include --reverse to see the commit that introduced the code listed first Scope search to specific folders or files by including a path Search with a regular expression using git log -G Search commit messages using git log --grep
Summary of git blame searching tips
If you include the -p option (short for --patch) you get the full diff alongside the commit messages
git log -S "method_name" -p <--- get the full dif with commit messages to have more context
Say you want to find the first commit that introduced a particular class, method or snippet of code. You can use the pickaxe combined with the --reverse option to get the commits in reverse-chronological order so the commit where the code first appears is listed at the top
git log -S "method_name" -p --reverse <--- get commits in reverse-chronological order
git log -S (also known as “the pickaxe”) is how you take things to the next level. It lets you search for all commits that contain a given string:
git log -S <--- search for commits with a given string
git blame is too coarse: it reports against the whole line. If the most recent change isn’t related to the part of the line you’re interested, you’re out of luck. git blame is too shallow: it only reports a single change; the most recent one. The story of the particular piece of code you’re interested in may have evolved over several commits. git blame is too narrow: it only considers the file you are running blame against. The code you are interested in may also appear in other files, but to get the relevant commits on those you’ll need to run blame several times.
Disadvantages of git blame command (it's like limiting ourselves to 2D view instead of 3D
BlackArch is a Linux distribution designed for penetration testing and security research. You can think of it like Kali Linux, with the exception of being based on Arch Linux. Its official repositories contain more than +2500 various penetration testing tools, and hence it can be considered a very good option for anyone wishing to explore this field and try to hack their own phones/routers/devices during the quarantine time.
BlackArch <--- kind of Kali Linux based on Arch Linux
Redstar OS is a Fedora-based Linux distribution that was developed to empower all computers running in North Korea. Researchers from various independent teams verified that the distribution contains huge spying mechanisms and logging capability, beside some modified kernel modules to record everything done on the OS.
Redstar OS <--- Linux distribution for North Korea
You can have more experiences in life by simply paying more attention. What you pay attention to is going to be your life. It's so basic that it hits me.
The point of attention
Everyone hates the feeling of having wasted their time doing nothing. But, even when I am doing many things in a day, when I am on autopilot, which is to say doing something for the sake of just completing them, I don't feel fulfilled. This makes total sense because the day wasn't filled with any intentionality. There was no mind.
Try to act with more intentions
Attention is like energy in that without it no work can be done, and in doing work it is dissipated. We create ourselves by how we invest this energy. Memories, thoughts, and feelings are all shaped by how we use it. And it is an energy under our control, to do with as we please; hence, attention is our most important tool in the task of improving the quality of experience.
Attention is really important
Automation suggests that a sysadmin has invented a system to cause a computer to do something that would normally have to be done manually. In automation, the sysadmin has already made most of the decisions on what needs to be done, and all the computer must do is execute a "recipe" of tasks. Orchestration suggests that a sysadmin has set up a system to do something on its own based on a set of rules, parameters, and observations. In orchestration, the sysadmin knows the desired end result but leaves it up to the computer to decide what to do.
Most intuitive difference between automation and orchestration
For instance, automation usually involves scripting, often in Bash or Python or similar, and it often suggests scheduling something to happen at either a precise time or upon a specific event. However, orchestration often begins with an application that's purpose-built for a set of tasks that may happen irregularly, on demand, or as a result of any number of trigger events, and the exact results may even depend on a variety of conditions.
Automation is like a subset of orchestration.
Orchestration suggest moving many parts, and automation usually refers to a singular task or a small number of strongly related tasks.
Start to parse the HTML Fetch external resources Parse the CSS and build the CSSOM Execute the JavaScript Merge DOM and CSSOM to construct the render tree Calculate layout and paint
6 steps of how browser renders a web page
If we’re honest, we’ll admit that we forget 95% of what we learn in a class. What sticks? A scattered analogy or diagram.
Analogy – ? Diagram – Google Images Example – Khan Academy for practice problems Plain-English – Forums like /r/math or Math Overflow Technical – Wikipedia or MathWorld
ADEPT learning method (Analogy, Diagram, Example, Plain-English, Technical)
We say i (lowercase) is 1.0 in the imaginary dimension Multiplying by i is a 90-degree counter-clockwise turn, to face “up” (here’s why). Multiplying by -i points us South It’s true that starting at 1.0 and taking 4 turns puts us at our starting point: And two turns points us negative: which simplifies to: so
Great explanation of why \(i=\sqrt-1\)
Imaginary numbers seem to point North, and we can get to them with a single clockwise turn. Oh! I guess they can point South too, by turning the other way. 4 turns gets us pointing in the positive direction again It seems like two turns points us backwards
Imaginary numbers explained in plain-english
Imaginary numbers let us rotate around the number line, not just move side-to-side.
Imaginary numbers
Another graph:

Study of patients attending a hospital-based obesity service shows no difference in weight loss between those under 60 years old and those from 60 to 78 years old The University of Warwick-led study conducted at University Hospitals Coventry and Warwickshire (UHCW) concludes that lifestyle changes to manage weight loss are effective in reducing obesity regardless of age Aims to dispel myths about effectiveness of weight loss in older people
Human ageing reversed in ‘Holy Grail’ study, scientists say
Comparison between pytest and unittes test frameworks
Detailed comparison table of pytest vs unittest modules (check below)
For example, suppose you want to search for the string "font-size: 52 px;" in your repository: $ git rev-list –all | xargs git grep -F ‘font-size: 52 px;’
Searching Git repo with grep
$ git show main:README.md Once you execute this command, you can view the content of the file in your terminal.
You can view the README.md file while being in any branch
Every repository has a .git folder. It is a special hidden folder. $ ls -a . … .git
Inside the hidden .git folder, you may find:
$ cat .git/HEAD
ref: refs/heads/master
$ cat .git/description
To optimize your repository further, you can use Git garbage collection. $ git gc --prune=now --aggressive This command helps when you or your team heavily uses pull or push commands.
You can optimise your repo not only with .gitignore file, but with Git garbage collection.
This command is an internal utility that cleans up unreachable or "orphaned" Git objects in your repository
Git, along with some Bash command piping, makes it easy to create a zip archive for your untracked files. $ git ls-files --others --exclude-standard -z |\ xargs -0 tar rvf ~/backup-untracked.zip
Taking backup of untracked files.
This command makes an archive (and excludes files listed in .gitignore) with the name backup-untracked.zip
$ git rev-list –count master 32
Example of counting the number of commits with
git rev-list --count
Many developers count the number of commits to judge when to increment the build number, for instance, or just to get a feel for how the project is progressing.
Why to count the number of commits
To avoid similar scenarios, enable Git autocorrection in your Git configuration: $ git config --global help.autocorrect 1
Applying autocorrection in Git.
When you type: git stats, instead of suggesting a command, it will run git status automatically
SpaceX każe sobie zapłacić jednorazowo 500 dolarów za modem i talerz oraz 100 dolarów miesięcznie za dostęp do sieci. Musk zapowiedział, że jeszcze w tym roku po ziemskiej orbicie będzie przemieszczało się blisko 1000 satelitów Starlink. Dzięki temu, sprawność sieci ma jeszcze wzrosnąć, a użytkownicy będą mogli liczyć na download z prędkością 200 Mb/s, upload z prędkością 50 Mb/s i ping na poziomie 20 ms. Starlink ma zawitać do Europy już w marcu przyszłego roku.
Cost of Starlink internet:
Speed of Starlink internet:

Today, we’re excited to open up a beta of a third approach to keeping web browsing safe with Cloudflare Browser Isolation. Browser sessions run in sandboxed environments in Cloudflare data centers in 200 cities around the world, bringing the remote browser milliseconds away from the user so it feels like local web browsing.
Cloudflare introduces sandboxed web browsing. It's like a browser inside a browser, so we can rest assured that we won't be infected by the websites we visit
Solving all those complex puzzles requires a huge amount of energy. So much energy that the two biggest blockchains in the world – bitcoin and Ethereum – are now using up the same amount of electricity as the whole of Austria. Carrying out a payment with Visa requires about 0.002 kilowatt-hours; the same payment with bitcoin uses up 906 kilowatt-hours
cryptocurrencies require A LOT of energy
The fact that no one is in charge and nothing can be modified also means that mistakes cannot be corrected. A bank can reverse a payment request. This is impossible for bitcoin and other cryptocurrencies. So anything that has been stolen will stay stolen. There is a continuous stream of hackers targeting bitcoin exchanges and users, and fraudsters launching investment vehicles that are in fact pyramid schemes.
3rd reason for blockchain not being widely implemented: it's not safe for a general public
Also, in a blockchain you aren’t anonymous, but “pseudonymous”: your identity is linked to a number, and if someone can link your name to that number, you’re screwed. Everything you got up to on that blockchain is visible to everyone.
2nd reason for blockchain not being widely implemented: users aren't fully anonymous
Firstly: the technology is at loggerheads with European privacy legislation, specifically the right to be forgotten. Once something is in the blockchain, it cannot be removed. For instance, hundreds of links to child abuse material and revenge porn were placed in the bitcoin blockchain by malicious users. It’s impossible to remove those.
1st reason for blockchain not being widely implemented: once something is in blockchain, it can't be removed
Out of over 86,000 blockchain projects that had been launched, 92% had been abandoned by the end of 2017, according to consultancy firm Deloitte.
Some sad statistics of blockchain
It’s very slow (sometimes a transaction takes nine minutes, sometimes nine days!), a lot of hassle (try it for yourself – cutting open hard plastic packaging with scissors is more user friendly), and very unstable (its price rose to €17,000 euros; dropped to €3,000; rose again to now €10,000).
Bitcoin transactions are slow & the currency is very unstable
Not only that, but the decentralised utopia that Nakamoto dreamed about, namely avoiding trusted third parties, is still far out of reach. Ironically, there are now three mining pools – a type of company that builds rooms full of servers in Alaska and other locations way up above the Arctic circle – which are responsible for more than half of all the new bitcoin (and also for checking payment requests).
Blockchain isn't also yet fully free from third-parties to check the payments
This is pretty inefficient. And it would be a lot less complicated if you trusted someone to manage your data (a bank, for instance). But that’s not what Satoshi Nakamoto did, which is what the inventor of bitcoin calls himself. He thought banks were bad news. They can make money disappear from your account. So he invented bitcoin. And bitcoin works, it exists, and according to the latest count, there are nearly 1,855 other bitcoin-like currencies out there.
Why bitcoin was created to replace banks
At its core, blockchain is a glorified spreadsheet (think: Excel with one table). In other words, a new way to store data. In traditional databases there’s usually one person who’s in charge, who decides who can access and input data, who can edit and remove it. That’s different in a blockchain. Nobody’s in charge, and you can’t change or delete anything, only view and input data.
Simple explanation of blockchain
Conclusions Draw your own. No? Fine. Good Enough: JSON? Protobuf Cap’n Proto Flatbuffers CBOR msgpack Avoid: YAML XML Thrift? BSON
Alternatives to JSON which are well readable by humans. In short, JSON is still fine and seems like it will be for a long time
There’s a story about an art teacher that split their class in half. They told one half of the students that they’d be graded based on a single piece of work, and the other half that they would be graded on the quantity of work produced. The half that was being graded on quantity ended up producing higher quality pieces. By iterating and learning from their mistakes they actually ended up producing better work than the students that only had to produce one piece. Quantity leads to quality. Share your work Sharing work helps you to think and develop. The feedback you get feeds into the next iteration.
Share your work as often as you can as quantity leads to quality
Statistical techniques: average, quantiles, probability distribution, association rulesSupervised ML algorithms: logistic regression, neural net, time-series analysisUnsupervised ML algorithms: Cluster analysis, Bayesian network, Peer group analysis, break point analysis, Benford’s law (law of anomalous numbers)
Typical techniques used in financial fraud classification
In machine learning, parlance fraud detection is generally treated as a supervised classification problem, where observations are classified as “fraud” or “non-fraud” based on the features in those observations. It is also an interesting problem in ML research due to imbalanced data — i.e. there’s a very few cases of frauds in an extremely large amount of transactions.
Financial fraud is generally solved as a supervised classification, but we've got the problem of imbalanced data
With ever-increasing online transactions and production of a large volume of customer data, machine learning has been increasingly seen as an effective tool to detect and counter frauds. However, there is no specific tool, the silver bullet, that works for all kinds of fraud detection problems in every single industry. The nature of the problem is different in every case and every industry. Therefore every solution is carefully tailored within the domain of each industry.
Machine learning in fraud detection
Facebook AI is introducing M2M-100, the first multilingual machine translation (MMT) model that can translate between any pair of 100 languages without relying on English data. It’s open sourced here. When translating, say, Chinese to French, most English-centric multilingual models train on Chinese to English and English to French, because English training data is the most widely available. Our model directly trains on Chinese to French data to better preserve meaning. It outperforms English-centric systems by 10 points on the widely used BLEU metric for evaluating machine translations. M2M-100 is trained on a total of 2,200 language directions — or 10x more than previous best, English-centric multilingual models. Deploying M2M-100 will improve the quality of translations for billions of people, especially those that speak low-resource languages. This milestone is a culmination of years of Facebook AI’s foundational work in machine translation. Today, we’re sharing details on how we built a more diverse MMT training data set and model for 100 languages. We’re also releasing the model, training, and evaluation setup to help other researchers reproduce and further advance multilingual models.
Summary of the 1st AI model from Facebook that translates directly between languages (not relying on English data)
For Decision Scientists, the business problem comes first. Analysis follows and is dependent on the question or business decision that needs to be made.
Decision Scientists
The number of hidden neurons should be between the size of the input layer and the size of the output layer. The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer. The number of hidden neurons should be less than twice the size of the input layer.
3 rules of thumb while choosing the number of hidden layers and neurons
Your weekly reminder that1. You should not have your development process on a centralized platform, at least not if you're doing anything that smells of copyright issues2. If you do host on a centralized platform, have regular, decentralized backups of code and issue tracking.3. Also, avoid US-centric hosting for this kind of thing. But really, refer to 1.
3 things to keep in mind while hosting your code
Junior school is great because you don’t learn to find a job. You’re at school to grow as a citizen, not to prepare to your future life, and actually you’re better not to think about your future if you don’t want to question yourself about the meaning of your existence. This is the same thing for side projects. See them as a way to discover a new topic and enlarge your horizon, not as something that you could action in your current position or to find a new job.
Comparing school to side projects
Even from a purely technical point of view, you will not really understand a library or a framework by just sticking to the tutorial or its basic features. You’ll have to dig into and to face its most painful aspects to deeply understand it.
Way to deeply learn technologies
Today, I recommend not hiding your side projects, but to not displaying them as a primary part of your identity, to not scare the recruiter and let him feel that you have a secondary plan.
Why not to talk too much about your personal projects
You can generally assume that a developer coming from a cutting-edge company has better chances to be a great developer than a developer coming from a Java 1.4 shop. He may not be smarter, but he has been hired by a company with a most demanding hiring process, and has been surrounded by some of the smartest developers.
First, most recruiters don’t care about your personal projects or how many meetups you went during the year. What matters the most is your current company - and by that I mean the name of your current company.
True in most cases
suite of over 30 separate packages of React visualization primitives that fall into several categories (Figure 2). It is un-opinionated on state management, animation, and styling so it can integrate with any React codebase, and its emphasis on modularity (similar to D3) lets you keep your bundle sizes down by only using the packages you need to create your reusable chart library or a custom one-off chart.
Short definition of visx
In our research (Figure 1), we found that existing React visualization libraries are often high-level abstractions and optimized for ease of use (i.e., fewer lines of code) at the expense of expressivity. None offer the expressivity of D3 primitives and many don’t allow for the optimization we want in production because computation, animations, state management, styles, and rendering are all encapsulated.
Comparison of data visualisation libraries:

because D3 and React both want to own DOM manipulation, we’ve found that it’s best to only use D3 for the math and React for the DOM because two mental models for updating the DOM opens the door for bugs to sneak in. However, using D3 solely for math means a significant amount of its (DOM-based) functionality is not available for use: selection.join, zoom, drag, brush, and transitions. Additionally, as mentioned above, D3 has its own learning curve and we would like developers to feel like they are writing native React code with standard APIs and familiar patterns.
You can use D3 inside a React app, but...
Leveraging React and its ecosystem would provide learnability and performance, and a low-level, modular API would deliver expressivity.
Thanks to React, visx achieved all: learnability, performance and expressivity
After 3 years of development, 2.5 years of production use at Airbnb, and a rewrite in TypeScript we are excited to announce the official 1.0 release of visx (formerly vx). You can find the project on GitHub and browse documentation and examples on airbnb.io.
visx - collection of reusable low-level visualization components. visx combines the power of d3 to generate your visualization with the benefits of react for updating the DOM