- Jan 2020
Most VQA models would use some kind of Recurrent Neural Network (RNN) to process the question input
- Most VQA will use RNN to process the question input
- Easier VQA datasets shall be fine with using BOW to transport vector input to a standard (feedforward) NN
Here’s a very simple example of how a VQA system might answer the question “what color is the triangle?”
- Look for shapes and colours using CNN.
- Understand the question type with NLP.
- Determine strength for each possible answer.
- Convert each answer strength to % probability
The standard approach to performing VQA looks something like this: Process the image. Process the question. Combine features from steps 1/2. Assign probabilities to each possible answer.
Approach to handle VQA problems:
Visual Question Answering (VQA): answering open-ended questions about images. VQA is interesting because it requires combining visual and language understanding.
Visual Question Answering (VQA) = visual + language understanding
- Nov 2019
20th Century Fox has been using this tool since the release of The Greatest Showman in 2017, and continues to use it to inform their latest releases
The Merlin Video tool is used nowadays by 20th Century Fox
After a movie’s release, we are able to process the data on which movies were previously seen by that audience. The table below shows the top 20 actual moviegoer audiences (Comp ACT) compared to the top 20 predicted audiences (Comp PRED)
Way of validating the Merlin model
The obvious choice was Cloud Machine Learning Engine (Cloud ML Engine), in conjunction with the TensorFlow deep learning framework
Merlin Video is powered by:
- Cloud Machine Learning Engine - automating infrastructure (resources, provisioning and monitoring)
- Cloud Dataflow and Data Studio - Dataflow generates reports in Data Studio
- BigQuery and BigQueryML - used in a final step to merge Merlin’s millions of customer predictions with other data sources to create useful reports and to quickly prototype media plans for marketing campaigns
model is trained end-to-end, and the loss of the logistic regression is back-propagated to all the trainable components (weights). Merlin’s data pipeline is refreshed weekly to account for new trailer releases
Way the model is trained and located in the pipeline
The trailer release for a new movie is a highly anticipated event that can help predict future success, so it behooves the business to ensure the trailer is hitting the right notes with moviegoers. To achieve this goal, the 20th Century Fox data science team partnered with Google’s Advanced Solutions Lab to create Merlin Video, a computer vision tool that learns dense representations of movie trailers to help predict a specific trailer’s future moviegoing audience
Merlin Video - computer vision tool to help predict a specific trailer's moviegoing audience
pipeline also includes a distance-based “collaborative filtering” (CF) model and a logistic regression layer that combines all the model outputs together to produce the movie attendance probability
other elements of pipeline:
- collaborative filtering (CF) model
- logistic regression layer
custom model learns the temporal sequencing of labels in the movie trailer
Temporal sequencing - times of different shots (e.g. long or short).
Temporal sequencing can convey information on:
- movie type
- movie plot
- roles of the main characters
- filmmakers' cinematographic choices.
When combined with historical customer data, sequencing analysis can be used to create predictions of customer behavior.
The elasticity of Cloud ML Engine allowed the data science team to iterate and test quickly, without compromising the integrity of the deep learning model
Cloud ML Engine reduced the deployment time from months to days
The first challenge is the temporal position of the labels in the trailer: it matters when the labels occur in the trailer. The second challenge is the high dimensionality of this data
2 challenges that we find in labelling video clips:
Merlin returns the following labels: facial_hair, beard, screenshot, chin, human, film
Types of features Merlin Video can generate from a single trailer frame.
Final result of feature collecting and ordering:
team began its analysis on YouTube 8M, a publicly available dataset of YouTube videos
YouTube 8M - public dataset of YouTube videos. With this, we can analyse video features like: *color
- many types of faces
- thousands of objects
- several landscapes
Architecture flow diagram for Merlin
When it comes to movies, analyzing text taken from a script is limiting because it only provides a skeleton of the story, without any of the additional dynamism that can entice an audience to see a movie
Analysing movie script isn't enough to predict the overall movie's attractiveness to the audience
We’ve always lived in a world which we didn’t completely understand but now we’re living in a world designed by us – for Pedro, that’s actually an improvement
We never really understood the surroundings, but now we have a great impact to modify it
But at the end of the day, what we know about neuroscience today is not enough to determine what we do in AI, it’s only enough to give us ideas. In fact it’s a two way street – AI can help us to learn how the brain works and this loop between the two disciplines is a very important one and is growing very rapidly
Neuroscience can help us understand AI and the opposite
There was this period of a couple of 100 years where we understood our technology. Now we just have to learn live in a world where we don’t understand the machines that work for us, we just have to be confident they are working for us and doing their best
Should we just accept the fact that machines will rule the world with a mysterious intelligence?
if you look at the number of connections that the state of the art machine learning systems for some of these problems have, they’re more than many animals – they have many hundreds of millions or billions of connections
State of the art ML systems are composed of millions or billions of connections (close to humans)
ability to learn from data e.g. OpenAI and the Rubik’s Cube and DeepMind with AlphaGo required the equivalent of thousands of years of gameplay to achieve those milestones
Even while making the perfect algorithm, we have to expect long hours of learning
Pedro believes that success will come from unifying the different major types of learning and their master algorithms –not just combining, but unifying them such that “it feels like using one thing”
Interesting point of view on designing the master algorithm
Pedro’s book “The Master Algorithm” takes readers on a journey through the five dominant paradigms of machine learning research on a quest for the master algorithm. Along the way, Pedro wanted to abstract away from the mechanics so that a broad audience, from the CXO to the consumer, can understand how machine learning is shaping our lives
"The Master Algorithm" book seems to be too abstract in such a case; however, it covers the following 5 paradigms:
- Rule based learning (Decision trees, Random Forests, etc)
- Connectivism (neural networks, etc)
- Bayesian (Naive Bayes, Bayesian Networks, Probabilistic Graphical Models)
- Analogy (KNN & SVMs)
- Unsupervised Learning (Clustering, dimensionality reduction, etc)
For the application of machine learning in finance, it’s still very early days. Some of the stuff people have been doing in finance for a long time is simple machine learning, and some people were using neural networks back in the 80s and 90s. But now we have a lot more data and a lot more computing power, so with our creativity in machine learning research, “We are so much in the beginning that we can’t even picture where we’re going to be 20 years from now”
We are just in time to apply modern ML techniques to financial industry
- Oct 2019
When you run a PyTorch/TensorFlow model, most of the work isn’t actually being done in the framework itself, but rather by third party kernels. These kernels are often provided by the hardware vendor, and consist of operator libraries that higher-level frameworks can take advantage of. These are things like MKLDNN (for CPU) or cuDNN (for Nvidia GPUs). Higher-level frameworks break their computational graphs into chunks, which can then call these computational libraries. These libraries represent thousands of man hours of effort, and are often optimized for the architecture and application to yield the best performance
What happens behind when you run ML frameworks
At their core, PyTorch and Tensorflow are auto-differentiation frameworks
auto-differentation = takes derivative of some function. It can be implemented in many ways so most ML frameworks choose "reverse-mode auto-differentation" (known as "backpropagation")
Jax is built by the same people who built the original Autograd, and features both forward- and reverse-mode auto-differentiation. This allows computation of higher order derivatives orders of magnitude faster than what PyTorch/TensorFlow can offer
the transition from TensorFlow 1.0 to 2.0 will be difficult and provides a natural point for companies to evaluate PyTorch
Chance of faster transition to PyTorch in industry
At the API level, TensorFlow eager mode is essentially identical to PyTorch’s eager mode, originally made popular by Chainer. This gives TensorFlow most of the advantages of PyTorch’s eager mode (ease of use, debuggability, and etc.) However, this also gives TensorFlow the same disadvantages. TensorFlow eager models can’t be exported to a non-Python environment, they can’t be optimized, they can’t run on mobile, etc. This puts TensorFlow in the same position as PyTorch, and they resolve it in essentially the same way - you can trace your code (tf.function) or reinterpret the Python code (Autograph).
Once your PyTorch model is in this IR, we gain all the benefits of graph mode. We can deploy PyTorch models in C++ without a Python dependency , or optimize it.
Tracing takes a function and an input, records the operations that were executed with that input, and constructs the IR. Although straightforward, tracing has its downsides. For example, it can’t capture control flow that didn’t execute. For example, it can’t capture the false block of a conditional if it executed the true block
Tracing mode in PyTorch
Script mode takes a function/class, reinterprets the Python code and directly outputs the TorchScript IR. This allows it to support arbitrary code, however it essentially needs to reinterpret Python
Script mode in PyTorch
The PyTorch JIT is an intermediate representation (IR) for PyTorch called TorchScript. TorchScript is the “graph” representation of PyTorch. You can turn a regular PyTorch model into TorchScript by using either tracing or script mode.
On the other hand, industry has a litany of restrictions/requirements
- no Python <--- overhead of the Python runtime might be too much to take
- mobile <--- Python can't be embedded in the mobile binary
- serving <--- no-downtime updates of models, switching between models seamlessly, etc.
Researchers care about how fast they can iterate on their research, which is typically on relatively small datasets (datasets that can fit on one machine) and run on <8 GPUs. This is not typically gated heavily by performance considerations, but by their ability to quickly implement new ideas. On the other hand, industry considers performance to be of the utmost priority. While 10% faster runtime means nothing to a researcher, that could directly translate to millions of savings for a company
Researchers value how fast they can implement tools on their research.
Industry considers value performance as it brings money.
TensorFlow came out years before PyTorch, and industry is slower to adopt new technologies than researchers
Reason why PyTorch wasn't previously more popular than TensorFlow
TensorFlow is still the dominant framework. For example, based on data   from 2018 to 2019, TensorFlow had 1541 new job listings vs. 1437 job listings for PyTorch on public job boards, 3230 new TensorFlow Medium articles vs. 1200 PyTorch, 13.7k new GitHub stars for TensorFlow vs 7.2k for PyTorch, etc
Nowadays, the numbers still play against PyTorch
TensorFlow will always have a captive audience within Google/DeepMind, but I wonder whether Google will eventually relax this
Generally, PyTorch will be much more favorised that maybe one day it will replace TensorFlow at Google's offices
Why do researchers love PyTorch?
- simplicity <--- pythonic like, easily integrates with its ecosystem
- great API <--- TensorFlow used to switch API many times
- performance <--- it's not so clear if it's faster than TensorFlow
In 2018, PyTorch was a minority. Now, it is an overwhelming majority, with 69% of CVPR using PyTorch, 75+% of both NAACL and ACL, and 50+% of ICLR and ICML
every major conference in 2019 has had a majority of papers implemented in PyTorch
- CVPR, ICCV, ECCV - computer vision conferences
- NAACL, ACL, EMNLP - NLP conferences
- ICML, ICLR, NeurIPS - general ML conferences
In 2019, the war for ML frameworks has two remaining main contenders: PyTorch and TensorFlow. My analysis suggests that researchers are abandoning TensorFlow and flocking to PyTorch in droves. Meanwhile in industry, Tensorflow is currently the platform of choice, but that may not be true for long
- in research: PyTorch > TensorFlow
- in industry: TensorFlow > PyTorch
From the early academic outputs Caffe and Theano to the massive industry-backed PyTorch and TensorFlow
It's not easy to track all the ML frameworks
Caffe, Theano ---> PyTorch, TensorFlow
- Sep 2019
Continuous Delivery for Machine Learning end-to-end process
We chose to use GoCD as our Continuous Delivery tool, as it was built with the concept of pipelines as a first-class concern
GoCD - open source Continuous Delivery tool
A deployment pipeline automates the process for getting software from version control into production, including all the stages, approvals, testing, and deployment to different environments
example of how to combine different test pyramids for data, model, and code in CD4ML
Combining tests for data (purple), model (green) and code (blue)
There are different types of testing that can be introduced in the ML workflow.
Automated tests for ML system:
- validating data
- validating component integration
- validating the model quality
- validating model bias and fairness
Another approach is to use a tool like H2O to export the model as a POJO in a JAR Java library, which you can then add as a dependency in your application. The benefit of this approach is that you can train the models in a language familiar to Data Scientists, such as Python or R, and export the model as a compiled binary that runs in a different target environment (JVM), which can be faster at inference time
H2O - export models trained in Python/R as a POJO in JAR
In order to formalise the model training process in code, we used an open source tool called DVC (Data Science Version Control). It provides similar semantics to Git, but also solves a few ML-specific problems:
DVC - transform model training process into code.
- it has multiple backend plugins to fetch and store large files on an external storage outside of the source control repository;
- it can keep track of those files' versions, allowing us to retrain our models when the data changes;
- it keeps track of the dependency graph and commands used to execute the ML pipeline, allowing the process to be reproduced in other environments;
- it can integrate with Git branches to allow multiple experiments to co-exist
Machine Learning pipeline for our Sales Forecasting problem, and the 3 steps to automate it with DVC
Sales Forecasting process
common functional silos in large organizations can create barriers, stifling the ability to automate the end-to-end process of deploying ML applications to production
Common ML process (leading to delays and frictions)
Continuous Delivery for Machine Learning (CD4ML) is a software engineering approach in which a cross-functional team produces machine learning applications based on code, data, and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles.
Continuous Delivery for Machine Learning (CD4ML) (long definition)
- software engineering approach
- cross-functional team
- producing software based on code, data, and ml models
- small and safe increments
- reproducible and reliable software release
- short adaptation cycles
Continuous Delivery for Machine Learning (CD4ML) is the discipline of bringing Continuous Delivery principles and practices to Machine Learning applications.
Continuous Delivery for Machine Learning (CD4ML)
- Oct 2018
Can Education Keep Up with Technology?
Workforce-readiness focused compendium of how edtech innovations align with preparing students for jobs.
- Aug 2017
introduce topic modeling to those not yet fully converted aware of its potential.
Is resistance futile?
They’re powerful, widely applicable, easy to use, and difficult to understand — a dangerous combination.
In writing the description of our reverse engineering work below, we deliberately avoid terms that are commonly used in Machine Learning, where labels are "true," "correct," or "gold standard." This linguistic distinction highlights the fundamentally different perspective that humanists have on classification as a tool. Our goal is not to create a system that mimics the decisions of a human annotator, but rather to better represent the porous boundaries between labels and identify the piles on which a story could have been placed over a century ago late on a cold wintry night in a dimly lit schoolhouse in eastern Jutland. We note the contrast between our use of computers to problematize existing distinctions and the common concern in the Humanities that computers deal only with binaries and black-and-white distinctions.
Valuable insight and eloquently phrased.
Our goal is not to treat existing classifications as "ground truth" labels and build machine learning tools to mimic them, but rather to use computation to better quantify the variability and uncertainty of those classifications.
Our goal with this classification method can be seen as the inverse of usual machine learning classifiers.
- May 2014