68 Matching Annotations
  1. Last 7 days
    1. Amazon Machine Learning Deprecated. Use SageMaker instead.

      Instead of Amazon Machine Learning use Amazon SageMaker

  2. Apr 2020
    1. Another approach is to use a tool like H2O to export the model as a POJO in a JAR Java library, which you can then add as a dependency in your application. The benefit of this approach is that you can train the models in a language familiar to Data Scientists, such as Python or R, and export the model as a compiled binary that runs in a different target environment (JVM), which can be faster at inference time

      H2O - export models trained in Python/R as a POJO in JAR

    2. Continuous Delivery for Machine Learning (CD4ML) is a software engineering approach in which a cross-functional team produces machine learning applications based on code, data, and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles.

      Continuous Delivery for Machine Learning (CD4ML) (long definition)

      Basic principles:

      • software engineering approach
      • cross-functional team
      • producing software based on code, data, and ml models
      • small and safe increments
      • reproducible and reliable software release
      • short adaptation cycles
    3. In order to formalise the model training process in code, we used an open source tool called DVC (Data Science Version Control). It provides similar semantics to Git, but also solves a few ML-specific problems:

      DVC - transform model training process into code.

      Advantages:

      • it has multiple backend plugins to fetch and store large files on an external storage outside of the source control repository;
      • it can keep track of those files' versions, allowing us to retrain our models when the data changes;
      • it keeps track of the dependency graph and commands used to execute the ML pipeline, allowing the process to be reproduced in other environments;
      • it can integrate with Git branches to allow multiple experiments to co-exist
    4. Machine Learning pipeline for our Sales Forecasting problem, and the 3 steps to automate it with DVC

      Sales Forecasting process Sales Forecasting process

    5. Continuous Delivery for Machine Learning end-to-end process

      end-to-end process

    6. common functional silos in large organizations can create barriers, stifling the ability to automate the end-to-end process of deploying ML applications to production

      Common ML process (leading to delays and frictions) ML process

    7. There are different types of testing that can be introduced in the ML workflow.

      Automated tests for ML system:

      • validating data
      • validating component integration
      • validating the model quality
      • validating model bias and fairness
    8. example of how to combine different test pyramids for data, model, and code in CD4ML

      Combining tests for data (purple), model (green) and code (blue) testing

    9. A deployment pipeline automates the process for getting software from version control into production, including all the stages, approvals, testing, and deployment to different environments

      Deployment pipeline

    10. We chose to use GoCD as our Continuous Delivery tool, as it was built with the concept of pipelines as a first-class concern

      GoCD - open source Continuous Delivery tool

    11. Continuous Delivery for Machine Learning (CD4ML) is the discipline of bringing Continuous Delivery principles and practices to Machine Learning applications.

      Continuous Delivery for Machine Learning (CD4ML)

    1. Common questions are: How many users click this button; what % of users that visit a screen click this button; how many users have signed up by region or account type? However, the data needed to answer those questions may not exist! If the data does exists, it’s likely “dirty” - undocumented, tough to find or could be factually inaccurate. It’ll be tough to work with! You could spend hours or days attempting to answer a single question only to discover that you can’t sufficiently answer it for a stakeholder. In machine learning, you may be asked to optimize some process or experience for consumers. However, there’s uncertainty with how much, if at all, the experience can be improved!

      Common types of problems you might be working with in Data Science / Machine Learning industry

    1. In data science community the performance of the model on the test dataset is one of the most important things people look at. Just look at the competitions on kaggle.com. They are extremely focused on test dataset and the performance of these models is really good.

      In data science, performance of the model on the test model is the most important metric for the majority.

      It's not always the best measurement since the most efficient model can completely misperform while receiving a different type of a dataset

    2. It's basically a look up table, interpolating between known data points. Except, unlike other interpolants like 'linear', 'nearest neighbour' or 'cubic', the underlying functional form is determined to best represent the kind of data you have.

      You can describe AI/ML methods as a look up table that adjusts to your data points unlike other interpolants (linear,nearest neighbor or cubic)

  3. Mar 2020
    1. Another nice SQL script paired with CRON jobs was the one that reminded people of carts that was left for more than 48 hours. Select from cart where state is not empty and last date is more than or equal to 48hrs.... Set this as a CRON that fires at 2AM everyday, period with less activity and traffic. People wake up to emails reminding them about their abandoned carts. Then sit watch magic happens. No AI/ML needed here. Just good 'ol SQL + Bash.

      Another example of using SQL + CRON job + Bash to remind customers of cart that was left (again no ML needed here)

    2. I will write a query like select from order table where last shop date is 3 or greater months. When we get this information, we will send a nice "we miss you, come back and here's X Naira voucher" email. The conversation rate for this one was always greater than 50%.

      Sometimes SQL is much more than enough (you don't need ML)

    1. Here’s a very simple example of how a VQA system might answer the question “what color is the triangle?”
      1. Look for shapes and colours using CNN.
      2. Understand the question type with NLP.
      3. Determine strength for each possible answer.
      4. Convert each answer strength to % probability
    2. Visual Question Answering (VQA): answering open-ended questions about images. VQA is interesting because it requires combining visual and language understanding.

      Visual Question Answering (VQA) = visual + language understanding

    3. Most VQA models would use some kind of Recurrent Neural Network (RNN) to process the question input
      • Most VQA will use RNN to process the question input
      • Easier VQA datasets shall be fine with using BOW to transport vector input to a standard (feedforward) NN
    4. The standard approach to performing VQA looks something like this: Process the image. Process the question. Combine features from steps 1/2. Assign probabilities to each possible answer.

      Approach to handle VQA problems:

      animation

    1. Script mode takes a function/class, reinterprets the Python code and directly outputs the TorchScript IR. This allows it to support arbitrary code, however it essentially needs to reinterpret Python

      Script mode in PyTorch

    2. In 2019, the war for ML frameworks has two remaining main contenders: PyTorch and TensorFlow. My analysis suggests that researchers are abandoning TensorFlow and flocking to PyTorch in droves. Meanwhile in industry, Tensorflow is currently the platform of choice, but that may not be true for long
      • in research: PyTorch > TensorFlow
      • in industry: TensorFlow > PyTorch
    3. Why do researchers love PyTorch?
      • simplicity <--- pythonic like, easily integrates with its ecosystem
      • great API <--- TensorFlow used to switch API many times
      • performance <--- it's not so clear if it's faster than TensorFlow
    4. Researchers care about how fast they can iterate on their research, which is typically on relatively small datasets (datasets that can fit on one machine) and run on <8 GPUs. This is not typically gated heavily by performance considerations, but by their ability to quickly implement new ideas. On the other hand, industry considers performance to be of the utmost priority. While 10% faster runtime means nothing to a researcher, that could directly translate to millions of savings for a company

      Researchers value how fast they can implement tools on their research.

      Industry considers value performance as it brings money.

    5. From the early academic outputs Caffe and Theano to the massive industry-backed PyTorch and TensorFlow

      It's not easy to track all the ML frameworks

      Caffe, Theano ---> PyTorch, TensorFlow

    6. When you run a PyTorch/TensorFlow model, most of the work isn’t actually being done in the framework itself, but rather by third party kernels. These kernels are often provided by the hardware vendor, and consist of operator libraries that higher-level frameworks can take advantage of. These are things like MKLDNN (for CPU) or cuDNN (for Nvidia GPUs). Higher-level frameworks break their computational graphs into chunks, which can then call these computational libraries. These libraries represent thousands of man hours of effort, and are often optimized for the architecture and application to yield the best performance

      What happens behind when you run ML frameworks

    7. Jax is built by the same people who built the original Autograd, and features both forward- and reverse-mode auto-differentiation. This allows computation of higher order derivatives orders of magnitude faster than what PyTorch/TensorFlow can offer

      Jax

    8. TensorFlow will always have a captive audience within Google/DeepMind, but I wonder whether Google will eventually relax this

      Generally, PyTorch will be much more favorised that maybe one day it will replace TensorFlow at Google's offices

    9. Once your PyTorch model is in this IR, we gain all the benefits of graph mode. We can deploy PyTorch models in C++ without a Python dependency , or optimize it.

    10. At their core, PyTorch and Tensorflow are auto-differentiation frameworks

      auto-differentation = takes derivative of some function. It can be implemented in many ways so most ML frameworks choose "reverse-mode auto-differentation" (known as "backpropagation")

    11. At the API level, TensorFlow eager mode is essentially identical to PyTorch’s eager mode, originally made popular by Chainer. This gives TensorFlow most of the advantages of PyTorch’s eager mode (ease of use, debuggability, and etc.) However, this also gives TensorFlow the same disadvantages. TensorFlow eager models can’t be exported to a non-Python environment, they can’t be optimized, they can’t run on mobile, etc. This puts TensorFlow in the same position as PyTorch, and they resolve it in essentially the same way - you can trace your code (tf.function) or reinterpret the Python code (Autograph).

      Tensorflow Eager

    12. TensorFlow came out years before PyTorch, and industry is slower to adopt new technologies than researchers

      Reason why PyTorch wasn't previously more popular than TensorFlow

    13. The PyTorch JIT is an intermediate representation (IR) for PyTorch called TorchScript. TorchScript is the “graph” representation of PyTorch. You can turn a regular PyTorch model into TorchScript by using either tracing or script mode.

      PyTorch JIT

    14. In 2018, PyTorch was a minority. Now, it is an overwhelming majority, with 69% of CVPR using PyTorch, 75+% of both NAACL and ACL, and 50+% of ICLR and ICML

    15. Tracing takes a function and an input, records the operations that were executed with that input, and constructs the IR. Although straightforward, tracing has its downsides. For example, it can’t capture control flow that didn’t execute. For example, it can’t capture the false block of a conditional if it executed the true block

      Tracing mode in PyTorch

    16. On the other hand, industry has a litany of restrictions/requirements

      TensorFlow's requirements:

      • no Python <--- overhead of the Python runtime might be too much to take
      • mobile <--- Python can't be embedded in the mobile binary
      • serving <--- no-downtime updates of models, switching between models seamlessly, etc.
    17. TensorFlow is still the dominant framework. For example, based on data [2] [3] from 2018 to 2019, TensorFlow had 1541 new job listings vs. 1437 job listings for PyTorch on public job boards, 3230 new TensorFlow Medium articles vs. 1200 PyTorch, 13.7k new GitHub stars for TensorFlow vs 7.2k for PyTorch, etc

      Nowadays, the numbers still play against PyTorch

    18. every major conference in 2019 has had a majority of papers implemented in PyTorch

      Legend:

      • CVPR, ICCV, ECCV - computer vision conferences
      • NAACL, ACL, EMNLP - NLP conferences
      • ICML, ICLR, NeurIPS - general ML conferences

      Interactive version

      PyTorch vs TensorFlow

    19. the transition from TensorFlow 1.0 to 2.0 will be difficult and provides a natural point for companies to evaluate PyTorch

      Chance of faster transition to PyTorch in industry

    1. For the application of machine learning in finance, it’s still very early days. Some of the stuff people have been doing in finance for a long time is simple machine learning, and some people were using neural networks back in the 80s and 90s.   But now we have a lot more data and a lot more computing power, so with our creativity in machine learning research, “We are so much in the beginning that we can’t even picture where we’re going to be 20 years from now”

      We are just in time to apply modern ML techniques to financial industry

    2. ability to learn from data e.g. OpenAI and the Rubik’s Cube and DeepMind with AlphaGo required the equivalent of thousands of years of gameplay to achieve those milestones

      Even while making the perfect algorithm, we have to expect long hours of learning

    3. Pedro’s book “The Master Algorithm” takes readers on a journey through the five dominant paradigms of machine learning research on a quest for the master  algorithm. Along the way, Pedro wanted to abstract away from the mechanics so that a broad audience, from the CXO to the consumer, can understand how machine learning is shaping our lives

      "The Master Algorithm" book seems to be too abstract in such a case; however, it covers the following 5 paradigms:

      • Rule based learning (Decision trees, Random Forests, etc)
      • Connectivism (neural networks, etc)
      • Bayesian (Naive Bayes, Bayesian Networks, Probabilistic Graphical Models)
      • Analogy (KNN & SVMs)
      • Unsupervised Learning (Clustering, dimensionality reduction, etc)
    4. We’ve always lived in a world which we didn’t completely understand but now we’re living in a world designed by us – for Pedro, that’s actually an improvement

      We never really understood the surroundings, but now we have a great impact to modify it

    5. But at the end of the day, what we know about neuroscience today is not enough to determine what we do in AI, it’s only enough to give us ideas.  In fact it’s a two way street – AI can help us to learn how the brain works and this loop between the two disciplines is a very important one and is growing very rapidly

      Neuroscience can help us understand AI and the opposite

    6. Pedro believes that success will come from unifying the different major types of learning and their master algorithms –not just combining, but unifying them such that “it feels like using one thing”

      Interesting point of view on designing the master algorithm

    7. if you look at the number  of connections that the state of the art machine learning systems for some of these problems have, they’re more than many animals – they have many hundreds of millions or billions of connections

      State of the art ML systems are composed of millions or billions of connections (close to humans)

    8. There was this period of a couple of 100 years where we understood our technology.  Now we just have to learn live in a world where we don’t understand the machines that work  for us, we just have to be confident they are working for us and doing their best

      Should we just accept the fact that machines will rule the world with a mysterious intelligence?

    1. team began its analysis on YouTube 8M, a publicly available dataset of YouTube videos

      YouTube 8M - public dataset of YouTube videos. With this, we can analyse video features like: *color

      • illumination
      • many types of faces
      • thousands of objects
      • several landscapes
    2. The trailer release for a new movie is a highly anticipated event that can help predict future success, so it behooves the business to ensure the trailer is hitting the right notes with moviegoers. To achieve this goal, the 20th Century Fox data science team partnered with Google’s Advanced Solutions Lab to create Merlin Video, a computer vision tool that learns dense representations of movie trailers to help predict a specific trailer’s future moviegoing audience

      Merlin Video - computer vision tool to help predict a specific trailer's moviegoing audience

    3. pipeline also includes a distance-based “collaborative filtering” (CF) model and a logistic regression layer that combines all the model outputs together to produce the movie attendance probability

      other elements of pipeline:

      • collaborative filtering (CF) model
      • logistic regression layer
    4. Merlin returns the following labels: facial_hair, beard, screenshot, chin, human, film

      Types of features Merlin Video can generate from a single trailer frame.

      Final result of feature collecting and ordering:

    5. The obvious choice was Cloud Machine Learning Engine (Cloud ML Engine), in conjunction with the TensorFlow deep learning framework

      Merlin Video is powered by:

      • Cloud Machine Learning Engine - automating infrastructure (resources, provisioning and monitoring)
      • TensorFlow
      • Cloud Dataflow and Data Studio - Dataflow generates reports in Data Studio
      • BigQuery and BigQueryML - used in a final step to merge Merlin’s millions of customer predictions with other data sources to create useful reports and to quickly prototype media plans for marketing campaigns
    6. custom model learns the temporal sequencing of labels in the movie trailer

      Temporal sequencing - times of different shots (e.g. long or short).

      Temporal sequencing can convey information on:

      • movie type
      • movie plot
      • roles of the main characters
      • filmmakers' cinematographic choices.

      When combined with historical customer data, sequencing analysis can be used to create predictions of customer behavior.

      arxiv paper on Merlin Video

    7. The elasticity of Cloud ML Engine allowed the data science team to iterate and test quickly, without compromising the integrity of the deep learning model

      Cloud ML Engine reduced the deployment time from months to days

    8. Architecture flow diagram for Merlin

    9. The first challenge is the temporal position of the labels in the trailer: it matters when the labels occur in the trailer. The second challenge is the high dimensionality of this data

      2 challenges that we find in labelling video clips: occurrence and volume of labels

    10. When it comes to movies, analyzing text taken from a script is limiting because it only provides a skeleton of the story, without any of the additional dynamism that can entice an audience to see a movie

      Analysing movie script isn't enough to predict the overall movie's attractiveness to the audience

    11. 20th Century Fox has been using this tool since the release of The Greatest Showman in 2017, and continues to use it to inform their latest releases

      The Merlin Video tool is used nowadays by 20th Century Fox

    12. model is trained end-to-end, and the loss of the logistic regression is back-propagated to all the trainable components (weights). Merlin’s data pipeline is refreshed weekly to account for new trailer releases

      Way the model is trained and located in the pipeline

    13. After a movie’s release, we are able to process the data on which movies were previously seen by that audience. The table below shows the top 20 actual moviegoer audiences (Comp ACT) compared to the top 20 predicted audiences (Comp PRED)

      Way of validating the Merlin model

  4. Oct 2018
  5. Aug 2017
    1. introduce topic modeling to those not yet fully converted aware of its potential.

      Is resistance futile?

    2. They’re powerful, widely applicable, easy to use, and difficult to understand — a dangerous combination.
    1. In writing the description of our reverse engineering work below, we deliberately avoid terms that are commonly used in Machine Learning, where labels are "true," "correct," or "gold standard." This linguistic distinction highlights the fundamentally different perspective that humanists have on classification as a tool. Our goal is not to create a system that mimics the decisions of a human annotator, but rather to better represent the porous boundaries between labels and identify the piles on which a story could have been placed over a century ago late on a cold wintry night in a dimly lit schoolhouse in eastern Jutland. We note the contrast between our use of computers to problematize existing distinctions and the common concern in the Humanities that computers deal only with binaries and black-and-white distinctions.

      Valuable insight and eloquently phrased.

    2. Our goal is not to treat existing classifications as "ground truth" labels and build machine learning tools to mimic them, but rather to use computation to better quantify the variability and uncertainty of those classifications.
    3. Our goal with this classification method can be seen as the inverse of usual machine learning classifiers.
  6. May 2014