37 Matching Annotations
  1. Last 7 days
    1. When you run a PyTorch/TensorFlow model, most of the work isn’t actually being done in the framework itself, but rather by third party kernels. These kernels are often provided by the hardware vendor, and consist of operator libraries that higher-level frameworks can take advantage of. These are things like MKLDNN (for CPU) or cuDNN (for Nvidia GPUs). Higher-level frameworks break their computational graphs into chunks, which can then call these computational libraries. These libraries represent thousands of man hours of effort, and are often optimized for the architecture and application to yield the best performance

      What happens behind when you run ML frameworks

    2. At their core, PyTorch and Tensorflow are auto-differentiation frameworks

      auto-differentation = takes derivative of some function. It can be implemented in many ways so most ML frameworks choose "reverse-mode auto-differentation" (known as "backpropagation")

    3. Jax is built by the same people who built the original Autograd, and features both forward- and reverse-mode auto-differentiation. This allows computation of higher order derivatives orders of magnitude faster than what PyTorch/TensorFlow can offer

      Jax

    4. the transition from TensorFlow 1.0 to 2.0 will be difficult and provides a natural point for companies to evaluate PyTorch

      Chance of faster transition to PyTorch in industry

    5. At the API level, TensorFlow eager mode is essentially identical to PyTorch’s eager mode, originally made popular by Chainer. This gives TensorFlow most of the advantages of PyTorch’s eager mode (ease of use, debuggability, and etc.) However, this also gives TensorFlow the same disadvantages. TensorFlow eager models can’t be exported to a non-Python environment, they can’t be optimized, they can’t run on mobile, etc. This puts TensorFlow in the same position as PyTorch, and they resolve it in essentially the same way - you can trace your code (tf.function) or reinterpret the Python code (Autograph).

      Tensorflow Eager

    6. Once your PyTorch model is in this IR, we gain all the benefits of graph mode. We can deploy PyTorch models in C++ without a Python dependency , or optimize it.

    7. Tracing takes a function and an input, records the operations that were executed with that input, and constructs the IR. Although straightforward, tracing has its downsides. For example, it can’t capture control flow that didn’t execute. For example, it can’t capture the false block of a conditional if it executed the true block

      Tracing mode in PyTorch

    8. Script mode takes a function/class, reinterprets the Python code and directly outputs the TorchScript IR. This allows it to support arbitrary code, however it essentially needs to reinterpret Python

      Script mode in PyTorch

    9. The PyTorch JIT is an intermediate representation (IR) for PyTorch called TorchScript. TorchScript is the “graph” representation of PyTorch. You can turn a regular PyTorch model into TorchScript by using either tracing or script mode.

      PyTorch JIT

    10. On the other hand, industry has a litany of restrictions/requirements

      TensorFlow's requirements:

      • no Python <--- overhead of the Python runtime might be too much to take
      • mobile <--- Python can't be embedded in the mobile binary
      • serving <--- no-downtime updates of models, switching between models seamlessly, etc.
    11. Researchers care about how fast they can iterate on their research, which is typically on relatively small datasets (datasets that can fit on one machine) and run on <8 GPUs. This is not typically gated heavily by performance considerations, but by their ability to quickly implement new ideas. On the other hand, industry considers performance to be of the utmost priority. While 10% faster runtime means nothing to a researcher, that could directly translate to millions of savings for a company

      Researchers value how fast they can implement tools on their research.

      Industry considers value performance as it brings money.

    12. TensorFlow came out years before PyTorch, and industry is slower to adopt new technologies than researchers

      Reason why PyTorch wasn't previously more popular than TensorFlow

    13. TensorFlow is still the dominant framework. For example, based on data [2] [3] from 2018 to 2019, TensorFlow had 1541 new job listings vs. 1437 job listings for PyTorch on public job boards, 3230 new TensorFlow Medium articles vs. 1200 PyTorch, 13.7k new GitHub stars for TensorFlow vs 7.2k for PyTorch, etc

      Nowadays, the numbers still play against PyTorch

    14. TensorFlow will always have a captive audience within Google/DeepMind, but I wonder whether Google will eventually relax this

      Generally, PyTorch will be much more favorised that maybe one day it will replace TensorFlow at Google's offices

    15. Why do researchers love PyTorch?
      • simplicity <--- pythonic like, easily integrates with its ecosystem
      • great API <--- TensorFlow used to switch API many times
      • performance <--- it's not so clear if it's faster than TensorFlow
    16. In 2018, PyTorch was a minority. Now, it is an overwhelming majority, with 69% of CVPR using PyTorch, 75+% of both NAACL and ACL, and 50+% of ICLR and ICML

    17. every major conference in 2019 has had a majority of papers implemented in PyTorch

      Legend:

      • CVPR, ICCV, ECCV - computer vision conferences
      • NAACL, ACL, EMNLP - NLP conferences
      • ICML, ICLR, NeurIPS - general ML conferences

      Interactive version

      PyTorch vs TensorFlow

    18. In 2019, the war for ML frameworks has two remaining main contenders: PyTorch and TensorFlow. My analysis suggests that researchers are abandoning TensorFlow and flocking to PyTorch in droves. Meanwhile in industry, Tensorflow is currently the platform of choice, but that may not be true for long
      • in research: PyTorch > TensorFlow
      • in industry: TensorFlow > PyTorch
    19. From the early academic outputs Caffe and Theano to the massive industry-backed PyTorch and TensorFlow

      It's not easy to track all the ML frameworks

      Caffe, Theano ---> PyTorch, TensorFlow

  2. Sep 2019
    1. Continuous Delivery for Machine Learning end-to-end process

      end-to-end process

    2. We chose to use GoCD as our Continuous Delivery tool, as it was built with the concept of pipelines as a first-class concern

      GoCD - open source Continuous Delivery tool

    3. A deployment pipeline automates the process for getting software from version control into production, including all the stages, approvals, testing, and deployment to different environments

      Deployment pipeline

    4. example of how to combine different test pyramids for data, model, and code in CD4ML

      Combining tests for data (purple), model (green) and code (blue) testing

    5. There are different types of testing that can be introduced in the ML workflow.

      Automated tests for ML system:

      • validating data
      • validating component integration
      • validating the model quality
      • validating model bias and fairness
    6. Another approach is to use a tool like H2O to export the model as a POJO in a JAR Java library, which you can then add as a dependency in your application. The benefit of this approach is that you can train the models in a language familiar to Data Scientists, such as Python or R, and export the model as a compiled binary that runs in a different target environment (JVM), which can be faster at inference time

      H2O - export models trained in Python/R as a POJO in JAR

    7. In order to formalise the model training process in code, we used an open source tool called DVC (Data Science Version Control). It provides similar semantics to Git, but also solves a few ML-specific problems:

      DVC - transform model training process into code.

      Advantages:

      • it has multiple backend plugins to fetch and store large files on an external storage outside of the source control repository;
      • it can keep track of those files' versions, allowing us to retrain our models when the data changes;
      • it keeps track of the dependency graph and commands used to execute the ML pipeline, allowing the process to be reproduced in other environments;
      • it can integrate with Git branches to allow multiple experiments to co-exist
    8. Machine Learning pipeline for our Sales Forecasting problem, and the 3 steps to automate it with DVC

      Sales Forecasting process Sales Forecasting process

    9. common functional silos in large organizations can create barriers, stifling the ability to automate the end-to-end process of deploying ML applications to production

      Common ML process (leading to delays and frictions) ML process

    10. Continuous Delivery for Machine Learning (CD4ML) is a software engineering approach in which a cross-functional team produces machine learning applications based on code, data, and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles.

      Continuous Delivery for Machine Learning (CD4ML) (long definition)

      Basic principles:

      • software engineering approach
      • cross-functional team
      • producing software based on code, data, and ml models
      • small and safe increments
      • reproducible and reliable software release
      • short adaptation cycles
    11. Continuous Delivery for Machine Learning (CD4ML) is the discipline of bringing Continuous Delivery principles and practices to Machine Learning applications.

      Continuous Delivery for Machine Learning (CD4ML)

  3. Oct 2018
  4. Aug 2017
    1. introduce topic modeling to those not yet fully converted aware of its potential.

      Is resistance futile?

    2. They’re powerful, widely applicable, easy to use, and difficult to understand — a dangerous combination.
    1. In writing the description of our reverse engineering work below, we deliberately avoid terms that are commonly used in Machine Learning, where labels are "true," "correct," or "gold standard." This linguistic distinction highlights the fundamentally different perspective that humanists have on classification as a tool. Our goal is not to create a system that mimics the decisions of a human annotator, but rather to better represent the porous boundaries between labels and identify the piles on which a story could have been placed over a century ago late on a cold wintry night in a dimly lit schoolhouse in eastern Jutland. We note the contrast between our use of computers to problematize existing distinctions and the common concern in the Humanities that computers deal only with binaries and black-and-white distinctions.

      Valuable insight and eloquently phrased.

    2. Our goal is not to treat existing classifications as "ground truth" labels and build machine learning tools to mimic them, but rather to use computation to better quantify the variability and uncertainty of those classifications.
    3. Our goal with this classification method can be seen as the inverse of usual machine learning classifiers.
  5. May 2014