9 Matching Annotations
  1. Mar 2020
    1. Vaex supports Just-In-Time compilation via Numba (using LLVM) or Pythran (acceleration via C++), giving better performance. If you happen to have a NVIDIA graphics card, you can use CUDA via the jit_cuda method to get even faster performance.

      Tools supported by Vaex

    2. displaying a Vaex DataFrame or column requires only the first and last 5 rows to be read from disk

      Vaex tries to go over the entire dataset with as few passes as possible

    3. Why is it so fast? When you open a memory mapped file with Vaex, there is actually no data reading going on. Vaex only reads the file metadata

      Vaex only reads the file metadata:

      • location of the data on disk
      • data structure (number of rows, columns...)
      • file description
      • and so on...
    4. When filtering a Vaex DataFrame no copies of the data are made. Instead only a reference to the original object is created, on which a binary mask is applied

      Filtering Vaex DataFrame works on reference to the original data, saving lots of RAM

    5. Vaex is an open-source DataFrame library which enables the visualisation, exploration, analysis and even machine learning on tabular datasets that are as large as your hard-drive. To do this, Vaex employs concepts such as memory mapping, efficient out-of-core algorithms and lazy evaluations.

      Vaex - library to manage as large datasets as your HDD, thanks to:

      • memory mapping
      • efficient out-of-core algorithms
      • lazy evaluations.

      All wrapped in a Pandas-like API

    6. The describe method nicely illustrates the power and efficiency of Vaex: all of these statistics were computed in under 3 minutes on my MacBook Pro (15", 2018, 2.6GHz Intel Core i7, 32GB RAM). Other libraries or methods would require either distributed computing or a cloud instance with over 100GB to preform the same computations.

      Possibilities of Vaex