11 Matching Annotations
  1. Mar 2019
  2. Dec 2018
  3. Nov 2018
  4. Oct 2018
  5. Jan 2018
  6. May 2017
  7. Apr 2017
  8. Apr 2014
    1. Mike Olson of Cloudera is on record as predicting that Spark will be the replacement for Hadoop MapReduce. Just about everybody seems to agree, except perhaps for Hortonworks folks betting on the more limited and less mature Tez. Spark’s biggest technical advantages as a general data processing engine are probably: The Directed Acyclic Graph processing model. (Any serious MapReduce-replacement contender will probably echo that aspect.) A rich set of programming primitives in connection with that model. Support also for highly-iterative processing, of the kind found in machine learning. Flexible in-memory data structures, namely the RDDs (Resilient Distributed Datasets). A clever approach to fault-tolerance.

      Spark's advantages:

      • DAG processing model
      • programming primitives for DAG model
      • highly-iterative processing suited for ML
      • RDD in-memory data structures
      • clever approach to fault-tolerance