19 Matching Annotations
  1. May 2021
  2. Apr 2021
    1. With Spark 3.1, the Spark-on-Kubernetes project is now considered Generally Available and Production-Ready.

      With Spark 3.1 k8s becomes the right option to replace YARN

  3. Feb 2021
    1. Consider the amount of data and the speed of the data, if low latency is your priority use Akka Streams, if you have huge amounts of data use Spark, Flink or GCP DataFlow.

      For low latency = Akka Streams

      For huge amounts of data = Spark, Flink or GCP DataFlow

  4. Jun 2020
  5. Sep 2019
  6. Mar 2019
  7. Dec 2018
  8. Nov 2018
  9. Oct 2018
  10. Jan 2018
  11. May 2017
  12. Apr 2017
  13. Apr 2014
    1. Mike Olson of Cloudera is on record as predicting that Spark will be the replacement for Hadoop MapReduce. Just about everybody seems to agree, except perhaps for Hortonworks folks betting on the more limited and less mature Tez. Spark’s biggest technical advantages as a general data processing engine are probably: The Directed Acyclic Graph processing model. (Any serious MapReduce-replacement contender will probably echo that aspect.) A rich set of programming primitives in connection with that model. Support also for highly-iterative processing, of the kind found in machine learning. Flexible in-memory data structures, namely the RDDs (Resilient Distributed Datasets). A clever approach to fault-tolerance.

      Spark's advantages:

      • DAG processing model
      • programming primitives for DAG model
      • highly-iterative processing suited for ML
      • RDD in-memory data structures
      • clever approach to fault-tolerance