Starting off as a muggle that naïve to the Math's and Data Science world.

Day 9

Spark (Con’t)

Execution step

  1. Code written, execute
  2. Driver Program launch
  3. Cluster manager provision resources
  4. At same time, spark context created. (aka Action initiated)
    • Create a job using DAG as a pipeline (aka RDD lineage)
    • Each item is call Task :: Partition and distributed to worker node
  5. Program are distributed into worker node

Tutorial 9

Tutorial 10

Leave a comment