Thursday, August 1, 2024

𝗞𝗲𝘆 𝗦𝘁𝗲𝗽𝘀 𝗶𝗻 𝗦𝗽𝗮𝗿𝗸'𝘀 𝗘𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻 𝗙𝗹𝗼𝘄:🚀


🚀 𝗞𝗲𝘆 𝗦𝘁𝗲𝗽𝘀 𝗶𝗻 𝗦𝗽𝗮𝗿𝗸'𝘀 𝗘𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻 𝗙𝗹𝗼𝘄:🚀

𝗜𝗻𝘁𝗲𝗿𝗻𝗮𝗹 𝗪𝗼𝗿𝗸𝗶𝗻𝗴:


1-Apache Spark utilizes a Master/Slave architecture, where a Driver program is launched upon command submission

2-Driver Program runs the main method of the class in execution 

3-Driver Program creates the spark session & helps create DAG based on tranformation running on th eprogramme

4-Driver Programme, operating as a separate JVM process

5-DAG defines several steps of the programmed also consists of RDD lineages which will help to recreate RDD during failure 

6-Action called -Job got created & Job is submitted to DAG scheduler

7-DAG Scheduler taks to divide into Stages and further divided to tasks

8-Tasks are submiited to Task Scheduler via cluster manager on different nodes where executor execute the task

9-Executor is also a seperate JVM process

10-Once the task is completed ,the results are shared back to driver program.

11-Once the execution of the code is completed ,driver program is exited and spark session is shutdown

No comments:

Post a Comment

"🚀 Delta Lake's Vectorized Delete: The Secret to 10x Faster Data Operations!"

"🚀 Delta Lake's Vectorized Delete: The Secret to 10x Faster Data Operations!" Big news for data engineers! Delta Lake 2.0+ in...