𝗥𝗲𝗽𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻 𝘃𝘀 𝗖𝗼𝗮𝗹𝗲𝘀𝗰𝗲
🚩𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻 🚀 𝗥𝗲𝗽𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻 𝘃𝘀 𝗖𝗼𝗮𝗹𝗲𝘀𝗰𝗲 ---------------------------------------------------------------------------- 📌𝗥𝗲𝗽𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻: 👉🏻Default partition for RDD/DataFrame 👉🏻spark.sql.files.maxpartitionBytes-128MB( 👉🏻spark.default.parallelism-8 partitons by default(creating data within spark) 👉🏻Repartition is used to increase or decrease the partition in spark 👉🏻Repartition shuffle the data and build a new partition from scratch repartition is always result equal size partition 👉🏻Due to full shuffle its not good for performance(Depend upon the use case) 👉🏻Use repartition to redistribute data evenly before the join, ensuring a balanced workload 👉🏻Apply repartition before grouping to enhance data distribution. 📌𝗖𝗼𝗮𝗹𝗲𝘀𝗰𝗲: 👉🏻Coalesce will only reduces the no of partitions 👉🏻Coalesce does not required full shuffle 👉🏻Unlike repartition, it tries to minimize data movement and avoids a full shuffle when reducing partitions. 👉...