Memory Overhead

Published: None

Source: https://www.linkedin.com/pulse/memory-overhead-arabinda-mohapatra-nzi8c?trackingId=XFuzrPv8SFqHvSecoJC58A%3D%3D

Memory Overhead

Arabinda Mohapatra

Running Kafka streams after dark, diving into genetic code by daylight, and wrestling with Databricks and Tableflow in every spare moment—sleep is optional

August 25, 2024

In Spark, memory overhead refers to the additional memory allocated beyond the user-defined executor memory. This overhead is crucial for managing various internal operations and ensuring smooth execution of tasks.

What is Memory Overhead?

Memory overhead in Spark includes memory used for:

Task Execution Management:

Tracks and manages the status ,context and metadata of task being executed
Allocate space for task releated infomration as input splits,intermediate results and shuffle output

2. Shuffle Operation:

During the shuffle operations,intermediate data is exchnaged between nodes ,this reuires additional memory for buffer management between nodes and data seriallization

3. Broadcast Variable

Memory over head ensures that broadcast variable are effciently stored and managed reducing reducant data transfer

4. Internal Data Structure

Spark internal data structures such as task as metadata ,storage and job details required additional memory

5.Network Buffers

During the data exchange between nodes,n/w buffers are used to temporarily hold data that being sentt/received
Spark.executor.MemoryOverhead

How Much Memory is Allocated?

The amount of memory allocated for overhead is typically a fraction of the total executor memory. By default, Spark allocates 10% of the executor memory for overhead, but this can be configured using the spark.yarn.executor.memoryOverhead parameter. For example:

If an executor has 4 GB of memory, the default overhead would be 400 MB (10% of 4 GB).

from pyspark import SparkConf, SparkContext

conf = SparkConf().setAppName("MemoryOverheadExample") \
                  .set("spark.executor.memory", "4g") \
                  .set("spark.executor.memoryOverhead", "512m")
sc = SparkContext(conf=conf)

Search This Blog

Memory Overhead In pySpark

Memory Overhead

Memory Overhead

Arabinda Mohapatra

What is Memory Overhead?

How Much Memory is Allocated?

Comments

Post a Comment

Hi, I'm Arabinda Mohapatra