Skip to content

Commit

Permalink
[SPARK-5836] [DOCS] [STREAMING] Clarify what may cause long-running S…
Browse files Browse the repository at this point in the history
…park apps to preserve shuffle files

Clarify what may cause long-running Spark apps to preserve shuffle files

Author: Sean Owen <[email protected]>

Closes #6901 from srowen/SPARK-5836 and squashes the following commits:

a9faef0 [Sean Owen] Clarify what may cause long-running Spark apps to preserve shuffle files

(cherry picked from commit 4be53d0)
Signed-off-by: Andrew Or <[email protected]>
  • Loading branch information
srowen authored and Andrew Or committed Jun 19, 2015
1 parent 1d44147 commit 0b8dce0
Showing 1 changed file with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions docs/programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -1079,9 +1079,11 @@ generate these on the reduce side. When data does not fit in memory Spark will s
to disk, incurring the additional overhead of disk I/O and increased garbage collection.

Shuffle also generates a large number of intermediate files on disk. As of Spark 1.3, these files
are not cleaned up from Spark's temporary storage until Spark is stopped, which means that
long-running Spark jobs may consume available disk space. This is done so the shuffle doesn't need
to be re-computed if the lineage is re-computed. The temporary storage directory is specified by the
are preserved until the corresponding RDDs are no longer used and are garbage collected.
This is done so the shuffle files don't need to be re-created if the lineage is re-computed.
Garbage collection may happen only after a long period time, if the application retains references
to these RDDs or if GC does not kick in frequently. This means that long-running Spark jobs may
consume a large amount of disk space. The temporary storage directory is specified by the
`spark.local.dir` configuration parameter when configuring the Spark context.

Shuffle behavior can be tuned by adjusting a variety of configuration parameters. See the
Expand Down

0 comments on commit 0b8dce0

Please sign in to comment.