-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30069][CORE][YARN] Clean up non-shuffle disk block manager files following executor exists on YARN #29378
Changes from all commits
c226dd6
89f7946
cbc2d29
612a6f1
0e9c004
39f27eb
21e6f60
3520e33
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,7 +17,7 @@ | |
|
||
package org.apache.spark.storage | ||
|
||
import java.io.{File, IOException} | ||
import java.io.File | ||
import java.nio.file.Files | ||
import java.util.UUID | ||
|
||
|
@@ -40,7 +40,7 @@ private[spark] class DiskBlockManager(conf: SparkConf, deleteFilesOnStop: Boolea | |
/* Create one local directory for each path mentioned in spark.local.dir; then, inside this | ||
* directory, create multiple subdirectories that we will hash files into, in order to avoid | ||
* having really large inodes at the top level. */ | ||
private[spark] val localDirs: Array[File] = createLocalDirs(conf) | ||
private[spark] val localDirs: Array[File] = StorageUtils.createLocalDirs(conf) | ||
if (localDirs.isEmpty) { | ||
logError("Failed to create any local dir.") | ||
System.exit(ExecutorExitCode.DISK_STORE_FAILED_TO_CREATE_DIR) | ||
|
@@ -50,14 +50,31 @@ private[spark] class DiskBlockManager(conf: SparkConf, deleteFilesOnStop: Boolea | |
|
||
// The content of subDirs is immutable but the content of subDirs(i) is mutable. And the content | ||
// of subDirs(i) is protected by the lock of subDirs(i) | ||
private val subDirs = Array.fill(localDirs.length)(new Array[File](subDirsPerLocalDir)) | ||
private val subDirs = StorageUtils.createSubDirs(conf, parent = localDirs) | ||
|
||
/* Directories persist the temporary files (temp_local, temp_shuffle). | ||
* We separate the storage directories of temp block from non-temp block since | ||
* the cleaning process for temp block may be different between deploy modes. | ||
* For example, these files have no opportunity to be cleaned before application end on YARN. | ||
* This is a real issue, especially for long-lived Spark application like Spark thrift-server. | ||
* So for Yarn mode, we persist these files in YARN container directories which could be | ||
* cleaned by YARN when the container exists. */ | ||
private val tempDirs: Array[File] = StorageUtils.createTempDirs(conf, replacement = localDirs) | ||
if (tempDirs.isEmpty) { | ||
logError("Failed to create any temp dir.") | ||
System.exit(ExecutorExitCode.DISK_STORE_FAILED_TO_CREATE_DIR) | ||
} | ||
|
||
// Similar with subDirs, tempSubDirs are used only for temp block. | ||
private val tempSubDirs = | ||
StorageUtils.createTempSubDirs(conf, parent = tempDirs, replacement = subDirs) | ||
|
||
private val shutdownHook = addShutdownHook() | ||
|
||
/** Looks up a file by hashing it into one of our local subdirectories. */ | ||
// This method should be kept in sync with | ||
// org.apache.spark.network.shuffle.ExecutorDiskUtils#getFile(). | ||
def getFile(filename: String): File = { | ||
private def getFile(localDirs: Array[File], subDirs: Array[Array[File]], | ||
subDirsPerLocalDir: Int, filename: String): File = { | ||
// Figure out which local directory it hashes to, and which subdirectory in that | ||
val hash = Utils.nonNegativeHash(filename) | ||
val dirId = hash % localDirs.length | ||
|
@@ -81,17 +98,26 @@ private[spark] class DiskBlockManager(conf: SparkConf, deleteFilesOnStop: Boolea | |
new File(subDir, filename) | ||
} | ||
|
||
def getFile(blockId: BlockId): File = getFile(blockId.name) | ||
/** Looks up a file by hashing it into one of our local/temp subdirectories. */ | ||
def getFile(blockId: BlockId): File = { | ||
if (blockId.isTemp) { | ||
getFile(tempDirs, tempSubDirs, subDirsPerLocalDir, blockId.name) | ||
} else { | ||
getFile(localDirs, subDirs, subDirsPerLocalDir, blockId.name) | ||
} | ||
} | ||
|
||
/** Check if disk block manager has a block. */ | ||
def containsBlock(blockId: BlockId): Boolean = { | ||
getFile(blockId.name).exists() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this a performance optimization? I guess we can leave the original style because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Looks like we can change - val targetFile = diskManager.getFile(targetBlockId.name)
+ val targetFile = diskManager.getFile(targetBlockId) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I updated the description for the interface changing part. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Now I refactor the test case and remove There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for explanation. It's a little confusing to me, but I'll take a look more in that way later. |
||
getFile(blockId).exists() | ||
} | ||
|
||
/** List all the files currently stored on disk by the disk manager. */ | ||
def getAllFiles(): Seq[File] = { | ||
// Get all the files inside the array of array of directories | ||
subDirs.flatMap { dir => | ||
// compare their references are same | ||
val allSubDirs = if (subDirs eq tempSubDirs) subDirs else subDirs ++ tempSubDirs | ||
allSubDirs.flatMap { dir => | ||
dir.synchronized { | ||
// Copy the content of dir because it may be modified in other threads | ||
dir.clone() | ||
|
@@ -134,25 +160,6 @@ private[spark] class DiskBlockManager(conf: SparkConf, deleteFilesOnStop: Boolea | |
(blockId, getFile(blockId)) | ||
} | ||
|
||
/** | ||
* Create local directories for storing block data. These directories are | ||
* located inside configured local directories and won't | ||
* be deleted on JVM exit when using the external shuffle service. | ||
*/ | ||
private def createLocalDirs(conf: SparkConf): Array[File] = { | ||
Utils.getConfiguredLocalDirs(conf).flatMap { rootDir => | ||
try { | ||
val localDir = Utils.createDirectory(rootDir, "blockmgr") | ||
logInfo(s"Created local directory at $localDir") | ||
Some(localDir) | ||
} catch { | ||
case e: IOException => | ||
logError(s"Failed to create local dir in $rootDir. Ignoring this directory.", e) | ||
None | ||
} | ||
} | ||
} | ||
|
||
private def addShutdownHook(): AnyRef = { | ||
logDebug("Adding shutdown hook") // force eager creation of logger | ||
ShutdownHookManager.addShutdownHook(ShutdownHookManager.TEMP_DIR_SHUTDOWN_PRIORITY + 1) { () => | ||
|
@@ -175,15 +182,17 @@ private[spark] class DiskBlockManager(conf: SparkConf, deleteFilesOnStop: Boolea | |
|
||
private def doStop(): Unit = { | ||
if (deleteFilesOnStop) { | ||
localDirs.foreach { localDir => | ||
if (localDir.isDirectory() && localDir.exists()) { | ||
// compare their references are same | ||
val toDelete = if (localDirs eq tempDirs) localDirs else localDirs ++ tempDirs | ||
toDelete.foreach { dir => | ||
if (dir.isDirectory() && dir.exists()) { | ||
try { | ||
if (!ShutdownHookManager.hasRootAsShutdownDeleteDir(localDir)) { | ||
Utils.deleteRecursively(localDir) | ||
if (!ShutdownHookManager.hasRootAsShutdownDeleteDir(dir)) { | ||
Utils.deleteRecursively(dir) | ||
} | ||
} catch { | ||
case e: Exception => | ||
logError(s"Exception while deleting local spark dir: $localDir", e) | ||
logError(s"Exception while deleting local spark dir: $dir", e) | ||
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LantaoJin . This comment reminds me that we need to handle
ExecutorDiskUtils
properly. Do we need to updateExecutorDiskUtils.getFile
? Or, do we need to remove this comment?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I didn't change any behaviour for non-temp files. This
ExecutorDiskUtils.getFile
is handing the non-temp files such as start with "shuffle_", "rdd_". So it should still be kept in sync withDiskBlockManager.getFile
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see the method body is not changed. Just change the signature from
def getFile(filename: String): File
toprivate def getFile(localDirs: Array[File], subDirs: Array[Array[File]], subDirsPerLocalDir: Int, filename: String): File
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this function need to have unused parameters? Maybe, I'm still confused with this frame change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we just storage temp files and non-temp files to different root paths. The algorithm how to find a file doesn't change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you must mean the
getFile
inDiskBlockManager
has 4 parameters andgetFile
inExecutorDiskUtils
has 3 parameters. ExecutorDiskUtils is a utility, so 3 parameters can define a file path structure. The parametersubDirs
inDiskBlockManager
only is a local variable which used in other parts. It does not impact the finding algorithm result. I can use 3 parameters inDiskBlockManager.getFile
as well. But I need to use different array by a condition, Like:But the
someConditions
is a little hard to determine in this method.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the
getFile
inDiskBlockManager
has 4 parameters:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to pay more attention on how parameters in
getFile
. Before this PR. There was only 1 parameterfilename: String
in DiskBlockManager and 3 parameters inExecutorDiskUtils
.