-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21822][SQL]When insert Hive Table is finished, it is better to clean out the tmpLocation dir #19035
Conversation
I think this change is very useful, auto-delete tmp location dir like : .hive-staging_xxxxxx could make everyone only see the target table's partition dir. |
Can one of the admins verify this patch? |
@@ -435,6 +435,18 @@ case class InsertIntoHiveTable( | |||
logWarning(s"Unable to delete staging directory: $stagingDir.\n" + e) | |||
} | |||
|
|||
//delete the tmpLocation dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lint-scala gives error for this line: Insert a space after the start of the comment.
} | ||
} catch { | ||
case NonFatal(e) => | ||
logWarning(s"Unable to delete tmpLocation directory:" + tmpLocation.toString + "\n" + e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be compressed a bit:
logWarning(s"Unable to delete tmpLocation directory: $tmpLocation\n$e")
I think the problem is gone, right? Could you see the latest file version: |
@figo77 Could you close it? |
Closes apache#18916 Closes apache#19613 Closes apache#19739 Closes apache#19936 Closes apache#19919 Closes apache#19933 Closes apache#19917 Closes apache#20027 Closes apache#19035 Closes apache#20044 Closes apache#20104
What changes were proposed in this pull request?
When insert Hive Table is finished, it is better to clean out the tmpLocation dir(the temp directories like ".hive-staging_hive_2017-08-19_10-56-01_540_5448395226195533570-9/-ext-10000" or "/tmp/hive/..." for an old spark version).
Otherwise, when lots of spark job are executed, millions of temporary directories are left in HDFS. And these temporary directories can only be deleted by the maintainer through the shell script.
How was this patch tested?