-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make hive temporary staging directory to be configurable #12199
Conversation
Looks good, but Travis is red.
|
@findepi Thanks. Updated. |
presto-hive/src/main/java/com/facebook/presto/hive/HiveWriteUtils.java
Outdated
Show resolved
Hide resolved
{ | ||
// skip using temporary directory for S3 | ||
return !isS3FileSystem(context, hdfsEnvironment, path); | ||
return isTemporaryStagingDirectoryEnabled(session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe want to update the comment around WriteMode
enum?
presto/presto-hive/src/main/java/com/facebook/presto/hive/LocationHandle.java
Lines 107 to 121 in 79f480b
public enum WriteMode | |
{ | |
/** | |
* common mode for new table or existing table (both new and existing partition) | |
*/ | |
STAGE_AND_MOVE_TO_TARGET_DIRECTORY(false), | |
/** | |
* for new table in S3 | |
*/ | |
DIRECT_TO_TARGET_NEW_DIRECTORY(true), | |
/** | |
* for existing table in S3 (both new and existing partition) | |
*/ | |
DIRECT_TO_TARGET_EXISTING_DIRECTORY(true), | |
/**/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make hive temporary staging directory to be disableable
- I feel like we can update the commit title to:
Add config flag to disable hive temp staging directory
orMake use of hive temp staging directory configurable
.
presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java
Show resolved
Hide resolved
@@ -1190,4 +1192,17 @@ public HiveClientConfig setS3SelectPushdownMaxConnections(int s3SelectPushdownMa | |||
this.s3SelectPushdownMaxConnections = s3SelectPushdownMaxConnections; | |||
return this; | |||
} | |||
|
|||
@Config("hive.temporary-staging-directory-enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we rename this to hive.use-temporary-staging-directory-for-writes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we be using staging directory for reads? would there be use for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findepi : Unlikely. When the WriteMode
is STAGE_AND_MOVE_TO_TARGET_DIRECTORY
, the data is first written to the stage directory, and rename to the target directory when when commit the write.
But I don't feel we have to emphasize for-write
? What about hive.use-temporary-staging-directory
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I don't feel we have to emphasize
for-write
?
i thought the same
What about
hive.use-temporary-staging-directory
?
would do
however, i think the original hive.temporary-staging-directory-enabled
is slightly better. Consider the case when one uses s3 where staging cannot (currently) be used.
When users sets hive.use-temporary-staging-directory = true
, they could expect that staging is used (or error is reported). However, none of these will happen.
At the same time, I don't want to report an error when the option is used with s3 -- a single cluster may be talking to s3 and hdfs and user may want to set the option to change behavior on hdfs.
So... to me this should be either
hive.temporary-staging-directory-enabled
(original) orhive.hdfs.use-temporary-staging-directory
i prefer the original.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using hdfs
could make someone to think that this only affects actual HDFS, and not any HDFS-compatible file system.
I would keep the original. @nezihyigitbasi What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like "using" a staging directory sounds better than "enabling" it (hive.use-temporary-staging-directory
). But, if you all want to go with the original name I am fine with it.
@@ -76,6 +76,7 @@ | |||
private static final String COLLECT_COLUMN_STATISTICS_ON_WRITE = "collect_column_statistics_on_write"; | |||
private static final String OPTIMIZE_MISMATCHED_BUCKET_COUNT = "optimize_mismatched_bucket_count"; | |||
private static final String S3_SELECT_PUSHDOWN_ENABLED = "s3_select_pushdown_enabled"; | |||
private static final String TEMPORARY_STAGING_DIRECTORY_ENABLED = "temporary_staging_directory_enabled"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, should we rename this to use_temporary_staging_directory_for_writes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make hive temporary staging directory location to be configurable
Make hive temporary staging directory location configurable
presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveSessionProperties.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nezihyigitbasi Comments addressed, would you mind to take a look?
@@ -1190,4 +1192,17 @@ public HiveClientConfig setS3SelectPushdownMaxConnections(int s3SelectPushdownMa | |||
this.s3SelectPushdownMaxConnections = s3SelectPushdownMaxConnections; | |||
return this; | |||
} | |||
|
|||
@Config("hive.temporary-staging-directory-enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using hdfs
could make someone to think that this only affects actual HDFS, and not any HDFS-compatible file system.
I would keep the original. @nezihyigitbasi What do you think?
presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. (Please merge after the release is completed.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
assertEquals(hiveInsertTableHandle.getLocationHandle().getWritePath(), hiveInsertTableHandle.getLocationHandle().getTargetPath()); | ||
|
||
session = Session.builder(getSession()) | ||
.setCatalogSessionProperty("hive", "temporary_staging_directory_path", "/tmp/custom/temporary-${USER}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe explicitly put "temporary_staging_directory_enabled": "true"?
I am asking this because I got confused when looking at it first time, then I realized temporary_staging_directory_enabled is by default true.
Presto for some file systems that wrap S3, should behave in the same way as it does for S3.
In some deployments, /tmp cannot be used to write temporary files.
No description provided.