-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support cos select #4978
support cos select #4978
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments. Overall looks good
* To process CSV objects that may contain \uFDD0 as first row character please disable S3SelectPushdown. | ||
* TODO: Remove this proxy logic when S3Select API supports disabling of row level comments. | ||
*/ | ||
private String s3CsvComments = "\uFDD0"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create a new config class HiveS3SelectConfig
for this property. Name the config property hive.s3select-pushdown.csv-comments
to match the other S3Select comments.
Later, we can move the existing S3Select configs from HiveConfig
to the new HiveS3SelectConfig
.
{ | ||
HiveS3Config defaults = new HiveS3Config(); | ||
String s3CsvComments = configuration.get(S3_CSV_COMMENTS, defaults.getS3CsvComments()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this CSV comments config is only used by S3Select, we can limit it to the S3Select code -- no need to pass it through Hadoop config:
- Change
PrestoS3ClientFactory
to take the newHiveS3SelectConfig
as mentioned in the other comment. - Store
csvComments
from the config inPrestoS3ClientFactory
. - Add a method
getCsvComments()
toPrestoS3ClientFactory
and use that in theS3SelectCsvRecordReader
constructor to fetch the value.
@@ -172,6 +172,7 @@ | |||
public static final String S3_SKIP_GLACIER_OBJECTS = "presto.s3.skip-glacier-objects"; | |||
public static final String S3_REQUESTER_PAYS_ENABLED = "presto.s3.requester-pays.enabled"; | |||
public static final String S3_STORAGE_CLASS = "presto.s3.storage-class"; | |||
public static final String S3_CSV_COMMENTS = "presto.s3.csv-comments"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this after moving the config handling to PrestoS3ClientFactory
(see other comment).
@@ -462,4 +471,17 @@ public HiveS3Config setRequesterPaysEnabled(boolean requesterPaysEnabled) | |||
this.requesterPaysEnabled = requesterPaysEnabled; | |||
return this; | |||
} | |||
|
|||
@NotNull |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S3 ignores anything after the first character, so we also should enforce it is exactly one character with size validation (in addition to not null)
@Size(min = 1, max = 1)
@@ -95,6 +96,7 @@ private AmazonS3 createS3Client(Configuration config) | |||
Duration connectTimeout = Duration.valueOf(config.get(S3_CONNECT_TIMEOUT, defaults.getS3ConnectTimeout().toString())); | |||
Duration socketTimeout = Duration.valueOf(config.get(S3_SOCKET_TIMEOUT, defaults.getS3SocketTimeout().toString())); | |||
int maxConnections = config.getInt(S3_SELECT_PUSHDOWN_MAX_CONNECTIONS, defaultMaxConnections); | |||
boolean s3PathStyleAccess = config.getBoolean(S3_PATH_STYLE_ACCESS, defaults.isS3PathStyleAccess()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this a separate commit, since it's a bug fix, unrelated to the CSV comments change? Title it something like
Respect path-style-access config for S3Select
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be a separate commit
still interested in this @Coderlxl ? |
👋 @Coderlxl - this PR is inactive and doesn't seem to be under development. If you'd like to continue work on this at any point in the future, feel free to re-open. |
Fixes #4519