Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support cos select #4978

Closed
wants to merge 2 commits into from
Closed

support cos select #4978

wants to merge 2 commits into from

Conversation

Coderlxl
Copy link

@Coderlxl Coderlxl commented Aug 26, 2020

Fixes #4519

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments. Overall looks good

* To process CSV objects that may contain \uFDD0 as first row character please disable S3SelectPushdown.
* TODO: Remove this proxy logic when S3Select API supports disabling of row level comments.
*/
private String s3CsvComments = "\uFDD0";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a new config class HiveS3SelectConfig for this property. Name the config property hive.s3select-pushdown.csv-comments to match the other S3Select comments.

Later, we can move the existing S3Select configs from HiveConfig to the new HiveS3SelectConfig.

{
HiveS3Config defaults = new HiveS3Config();
String s3CsvComments = configuration.get(S3_CSV_COMMENTS, defaults.getS3CsvComments());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this CSV comments config is only used by S3Select, we can limit it to the S3Select code -- no need to pass it through Hadoop config:

  1. Change PrestoS3ClientFactory to take the new HiveS3SelectConfig as mentioned in the other comment.
  2. Store csvComments from the config in PrestoS3ClientFactory.
  3. Add a method getCsvComments() to PrestoS3ClientFactoryand use that in the S3SelectCsvRecordReader constructor to fetch the value.

@@ -172,6 +172,7 @@
public static final String S3_SKIP_GLACIER_OBJECTS = "presto.s3.skip-glacier-objects";
public static final String S3_REQUESTER_PAYS_ENABLED = "presto.s3.requester-pays.enabled";
public static final String S3_STORAGE_CLASS = "presto.s3.storage-class";
public static final String S3_CSV_COMMENTS = "presto.s3.csv-comments";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this after moving the config handling to PrestoS3ClientFactory (see other comment).

@@ -462,4 +471,17 @@ public HiveS3Config setRequesterPaysEnabled(boolean requesterPaysEnabled)
this.requesterPaysEnabled = requesterPaysEnabled;
return this;
}

@NotNull
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 ignores anything after the first character, so we also should enforce it is exactly one character with size validation (in addition to not null)

@Size(min = 1, max = 1)

@@ -95,6 +96,7 @@ private AmazonS3 createS3Client(Configuration config)
Duration connectTimeout = Duration.valueOf(config.get(S3_CONNECT_TIMEOUT, defaults.getS3ConnectTimeout().toString()));
Duration socketTimeout = Duration.valueOf(config.get(S3_SOCKET_TIMEOUT, defaults.getS3SocketTimeout().toString()));
int maxConnections = config.getInt(S3_SELECT_PUSHDOWN_MAX_CONNECTIONS, defaultMaxConnections);
boolean s3PathStyleAccess = config.getBoolean(S3_PATH_STYLE_ACCESS, defaults.isS3PathStyleAccess());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this a separate commit, since it's a bug fix, unrelated to the CSV comments change? Title it something like

Respect path-style-access config for S3Select

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be a separate commit

@tooptoop4
Copy link
Contributor

still interested in this @Coderlxl ?

@colebow
Copy link
Member

colebow commented Oct 19, 2022

👋 @Coderlxl - this PR is inactive and doesn't seem to be under development. If you'd like to continue work on this at any point in the future, feel free to re-open.

@colebow colebow closed this Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

Presto support COS select
5 participants