Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IcebergInputSource : Add option to toggle case sensitivity while reading columns from iceberg catalog #16496

Merged
merged 4 commits into from
May 31, 2024

Conversation

a2l007
Copy link
Contributor

@a2l007 a2l007 commented May 23, 2024

Description

The column names defined in the iceberg catalog schema may not have the same case as those column names defined in the data files itself. Since the Iceberg table scan context enables case sensitivity by default, this can break filter calls during the scan operation if the column name cases don't match.
This PR adds config: caseSensitive to the Iceberg catalog spec of IcebergInputSource that toggles the case sensitivity. Setting this to false fixes the scenario described above.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

@@ -92,6 +97,10 @@ public List<String> extractSnapshotDataFiles(
if (snapshotTime != null) {
tableScan = tableScan.asOfTime(snapshotTime.getMillis());
}
//Default case sensitivity is true for Iceberg TableScanContext
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//Default case sensitivity is true for Iceberg TableScanContext
// Default case sensitivity is true for Iceberg TableScanContext

@@ -92,6 +97,10 @@ public List<String> extractSnapshotDataFiles(
if (snapshotTime != null) {
tableScan = tableScan.asOfTime(snapshotTime.getMillis());
}
//Default case sensitivity is true for Iceberg TableScanContext
if (!isCaseSensitive()) {
tableScan = tableScan.caseSensitive(isCaseSensitive());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of relying on the upstream default value of true, we can always set this unconditionally: tableScan = tableScan.caseSensitive(isCaseSensitive())

@a2l007
Copy link
Contributor Author

a2l007 commented May 31, 2024

Thank you for the review @abhishekrb19 @asdf2014

@a2l007 a2l007 merged commit b53d757 into apache:master May 31, 2024
87 checks passed
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants