Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49666][SQL] Add feature flag for trim collation feature #48222

Conversation

jovanpavl-db
Copy link
Contributor

What changes were proposed in this pull request?

Introducing new specifier for trim collations (both leading and trailing trimming). These are initial changes so that trim specifier is recognized and put under feature flag (all code paths blocked).

Why are the changes needed?

Support for trailing space trimming is one of the requested feature by users.

Does this PR introduce any user-facing change?

This is guarded by feature flag.

How was this patch tested?

Added tests to CollationSuite, SqlConfSuite and QueryCompilationErrorSuite.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Sep 24, 2024
Copy link
Contributor

@stefankandic stefankandic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Left some minor comments but looks good overall

Copy link
Contributor

@stefankandic stefankandic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending scalastyle fixes!

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in c54c017 Sep 30, 2024
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?
Introducing new specifier for trim collations (both leading and trailing trimming). These are initial changes so that trim specifier is recognized and put under feature flag (all code paths blocked).

### Why are the changes needed?
Support for trailing space trimming is one of the requested feature by users.

### Does this PR introduce _any_ user-facing change?
This is guarded by feature flag.

### How was this patch tested?
Added tests to CollationSuite, SqlConfSuite and QueryCompilationErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#48222 from jovanpavl-db/trim-collation-feature-initial-support.

Authored-by: Jovan Pavlovic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
### What changes were proposed in this pull request?
Introducing new specifier for trim collations (both leading and trailing trimming). These are initial changes so that trim specifier is recognized and put under feature flag (all code paths blocked).

### Why are the changes needed?
Support for trailing space trimming is one of the requested feature by users.

### Does this PR introduce _any_ user-facing change?
This is guarded by feature flag.

### How was this patch tested?
Added tests to CollationSuite, SqlConfSuite and QueryCompilationErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#48222 from jovanpavl-db/trim-collation-feature-initial-support.

Authored-by: Jovan Pavlovic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
MaxGekk pushed a commit that referenced this pull request Nov 14, 2024
…ationNameToId` outside of cases

### What changes were proposed in this pull request?
In this PR, UTF8_BINARY performance regression is addressed, that was first identified here #48721. The regression is traced back to this PR #48222 when it first occurred, however this isn't the actual source of performance degradation.

### Why are the changes needed?
The PR #48222 caused the regression because it changed the `collationNameToId` function and made it slightly slower by removing a short-circuit for fetching the UTF8_BINARY collation. However this function should be called fixed amount of times for each query and from the benchmark framework at most once - this was not the case and it was the largest contributor to performance regression.

This PR addresses the benchmarking framework to not call this function at each expression, but once per the test case.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing testing surface, benchmarks.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #48804 from stevomitric/stevomitric/fix-utf8_binary-regression.

Authored-by: Stevo Mitric <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants