-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49666][SQL] Add feature flag for trim collation feature #48222
Closed
jovanpavl-db
wants to merge
19
commits into
apache:master
from
jovanpavl-db:trim-collation-feature-initial-support
Closed
[SPARK-49666][SQL] Add feature flag for trim collation feature #48222
jovanpavl-db
wants to merge
19
commits into
apache:master
from
jovanpavl-db:trim-collation-feature-initial-support
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
stefankandic
suggested changes
Sep 24, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Left some minor comments but looks good overall
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
Outdated
Show resolved
Hide resolved
stefankandic
approved these changes
Sep 25, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending scalastyle fixes!
uros-db
reviewed
Sep 25, 2024
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
Show resolved
Hide resolved
uros-db
reviewed
Sep 25, 2024
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
Outdated
Show resolved
Hide resolved
uros-db
reviewed
Sep 25, 2024
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
Outdated
Show resolved
Hide resolved
uros-db
reviewed
Sep 25, 2024
uros-db
reviewed
Sep 25, 2024
uros-db
reviewed
Sep 25, 2024
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
Show resolved
Hide resolved
uros-db
reviewed
Sep 25, 2024
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala
Outdated
Show resolved
Hide resolved
uros-db
reviewed
Sep 25, 2024
uros-db
reviewed
Sep 25, 2024
common/unsafe/src/test/scala/org/apache/spark/unsafe/types/CollationFactorySuite.scala
Show resolved
Hide resolved
HyukjinKwon
reviewed
Sep 25, 2024
cloud-fan
reviewed
Sep 26, 2024
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Show resolved
Hide resolved
cloud-fan
reviewed
Sep 26, 2024
cloud-fan
approved these changes
Sep 30, 2024
thanks, merging to master! |
attilapiros
pushed a commit
to attilapiros/spark
that referenced
this pull request
Oct 4, 2024
### What changes were proposed in this pull request? Introducing new specifier for trim collations (both leading and trailing trimming). These are initial changes so that trim specifier is recognized and put under feature flag (all code paths blocked). ### Why are the changes needed? Support for trailing space trimming is one of the requested feature by users. ### Does this PR introduce _any_ user-facing change? This is guarded by feature flag. ### How was this patch tested? Added tests to CollationSuite, SqlConfSuite and QueryCompilationErrorSuite. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48222 from jovanpavl-db/trim-collation-feature-initial-support. Authored-by: Jovan Pavlovic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
himadripal
pushed a commit
to himadripal/spark
that referenced
this pull request
Oct 19, 2024
### What changes were proposed in this pull request? Introducing new specifier for trim collations (both leading and trailing trimming). These are initial changes so that trim specifier is recognized and put under feature flag (all code paths blocked). ### Why are the changes needed? Support for trailing space trimming is one of the requested feature by users. ### Does this PR introduce _any_ user-facing change? This is guarded by feature flag. ### How was this patch tested? Added tests to CollationSuite, SqlConfSuite and QueryCompilationErrorSuite. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48222 from jovanpavl-db/trim-collation-feature-initial-support. Authored-by: Jovan Pavlovic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
MaxGekk
pushed a commit
that referenced
this pull request
Nov 14, 2024
…ationNameToId` outside of cases ### What changes were proposed in this pull request? In this PR, UTF8_BINARY performance regression is addressed, that was first identified here #48721. The regression is traced back to this PR #48222 when it first occurred, however this isn't the actual source of performance degradation. ### Why are the changes needed? The PR #48222 caused the regression because it changed the `collationNameToId` function and made it slightly slower by removing a short-circuit for fetching the UTF8_BINARY collation. However this function should be called fixed amount of times for each query and from the benchmark framework at most once - this was not the case and it was the largest contributor to performance regression. This PR addresses the benchmarking framework to not call this function at each expression, but once per the test case. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing testing surface, benchmarks. ### Was this patch authored or co-authored using generative AI tooling? No Closes #48804 from stevomitric/stevomitric/fix-utf8_binary-regression. Authored-by: Stevo Mitric <[email protected]> Signed-off-by: Max Gekk <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Introducing new specifier for trim collations (both leading and trailing trimming). These are initial changes so that trim specifier is recognized and put under feature flag (all code paths blocked).
Why are the changes needed?
Support for trailing space trimming is one of the requested feature by users.
Does this PR introduce any user-facing change?
This is guarded by feature flag.
How was this patch tested?
Added tests to CollationSuite, SqlConfSuite and QueryCompilationErrorSuite.
Was this patch authored or co-authored using generative AI tooling?
No.