-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50373] Prohibit Variant from set operations #48909
base: master
Are you sure you want to change the base?
[SPARK-50373] Prohibit Variant from set operations #48909
Conversation
@gene-db @cloud-fan Can you please look at this? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harshmotw-db Thanks catching the undefined behavior!
LGTM
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, we should probably not allow these until we can think about it further. What about GROUP BY and SELECT DISTINCT, should we prohibit those too?
@dtenedor This PR prohibits |
…ariant_distinct_fix
…hub.com/harshmotw-db/spark into variant_distinct_fix
What changes were proposed in this pull request?
Prior to this PR, Variant columns could be used with set operations like
DISTINCT
,INTERSECT
andEXCEPT
. This PR prohibits this behavior since Variant is not orderable.Why are the changes needed?
Variant equality is not defined, and therefore, these operations are also undefined.
Does this PR introduce any user-facing change?
Yes, users will now no longer be able to perform set operations on variant columns.
How was this patch tested?
Unit tests
Was this patch authored or co-authored using generative AI tooling?
No