Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50373] Prohibit Variant from set operations #48909

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

harshmotw-db
Copy link
Contributor

What changes were proposed in this pull request?

Prior to this PR, Variant columns could be used with set operations like DISTINCT, INTERSECT and EXCEPT. This PR prohibits this behavior since Variant is not orderable.

Why are the changes needed?

Variant equality is not defined, and therefore, these operations are also undefined.

Does this PR introduce any user-facing change?

Yes, users will now no longer be able to perform set operations on variant columns.

How was this patch tested?

Unit tests

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Nov 20, 2024
@harshmotw-db harshmotw-db marked this pull request as ready for review November 20, 2024 17:27
@harshmotw-db
Copy link
Contributor Author

@gene-db @cloud-fan Can you please look at this? Thanks!

Copy link
Contributor

@gene-db gene-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshmotw-db Thanks catching the undefined behavior!

LGTM

Copy link
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we should probably not allow these until we can think about it further. What about GROUP BY and SELECT DISTINCT, should we prohibit those too?

@harshmotw-db
Copy link
Contributor Author

@dtenedor This PR prohibits select distinct, and GROUP BY was already disabled. It was believe that this code path would also disable other operations which require grouping but apparently not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants