[SPARK-14112] [SQL] [WIP] Unique Constraints over a Set of AttributeReferences #11930
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR is to introduce unique constraints over a set of
AttributeReference
s. Below are just two of use cases:Distinct
from the plan, as shown in the PR: [SPARK-14032] [SQL] Eliminate Unnecessary Distinct/Aggregate #11854.Join
from the plan, as shown in the PR:[SPARK-11077] [SQL] Join elimination in Catalyst #9089
We can infer the output uniqueness of the current operator through the uniqueness of the child node's output. The bottom-up propagation rule of unique constraints is
Distinct
,Intersect
,Except
always return distinct values.Aggregate
has three cases:Distinct
. It can always return distinct values.outputSet
is subset of its ownoutputSet
, it keeps the unique constraints of the child.Filter
,BroadcastHint
,Sort
,Window
,GlobalLimit
,LocalLimit
,Sample
andSubqueryAlias
still keep the unique constraints of the child.Left-semi Join
keeps the unique constraints of the left child.Project
keeps the unique constraints of the child if and only if the child'soutputSet
is subset of its ownoutputSet
How was this patch tested?
TODO: add a set of test cases for verifying the propagation rules