Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distsql: Set operations (UNION, INTERSECT, EXCEPT) #10432

Closed
irfansharif opened this issue Nov 3, 2016 · 20 comments
Closed

distsql: Set operations (UNION, INTERSECT, EXCEPT) #10432

irfansharif opened this issue Nov 3, 2016 · 20 comments
Assignees
Labels
E-intermediate Intermediate complexity, needs a contributor with 3-6 months of past contribution experience. help wanted Help is requested / needed by the one who filed the issue to fix it.
Milestone

Comments

@irfansharif
Copy link
Contributor

Implementation of a processor "core" type from #7587.
Set operations would include UNION, UNION ALL, INTERSECT, INTERSECT ALL, EXCEPT, EXCEPT ALL.

This processor, like join processors, would have two input streams and one output stream. Some of existing processor core type implementations under pkg/sql/distsql and spec definitions in pkg/sql/distsql/processors.proto can be used for reference.

@irfansharif irfansharif added help wanted Help is requested / needed by the one who filed the issue to fix it. E-intermediate Intermediate complexity, needs a contributor with 3-6 months of past contribution experience. labels Nov 3, 2016
@a6802739
Copy link
Contributor

a6802739 commented Nov 3, 2016

@irfansharif , I could have a look on it.

@irfansharif
Copy link
Contributor Author

@a6802739: go for it! let me know if you need any pointers.

@a6802739 a6802739 self-assigned this Nov 3, 2016
@a6802739
Copy link
Contributor

a6802739 commented Nov 3, 2016

@irfansharif , okay, thanks a lot. At first, I'll have a see on it.

@a6802739
Copy link
Contributor

a6802739 commented Nov 9, 2016

@irfansharif, should I add a Message UnionSpec in pkg/sql/distsql/processors.proto, and generate a new struct Union under pkg/sql/distsql?

When I see the real implemention of aggregator, I couldn't quite understand what does buckets means?
For example in distsql/processors.pb.go

GroupCols []uint32 `protobuf:"varint,2,rep,name=group_cols,json=groupCols" json:"group_cols,omitempty"`

what does the GroupCols means?

And in distsql/aggregator.go, what does the buckets mean?

buckets   map[string]struct{} // The set of bucket keys.

And should I just translate sql/union.go to distsql/union.go?

cc @irfansharif .

@irfansharif
Copy link
Contributor Author

hey @a6802739,

for the spec definition going with SetSpec would be best, this would internally specify which operation to apply (UNION, INTERSECT, EXCEPT). As for the struct itself going with a simple type set struct { ... } should suffice (under pkg/sql/distsql/set.go).

And should I just translate sql/union.go to distsql/union.go?

I would the implementations would function similarly but keep in mind under distsql we're operating under the abstraction that we simply operate on streams. There's also a lot of additional baggage in sql/union.go given our current structure constructing a planNode to evaluate queries which we don't need to satisfy here.

As far as your questions concerning aggregator.go, feel free to ping me on gitter @irfansharif and will be happy to walk you through it there just for the sake of not polluting this thread. for reference I would advise you to look at distinct.go as opposed to aggregator.go, it's more pertinent to Set as compared to aggregator responsible for grouping aggregators (SUM, AVG, etc).

@irfansharif
Copy link
Contributor Author

cc @arjunravinarayan, given you expressed interest in taking this on as well.

@RaduBerinde
Copy link
Member

This issue has two parts:

  • physical processor implementation for set operations
  • physical planning for set operations

The first would be better as a one-off project for someone who is not familiar with distsql. The latter is not very difficult but it requires understanding a lot of the existing planning code.

@rjnn
Copy link
Contributor

rjnn commented Feb 2, 2017

Hey @a6802739, what's your status on this? I'm thinking of taking this one on since it's been sitting here for a while, unless you've already done some work on this.

@rjnn rjnn assigned rjnn and unassigned a6802739 Feb 14, 2017
rjnn pushed a commit to rjnn/cockroach that referenced this issue Feb 16, 2017
This addresses cockroachdb#10432, but does not finish it as it does not include
additions to the DistSQL physical planner to use the SET processor.

This is a work in progress out for quick feedback, it only implements
the UNION ALL processor, but it involves a bunch of scaffolding for
which I would love some feedback.

[*] Implement basic scaffolding
[*] Split off StreamCacher into its own file
[*] Implement one single SET operation: Union All
[*] Implement testing for Union All

[ ] Implement Union
[ ] Implement Intersect
[ ] Implement Intersect All
[ ] Implement Except
[ ] Implement Except All
[ ] Implement all other tests
@vivekmenezes
Copy link
Contributor

abhishek has been assigned this issue to complete the work on INTERSECT and EXCEPT

@knz knz added this to the 2.1 milestone Jan 29, 2018
abhimadan pushed a commit to abhimadan/cockroach that referenced this issue Jan 29, 2018
Part of cockroachdb#10432, cockroachdb#21661, and cockroachdb#21706.

Release note (performance improvement): Support distributed execution of
INTERSECT ALL and EXCEPT ALL queries.
abhimadan pushed a commit to abhimadan/cockroach that referenced this issue Jan 30, 2018
Part of cockroachdb#10432, cockroachdb#21661, and cockroachdb#21706.

Release note (performance improvement): Support distributed execution of
INTERSECT ALL and EXCEPT ALL queries.
abhimadan pushed a commit to abhimadan/cockroach that referenced this issue Jan 30, 2018
Part of cockroachdb#10432, cockroachdb#21661, and cockroachdb#21706.

Release note (performance improvement): Support distributed execution of
INTERSECT ALL and EXCEPT ALL queries.
abhimadan pushed a commit to abhimadan/cockroach that referenced this issue Feb 5, 2018
Part of cockroachdb#10432, cockroachdb#21661, and cockroachdb#21706.

Closes cockroachdb#16064.

Release note (performance improvement): Support distributed execution of
INTERSECT ALL and EXCEPT ALL queries.
abhimadan pushed a commit to abhimadan/cockroach that referenced this issue Feb 6, 2018
Part of cockroachdb#10432, cockroachdb#21661, and cockroachdb#21706.

Closes cockroachdb#16064.

Release note (performance improvement): Support distributed execution of
INTERSECT ALL and EXCEPT ALL queries.
abhimadan pushed a commit to abhimadan/cockroach that referenced this issue Feb 12, 2018
Fixes cockroachdb#10432.
Fixes cockroachdb#21661.
Fixes cockroachdb#21706.

Release note (performance improvement): Support distributed execution of
INTERSECT and EXCEPT queries.
abhimadan pushed a commit to abhimadan/cockroach that referenced this issue Feb 14, 2018
Fixes cockroachdb#10432.
Fixes cockroachdb#21661.
Fixes cockroachdb#21706.

Release note (performance improvement): Support distributed execution of
INTERSECT and EXCEPT queries.
abhimadan pushed a commit to abhimadan/cockroach that referenced this issue Feb 15, 2018
Fixes cockroachdb#10432.
Fixes cockroachdb#21661.
Fixes cockroachdb#21706.

Release note (performance improvement): Support distributed execution of
INTERSECT and EXCEPT queries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-intermediate Intermediate complexity, needs a contributor with 3-6 months of past contribution experience. help wanted Help is requested / needed by the one who filed the issue to fix it.
Projects
None yet
Development

No branches or pull requests