Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distsql: implement distributed DISTINCT #10034

Merged
merged 1 commit into from
Oct 18, 2016

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented Oct 17, 2016

Implementation of the Distinct processor core type, pretty trivial. Can't think of a verb form for distinct, distinguisher seems off but no strong opinions here. Sent this PR out now for early review, now writing out the optimizations for pre-existing ordering in the input rows taking advantage of the fact that the seen set can be reset whenever we find consecutive rows differing on a prefix comprised of ordered columns.

Part of #7587.
cc @RaduBerinde, @andreimatei


This change is Reviewable

@RaduBerinde
Copy link
Member

:lgtm:


Review status: 0 of 7 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


pkg/sql/distsql/distinct.go, line 125 at r1 (raw file):

  }

  return sqlbase.EncodeDTuple(b, d.tuple)

EncDatum has an Encode function. Using that instead of going to a Tuple and EncodeDTuple can be more efficient - the desired encoding matches the one we already have, we can avoid decoding and reencoding.


Comments from Reviewable

@irfansharif irfansharif force-pushed the dist-distinct branch 3 times, most recently from 150001f to 36817a2 Compare October 18, 2016 17:17
Implementation of the `Distinct` processor core type.
@irfansharif
Copy link
Contributor Author

TFTR, pulling it in now.


Review status: 0 of 7 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


pkg/sql/distsql/distinct.go, line 125 at r1 (raw file):

Previously, RaduBerinde wrote…

EncDatum has an Encode function. Using that instead of going to a Tuple and EncodeDTuple can be more efficient - the desired encoding matches the one we already have, we can avoid decoding and reencoding.

Done.

Comments from Reviewable

@irfansharif irfansharif changed the title implement distributed DISTINCT distsql: implement distributed DISTINCT Oct 18, 2016
@irfansharif irfansharif merged commit b7d11b9 into cockroachdb:master Oct 18, 2016
@RaduBerinde
Copy link
Member

pkg/sql/distsql/distinct.go, line 118 at r2 (raw file):

      enc, ok := row[col].Encoding()
      if !ok {
          return nil, errors.Errorf("encoding not available for %s", row[col])

We need to fix an encoding for each column and always use that. Different rows may come with different encodings, e.g. if they come from different streams that were merged. Then the encodings won't match. Also, this no-encoding case is legitimate - e.g. a new result from an Evaluator will have a Datum but no encoding.

It's ok for now to just always choose DatumEncoding_VALUE with a TODO to maybe check the first row for what encodings are already available.


Comments from Reviewable

@irfansharif
Copy link
Contributor Author

Review status: 0 of 7 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


pkg/sql/distsql/distinct.go, line 118 at r2 (raw file):

Previously, RaduBerinde wrote…

We need to fix an encoding for each column and always use that. Different rows may come with different encodings, e.g. if they come from different streams that were merged. Then the encodings won't match. Also, this no-encoding case is legitimate - e.g. a new result from an Evaluator will have a Datum but no encoding.

It's ok for now to just always choose DatumEncoding_VALUE with a TODO to maybe check the first row for what encodings are already available.

will update this with the partial ordering optimization PR.

Comments from Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants