Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distsqlrun: lookup join efficiency improvements #39043

Merged
merged 4 commits into from
Jul 29, 2019

Conversation

jordanlewis
Copy link
Member

@jordanlewis jordanlewis commented Jul 23, 2019

  • distsqlrun: don't save lookup rows twice
  • distsqlrun: stream lookup join output
  • distsql: remove unused field from lookup join spec

Previously, lookup join buffered its rendered output rows before
emitting any of them, because this used to be a requirement when lookup
join was also responsible for doing a second layer of lookups against an
index. This was no longer necessary.

Now, after the result of a lookup batch is accumulated into memory, rows
are rendered and emitted in a row-at-a-time streaming fashion.

Also, remove the unused extra index filter expression field from the lookup join spec.

@jordanlewis jordanlewis requested review from a team July 23, 2019 02:40
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Previously, the joinReader saved each looked up row into an intermediate
buffer before rendering them for emission. This wasn't necessary - the
render step could be done on-demand, saving allocations and buffering.

Release note: None
@jordanlewis jordanlewis requested a review from a team July 25, 2019 19:34
@jordanlewis jordanlewis changed the title distsqlrun: don't save lookup rows twice distsqlrun: lookup join efficiency improvements Jul 25, 2019
Previously, lookup join buffered its rendered output rows before
emitting any of them, because this used to be a requirement when lookup
join was also responsible for doing a second layer of lookups against an
index. This was no longer necessary.

Now, after the result of a lookup batch is accumulated into memory, rows
are rendered and emitted in a row-at-a-time streaming fashion.
The lookup join spec and implementation used to have an extra filter
expression that was used when lookup join could also perform a second
index join style lookup. This was no longer used - delete it and bump
the min accepted distsql version (which has already been bumped for
19.2).

Release note: None
@jordanlewis
Copy link
Member Author

PTAL, thanks!

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, no nits from me :) LGTM.

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 7 of 7 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @solongordon)

Copy link
Contributor

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff. It's nice to see the lookup joiner becoming less of a beast.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)


pkg/sql/distsqlrun/joinreader.go, line 473 at r3 (raw file):

	*distsqlpb.ProducerMetadata,
) {
	for {

Minor: I think this would read better if rather than wrapping the whole thing in a for loop, you just returned jrEmittingRows, nil, nil whenever there is no output and let the state machine do its work.

Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/distsqlrun/joinreader.go, line 473 at r3 (raw file):

Previously, solongordon (Solon) wrote…

Minor: I think this would read better if rather than wrapping the whole thing in a for loop, you just returned jrEmittingRows, nil, nil whenever there is no output and let the state machine do its work.

Done.

craig bot pushed a commit that referenced this pull request Jul 29, 2019
39023: pkg: Add support for user specified target cols in IMPORT INTO r=adityamaru27 a=adityamaru27

This change adds the ability for users to specify a subset of the columns of an existing table to import into, from a CSV data source. 
(Part of the larger roadmap #26834)

39043: distsqlrun: lookup join efficiency improvements r=jordanlewis a=jordanlewis

- distsqlrun: don't save lookup rows twice
- distsqlrun: stream lookup join output
- distsql: remove unused field from lookup join spec

Previously, lookup join buffered its rendered output rows before
emitting any of them, because this used to be a requirement when lookup
join was also responsible for doing a second layer of lookups against an
index. This was no longer necessary.

Now, after the result of a lookup batch is accumulated into memory, rows
are rendered and emitted in a row-at-a-time streaming fashion.

Also, remove the unused extra index filter expression field from the lookup join spec.

39118: opt: add IndexOrdinal alias r=RaduBerinde a=RaduBerinde

Adding a `cat.IndexOrdinal` alias to make it more explicit when a
value is an index ordinal. It is only an alias because a separate type
would make things like loops more annoying.

Release note: None

39146: sql/tree: store From as value in SelectClause AST node r=nvanbenschoten a=nvanbenschoten

Drive by cleanup. The value is assumed to be non-nil in a number of
places, so we should avoid the unnecessary indirection.

Release note: None

Co-authored-by: Aditya Maru <[email protected]>
Co-authored-by: Jordan Lewis <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jul 29, 2019

Build succeeded

@craig craig bot merged commit ba6d1a6 into cockroachdb:master Jul 29, 2019
@jordanlewis jordanlewis deleted the lookupjoin-efc branch July 30, 2019 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants