-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distsqlrun: lookup join efficiency improvements #39043
Conversation
a1d5b84
to
800d2fe
Compare
Previously, the joinReader saved each looked up row into an intermediate buffer before rendering them for emission. This wasn't necessary - the render step could be done on-demand, saving allocations and buffering. Release note: None
800d2fe
to
acdb5b6
Compare
acdb5b6
to
1616242
Compare
Previously, lookup join buffered its rendered output rows before emitting any of them, because this used to be a requirement when lookup join was also responsible for doing a second layer of lookups against an index. This was no longer necessary. Now, after the result of a lookup batch is accumulated into memory, rows are rendered and emitted in a row-at-a-time streaming fashion.
1616242
to
c378d0c
Compare
The lookup join spec and implementation used to have an extra filter expression that was used when lookup join could also perform a second index join style lookup. This was no longer used - delete it and bump the min accepted distsql version (which has already been bumped for 19.2). Release note: None
c378d0c
to
aae2891
Compare
PTAL, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, no nits from me :) LGTM.
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 7 of 7 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @solongordon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff. It's nice to see the lookup joiner becoming less of a beast.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)
pkg/sql/distsqlrun/joinreader.go, line 473 at r3 (raw file):
*distsqlpb.ProducerMetadata, ) { for {
Minor: I think this would read better if rather than wrapping the whole thing in a for loop, you just returned jrEmittingRows, nil, nil
whenever there is no output and let the state machine do its work.
Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/distsqlrun/joinreader.go, line 473 at r3 (raw file):
Previously, solongordon (Solon) wrote…
Minor: I think this would read better if rather than wrapping the whole thing in a for loop, you just returned
jrEmittingRows, nil, nil
whenever there is no output and let the state machine do its work.
Done.
39023: pkg: Add support for user specified target cols in IMPORT INTO r=adityamaru27 a=adityamaru27 This change adds the ability for users to specify a subset of the columns of an existing table to import into, from a CSV data source. (Part of the larger roadmap #26834) 39043: distsqlrun: lookup join efficiency improvements r=jordanlewis a=jordanlewis - distsqlrun: don't save lookup rows twice - distsqlrun: stream lookup join output - distsql: remove unused field from lookup join spec Previously, lookup join buffered its rendered output rows before emitting any of them, because this used to be a requirement when lookup join was also responsible for doing a second layer of lookups against an index. This was no longer necessary. Now, after the result of a lookup batch is accumulated into memory, rows are rendered and emitted in a row-at-a-time streaming fashion. Also, remove the unused extra index filter expression field from the lookup join spec. 39118: opt: add IndexOrdinal alias r=RaduBerinde a=RaduBerinde Adding a `cat.IndexOrdinal` alias to make it more explicit when a value is an index ordinal. It is only an alias because a separate type would make things like loops more annoying. Release note: None 39146: sql/tree: store From as value in SelectClause AST node r=nvanbenschoten a=nvanbenschoten Drive by cleanup. The value is assumed to be non-nil in a number of places, so we should avoid the unnecessary indirection. Release note: None Co-authored-by: Aditya Maru <[email protected]> Co-authored-by: Jordan Lewis <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
Build succeeded |
Previously, lookup join buffered its rendered output rows before
emitting any of them, because this used to be a requirement when lookup
join was also responsible for doing a second layer of lookups against an
index. This was no longer necessary.
Now, after the result of a lookup batch is accumulated into memory, rows
are rendered and emitted in a row-at-a-time streaming fashion.
Also, remove the unused extra index filter expression field from the lookup join spec.