-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colexec: implement vectorized index join #67450
Conversation
There are a few things/questions I would like to draw attention to:
|
I ran the tpch workload for each query that has an index join vs master; here are the results:
|
c2a6e2a
to
54c86e5
Compare
Wow @DrewKimball! The results speak for themselves, the speedup far surpasses my expectation. Excellent work. |
@jordanlewis thanks! I should mention that most of the gains come from using larger batches of spans (default 4MB for the underlying key bytes). If someone more knowledgable about the KV fetcher code than I am points out a problem with that, we may have to decrease the limit. Though, I found pretty nice gains with just 1MB as well. |
6d58e0d
to
dae3f86
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work and great speedup!
nit: I think "Addresses" doesn't automatically close the linked issue.
Reviewed 11 of 16 files at r1, 3 of 8 files at r2, 4 of 6 files at r3, 1 of 1 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)
pkg/col/coldata/bytes.go, line 413 at r4 (raw file):
} // ResetForAppend is similar to Reset, but is also resets the offsets slice so
nit: s/is also/it also/
.
pkg/sql/colexec/colbuilder/execplan.go, line 803 at r4 (raw file):
} if core.JoinReader.LookupColumns != nil || !core.JoinReader.LookupExpr.Empty() { return r, errors.Newf("lookup join reader is unsupported in vectorized")
nit: this should be an assertion failure.
pkg/sql/colexec/colexecjoin/span_assembler_tmpl.go, line 14 at r4 (raw file):
// +build execgen_template // // This file is the execgen template for span_encoder.eg.go. It's formatted in a
nit: s/span_encoder/span_assembler/
.
pkg/sql/colexec/colexecjoin/span_assembler_tmpl.go, line 55 at r4 (raw file):
// Add span encoders to encode each primary key column as bytes. The // ColSpanAssembler will later append these together to form valid spans. var spanEncoders []spanEncoder
nit: we could allocate the slice of the correct capacity upfront.
pkg/sql/colexec/colexecjoin/span_assembler_tmpl.go, line 63 at r4 (raw file):
b := spanAssemblerPool.Get().(*spanAssemblerBase) base := spanAssemblerBase{
nit: why not update b
in-place while keeping the reference to the span slice?
pkg/sql/colexec/colexecjoin/span_assembler_tmpl.go, line 68 at r4 (raw file):
keyPrefix: rowenc.MakeIndexKeyPrefix(codec, table, index.GetID()), spanEncoders: spanEncoders, spanCols: make([]*coldata.Bytes, len(spanEncoders)),
This slice could also be kept when putting the object back to the pool (only need to nil out all elements first).
pkg/sql/colexec/colexecjoin/span_assembler_tmpl.go, line 87 at r4 (raw file):
// ColSpanAssembler is a utility operator that generates a series of spans from // input batches which can be used to perform an index join or lookup join.
super nit: maybe not mention lookup join since only index joins are now supported and lookup joins seem hard?
pkg/sql/colexec/colexecjoin/span_assembler_tmpl.go, line 175 at r4 (raw file):
op.spans = op.spans[:0] } op.shouldReset = false
super nit: this could be moved into the if
above.
pkg/sql/colexec/colexecjoin/span_assembler_tmpl.go, line 189 at r4 (raw file):
for i := 0; i < n; i++ { // Every key has a prefix encoding the table, index, etc. op.scratchKey = append(op.scratchKey[:0], op.keyPrefix...)
Do we modify scratchKey
somewhere? If not, we could copy the key prefix only once, outside of the loop, and then slice the scratch up correctly here.
pkg/sql/colexec/execgen/cmd/execgen/span_encoder_gen.go, line 38 at r4 (raw file):
s = assignAddRe.ReplaceAllString(s, makeTemplateFunctionCall("AssignSpanEncoding", 2)) s = replaceManipulationFuncs(s)
nit: I think we don't need this.
pkg/sql/colfetcher/BUILD.bazel, line 25 at r4 (raw file):
"//pkg/sql/colconv", "//pkg/sql/colencoding", "//pkg/sql/colexec/colexecjoin",
I think the dependency of colfetcher
on colexecjoin
is a bit unfortunate since the join package contains lots of generated code. I think it might be worth creating a separate package for the columnar span stuff.
pkg/sql/colfetcher/index_join.go, line 54 at r4 (raw file):
mu struct { syncutil.Mutex // rowsRead contains the number of total rows this ColBatchScan has
nit: s/ColBatchScan/ColIndexJoin/
.
pkg/sql/colfetcher/index_join.go, line 74 at r4 (raw file):
var _ colexecop.KVReader = &ColIndexJoin{} var _ execinfra.Releasable = &ColIndexJoin{} var _ colexecop.Closer = &ColIndexJoin{}
super nit: these two lines could be squashed into one with ClosableOperator
.
pkg/sql/colfetcher/index_join.go, line 121 at r4 (raw file):
if !s.maintainOrdering { // Sort the spans for the following cases: // - For lookupJoinReaderType: this is so that we can rely upon the
nit: needs an adjustment.
pkg/sql/colfetcher/index_join.go, line 134 at r4 (raw file):
// Handle metadata for these spans. if s.nodeID != 0 {
In ColBatchScan we also check that the flow is not local before retrieving the misplanned ranges, should we do the same here?
pkg/sql/colfetcher/index_join.go, line 266 at r4 (raw file):
// Before we can safely use types from the table descriptor, we need to // make sure they are hydrated. In row execution engine it is done during // the processor initialization, but neither ColBatchScan nor cFetcher are
nit: s/ColBatchScan/ColIndexJoin/
.
pkg/sql/colfetcher/index_join.go, line 279 at r4 (raw file):
index := table.ActiveIndexes()[indexIdx] proc := &execinfra.ProcOutputHelper{}
Not sure if it's worth it, but in case post.OutputColumns
is set (which is the case when we don't need to perform any rendering on top of the index join), we can avoid the allocation of the helper since OutputColumns
will contain all needed columns.
pkg/sql/colfetcher/index_join.go, line 298 at r4 (raw file):
fetcher.estimatedRowCount = 0 if err := fetcher.Init( flowCtx.Codec(), allocator, execinfra.GetWorkMemLimit(flowCtx), false, /* false */
nit: suspicious comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For the spans slice in the SpanAssembler operator, I am registering each increase in the slice's capacity with the allocator. I'm not sure whether this is the correct thing to do, considering the slice is retrieved from and returned to a pool.
- I am registering the memory allocated for the span keys (the underlying byte slices). However, the span generation operators do not own this memory, since the kv span fetching code keeps references to the underlying bytes. So, hypothetically a very large index join could cause the operator to hit the memory limit no matter how the input is batched. Maybe the SpanAssembler operator should release that memory once the spans slice is reset, even if it can't actually be garbage collected yet?
I guess these questions might be best answered by Becca given her recent work on the memory accounting in the join reader. As I mentioned earlier, IMO it's ok for this PR to not concern itself with the memory accounting since this is what we have on master right now.
- EncodeTableKey encodes Bytes and String types using EncodeStringAscending instead of EncodeBytesAscending. As far as I can tell, these do the same thing, except the former involves a few casts. Is it alright to just use EncodeBytesAscending and EncodeBytesDescending for all types that are stored in the vectorized engine as bytes?
Hm, my reading of the code is the same, so it seems reasonable to me.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)
pkg/col/coldata/bytes.go, line 413 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/is also/it also/
.
Done.
pkg/sql/colexec/colbuilder/execplan.go, line 803 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: this should be an assertion failure.
Done.
pkg/sql/colexec/colexecjoin/span_assembler_tmpl.go, line 14 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/span_encoder/span_assembler/
.
Done.
pkg/sql/colexec/colexecjoin/span_assembler_tmpl.go, line 55 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: we could allocate the slice of the correct capacity upfront.
Done.
pkg/sql/colexec/colexecjoin/span_assembler_tmpl.go, line 63 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: why not update
b
in-place while keeping the reference to the span slice?
Done.
pkg/sql/colexec/colexecjoin/span_assembler_tmpl.go, line 68 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
This slice could also be kept when putting the object back to the pool (only need to nil out all elements first).
Done. May as well do the same with the spanEncoders slice, I guess.
pkg/sql/colexec/colexecjoin/span_assembler_tmpl.go, line 87 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
super nit: maybe not mention lookup join since only index joins are now supported and lookup joins seem hard?
Done.
pkg/sql/colexec/colexecjoin/span_assembler_tmpl.go, line 175 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
super nit: this could be moved into the
if
above.
Done.
pkg/sql/colexec/colexecjoin/span_assembler_tmpl.go, line 189 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Do we modify
scratchKey
somewhere? If not, we could copy the key prefix only once, outside of the loop, and then slice the scratch up correctly here.
Done.
pkg/sql/colexec/execgen/cmd/execgen/span_encoder_gen.go, line 38 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: I think we don't need this.
Forgot to remove that after getting execgen:inline
to work.
pkg/sql/colfetcher/BUILD.bazel, line 25 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I think the dependency of
colfetcher
oncolexecjoin
is a bit unfortunate since the join package contains lots of generated code. I think it might be worth creating a separate package for the columnar span stuff.
Done.
pkg/sql/colfetcher/index_join.go, line 54 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/ColBatchScan/ColIndexJoin/
.
Done.
pkg/sql/colfetcher/index_join.go, line 74 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
super nit: these two lines could be squashed into one with
ClosableOperator
.
Done.
pkg/sql/colfetcher/index_join.go, line 121 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: needs an adjustment.
Done.
pkg/sql/colfetcher/index_join.go, line 134 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
In ColBatchScan we also check that the flow is not local before retrieving the misplanned ranges, should we do the same here?
Oh, yes we should. Done.
pkg/sql/colfetcher/index_join.go, line 266 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/ColBatchScan/ColIndexJoin/
.
Done.
pkg/sql/colfetcher/index_join.go, line 279 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Not sure if it's worth it, but in case
post.OutputColumns
is set (which is the case when we don't need to perform any rendering on top of the index join), we can avoid the allocation of the helper sinceOutputColumns
will contain all needed columns.
Its easy enough, so we may as well.
pkg/sql/colfetcher/index_join.go, line 298 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: suspicious comment.
Oops... fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 19 of 19 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)
pkg/sql/colexec/colbuilder/execplan.go, line 182 at r5 (raw file):
case spec.Core.JoinReader != nil: if spec.Core.JoinReader.LookupColumns != nil || !spec.Core.JoinReader.LookupExpr.Empty() { return errors.Newf("lookup join reader is unsupported in vectorized")
nit: I think it is worth creating global variables for these two errors (I have seen profiles where errors.Newf
calls were non-trivial in this file).
pkg/sql/colexec/colexecspan/dep_test.go, line 21 at r5 (raw file):
func TestNoLinkForbidden(t *testing.T) { buildutil.VerifyNoImports(t, "github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecwindow", true,
s/colexecwindow/colexecspan/
.
pkg/sql/colexec/colexecspan/main_test.go, line 33 at r5 (raw file):
) var (
nit: looks like you instantiate most of these objects here and in TestSpanAssembler
which seems redundant. Probably this file could be trimmed down (maybe only batch size randomization should be kept here).
pkg/sql/colfetcher/index_join.go, line 87 at r5 (raw file):
// cFetcher. Note that ProcessorSpan method itself will check whether // tracing is enabled. s.Ctx, s.tracingSpan = execinfra.ProcessorSpan(s.Ctx, "colindexjoin")
nit: I think the trace should be started before initializing the input.
pkg/sql/colfetcher/index_join.go, line 131 at r5 (raw file):
if !s.flowCtx.Local && s.nodeID != 0 { s.misplannedRanges = append(s.misplannedRanges, execinfra.MisplannedRanges(s.Ctx, spans, s.nodeID, s.flowCtx.Cfg.RangeCache)...)
Hm, I wonder whether MisplannedRanges
is a noop before we actually perform and finish the scan. I have a feeling that it is, and if that's true, this call is not useful. I think I would delete this altogether and leave a TODO (given that in the join reader we don't collect the misplanned info either).
Now that I've typed this out, I think that it actually makes sense that we should not try to collect misplanned ranges metadata in case of the index/lookup joins because their placement is determined by the physical placement of the table readers which are the inputs, so probably even a TODO is not needed.
pkg/sql/colfetcher/index_join.go, line 288 at r5 (raw file):
} tableArgs := row.FetcherTableArgs{
nit: do you think it's worth refactoring initCRowFetcher
a bit and reusing it here?
pkg/sql/colfetcher/index_join.go, line 299 at r5 (raw file):
fetcher := cFetcherPool.Get().(*cFetcher) fetcher.estimatedRowCount = 0
It might be worth plumbing through the estimated row count here since the index join is 1-to-1 from its input scan, and we might have a good estimate for that scan.
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 56 at r5 (raw file):
base.colFamStartKeys, base.colFamEndKeys = getColFamilyEncodings(neededCols, table, index) base.spanEncoders = base.spanEncoders[:0] base.spanCols = base.spanCols
nit: this line seems redundant.
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 136 at r5 (raw file):
// maxSpansLength tracks the largest length the spans field reaches, so that // all the Span objects can be deeply reset upon a call to close. maxSpansLength int
I wonder whether it'll be faster and maybe cleaner to not track this field, but in Release
method slice up spans
up to its capacity, nil out all elements, and then slice it up to 0 to put it back into the pool. Thoughts?
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 204 at r5 (raw file):
// in the latter, the kv fetcher logic maintains references for an unknown // amount of time. The row engine doesn't account for these either so it isn't // be a regression, but we really should find a good way to account for the
nit: s/be a/a/
.
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 219 at r5 (raw file):
return nil } b.shouldReset = true
Similarly, I wonder whether we should get rid off shouldReset
with something like
func (b *spanAssemblerBase) GetSpans() roachpb.Spans {
spans := b.spans
b.spans = b.spans[:0]
return spans
}
It slightly modifies the contact of GetSpans
in that it now would return zero length slice and not nil
if ConsumeBatch
hasn't been called, but it seems ok to me.
pkg/sql/colexec/colexecspan/span_encoder_tmpl.go, line 88 at r5 (raw file):
type spanEncoder interface { // next generates the encoding for the current key column for each row int the
nit: s/int/in/
.
pkg/sql/colexec/colexecspan/span_encoder_tmpl.go, line 128 at r5 (raw file):
} if op.outputBytes == nil { op.outputBytes = coldata.NewBytes(n)
This allocation seems to be unaccounted for which is undesirable I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some discussion with @yuzefovich and @rytaft, I've decided on the following approach to memory accounting for the span slice and key bytes: while the SpanAssembler
maintains references to them, we register the memory usage in the allocator. Once the SpanAssembler
operator loses the references, we release the memory even though it may still be referenced elsewhere (the spans slice goes to a pool, while the key bytes are referenced by the kv fetcher code).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)
pkg/sql/colexec/colbuilder/execplan.go, line 182 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: I think it is worth creating global variables for these two errors (I have seen profiles where
errors.Newf
calls were non-trivial in this file).
Done. Should this be done for the other errors, or do those seem uncommon enough to not matter much?
pkg/sql/colexec/colexecspan/dep_test.go, line 21 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
s/colexecwindow/colexecspan/
.
Done.
pkg/sql/colexec/colexecspan/main_test.go, line 33 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: looks like you instantiate most of these objects here and in
TestSpanAssembler
which seems redundant. Probably this file could be trimmed down (maybe only batch size randomization should be kept here).
Done.
pkg/sql/colfetcher/index_join.go, line 87 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: I think the trace should be started before initializing the input.
Done.
pkg/sql/colfetcher/index_join.go, line 131 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Hm, I wonder whether
MisplannedRanges
is a noop before we actually perform and finish the scan. I have a feeling that it is, and if that's true, this call is not useful. I think I would delete this altogether and leave a TODO (given that in the join reader we don't collect the misplanned info either).Now that I've typed this out, I think that it actually makes sense that we should not try to collect misplanned ranges metadata in case of the index/lookup joins because their placement is determined by the physical placement of the table readers which are the inputs, so probably even a TODO is not needed.
Done.
pkg/sql/colfetcher/index_join.go, line 288 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: do you think it's worth refactoring
initCRowFetcher
a bit and reusing it here?
Sure, why not.
pkg/sql/colfetcher/index_join.go, line 299 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
It might be worth plumbing through the estimated row count here since the index join is 1-to-1 from its input scan, and we might have a good estimate for that scan.
Good idea. We can do even better - we can use the number of input rows for each span batch, and set it every time StartScan
is called.
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 56 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: this line seems redundant.
Done.
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 136 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I wonder whether it'll be faster and maybe cleaner to not track this field, but in
Release
method slice upspans
up to its capacity, nil out all elements, and then slice it up to 0 to put it back into the pool. Thoughts?
That sounds like a good idea.
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 204 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/be a/a/
.
Done.
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 219 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Similarly, I wonder whether we should get rid off
shouldReset
with something likefunc (b *spanAssemblerBase) GetSpans() roachpb.Spans { spans := b.spans b.spans = b.spans[:0] return spans }
It slightly modifies the contact of
GetSpans
in that it now would return zero length slice and notnil
ifConsumeBatch
hasn't been called, but it seems ok to me.
I like it. Done.
pkg/sql/colexec/colexecspan/span_encoder_tmpl.go, line 88 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit:
s/int/in/
.
Done.
pkg/sql/colexec/colexecspan/span_encoder_tmpl.go, line 128 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
This allocation seems to be unaccounted for which is undesirable I think.
Definitely an oversight. Fixed it.
0cec42f
to
ef3b8e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! but might be worth for someone (maybe @jordanlewis) also taking a look at span_assembler_tmpl
, especially towards the bottom of the file, and at the unit test for it too.
Reviewed 10 of 15 files at r6, 7 of 8 files at r7, 1 of 1 files at r10.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)
pkg/sql/colexec/colbuilder/execplan.go, line 182 at r5 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Done. Should this be done for the other errors, or do those seem uncommon enough to not matter much?
Yeah, I think those aren't common enough and occur for the queries that are more heavy-weight (in which case an extra allocation isn't a big deal), but feel free to extract all errors from supportedNatively
.
pkg/sql/colexec/colbuilder/execplan.go, line 276 at r10 (raw file):
errExperimentalWrappingProhibited = errors.New("wrapping for non-JoinReader and non-LocalPlanNode cores is prohibited in vectorize=experimental_always") errWrappedCast = errors.New("mismatched types in NewColOperator and unsupported casts") errLookupJoinUnsupported = errors.Newf("lookup join reader is unsupported in vectorized")
super nit: could use just New
not Newf
.
pkg/sql/colfetcher/index_join.go, line 299 at r5 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Good idea. We can do even better - we can use the number of input rows for each span batch, and set it every time
StartScan
is called.
Nice!
pkg/sql/colfetcher/index_join.go, line 82 at r10 (raw file):
// tracing is enabled. s.Ctx, s.tracingSpan = execinfra.ProcessorSpan(s.Ctx, "colindexjoin") s.Input.Init(ctx)
Here we should use s.Ctx
because in case the tracing was started, a new context was derived (and ctx
points to the old one).
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 111 at r10 (raw file):
type spanAssemblerBase struct { allocator *colmem.Allocator keyBytes int
nit: quick comment for keyBytes
would be helpful.
I guess one idea on how to increase the test coverage is to introduce another "test against processor" test for the index join in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a stab at it in a bit and see if it seems worth the trouble to add some "against processor" testing.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @michae2, and @yuzefovich)
pkg/sql/colexec/colbuilder/execplan.go, line 276 at r10 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
super nit: could use just
New
notNewf
.
Done.
pkg/sql/colfetcher/index_join.go, line 82 at r10 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Here we should use
s.Ctx
because in case the tracing was started, a new context was derived (andctx
points to the old one).
Done.
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 111 at r10 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: quick comment for
keyBytes
would be helpful.
Done.
All right, I've made the batch-size limiting more like the row engine. The number of rows from which to generate spans is now determined by an estimate of the memory footprint of those rows. The memory size of the lookup batch is calculated by adding up the memory used by the underlying data for the row (e.g. 8 bytes for int64, etc.) as well as some extra to account for overhead due to EncDatum structs. While the vectorized engine doesn't actually use EncDatums, we add the overhead anyway to achieve parity with the row engine. One difference is that we cannot include the EncDatum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jordanlewis is out for a week, and I think both him and myself would like for the vectorized index join input batches be as close as possible to the join reader, so introducing a multiplier sounds good to me. I agree that the encoded
field of an EncDatum shouldn't be larger that the Datum
field in size, so maybe use 1.25 multiplier as a somewhat arbitrary average?
Reviewed 16 of 16 files at r13.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @erikgrinaker, @jordanlewis, and @michae2)
pkg/sql/colexec/colbuilder/execplan.go, line 826 at r11 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
This setup code looks identical to that in the
TableReader
initialization section. Maybe we could introduce a helper function? I expect we'll also do this again if/when we support vectorized for general lookups joins, zigzag joins, etc.
+1
pkg/sql/colfetcher/cfetcher.go, line 1596 at r13 (raw file):
func (rf *cFetcher) Release() { rf.accountingHelper.Close() rf.table.Release()
nit: maybe rename AccountingHelper.Close
to Release
?
pkg/sql/colfetcher/index_join.go, line 220 at r13 (raw file):
return s.startIdx } for endIdx = s.startIdx; endIdx < n; endIdx++ {
I'm thinking that maybe we should have a quick check whether including the whole batch will put us over the limit or not, using GetBatchMemSize
(the fact that it includes the selection vector is probably negligible). Thoughts?
What's different here versus the cFetcher is that here we already have the full batch in hands, so we can know exactly the footprint of the batch whereas in the cFetcher we're still building out the batch, one row at a time.
pkg/sql/colfetcher/index_join.go, line 250 at r13 (raw file):
switch vec.CanonicalTypeFamily() { case types.DecimalFamily: rowSize += int64(tree.SizeOfDecimal(&vec.Decimal()[idx]))
I'd be curious to see whether these interface conversions (i.e. calling Decimal()
or Bytes()
) show up non-negligibly in the CPU profiles. I'm thinking that we might have to perform that conversion one time, once we receive the new input batch and only if we see that using the whole batch will put us over the limit.
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 90 at r13 (raw file):
// ConsumeBatch generates lookup spans from input batches and stores them to // later be returned by GetSpans. Spans are generated only for rows in the // range [startIdx, endIdx). If given indices such that startIdx >= endIdx,
super nit: sounds a bit unusual, maybe "If given indices are such that ..."?
7de64d4
to
f12c46b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up using 2.0 as the per-value multiplier
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @erikgrinaker, @jordanlewis, @michae2, and @yuzefovich)
pkg/sql/colexec/colbuilder/execplan.go, line 826 at r11 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
+1
Done.
pkg/sql/colfetcher/cfetcher.go, line 1596 at r13 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: maybe rename
AccountingHelper.Close
toRelease
?
Done.
pkg/sql/colfetcher/index_join.go, line 220 at r13 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I'm thinking that maybe we should have a quick check whether including the whole batch will put us over the limit or not, using
GetBatchMemSize
(the fact that it includes the selection vector is probably negligible). Thoughts?What's different here versus the cFetcher is that here we already have the full batch in hands, so we can know exactly the footprint of the batch whereas in the cFetcher we're still building out the batch, one row at a time.
That's a nice idea, though GetBatchMemSize
won't work because it will produce a much smaller number than the row engine would on the same data. I'll write a helper method to handle it.
pkg/sql/colfetcher/index_join.go, line 250 at r13 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I'd be curious to see whether these interface conversions (i.e. calling
Decimal()
orBytes()
) show up non-negligibly in the CPU profiles. I'm thinking that we might have to perform that conversion one time, once we receive the new input batch and only if we see that using the whole batch will put us over the limit.
Added some fields to the mem
struct so the conversions can be performed up-front. I've been hard-pressed to find a query to bench with that doesn't spend most of its time in KV, but I think this is worth optimizing anyway because it's called per-row.
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 182 at r11 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Things might get trickier for lookup joins, since we won't necessarily know up front whether rows can be split into column families (null values disallow it)
I'm going to stick with the current algorithm for now, because (a) I think it's a bit simpler and (b) it's more obvious how to extend it to more complicated span generation for lookup joins.
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 90 at r13 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
super nit: sounds a bit unusual, maybe "If given indices are such that ..."?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r14, 12 of 13 files at r15.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @erikgrinaker, @jordanlewis, @michae2, and @yuzefovich)
pkg/sql/colfetcher/index_join.go, line 220 at r13 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
That's a nice idea, though
GetBatchMemSize
won't work because it will produce a much smaller number than the row engine would on the same data. I'll write a helper method to handle it.
I think we could add some stuff on top of GetBatchMemSize
to be essentially what we have in getBatchSize
right now.
If I'm reading the code correctly, getBatchSize = GetBatchMemSize() * memEstimateMultiplier + l * w * memEstimateAdditive
where l = batch.Length()
and w = batch.Width()
. Maybe also we need to add EncDatumRowOverhead * l
.
f12c46b
to
2158334
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @erikgrinaker, @jordanlewis, @michae2, and @yuzefovich)
pkg/sql/colfetcher/index_join.go, line 220 at r13 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I think we could add some stuff on top of
GetBatchMemSize
to be essentially what we have ingetBatchSize
right now.If I'm reading the code correctly,
getBatchSize = GetBatchMemSize() * memEstimateMultiplier + l * w * memEstimateAdditive
wherel = batch.Length()
andw = batch.Width()
. Maybe also we need to addEncDatumRowOverhead * l
.
Yeah, I guess it doesn't need to be exact, just good enough. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to get the performance numbers on the relevant TPCH queries as well as a sanity check on the number of "input batches" from the perspective of the index join both in the row-by-row and vectorized engine (to make sure that our estimates are reasonable).
Reviewed 4 of 4 files at r16.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @erikgrinaker, @jordanlewis, and @michae2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Batch sizes look good: https://gist.github.com/DrewKimball/e8d523d32671e9fa4e16777b471c110a
TPCH perf:
Master:
drewkimball@drews-mbp cockroach % bin/workload run tpch --queries '4,5,6,10,12,14,15,20' --max-ops 8 --concurrency 1
I210811 04:11:22.719317 1 workload/cli/run.go:404 [-] 1 creating load generator...
I210811 04:11:22.719452 1 workload/cli/run.go:435 [-] 2 creating load generator... done (took 136µs)
I210811 04:11:25.526325 82 workload/tpch/tpch.go:480 [-] 3 [q4] returned 5 rows after 2.81 seconds
I210811 04:11:30.293969 82 workload/tpch/tpch.go:480 [-] 4 [q5] returned 5 rows after 4.77 seconds
I210811 04:11:40.634066 82 workload/tpch/tpch.go:480 [-] 5 [q6] returned 1 rows after 10.34 seconds
I210811 04:11:44.439470 82 workload/tpch/tpch.go:480 [-] 6 [q10] returned 20 rows after 3.81 seconds
I210811 04:11:56.159936 82 workload/tpch/tpch.go:480 [-] 7 [q12] returned 2 rows after 11.72 seconds
I210811 04:11:57.293476 82 workload/tpch/tpch.go:480 [-] 8 [q14] returned 1 rows after 1.13 seconds
I210811 04:12:03.696872 82 workload/tpch/tpch.go:480 [-] 9 [q15] returned 1 rows after 6.40 seconds
I210811 04:12:21.027038 82 workload/tpch/tpch.go:480 [-] 10 [q20] returned 186 rows after 17.33 seconds
Branch:
drewkimball@drews-mbp cockroach % bin/workload run tpch --queries '4,5,6,10,12,14,15,20' --max-ops 8 --concurrency 1
I210811 04:14:30.642055 1 workload/cli/run.go:404 [-] 1 creating load generator...
I210811 04:14:30.642258 1 workload/cli/run.go:435 [-] 2 creating load generator... done (took 206µs)
I210811 04:14:33.409946 36 workload/tpch/tpch.go:480 [-] 3 [q4] returned 5 rows after 2.77 seconds
I210811 04:14:37.602520 36 workload/tpch/tpch.go:480 [-] 4 [q5] returned 5 rows after 4.19 seconds
I210811 04:14:46.229121 36 workload/tpch/tpch.go:480 [-] 5 [q6] returned 1 rows after 8.63 seconds
I210811 04:14:50.058010 36 workload/tpch/tpch.go:480 [-] 6 [q10] returned 20 rows after 3.83 seconds
I210811 04:15:00.080270 36 workload/tpch/tpch.go:480 [-] 7 [q12] returned 2 rows after 10.02 seconds
I210811 04:15:01.063998 36 workload/tpch/tpch.go:480 [-] 8 [q14] returned 1 rows after 0.98 seconds
I210811 04:15:06.490356 36 workload/tpch/tpch.go:480 [-] 9 [q15] returned 1 rows after 5.43 seconds
I210811 04:15:22.838457 36 workload/tpch/tpch.go:480 [-] 10 [q20] returned 186 rows after 16.35 seconds
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @erikgrinaker, @jordanlewis, and @michae2)
2158334
to
710b947
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the commit message as well as the PR description need an update.
Reviewed 1 of 1 files at r17.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @erikgrinaker, @jordanlewis, and @michae2)
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 40 at r16 (raw file):
// NewColSpanAssembler returns a ColSpanAssembler operator that is able to // generate lookup spans from input batches. The given size limit determines
nit: we've removed the size limit.
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 161 at r16 (raw file):
// Every key has a prefix encoding the table, index, etc. op.scratchKey = append(op.scratchKey[:0], op.keyPrefix...)
nit: we could do this once, maybe in Init
or in the constructor, to save up some work (but since this is roughly once per batch, feel free to disregard). We could also use len(op.keyPrefix)
for prefixLen
.
I also kicked off
|
This patch provides a vectorized implementation of the index join operator. Span generation is accomplished using two utility operators. `spanEncoder` operates on a single index key column and fills a `Bytes` column with the encoding of that column for each row. `spanAssembler` takes the output of each `spanEncoder` and generates spans, accounting for table/index prefixes and possibly splitting the spans over column families. Finally, the `ColIndexJoin` operator uses the generated spans to perform a lookup on the table's primary index, returns all batches resulting from the lookup, and repeats until the input is fully consumed. The `ColIndexJoin` operator queues up input rows until the memory footprint of the rows reaches a preset limit (default 4MB for parity with the row engine). This allows the cost of starting a scan to be amortized. Addresses cockroachdb#65905 Release note (sql change): The vectorized execution engine can now perform a scan over an index, and then join on the primary index to retrieve the required columns.
710b947
to
5a6f53b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @erikgrinaker, @jordanlewis, @michae2, and @yuzefovich)
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 40 at r16 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: we've removed the size limit.
Done.
pkg/sql/colexec/colexecspan/span_assembler_tmpl.go, line 161 at r16 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: we could do this once, maybe in
Init
or in the constructor, to save up some work (but since this is roughly once per batch, feel free to disregard). We could also uselen(op.keyPrefix)
forprefixLen
.
Nice catch. I guess we may as well just store the prefix directly in the scratch key. Done.
TFTRs! |
bors r+ |
👎 Rejected by label |
bors r+ |
Build succeeded: |
This patch provides a vectorized implementation of the index join
operator. Span generation is accomplished using two utility operators.
spanEncoder
operates on a single index key column and fills aBytes
column with the encoding of that column for each row.
spanAssembler
takes the output of each
spanEncoder
and generates spans, accountingfor table/index prefixes and possibly splitting the spans over column
families. Finally, the
ColIndexJoin
operator uses the generated spansto perform a lookup on the table's primary index, returns all batches
resulting from the lookup, and repeats until the input is fully consumed.
The
ColIndexJoin
operator queues up input rows until the memory footprintof the rows reaches a preset limit (default 4MB for parity with the row
engine). This allows the cost of starting a scan to be amortized.
Fixes #65905
Release note (sql change): The vectorized execution engine can now
perform a scan over an index, and then join on the primary index to
retrieve the required columns.