Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: speed up execbuilder phase #119095

Merged
merged 6 commits into from
Feb 27, 2024
Merged

Conversation

mgartner
Copy link
Collaborator

@mgartner mgartner commented Feb 12, 2024

opt/bench: add benchmark for execbuilder

This commit adds a benchmark that measures only the execbuilder phase of
optimization, and includes no other phases.

Release note: None

opt/execbuilder: remove column map from execPlan

As the execbuilder traverses a relational expression and recursively
builds an execPlan, it creates mappings from column IDs to their
ordinal position in the expression for each execPlan node. These
mappings are used when building parent nodes to correctly map column IDs
to indexed variables. In most cases the mappings are only used when
building a parent, and never again.

Prior to this commit, the column mappings were a field of execPlan,
tying the lifetime of execPlan nodes and column mappings together.
This commit decouples the lifetimes of both by removing the mapping
field from execPlan and propagating mappings up as return values of
recursive function calls. This will enable future optimizations that can
reuse memory allocated for mappings that are no longer needed.

Release note: None

opt/exebuilder: introduce colOrdMap

This commit introduces a new struct, colOrdMap, which maps column IDs
to ordinals. See the comment for colOrdMap for more details. This type
will be used in execbuilder in future commits to store output column
mappings.

Release note: None

opt/execbuilder: use colOrdMap to store output columns

Output columns of execution nodes are now stored in colOrdMaps instead
of opt.ColMaps. The colOrdMapAllocator struct, which is used to
allocate new colOrdMaps has been added as a field of Builder. It
currently is a simple implementation. Future commits will extend it to
reuse allocated colOrdMaps when possible.

Release note: None

opt/execbuilder: reuse allocated colOrdMaps

This commit extends colOrdOrdMapAllocator with a Free method. Freed
maps will be reused in future calls to Alloc instead of allocating a
new map. The build functions of the major relational expressions have
been updated to free maps when they are no longer needed. This reduces
the number of maps allocated, especially for complex queries with many
execution nodes.

Informs #117546

Release note: None

opt/execbuilder: faster maximum ordinal method for colOrdMap

This commit makes colOrdMap.MaxOrd() a constant-time operation in most
cases. See the newly added comments for more details.

Release note: None

@mgartner mgartner requested a review from a team February 12, 2024 19:34
@mgartner mgartner requested a review from a team as a code owner February 12, 2024 19:34
@mgartner mgartner requested review from michae2 and removed request for a team February 12, 2024 19:34
Copy link

blathers-crl bot commented Feb 12, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@mgartner
Copy link
Collaborator Author

I've broken the second and third commits out into #119094.

@mgartner
Copy link
Collaborator Author

These changes significantly speed-up the execbuilder phase for complex queries with many relational expressions and columns. There is a regression for very simple queries, which previously benefitted from the zero-allocation, small number mode of util.FastIntMap. I think these regressions are acceptable given that execbuilder is still very fast in these cases.

name                                      old time/op    new time/op    delta
ExecBuild/kv-read-10                         540ns ± 0%     563ns ± 2%   +4.29%  (p=0.008 n=5+5)
ExecBuild/kv-read-const-10                   538ns ± 0%     579ns ± 4%   +7.68%  (p=0.008 n=5+5)
ExecBuild/tpcc-new-order-10                  784ns ± 0%     871ns ± 1%  +11.09%  (p=0.008 n=5+5)
ExecBuild/tpcc-delivery-10                   690ns ± 1%     752ns ± 1%   +8.97%  (p=0.008 n=5+5)
ExecBuild/tpcc-stock-level-10               2.56µs ± 1%    2.15µs ± 1%  -16.25%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-a-10      912ns ± 0%    1035ns ± 2%  +13.57%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-b-10     1.39µs ± 1%    1.52µs ± 3%   +9.11%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-c-10      112µs ± 0%      21µs ± 1%  -81.15%  (p=0.008 n=5+5)
ExecBuild/single-col-histogram-range-10     1.21µs ± 1%    1.25µs ± 3%   +3.24%  (p=0.008 n=5+5)
ExecBuild/batch-insert-one-10                820ns ± 0%     815ns ± 1%     ~     (p=0.421 n=5+5)
ExecBuild/batch-insert-many-10              36.5µs ± 0%    37.0µs ± 6%     ~     (p=0.690 n=5+5)
ExecBuild/ored-preds-100-10                 50.9µs ± 1%    47.8µs ± 0%   -6.20%  (p=0.008 n=5+5)
ExecBuild/ored-preds-using-params-100-10    50.7µs ± 1%    47.8µs ± 0%   -5.71%  (p=0.008 n=5+5)
ExecBuild/slow-query-1-10                   24.1µs ± 0%    12.3µs ± 0%  -49.04%  (p=0.008 n=5+5)
ExecBuild/slow-query-2-10                   22.1µs ± 1%    12.2µs ± 1%  -44.99%  (p=0.008 n=5+5)
ExecBuild/slow-query-3-10                   50.4µs ± 0%    11.3µs ± 3%  -77.62%  (p=0.008 n=5+5)
ExecBuild/slow-query-4-10                    184µs ± 0%      24µs ± 3%  -87.13%  (p=0.008 n=5+5)
ExecBuild/slow-query-5-10                   6.45ms ± 0%    1.16ms ± 2%  -81.98%  (p=0.008 n=5+5)
ExecBuild/slow-query-6-10                    838µs ± 0%     203µs ± 2%  -75.80%  (p=0.008 n=5+5)
ExecBuild/slow-query-7-10                   1.09ms ± 0%    0.28ms ± 2%  -74.69%  (p=0.008 n=5+5)

name                                      old alloc/op   new alloc/op   delta
ExecBuild/kv-read-10                          624B ± 0%      752B ± 0%  +20.51%  (p=0.008 n=5+5)
ExecBuild/kv-read-const-10                    624B ± 0%      752B ± 0%  +20.51%  (p=0.008 n=5+5)
ExecBuild/tpcc-new-order-10                   656B ± 0%     1104B ± 0%  +68.29%  (p=0.008 n=5+5)
ExecBuild/tpcc-delivery-10                    604B ± 0%      764B ± 0%  +26.49%  (p=0.008 n=5+5)
ExecBuild/tpcc-stock-level-10               3.85kB ± 0%    2.39kB ± 0%  -37.84%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-a-10       780B ± 0%     1340B ± 0%  +71.79%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-b-10     1.07kB ± 0%    1.63kB ± 0%  +52.43%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-c-10      125kB ± 0%      42kB ± 0%  -65.92%  (p=0.008 n=5+5)
ExecBuild/single-col-histogram-range-10     1.52kB ± 0%    1.62kB ± 0%   +6.32%  (p=0.008 n=5+5)
ExecBuild/batch-insert-one-10               1.47kB ± 0%    1.52kB ± 0%   +3.26%  (p=0.008 n=5+5)
ExecBuild/batch-insert-many-10              71.1kB ± 0%    71.2kB ± 0%   +0.07%  (p=0.008 n=5+5)
ExecBuild/ored-preds-100-10                 29.3kB ± 0%    27.2kB ± 0%   -7.10%  (p=0.008 n=5+5)
ExecBuild/ored-preds-using-params-100-10    29.3kB ± 0%    27.2kB ± 0%   -7.10%  (p=0.008 n=5+5)
ExecBuild/slow-query-1-10                   26.9kB ± 0%    15.9kB ± 0%  -40.91%  (p=0.008 n=5+5)
ExecBuild/slow-query-2-10                   24.5kB ± 0%    13.1kB ± 0%  -46.57%  (p=0.008 n=5+5)
ExecBuild/slow-query-3-10                   68.7kB ± 0%    21.1kB ± 0%  -69.31%  (p=0.008 n=5+5)
ExecBuild/slow-query-4-10                    242kB ± 0%      95kB ± 0%  -60.98%  (p=0.008 n=5+5)
ExecBuild/slow-query-5-10                   9.20MB ± 0%    3.44MB ± 0%  -62.63%  (p=0.008 n=5+5)
ExecBuild/slow-query-6-10                   1.13MB ± 0%    0.78MB ± 0%  -31.16%  (p=0.008 n=5+5)
ExecBuild/slow-query-7-10                   1.47MB ± 0%    1.04MB ± 0%  -29.21%  (p=0.008 n=5+5)

name                                      old allocs/op  new allocs/op  delta
ExecBuild/kv-read-10                          7.00 ± 0%      9.00 ± 0%  +28.57%  (p=0.008 n=5+5)
ExecBuild/kv-read-const-10                    7.00 ± 0%      9.00 ± 0%  +28.57%  (p=0.008 n=5+5)
ExecBuild/tpcc-new-order-10                   8.00 ± 0%     11.00 ± 0%  +37.50%  (p=0.008 n=5+5)
ExecBuild/tpcc-delivery-10                    8.00 ± 0%     11.00 ± 0%  +37.50%  (p=0.008 n=5+5)
ExecBuild/tpcc-stock-level-10                 26.0 ± 0%      26.0 ± 0%     ~     (all equal)
ExecBuild/many-columns-and-indexes-a-10       11.0 ± 0%      14.0 ± 0%  +27.27%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-b-10       15.0 ± 0%      18.0 ± 0%  +20.00%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-c-10        252 ± 0%       120 ± 0%  -52.38%  (p=0.008 n=5+5)
ExecBuild/single-col-histogram-range-10       9.00 ± 0%     11.00 ± 0%  +22.22%  (p=0.008 n=5+5)
ExecBuild/batch-insert-one-10                 9.00 ± 0%      9.00 ± 0%     ~     (all equal)
ExecBuild/batch-insert-many-10                7.00 ± 0%      7.00 ± 0%     ~     (all equal)
ExecBuild/ored-preds-100-10                    414 ± 0%       410 ± 0%   -0.97%  (p=0.008 n=5+5)
ExecBuild/ored-preds-using-params-100-10       414 ± 0%       410 ± 0%   -0.97%  (p=0.008 n=5+5)
ExecBuild/slow-query-1-10                      162 ± 0%        90 ± 0%  -44.44%  (p=0.008 n=5+5)
ExecBuild/slow-query-2-10                      166 ± 0%       103 ± 0%  -37.95%  (p=0.008 n=5+5)
ExecBuild/slow-query-3-10                     79.0 ± 0%      42.0 ± 0%  -46.84%  (p=0.008 n=5+5)
ExecBuild/slow-query-4-10                      195 ± 0%        90 ± 0%  -53.85%  (p=0.008 n=5+5)
ExecBuild/slow-query-5-10                    2.13k ± 0%     0.75k ± 0%  -64.85%  (p=0.008 n=5+5)
ExecBuild/slow-query-6-10                      690 ± 0%       344 ± 0%  -50.14%  (p=0.008 n=5+5)
ExecBuild/slow-query-7-10                      790 ± 0%       409 ± 0%  -48.23%  (p=0.008 n=5+5)

@mgartner
Copy link
Collaborator Author

One more note:

The manual Freeing of colOrdMaps introduces the added risk of use-after-free and double-free type bugs in this code. I'm currently exploring ways to mitigate this risk, which I'll include in a separate PR.

@mgartner
Copy link
Collaborator Author

mgartner commented Feb 12, 2024

The diff is quite large, so here's some commentary that might be helpful:

  1. The first commit adds a benchmark.
  2. Many of the lines changed are from the second and third commits which I've broken out into opt: introduce memo.ExistsPrivate and add lazy projection column #119094.
  3. The fourth and sixth commits, opt/execbuilder: remove column map from execPlan and opt/execbuilder: use colOrdMap to store output columns, are mostly mechanical changes.
  4. The heart of the PR is in the fifth and seventh commits.
  5. The final commit is a minor change that speeds things up a bit more.

I'm happy to walk through it all or break this up in more PRs if that'd be helpful.

@mgartner mgartner force-pushed the exec-buildler-col-map branch from e78a5e2 to 0e9d852 Compare February 12, 2024 22:01
@mgartner mgartner force-pushed the exec-buildler-col-map branch 2 times, most recently from f918665 to 38daa14 Compare February 13, 2024 20:28
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: This is nice! I just have some nits and suggestions.

Reviewed 1 of 1 files at r1, 50 of 50 files at r2, 2 of 2 files at r3, 5 of 5 files at r4, 3 of 3 files at r5, 8 of 8 files at r6, 4 of 4 files at r7, 5 of 5 files at r8, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @michae2)


pkg/sql/opt/exec/execbuilder/builder.go line 303 at r6 (raw file):

	}
	md := b.mem.Metadata()
	b.colOrdsAlloc.Init(md.MaxColumn())

[nit] Shouldn't this already have been handled in New?


pkg/sql/opt/exec/execbuilder/col_ord_map.go line 46 at r5 (raw file):

	//
	// TODO(mgartner): It is probably unreasonable to have more than 2^31
	// ordinals in an execution node, so this could be []int32.

Isn't the number of ordinals capped by the max column ID? and column IDs are represented int32.


pkg/sql/opt/exec/execbuilder/col_ord_map.go line 168 at r7 (raw file):

// Clear clears the map. The allocated memory is retained for future reuse.
func (m colOrdMap) Clear() {

[nit] Since this is used more often with this commit, it might be worth copying in a global zeros slice instead like here:

var zeroElements = make([]element, MaxBatchSize)

// Zero out all elements, up to the capacity, and then restore the length of
// the vector.
l := len(b.elements)
b.elements = b.elements[:cap(b.elements)]
for n := 0; n < len(b.elements); {
n += copy(b.elements[n:], zeroElements)
}


pkg/sql/opt/exec/execbuilder/col_ord_map.go line 148 at r8 (raw file):

// ordinal, the method neither needs to scan through each key/value pair to find
// the current maximum ordinal, nor keep complex data structures to track it.
func (m colOrdMap) OrdUpperBound() int {

Under what circumstances does this end up being approximate? Are there places other than the methods in this file where the map is mutated? If it's only in Set, maybe it would be worth doing the scan only when the max ord gets replaced with a smaller ord? That seems like it might be a rare case.

We could also do something lazy, and instead scan the map in this method if the upper bound is unset, and just unset it in Set when the largest ord gets replaced by something smaller.


pkg/sql/opt/exec/execbuilder/relational.go line 2611 at r7 (raw file):

		outputCols.Set(join.ContinuationCol, maxOrd+1)

		// allExprCols is only needed for the lifetime of the function, so free

[nit] allExprCols seems stale.

@mgartner mgartner force-pushed the exec-buildler-col-map branch from 38daa14 to fc068cd Compare February 16, 2024 18:54
@mgartner mgartner requested a review from DrewKimball February 16, 2024 18:54
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)


pkg/sql/opt/exec/execbuilder/builder.go line 303 at r6 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Shouldn't this already have been handled in New?

Good catch. Done.


pkg/sql/opt/exec/execbuilder/col_ord_map.go line 46 at r5 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Isn't the number of ordinals capped by the max column ID? and column IDs are represented int32.

I think you're right. And #119150 (comment) shows that this yields a nice performance improvement.

I think it would be nice to detect an on overflow in the conversion from int->int32 and error, just to be safe. I originally thought we'd need Set to return an error rather than panic to do this safely, a la #119150, but I don't think that's necessarily true. I think I went overboard when I removed all execbuilder panics in #99981. These functions can panic, we just need to be careful about all the entry point having a panic-catcher, including closures we create in execbuilder, like in 5c0a632.


pkg/sql/opt/exec/execbuilder/col_ord_map.go line 168 at r7 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Since this is used more often with this commit, it might be worth copying in a global zeros slice instead like here:

var zeroElements = make([]element, MaxBatchSize)

// Zero out all elements, up to the capacity, and then restore the length of
// the vector.
l := len(b.elements)
b.elements = b.elements[:cap(b.elements)]
for n := 0; n < len(b.elements); {
n += copy(b.elements[n:], zeroElements)
}

I tried this and didn't see much of a difference in performance. Interestingly, it seemed slightly slower on my GCE worker. Using the new clear builtin in Go 1.21 (https://pkg.go.dev/builtin#clear) did seem slightly faster. I'll leave as-is for now since it wasn't a major difference.


pkg/sql/opt/exec/execbuilder/col_ord_map.go line 148 at r8 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Under what circumstances does this end up being approximate? Are there places other than the methods in this file where the map is mutated? If it's only in Set, maybe it would be worth doing the scan only when the max ord gets replaced with a smaller ord? That seems like it might be a rare case.

We could also do something lazy, and instead scan the map in this method if the upper bound is unset, and just unset it in Set when the largest ord gets replaced by something smaller.

Good idea. I've gone with your latter suggestion.


pkg/sql/opt/exec/execbuilder/relational.go line 2611 at r7 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] allExprCols seems stale.

Done.

@mgartner
Copy link
Collaborator Author

mgartner commented Feb 16, 2024

I reran the benchmarks on a GCE instance after applying the changes suggested by Drew and fixing a few bugs. It looks like the regression in very simple plans is less pronounced on the cloud hardware, which is great!

name                                      old time/op    new time/op    delta
ExecBuild/kv-read-24                        1.12µs ± 0%    1.19µs ± 0%   +6.66%  (p=0.008 n=5+5)
ExecBuild/kv-read-const-24                  1.13µs ± 0%    1.19µs ± 0%   +5.33%  (p=0.008 n=5+5)
ExecBuild/tpcc-new-order-24                 1.64µs ± 0%    1.77µs ± 0%   +8.27%  (p=0.008 n=5+5)
ExecBuild/tpcc-delivery-24                  1.43µs ± 0%    1.54µs ± 1%   +7.59%  (p=0.008 n=5+5)
ExecBuild/tpcc-stock-level-24               5.49µs ± 1%    4.82µs ± 0%  -12.23%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-a-24     2.02µs ± 0%    2.17µs ± 1%   +7.39%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-b-24     2.98µs ± 0%    3.14µs ± 1%   +5.40%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-c-24      199µs ± 2%      35µs ± 0%  -82.34%  (p=0.008 n=5+5)
ExecBuild/single-col-histogram-range-24     2.28µs ± 0%    2.36µs ± 0%   +3.67%  (p=0.008 n=5+5)
ExecBuild/batch-insert-one-24               1.63µs ± 0%    1.66µs ± 1%   +2.10%  (p=0.008 n=5+5)
ExecBuild/batch-insert-many-24              75.7µs ± 1%    73.5µs ± 1%   -2.86%  (p=0.008 n=5+5)
ExecBuild/ored-preds-100-24                  101µs ± 0%      96µs ± 0%   -4.85%  (p=0.008 n=5+5)
ExecBuild/ored-preds-using-params-100-24     101µs ± 0%      96µs ± 0%   -4.92%  (p=0.008 n=5+5)
ExecBuild/slow-query-1-24                   44.9µs ± 0%    20.8µs ± 1%  -53.62%  (p=0.008 n=5+5)
ExecBuild/slow-query-2-24                   41.5µs ± 1%    21.4µs ± 0%  -48.35%  (p=0.008 n=5+5)
ExecBuild/slow-query-3-24                   93.6µs ± 1%    21.9µs ± 1%  -76.63%  (p=0.008 n=5+5)
ExecBuild/slow-query-4-24                    319µs ± 0%      43µs ± 1%  -86.57%  (p=0.008 n=5+5)
ExecBuild/slow-query-5-24                   12.1ms ± 2%     2.3ms ± 1%  -80.60%  (p=0.008 n=5+5)
ExecBuild/slow-query-6-24                   1.49ms ± 1%    0.32ms ± 1%  -78.22%  (p=0.008 n=5+5)
ExecBuild/slow-query-7-24                   1.99ms ± 0%    0.45ms ± 1%  -77.41%  (p=0.008 n=5+5)

name                                      old alloc/op   new alloc/op   delta
ExecBuild/kv-read-24                          624B ± 0%      704B ± 0%  +12.82%  (p=0.008 n=5+5)
ExecBuild/kv-read-const-24                    624B ± 0%      704B ± 0%  +12.82%  (p=0.008 n=5+5)
ExecBuild/tpcc-new-order-24                   656B ± 0%      912B ± 0%  +39.02%  (p=0.008 n=5+5)
ExecBuild/tpcc-delivery-24                    604B ± 0%      716B ± 0%  +18.54%  (p=0.008 n=5+5)
ExecBuild/tpcc-stock-level-24               3.85kB ± 0%    2.02kB ± 0%  -47.40%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-a-24       780B ± 0%     1116B ± 0%  +43.08%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-b-24     1.07kB ± 0%    1.40kB ± 0%  +31.46%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-c-24      126kB ± 0%      23kB ± 0%  -81.37%  (p=0.008 n=5+5)
ExecBuild/single-col-histogram-range-24     1.52kB ± 0%    1.58kB ± 0%   +4.21%  (p=0.008 n=5+5)
ExecBuild/batch-insert-one-24               1.47kB ± 0%    1.52kB ± 0%   +3.26%  (p=0.008 n=5+5)
ExecBuild/batch-insert-many-24              71.1kB ± 0%    71.2kB ± 0%   +0.07%  (p=0.008 n=5+5)
ExecBuild/ored-preds-100-24                 29.3kB ± 0%    27.1kB ± 0%     ~     (p=0.079 n=4+5)
ExecBuild/ored-preds-using-params-100-24    29.3kB ± 0%    27.1kB ± 0%   -7.64%  (p=0.008 n=5+5)
ExecBuild/slow-query-1-24                   26.9kB ± 0%    11.0kB ± 0%  -59.30%  (p=0.008 n=5+5)
ExecBuild/slow-query-2-24                   24.6kB ± 0%     9.0kB ± 0%  -63.31%  (p=0.008 n=5+5)
ExecBuild/slow-query-3-24                   69.4kB ± 0%    17.3kB ± 0%  -75.09%  (p=0.008 n=5+5)
ExecBuild/slow-query-4-24                    245kB ± 0%      60kB ± 0%  -75.59%  (p=0.008 n=5+5)
ExecBuild/slow-query-5-24                   9.36MB ± 0%    2.78MB ± 0%  -70.28%  (p=0.008 n=5+5)
ExecBuild/slow-query-6-24                   1.15MB ± 0%    0.36MB ± 0%  -68.90%  (p=0.008 n=5+5)
ExecBuild/slow-query-7-24                   1.49MB ± 0%    0.52MB ± 0%  -65.31%  (p=0.008 n=5+5)

name                                      old allocs/op  new allocs/op  delta
ExecBuild/kv-read-24                          7.00 ± 0%      9.00 ± 0%  +28.57%  (p=0.008 n=5+5)
ExecBuild/kv-read-const-24                    7.00 ± 0%      9.00 ± 0%  +28.57%  (p=0.008 n=5+5)
ExecBuild/tpcc-new-order-24                   8.00 ± 0%     11.00 ± 0%  +37.50%  (p=0.008 n=5+5)
ExecBuild/tpcc-delivery-24                    8.00 ± 0%     11.00 ± 0%  +37.50%  (p=0.008 n=5+5)
ExecBuild/tpcc-stock-level-24                 26.0 ± 0%      27.0 ± 0%   +3.85%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-a-24       11.0 ± 0%      14.0 ± 0%  +27.27%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-b-24       15.0 ± 0%      18.0 ± 0%  +20.00%  (p=0.008 n=5+5)
ExecBuild/many-columns-and-indexes-c-24        276 ± 0%       115 ± 0%  -58.33%  (p=0.008 n=5+5)
ExecBuild/single-col-histogram-range-24       9.00 ± 0%     11.00 ± 0%  +22.22%  (p=0.008 n=5+5)
ExecBuild/batch-insert-one-24                 9.00 ± 0%      9.00 ± 0%     ~     (all equal)
ExecBuild/batch-insert-many-24                7.00 ± 0%      7.00 ± 0%     ~     (all equal)
ExecBuild/ored-preds-100-24                    414 ± 0%       410 ± 0%   -0.97%  (p=0.008 n=5+5)
ExecBuild/ored-preds-using-params-100-24       414 ± 0%       410 ± 0%   -0.97%  (p=0.008 n=5+5)
ExecBuild/slow-query-1-24                      163 ± 0%        90 ± 0%  -44.79%  (p=0.008 n=5+5)
ExecBuild/slow-query-2-24                      166 ± 0%       103 ± 0%  -37.95%  (p=0.008 n=5+5)
ExecBuild/slow-query-3-24                     99.0 ± 0%      43.0 ± 0%  -56.57%  (p=0.008 n=5+5)
ExecBuild/slow-query-4-24                      247 ± 0%        84 ± 0%  -65.99%  (p=0.008 n=5+5)
ExecBuild/slow-query-5-24                    3.91k ± 0%     0.74k ± 0%  -81.03%  (p=0.008 n=5+5)
ExecBuild/slow-query-6-24                      960 ± 0%       331 ± 0%     ~     (p=0.079 n=4+5)
ExecBuild/slow-query-7-24                    1.16k ± 0%     0.40k ± 0%  -65.81%  (p=0.008 n=5+5)

@mgartner mgartner requested a review from a team February 20, 2024 16:01
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 10 of 10 files at r9, 5 of 5 files at r10, 3 of 3 files at r11, 8 of 8 files at r12, 4 of 4 files at r13, 3 of 3 files at r14, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @michae2)


pkg/sql/opt/exec/execbuilder/col_ord_map.go line 168 at r7 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I tried this and didn't see much of a difference in performance. Interestingly, it seemed slightly slower on my GCE worker. Using the new clear builtin in Go 1.21 (https://pkg.go.dev/builtin#clear) did seem slightly faster. I'll leave as-is for now since it wasn't a major difference.

Huh, good to know.


pkg/sql/opt/exec/execbuilder/col_ord_map.go line 122 at r14 (raw file):

	case m.maxIsUnknown():
		// If the maximum ordinal is currently unknown, then leave it as-is.
	case m.ords[col] > 0 && m.ords[col] == int32(m.maxOrd) && ord < m.maxOrd:

Do you know under what circumstances we're changing a column's ordinal from anything other than "no ordinal"?

This commit adds a benchmark that measures only the execbuilder phase of
optimization, and includes no other phases.

Release note: None
As the execbuilder traverses a relational expression and recursively
builds an `execPlan`, it creates mappings from column IDs to their
ordinal position in the expression for each `execPlan` node. These
mappings are used when building parent nodes to correctly map column IDs
to indexed variables. In most cases the mappings are only used when
building a parent, and never again.

Prior to this commit, the column mappings were a field of `execPlan`,
tying the lifetime of `execPlan` nodes and column mappings together.
This commit decouples the lifetimes of both by removing the mapping
field from `execPlan` and propagating mappings up as return values of
recursive function calls. This will enable future optimizations that can
reuse memory allocated for mappings that are no longer needed.

Release note: None
This commit introduces a new struct, `colOrdMap`, which maps column IDs
to ordinals. See the comment for `colOrdMap` for more details. This type
will be used in execbuilder in future commits to store output column
mappings.

Release note: None
Output columns of execution nodes are now stored in `colOrdMap`s instead
of `opt.ColMap`s. The `colOrdMapAllocator` struct, which is used to
allocate new `colOrdMaps` has been added as a field of `Builder`. It
currently is a simple implementation. Future commits will extend it to
reuse allocated `colOrdMap`s when possible.

Release note: None
This commit extends `colOrdOrdMapAllocator` with a `Free` method. Freed
maps will be reused in future calls to `Alloc` instead of allocating a
new map. The build functions of the major relational expressions have
been updated to free maps when they are no longer needed. This reduces
the number of maps allocated, especially for complex queries with many
execution nodes.

Informs cockroachdb#117546

Release note: None
@mgartner mgartner force-pushed the exec-buildler-col-map branch 2 times, most recently from fdf4841 to a451975 Compare February 26, 2024 22:44
@mgartner mgartner requested a review from DrewKimball February 26, 2024 22:45
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @michae2)


pkg/sql/opt/exec/execbuilder/col_ord_map.go line 122 at r14 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Do you know under what circumstances we're changing a column's ordinal from anything other than "no ordinal"?

I haven't confirmed that it ever happens, but there's nothing preventing it. We could try to catch places where this happens by panicking in test builds if an ordinal is overwritten. I also explored changing the signature of Set so that it returned an error if it was overwriting a previously set ordinal. It made usage of the map a bit more messy, so I opted to not bother with it, but I can explore that again if you think it'd be worthwhile.

Also, I just pushed a minor change to MaxOrd such that it re-memoizes the max ordinal so that subsequent calls will be constant time, until the max ordinal is overwritten again.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 60 of 60 files at r15, 55 of 55 files at r16, 53 of 53 files at r17, 58 of 58 files at r18, 54 of 54 files at r19, 1 of 53 files at r20, 2 of 2 files at r21, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @michae2)


pkg/sql/opt/exec/execbuilder/col_ord_map.go line 122 at r14 (raw file):

I also explored changing the signature of Set so that it returned an error if it was overwriting a previously set ordinal. It made usage of the map a bit more messy, so I opted to not bother with it, but I can explore that again if you think it'd be worthwhile.

That's fine. I thought it might simplify things to assume maxOrd is never invalidated, but if it makes things messier the current form is fine.

This commit makes `colOrdMap.MaxOrd()` a constant-time operation in most
cases. See the newly added comments for more details.

Release note: None
@mgartner mgartner force-pushed the exec-buildler-col-map branch from a451975 to 3f4d099 Compare February 27, 2024 01:11
@mgartner
Copy link
Collaborator Author

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 27, 2024

Build succeeded:

@craig craig bot merged commit 3b2c1cc into cockroachdb:master Feb 27, 2024
15 checks passed
@mgartner mgartner deleted the exec-buildler-col-map branch February 27, 2024 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants