Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colbuilder: clean up type schema handling #107324

Merged
merged 2 commits into from
Jul 21, 2023

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Jul 21, 2023

This commit refactors how we're keeping track of the current type schema of the operators in NewColOperator. Previously, we would create a new type slice for each operator due to "type schema corruption" bugs we observed (#47889). We fixed that bug by being extremely conservative, and this commit applies a different more reasonable fix.

In particular, it is safe to append to the current type slice we have in scope, and we only need to be careful when we're trying to create a "projection" (i.e. when we need to change the order of types or modify one type in-place). Thus, this commit switches to making a copy only in those scenarios which should happen at most once per processor spec (previously, it could happen thousands of times for elaborate render expressions).

Furthermore, this commit reuses the same type slice from InputSyncSpec since creation of the operators occurs after the spec has been communicated across the wire (or locally), so we're free to use it as we please.

name                               old time/op    new time/op    delta
NestedAndPlanning/renders=16-24       627µs ± 1%     624µs ± 2%     ~     (p=0.143 n=10+10)
NestedAndPlanning/renders=256-24     3.54ms ± 0%    3.04ms ± 1%  -14.14%  (p=0.000 n=9+10)
NestedAndPlanning/renders=4096-24     211ms ± 4%      68ms ± 1%  -67.61%  (p=0.000 n=10+10)

name                               old alloc/op   new alloc/op   delta
NestedAndPlanning/renders=16-24      74.0kB ±20%    68.9kB ±10%     ~     (p=0.053 n=10+9)
NestedAndPlanning/renders=256-24     1.71MB ± 0%    0.60MB ± 0%  -65.07%  (p=0.000 n=8+8)
NestedAndPlanning/renders=4096-24     303MB ± 0%      13MB ± 1%  -95.58%  (p=0.000 n=8+8)

name                               old allocs/op  new allocs/op  delta
NestedAndPlanning/renders=16-24         754 ±18%       733 ±18%     ~     (p=0.105 n=9+9)
NestedAndPlanning/renders=256-24      6.44k ± 0%     5.93k ± 0%   -7.88%  (p=0.000 n=8+8)
NestedAndPlanning/renders=4096-24      146k ± 6%      136k ± 0%   -7.02%  (p=0.000 n=8+8)

Fixes: #104996.

Release note (bug fix): Previously, CockroachDB when planning expressions containing many sub-expressions (e.g. deeply-nested AND / OR structures) would use memory quadratical in the number of sub-expressions, and in the worst cases (thousands of sub-expressions) this could lead to OOMs. The bug has been present since at least 22.1 and has now been fixed.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the append-one-type branch 3 times, most recently from e4f9b87 to d25e123 Compare July 21, 2023 02:15
@yuzefovich yuzefovich force-pushed the append-one-type branch 2 times, most recently from 786a973 to a4da044 Compare July 21, 2023 02:31
@yuzefovich yuzefovich added backport-22.2.x backport-23.1.x Flags PRs that need to be backported to 23.1 labels Jul 21, 2023
@yuzefovich yuzefovich marked this pull request as ready for review July 21, 2023 02:32
@yuzefovich yuzefovich requested a review from a team as a code owner July 21, 2023 02:32
@yuzefovich yuzefovich requested review from msirek, mgartner and DrewKimball and removed request for msirek July 21, 2023 02:32
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this one on! It seems sound. If one of the places where the types are passed (like MakeHashJoinerSpec) modified the slice, we'd see corruption, but I think we consider the types slice immutable anyway.
:lgtm:

Reviewed 1 of 1 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)


-- commits line 45 at r2:
Do you think we should add a release note? Customers have observed this behavior once or twice.

This commit refactors how we're keeping track of the current type schema
of the operators in `NewColOperator`. Previously, we would create a new
type slice for each operator due to "type schema corruption" bugs we
observed (cockroachdb#47889). We fixed that bug by being extremely conservative,
and this commit applies a different more reasonable fix.

In particular, it is safe to append to the current type slice we have in
scope, and we only need to be careful when we're trying to create
a "projection" (i.e. when we need to change the order of types or modify
one type in-place). Thus, this commit switches to making a copy only in
those scenarios which should happen at most once per processor spec
(previously, it could happen thousands of times for elaborate render
expressions).

Furthermore, this commit reuses the same type slice from `InputSyncSpec`
since creation of the operators occurs _after_ the spec has been
communicated across the wire (or locally), so we're free to use it as we
please.

```
name                               old time/op    new time/op    delta
NestedAndPlanning/renders=16-24       627µs ± 1%     624µs ± 2%     ~     (p=0.143 n=10+10)
NestedAndPlanning/renders=256-24     3.54ms ± 0%    3.04ms ± 1%  -14.14%  (p=0.000 n=9+10)
NestedAndPlanning/renders=4096-24     211ms ± 4%      68ms ± 1%  -67.61%  (p=0.000 n=10+10)

name                               old alloc/op   new alloc/op   delta
NestedAndPlanning/renders=16-24      74.0kB ±20%    68.9kB ±10%     ~     (p=0.053 n=10+9)
NestedAndPlanning/renders=256-24     1.71MB ± 0%    0.60MB ± 0%  -65.07%  (p=0.000 n=8+8)
NestedAndPlanning/renders=4096-24     303MB ± 0%      13MB ± 1%  -95.58%  (p=0.000 n=8+8)

name                               old allocs/op  new allocs/op  delta
NestedAndPlanning/renders=16-24         754 ±18%       733 ±18%     ~     (p=0.105 n=9+9)
NestedAndPlanning/renders=256-24      6.44k ± 0%     5.93k ± 0%   -7.88%  (p=0.000 n=8+8)
NestedAndPlanning/renders=4096-24      146k ± 6%      136k ± 0%   -7.02%  (p=0.000 n=8+8)
```

Release note (bug fix): Previously, CockroachDB when planning
expressions containing many sub-expressions (e.g. deeply-nested AND / OR
structures) would use memory quadratical in the number of
sub-expressions, and in the worst cases (thousands of sub-expressions)
this could lead to OOMs. The bug has been present since at least 22.1
and has now been fixed.
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I too believe this should be safe since the callers have no reason to modify the passed-in slices. The type management seems to be centralized in execplan.go, and we only have a handful of operators that reorder / recreate the output type schema completely, and in those cases we do make a copy. I think it should be safe to backport too, after some baking period on master.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @mgartner)


-- commits line 45 at r2:

Previously, DrewKimball (Drew Kimball) wrote…

Do you think we should add a release note? Customers have observed this behavior once or twice.

Done.

@craig
Copy link
Contributor

craig bot commented Jul 21, 2023

Build succeeded:

@craig craig bot merged commit 4a6fb8c into cockroachdb:master Jul 21, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jul 21, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 85fd4fb to blathers/backport-release-22.2-107324: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@mgartner
Copy link
Collaborator

Nice improvement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

colexec: appendOneType allocations scale poorly
4 participants