Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpcdsvec failed #47889

Closed
cockroach-teamcity opened this issue Apr 22, 2020 · 7 comments · Fixed by #47938
Closed

roachtest: tpcdsvec failed #47889

cockroach-teamcity opened this issue Apr 22, 2020 · 7 comments · Fixed by #47938
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).tpcdsvec failed on master@056e32e84831f13b286fceb7681dd0cd2b00b4b4:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpcdsvec/run_1
	test.go:264,tpcdsvec.go:202,tpcdsvec.go:212,test_runner.go:753: 

More

Artifacts: /tpcdsvec

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Apr 22, 2020
@cockroach-teamcity cockroach-teamcity added this to the 20.1 milestone Apr 22, 2020
@yuzefovich yuzefovich assigned yuzefovich and unassigned andreimatei Apr 22, 2020
@yuzefovich
Copy link
Member

Query 49 got an internal error:

08:31:36 test.go:190: test status: encountered an error: ERROR: internal error: unexpected error from the vectorized engine: interface conversion: coldata.column is *coldata.Bytes, not []bool (SQLSTATE XX000)

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpcdsvec failed on master@c73fb589e223d21b7f5cf51e6fc8620b47b95de4:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpcdsvec/run_1
	test.go:264,tpcdsvec.go:202,tpcdsvec.go:212,test_runner.go:753: 

More

Artifacts: /tpcdsvec

See this test on roachdash
powered by pkg/cmd/internal/issues

@yuzefovich
Copy link
Member

yuzefovich commented Apr 23, 2020

For posterity, here is the comment that describes the "type schema corruption" scenario this query ran into (from colexec/execplan.go file):

// NOTE: throughout this file we do not append an output type of a projecting
// operator to the passed-in type schema - we, instead, always allocate a new
// type slice and copy over the old schema and set the output column of a
// projecting operator in the next slot. We attempt to enforce this by a linter
// rule, and such behavior prevents the type schema corruption scenario as
// described below.
//
// Without explicit new allocations, tt is possible that planSelectionOperators
// (and other planning functions) reuse the same array for filterColumnTypes as
// result.ColumnTypes is using because there was enough capacity to do so.
// As an example, consider the following scenario in the context of
// planFilterExpr method:
// 1. r.ColumnTypes={*types.Bool} with len=1 and cap=4
// 2. planSelectionOperators adds another types.Int column, so
//    filterColumnTypes={*types.Bool, *types.Int} with len=2 and cap=4
//    Crucially, it uses exact same underlying array as r.ColumnTypes
//    uses.
// 3. we project out second column, so r.ColumnTypes={*types.Bool}
// 4. later, we add another *types.Float column, so
//    r.ColumnTypes={*types.Bool, *types.Float}, but there is enough
//    capacity in the array, so we simply overwrite the second slot
//    with the new type which corrupts filterColumnTypes to become
//    {*types.Bool, *types.Float}, and we can get into a runtime type
//    mismatch situation.

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpcdsvec failed on master@0e16cc15f139b816b8e46fe6571691a8ec0e6937:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpcdsvec/run_1
	test.go:264,tpcdsvec.go:202,tpcdsvec.go:212,test_runner.go:753: 

More

Artifacts: /tpcdsvec

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpcdsvec failed on master@d620f6242ad43481e61a6af19416733cf05233a4:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpcdsvec/run_1
	test.go:264,tpcdsvec.go:202,tpcdsvec.go:212,test_runner.go:753: 

More

Artifacts: /tpcdsvec

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpcdsvec failed on master@60c9e055e970bd7f150ebcfad266929b2638d635:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpcdsvec/run_1
	test.go:264,tpcdsvec.go:202,tpcdsvec.go:212,test_runner.go:753: 

More

Artifacts: /tpcdsvec

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpcdsvec failed on master@a20a8811ee6abfe3754220e893ed383afbab21c9:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpcdsvec/run_1
	test.go:264,tpcdsvec.go:202,tpcdsvec.go:212,test_runner.go:753: 

More

Artifacts: /tpcdsvec

See this test on roachdash
powered by pkg/cmd/internal/issues

@craig craig bot closed this as completed in 0b797a6 Apr 27, 2020
craig bot pushed a commit that referenced this issue Apr 28, 2020
47942: colexec: some optimizations r=yuzefovich a=yuzefovich

**colexec: remove one of the Go maps from hash aggregator**

This commit switches usage of `map` to iteration over `[]uint64` when
building selection vectors in the hash aggregator. This is a lot more
efficient when group sizes are relatively large with moderate hit when
group sizes are small. This hit is reduced in a follow-up commit.

Release note: None

**colexec: more improvements to hash aggregator**

This commit removes the buffering stage of the hash aggregator as well
as removes the "append only" scratch batch that we're currently using.
The removal of buffering stage allows us to have smaller buffers without
sacrificing the performance. The removal of the scratch batch allows to
avoid copying over the data from the input batch and using that input
batch directly. We will be descructively modifying the selection vector
on that batch, but such behavior is acceptable because hash aggregator
owns the output batch, and the input batch will not be propagated
further.

This commit also bumps `hashAggFuncsAllocSize` from 16 to 64 which
gives us minor performance improvement in case of small group sizes.

Release note: None

**colexec: remove some allocations**

In a recent PR (for logical types plumbing) I introduced some
unnecessary allocations for unhandled type case - by taking a pointer
from a value in `[]types.T` slice. This commit fixes that.

Release note: None

47953: colexec, coldata: fix compiler warnings in template files r=yuzefovich a=yuzefovich

This commit fixes all compiler warnings that I see in Goland. To get
there it does the following:
1. renames `Vec._TemplateType` to `Vec.TemplateType` so that the method
is considered exported
2. pulls out declaration of local variables outside of templated `if`
blocks
3. breaks up the chained function call to parse flags in `pkg/workload`
and a few other places so that there is an allocation of a struct and we
can call a method on it that has a pointer receiver. It shouldn't matter
for the performance though.

Release note: None

47974: roachtest: fail tpcdsvec test with an error r=yuzefovich a=yuzefovich

In `tpcdsvec` test we run all the queries even we hit an error.
Previously if an error occurred, we would just fail the test, and now we
will be failing with an error that is a "combination" of all occurred
errors.

Addresses: #47889.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Jul 21, 2023
This commit refactors how we're keeping track of the current type schema
of the operators in `NewColOperator`. Previously, we would create a new
type slice for each operator due to "type schema corruption" bugs we
observed (cockroachdb#47889). We fixed that bug by being extremely conservative,
and this commit applies a different more reasonable fix.

In particular, it is safe to append to the current type slice we have in
scope, and we only need to be careful when we're trying to create
a "projection" (i.e. when we need to change the order of types or modify
one type in-place). Thus, this commit switches to making a copy only in
those scenarios which should happen at most once per processor spec
(previously, it could happen thousands of times for elaborate render
expressions).

Furthermore, this commit reuses the same type slice from `InputSyncSpec`
since creation of the operators occurs _after_ the spec has been
communicated across the wire (or locally), so we're free to use it as we
please.

```
name                               old time/op    new time/op    delta
NestedAndPlanning/renders=16-24       627µs ± 1%     624µs ± 2%     ~     (p=0.143 n=10+10)
NestedAndPlanning/renders=256-24     3.54ms ± 0%    3.04ms ± 1%  -14.14%  (p=0.000 n=9+10)
NestedAndPlanning/renders=4096-24     211ms ± 4%      68ms ± 1%  -67.61%  (p=0.000 n=10+10)

name                               old alloc/op   new alloc/op   delta
NestedAndPlanning/renders=16-24      74.0kB ±20%    68.9kB ±10%     ~     (p=0.053 n=10+9)
NestedAndPlanning/renders=256-24     1.71MB ± 0%    0.60MB ± 0%  -65.07%  (p=0.000 n=8+8)
NestedAndPlanning/renders=4096-24     303MB ± 0%      13MB ± 1%  -95.58%  (p=0.000 n=8+8)

name                               old allocs/op  new allocs/op  delta
NestedAndPlanning/renders=16-24         754 ±18%       733 ±18%     ~     (p=0.105 n=9+9)
NestedAndPlanning/renders=256-24      6.44k ± 0%     5.93k ± 0%   -7.88%  (p=0.000 n=8+8)
NestedAndPlanning/renders=4096-24      146k ± 6%      136k ± 0%   -7.02%  (p=0.000 n=8+8)
```

Release note: None
yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Jul 21, 2023
This commit refactors how we're keeping track of the current type schema
of the operators in `NewColOperator`. Previously, we would create a new
type slice for each operator due to "type schema corruption" bugs we
observed (cockroachdb#47889). We fixed that bug by being extremely conservative,
and this commit applies a different more reasonable fix.

In particular, it is safe to append to the current type slice we have in
scope, and we only need to be careful when we're trying to create
a "projection" (i.e. when we need to change the order of types or modify
one type in-place). Thus, this commit switches to making a copy only in
those scenarios which should happen at most once per processor spec
(previously, it could happen thousands of times for elaborate render
expressions).

Furthermore, this commit reuses the same type slice from `InputSyncSpec`
since creation of the operators occurs _after_ the spec has been
communicated across the wire (or locally), so we're free to use it as we
please.

```
name                               old time/op    new time/op    delta
NestedAndPlanning/renders=16-24       627µs ± 1%     624µs ± 2%     ~     (p=0.143 n=10+10)
NestedAndPlanning/renders=256-24     3.54ms ± 0%    3.04ms ± 1%  -14.14%  (p=0.000 n=9+10)
NestedAndPlanning/renders=4096-24     211ms ± 4%      68ms ± 1%  -67.61%  (p=0.000 n=10+10)

name                               old alloc/op   new alloc/op   delta
NestedAndPlanning/renders=16-24      74.0kB ±20%    68.9kB ±10%     ~     (p=0.053 n=10+9)
NestedAndPlanning/renders=256-24     1.71MB ± 0%    0.60MB ± 0%  -65.07%  (p=0.000 n=8+8)
NestedAndPlanning/renders=4096-24     303MB ± 0%      13MB ± 1%  -95.58%  (p=0.000 n=8+8)

name                               old allocs/op  new allocs/op  delta
NestedAndPlanning/renders=16-24         754 ±18%       733 ±18%     ~     (p=0.105 n=9+9)
NestedAndPlanning/renders=256-24      6.44k ± 0%     5.93k ± 0%   -7.88%  (p=0.000 n=8+8)
NestedAndPlanning/renders=4096-24      146k ± 6%      136k ± 0%   -7.02%  (p=0.000 n=8+8)
```

Release note (bug fix): Previously, CockroachDB when planning
expressions containing many sub-expressions (e.g. deeply-nested AND / OR
structures) would use memory quadratical in the number of
sub-expressions, and in the worst cases (thousands of sub-expressions)
this could lead to OOMs. The bug has been present since at least 22.1
and has now been fixed.
craig bot pushed a commit that referenced this issue Jul 21, 2023
107324: colbuilder: clean up type schema handling r=yuzefovich a=yuzefovich

This commit refactors how we're keeping track of the current type schema of the operators in `NewColOperator`. Previously, we would create a new type slice for each operator due to "type schema corruption" bugs we observed (#47889). We fixed that bug by being extremely conservative, and this commit applies a different more reasonable fix.

In particular, it is safe to append to the current type slice we have in scope, and we only need to be careful when we're trying to create a "projection" (i.e. when we need to change the order of types or modify one type in-place). Thus, this commit switches to making a copy only in those scenarios which should happen at most once per processor spec (previously, it could happen thousands of times for elaborate render expressions).

Furthermore, this commit reuses the same type slice from `InputSyncSpec` since creation of the operators occurs _after_ the spec has been communicated across the wire (or locally), so we're free to use it as we please.

```
name                               old time/op    new time/op    delta
NestedAndPlanning/renders=16-24       627µs ± 1%     624µs ± 2%     ~     (p=0.143 n=10+10)
NestedAndPlanning/renders=256-24     3.54ms ± 0%    3.04ms ± 1%  -14.14%  (p=0.000 n=9+10)
NestedAndPlanning/renders=4096-24     211ms ± 4%      68ms ± 1%  -67.61%  (p=0.000 n=10+10)

name                               old alloc/op   new alloc/op   delta
NestedAndPlanning/renders=16-24      74.0kB ±20%    68.9kB ±10%     ~     (p=0.053 n=10+9)
NestedAndPlanning/renders=256-24     1.71MB ± 0%    0.60MB ± 0%  -65.07%  (p=0.000 n=8+8)
NestedAndPlanning/renders=4096-24     303MB ± 0%      13MB ± 1%  -95.58%  (p=0.000 n=8+8)

name                               old allocs/op  new allocs/op  delta
NestedAndPlanning/renders=16-24         754 ±18%       733 ±18%     ~     (p=0.105 n=9+9)
NestedAndPlanning/renders=256-24      6.44k ± 0%     5.93k ± 0%   -7.88%  (p=0.000 n=8+8)
NestedAndPlanning/renders=4096-24      146k ± 6%      136k ± 0%   -7.02%  (p=0.000 n=8+8)
```

Fixes: #104996.

Release note (bug fix): Previously, CockroachDB when planning expressions containing many sub-expressions (e.g. deeply-nested AND / OR structures) would use memory quadratical in the number of sub-expressions, and in the worst cases (thousands of sub-expressions) this could lead to OOMs. The bug has been present since at least 22.1 and has now been fixed.

Co-authored-by: Yahor Yuzefovich <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Jul 21, 2023
This commit refactors how we're keeping track of the current type schema
of the operators in `NewColOperator`. Previously, we would create a new
type slice for each operator due to "type schema corruption" bugs we
observed (#47889). We fixed that bug by being extremely conservative,
and this commit applies a different more reasonable fix.

In particular, it is safe to append to the current type slice we have in
scope, and we only need to be careful when we're trying to create
a "projection" (i.e. when we need to change the order of types or modify
one type in-place). Thus, this commit switches to making a copy only in
those scenarios which should happen at most once per processor spec
(previously, it could happen thousands of times for elaborate render
expressions).

Furthermore, this commit reuses the same type slice from `InputSyncSpec`
since creation of the operators occurs _after_ the spec has been
communicated across the wire (or locally), so we're free to use it as we
please.

```
name                               old time/op    new time/op    delta
NestedAndPlanning/renders=16-24       627µs ± 1%     624µs ± 2%     ~     (p=0.143 n=10+10)
NestedAndPlanning/renders=256-24     3.54ms ± 0%    3.04ms ± 1%  -14.14%  (p=0.000 n=9+10)
NestedAndPlanning/renders=4096-24     211ms ± 4%      68ms ± 1%  -67.61%  (p=0.000 n=10+10)

name                               old alloc/op   new alloc/op   delta
NestedAndPlanning/renders=16-24      74.0kB ±20%    68.9kB ±10%     ~     (p=0.053 n=10+9)
NestedAndPlanning/renders=256-24     1.71MB ± 0%    0.60MB ± 0%  -65.07%  (p=0.000 n=8+8)
NestedAndPlanning/renders=4096-24     303MB ± 0%      13MB ± 1%  -95.58%  (p=0.000 n=8+8)

name                               old allocs/op  new allocs/op  delta
NestedAndPlanning/renders=16-24         754 ±18%       733 ±18%     ~     (p=0.105 n=9+9)
NestedAndPlanning/renders=256-24      6.44k ± 0%     5.93k ± 0%   -7.88%  (p=0.000 n=8+8)
NestedAndPlanning/renders=4096-24      146k ± 6%      136k ± 0%   -7.02%  (p=0.000 n=8+8)
```

Release note (bug fix): Previously, CockroachDB when planning
expressions containing many sub-expressions (e.g. deeply-nested AND / OR
structures) would use memory quadratical in the number of
sub-expressions, and in the worst cases (thousands of sub-expressions)
this could lead to OOMs. The bug has been present since at least 22.1
and has now been fixed.
yuzefovich added a commit that referenced this issue Jul 27, 2023
This commit refactors how we're keeping track of the current type schema
of the operators in `NewColOperator`. Previously, we would create a new
type slice for each operator due to "type schema corruption" bugs we
observed (#47889). We fixed that bug by being extremely conservative,
and this commit applies a different more reasonable fix.

In particular, it is safe to append to the current type slice we have in
scope, and we only need to be careful when we're trying to create
a "projection" (i.e. when we need to change the order of types or modify
one type in-place). Thus, this commit switches to making a copy only in
those scenarios which should happen at most once per processor spec
(previously, it could happen thousands of times for elaborate render
expressions).

Furthermore, this commit reuses the same type slice from `InputSyncSpec`
since creation of the operators occurs _after_ the spec has been
communicated across the wire (or locally), so we're free to use it as we
please.

```
name                               old time/op    new time/op    delta
NestedAndPlanning/renders=16-24       627µs ± 1%     624µs ± 2%     ~     (p=0.143 n=10+10)
NestedAndPlanning/renders=256-24     3.54ms ± 0%    3.04ms ± 1%  -14.14%  (p=0.000 n=9+10)
NestedAndPlanning/renders=4096-24     211ms ± 4%      68ms ± 1%  -67.61%  (p=0.000 n=10+10)

name                               old alloc/op   new alloc/op   delta
NestedAndPlanning/renders=16-24      74.0kB ±20%    68.9kB ±10%     ~     (p=0.053 n=10+9)
NestedAndPlanning/renders=256-24     1.71MB ± 0%    0.60MB ± 0%  -65.07%  (p=0.000 n=8+8)
NestedAndPlanning/renders=4096-24     303MB ± 0%      13MB ± 1%  -95.58%  (p=0.000 n=8+8)

name                               old allocs/op  new allocs/op  delta
NestedAndPlanning/renders=16-24         754 ±18%       733 ±18%     ~     (p=0.105 n=9+9)
NestedAndPlanning/renders=256-24      6.44k ± 0%     5.93k ± 0%   -7.88%  (p=0.000 n=8+8)
NestedAndPlanning/renders=4096-24      146k ± 6%      136k ± 0%   -7.02%  (p=0.000 n=8+8)
```

Release note (bug fix): Previously, CockroachDB when planning
expressions containing many sub-expressions (e.g. deeply-nested AND / OR
structures) would use memory quadratical in the number of
sub-expressions, and in the worst cases (thousands of sub-expressions)
this could lead to OOMs. The bug has been present since at least 22.1
and has now been fixed.
yuzefovich added a commit to yuzefovich/cockroach that referenced this issue Jul 27, 2023
This commit refactors how we're keeping track of the current type schema
of the operators in `NewColOperator`. Previously, we would create a new
type slice for each operator due to "type schema corruption" bugs we
observed (cockroachdb#47889). We fixed that bug by being extremely conservative,
and this commit applies a different more reasonable fix.

In particular, it is safe to append to the current type slice we have in
scope, and we only need to be careful when we're trying to create
a "projection" (i.e. when we need to change the order of types or modify
one type in-place). Thus, this commit switches to making a copy only in
those scenarios which should happen at most once per processor spec
(previously, it could happen thousands of times for elaborate render
expressions).

Furthermore, this commit reuses the same type slice from `InputSyncSpec`
since creation of the operators occurs _after_ the spec has been
communicated across the wire (or locally), so we're free to use it as we
please.

```
name                               old time/op    new time/op    delta
NestedAndPlanning/renders=16-24       627µs ± 1%     624µs ± 2%     ~     (p=0.143 n=10+10)
NestedAndPlanning/renders=256-24     3.54ms ± 0%    3.04ms ± 1%  -14.14%  (p=0.000 n=9+10)
NestedAndPlanning/renders=4096-24     211ms ± 4%      68ms ± 1%  -67.61%  (p=0.000 n=10+10)

name                               old alloc/op   new alloc/op   delta
NestedAndPlanning/renders=16-24      74.0kB ±20%    68.9kB ±10%     ~     (p=0.053 n=10+9)
NestedAndPlanning/renders=256-24     1.71MB ± 0%    0.60MB ± 0%  -65.07%  (p=0.000 n=8+8)
NestedAndPlanning/renders=4096-24     303MB ± 0%      13MB ± 1%  -95.58%  (p=0.000 n=8+8)

name                               old allocs/op  new allocs/op  delta
NestedAndPlanning/renders=16-24         754 ±18%       733 ±18%     ~     (p=0.105 n=9+9)
NestedAndPlanning/renders=256-24      6.44k ± 0%     5.93k ± 0%   -7.88%  (p=0.000 n=8+8)
NestedAndPlanning/renders=4096-24      146k ± 6%      136k ± 0%   -7.02%  (p=0.000 n=8+8)
```

Release note (bug fix): Previously, CockroachDB when planning
expressions containing many sub-expressions (e.g. deeply-nested AND / OR
structures) would use memory quadratical in the number of
sub-expressions, and in the worst cases (thousands of sub-expressions)
this could lead to OOMs. The bug has been present since at least 22.1
and has now been fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants