Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: race in AssignPlaceholders #34904

Closed
cockroach-teamcity opened this issue Feb 14, 2019 · 7 comments · Fixed by #35027
Closed

opt: race in AssignPlaceholders #34904

cockroach-teamcity opened this issue Feb 14, 2019 · 7 comments · Fixed by #35027
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/acba091f04f3d8ecabf51009bf394951fbd3643c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/jobcoordinator/nodes=6 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1137872&tab=buildLog

The test failed on master:
	cluster.go:1585,scaledata.go:126,scaledata.go:53,test.go:1212: unexpected node event: 2: dead

@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Feb 14, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Feb 14, 2019
@nvanbenschoten
Copy link
Member

fatal error: concurrent map read and map write

goroutine 400 [running]:
runtime.throw(0x317b02d, 0x21)
	/usr/local/go/src/runtime/panic.go:608 +0x72 fp=0xc0064b5518 sp=0xc0064b54e8 pc=0x709352
runtime.mapaccess2_fast64(0x2cfd160, 0xc00733e540, 0x300000000, 0x3, 0x1)
	/usr/local/go/src/runtime/map_fast64.go:61 +0x1a8 fp=0xc0064b5540 sp=0xc0064b5518 pc=0x6edc68
github.com/cockroachdb/cockroach/pkg/sql/opt/props.(*ColStatsMap).Lookup(0xc007229cc8, 0x8, 0x0, 0x0, 0x3922de0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/props/col_stats_map.go:140 +0xae fp=0xc0064b5598 sp=0xc0064b5540 pc=0x1d5b12e
github.com/cockroachdb/cockroach/pkg/sql/opt/props.(*ColStatsMap).Add(0xc007229cc8, 0x8, 0x0, 0x3904000, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/props/col_stats_map.go:154 +0x43 fp=0xc0064b5600 sp=0xc0064b5598 pc=0x1d5b2e3
github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).colStatLeaf(0xc0072583a8, 0x8, 0x0, 0xc007229cc0, 0xc00733e210, 0x16, 0x0, 0x1d6e464)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:357 +0x60 fp=0xc0064b56c0 sp=0xc0064b5600 pc=0x1de6f90
github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).colStatTable(0xc0072583a8, 0x100000001, 0x8, 0x0, 0x306d200)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:462 +0xcf fp=0xc0064b5720 sp=0xc0064b56c0 pc=0x1de793f
github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).colStatScan(0xc0072583a8, 0x8, 0x0, 0xc0070e7b70, 0x38f7700)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:523 +0x6a fp=0xc0064b5778 sp=0xc0064b5720 pc=0x1de7e8a
github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).colStat(0xc0072583a8, 0x8, 0x0, 0x3921620, 0xc0070e7b70, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:290 +0x6ee fp=0xc0064b57e8 sp=0xc0064b5778 pc=0x1de6a3e
github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).colStatFromChild(0xc0072583a8, 0x8, 0x0, 0x39216c0, 0xc0070c5610, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:197 +0xfe fp=0xc0064b5850 sp=0xc0064b57e8 pc=0x1de58ee
github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).colStatFromInput(0xc0072583a8, 0x8, 0x0, 0x39216c0, 0xc0070c5610, 0x8)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:217 +0x6bb fp=0xc0064b5900 sp=0xc0064b5850 pc=0x1de616b
github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).ensureColStat(0xc0072583a8, 0x8, 0x0, 0x4008000000000000, 0x39216c0, 0xc0070c5610, 0xc0070c54a8, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:1964 +0xc6 fp=0xc0064b5948 sp=0xc0064b5900 pc=0x1dee9b6
github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).updateDistinctCountsFromConstraint(0xc0072583a8, 0xc0062ae0c0, 0x39216c0, 0xc0070c5610, 0xc0070c54a8, 0x1d705c4)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:2374 +0x7ad fp=0xc0064b5b88 sp=0xc0064b5948 pc=0x1defc9d
github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).applyConstraintSet(0xc0072583a8, 0xc0062ae0c0, 0x39216c0, 0xc0070c5610, 0xc0070c54a8, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:2224 +0xee fp=0xc0064b5bd8 sp=0xc0064b5b88 pc=0x1def33e
github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).applyFilter.func1(0xc0072b8900)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:2171 +0x1a0 fp=0xc0064b5c50 sp=0xc0064b5bd8 pc=0x1df4bf0
github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).applyFilter(0xc0072583a8, 0xc0072b8900, 0x2, 0x2, 0x39216c0, 0xc0070c5610, 0xc0070c54a8, 0x0, 0x8, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:2183 +0xd0 fp=0xc0064b5ca8 sp=0xc0064b5c50 pc=0x1def020
github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).buildSelect(0xc0072583a8, 0xc0070c5610, 0xc0070c54a8)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:590 +0x1bf fp=0xc0064b5d98 sp=0xc0064b5ca8 pc=0x1de839f
github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*logicalPropsBuilder).buildSelectProps(0xc007258398, 0xc0070c5610, 0xc0070c54a8)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/logical_props_builder.go:236 +0x3c1 fp=0xc0064b5e20 sp=0xc0064b5d98 pc=0x1ddd381
github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*Memo).MemoizeSelect(0xc0072582c0, 0x3921620, 0xc0070e7b70, 0xc0072b8900, 0x2, 0x2, 0xc0072b3b30, 0xc0064b6020)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/expr.og.go:12016 +0x14b fp=0xc0064b5e60 sp=0xc0064b5e20 pc=0x1d9d02b
github.com/cockroachdb/cockroach/pkg/sql/opt/norm.(*Factory).ConstructSelect(0xc0062f5b90, 0x3921620, 0xc0070e7b70, 0xc0072b8900, 0x2, 0x2, 0x2, 0x2030ace)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/norm/factory.og.go:735 +0x50a1 fp=0xc0064b6490 sp=0xc0064b5e60 pc=0x1e31841
github.com/cockroachdb/cockroach/pkg/sql/opt/norm.(*Factory).assignPlaceholders(0xc0062f5b90, 0x38f7800, 0xc006f2bf70, 0x38f7800, 0xc006f2bf70)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/norm/factory.og.go:14612 +0x4cfe fp=0xc0064b6888 sp=0xc0064b6490 pc=0x1e9dd0e
github.com/cockroachdb/cockroach/pkg/sql/opt/norm.(*Factory).AssignPlaceholders(0xc0062f5b90, 0xc006a2ba20, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/norm/factory.go:192 +0x4b8 fp=0xc0064b6940 sp=0xc0064b6888 pc=0x1e2b478
github.com/cockroachdb/cockroach/pkg/sql.(*optPlanningCtx).reuseMemo(0xc0062aa3f0, 0xc006a2ba20, 0xc0072b3830, 0x3153678, 0x13)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/plan_opt.go:291 +0x74 fp=0xc0064b6980 sp=0xc0064b6940 pc=0x201cde4
github.com/cockroachdb/cockroach/pkg/sql.(*optPlanningCtx).buildExecMemo(0xc0062aa3f0, 0x38e9280, 0xc0072b3830, 0xc0072cc6c0, 0x38e9280, 0xc0072b3830, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/plan_opt.go:327 +0x814 fp=0xc0064b6a88 sp=0xc0064b6980 pc=0x201d694
github.com/cockroachdb/cockroach/pkg/sql.(*planner).makeOptimizerPlan(0xc0062f5458, 0x38e9280, 0xc0072b3830, 0x316290d, 0x19, 0xc63de47d7, 0x14904cd0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/plan_opt.go:145 +0xc8 fp=0xc0064b6b08 sp=0xc0064b6a88 pc=0x201c198
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).makeExecPlan(0xc0062f5000, 0x38e9280, 0xc0072b3830, 0xc0062f5458, 0x6, 0x6e8b1f)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:930 +0x11b fp=0xc0064b6bc8 sp=0xc0064b6b08 pc=0x1f41e1b
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).dispatchToExecutionEngine(0xc0062f5000, 0x38e9280, 0xc0072b3830, 0xc0062f5458, 0x7f1777fa0808, 0xc0072b0840, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:825 +0x148 fp=0xc0064b6cc8 sp=0xc0064b6bc8 pc=0x1f41148
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState(0xc0062f5000, 0x38e9280, 0xc0072b3830, 0x38eecc0, 0xc0072cc6c0, 0xc006357522, 0x65, 0x4, 0xc0062c37c0, 0x8, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:438 +0xe49 fp=0xc0064b74a0 sp=0xc0064b6cc8 pc=0x1f3d929
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt(0xc0062f5000, 0x38e9280, 0xc0072b3830, 0x38eecc0, 0xc0072cc6c0, 0xc006357522, 0x65, 0x4, 0xc0062c37c0, 0x8, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:93 +0x36f fp=0xc0064b7660 sp=0xc0064b74a0 pc=0x1f3c63f
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run(0xc0062f5000, 0x38e91c0, 0xc00619b440, 0xc00058ce78, 0x5400, 0x15000, 0xc00058cf10, 0xc006098e90, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1241 +0x1439 fp=0xc0064b7ec8 sp=0xc0064b7660 pc=0x1f32969
github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn(0xc000c26340, 0x38e91c0, 0xc00619b440, 0xc0062f5000, 0x5400, 0x15000, 0xc00058cf10, 0xc006098e90, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:429 +0xce fp=0xc0064b7f28 sp=0xc0064b7ec8 pc=0x1f2e04e
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).serveImpl.func4(0xc000c26340, 0xc005e2f600, 0xc006117534, 0xc006098e90, 0x38e91c0, 0xc00619b440, 0xc0062f5000, 0x5400, 0x15000, 0xc00058cf10, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:331 +0xcf fp=0xc0064b7f88 sp=0xc0064b7f28 pc=0x23d152f
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc0064b7f90 sp=0xc0064b7f88 pc=0x739701
created by github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).serveImpl
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:314 +0x1040

@nvanbenschoten nvanbenschoten added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Feb 14, 2019
@nvanbenschoten
Copy link
Member

@andy-kimball looks like access to the ColStatsMap isn't properly synchronized.

@andy-kimball
Copy link
Contributor

It shouldn't need to be synchronized. I suspect this is related to query caching. +@RaduBerinde

@RaduBerinde
Copy link
Member

@nvanbenschoten how did you get that log? I'd like to look at the other goroutines

@RaduBerinde RaduBerinde self-assigned this Feb 14, 2019
@RaduBerinde
Copy link
Member

I think I found the problem - the TableAnnotations in the Metadata need to be cleared (or perhaps copied) when we copy the metadata. Otherwise the two copies might have the same annotation, and they could independently try to compute new stats on the same *props.Statistics object.

@RaduBerinde
Copy link
Member

I will work on this. The fix should be easy, modulo writing a test.

@RaduBerinde RaduBerinde changed the title roachtest: scaledata/jobcoordinator/nodes=6 failed opt: race in AssignPlaceholders Feb 16, 2019
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 16, 2019
Test that reproduces cockroachdb#34904. Writing the test exposed a problem in the
`CopyAndReplace` API: there was no way to recursively copy a subtree
when creating an internal node. The API was reverted to the first
iteration Andy had in his PR.

Release note: None
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 16, 2019
When we assign placeholders on a detached memo, we copy the metadata.
However, this does a shallow copy of table annotations; this is
problematic for the statistics annotation which is a mutable object.

The fix is to register a copy function for each type of table
annotation as part of `NewTableAnnID`.

Fixes cockroachdb#34904.

Release note (bug fix): Fixed a crash related to cached plans.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 19, 2019
Test that reproduces cockroachdb#34904. Writing the test exposed a problem in the
`CopyAndReplace` API: there was no way to recursively copy a subtree
when creating an internal node. The API was reverted to the first
iteration Andy had in his PR.

Release note: None
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 19, 2019
When we assign placeholders on a detached memo, we copy the metadata.
However, this does a shallow copy of table annotations; this is
problematic for the statistics annotation which is a mutable object.

The fix is to register a copy function for each type of table
annotation as part of `NewTableAnnID`.

Fixes cockroachdb#34904.

Release note (bug fix): Fixed a crash related to cached plans.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 19, 2019
When we assign placeholders on a detached memo, we copy the metadata.
However, this does a shallow copy of table annotations; this is
problematic for the statistics annotation which is a mutable object.

The fix is to register a copy function for each type of table
annotation as part of `NewTableAnnID`.

Fixes cockroachdb#34904.

Release note (bug fix): Fixed a crash related to cached plans.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 20, 2019
Test that reproduces cockroachdb#34904. Writing the test exposed a problem in the
`CopyAndReplace` API: there was no way to recursively copy a subtree
when creating an internal node. The API was reverted to the first
iteration Andy had in his PR.

Release note: None
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 20, 2019
When we assign placeholders on a detached memo, we copy the metadata.
However, this does a shallow copy of table annotations; this is
problematic for the statistics annotation which is a mutable object.

The fix is to register a copy function for each type of table
annotation as part of `NewTableAnnID`. For statistics, we simply clear
out the annotation because it can be recalculated as needed (and it
turns out to be faster than copying it).

Fixes cockroachdb#34904.

Release note (bug fix): Fixed a crash related to cached plans.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 20, 2019
When we assign placeholders on a detached memo, we copy the metadata.
However, this does a shallow copy of table annotations; this is
problematic for the statistics annotation which is a mutable object.

The fix is to register a copy function for each type of table
annotation as part of `NewTableAnnID`. For statistics, we simply clear
out the annotation because it can be recalculated as needed (and it
turns out to be faster than copying it).

Fixes cockroachdb#34904.

Release note (bug fix): Fixed a crash related to cached plans.
craig bot pushed a commit that referenced this issue Feb 21, 2019
35027: opt: fix race caused by shared table annotations r=RaduBerinde a=RaduBerinde

#### opt: test reproducing detached memo race

Test that reproduces #34904. Writing the test exposed a problem in the
`CopyAndReplace` API: there was no way to recursively copy a subtree
when creating an internal node. The API was reverted to the first
iteration Andy had in his PR.

Release note: None

#### opt: replace ColumnMeta.TableMeta with a TableID

The TableMeta field is problematic: it needs to be fixed up when
copying Metadata; and since it points into a slice that we append to,
it may or may not be aliased with the corresponding entry in `tables`.
Avoiding these complication by just storing the TabeID. The `md`
backpointer is also removed and QualifiedAlias is moved to Metadata.

Release note: None

#### opt: fix race caused by shared table annotations

When we assign placeholders on a detached memo, we copy the metadata.
However, this does a shallow copy of table annotations; this is
problematic for the statistics annotation which is a mutable object.

The fix is to register a copy function for each type of table
annotation as part of `NewTableAnnID`.

Fixes #34904.

Release note (bug fix): Fixed a crash related to cached plans.


Co-authored-by: Radu Berinde <[email protected]>
@craig craig bot closed this as completed in #35027 Feb 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants