Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: cdc/cloud-sink-gcs/rangefeed=true failed #87939

Closed
cockroach-teamcity opened this issue Sep 14, 2022 · 3 comments · Fixed by #88130
Closed

roachtest: cdc/cloud-sink-gcs/rangefeed=true failed #87939

cockroach-teamcity opened this issue Sep 14, 2022 · 3 comments · Fixed by #88130
Assignees
Labels
branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). GA-blocker O-roachtest O-robot Originated from a bot. T-cdc
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Sep 14, 2022

roachtest.cdc/cloud-sink-gcs/rangefeed=true failed with artifacts on release-22.2 @ ba1686e00bb140b67ba9ecc154d949795c2555b6:

		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cdc.go:1602
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.cdcBasicTest.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cdc.go:176
		  | main.(*monitorImpl).Go.func1
		  | 	main/pkg/cmd/roachtest/monitor.go:105
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:74
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1594
		Wraps: (2) output in run_064311.926715310_n4_workload_run_tpcc
		Wraps: (3) ./workload run tpcc --warehouses=50 --duration=30m  {pgurl:1-3}  returned
		  | stderr:
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 4. Command with error:
		  |   | ``````
		  |   | ./workload run tpcc --warehouses=50 --duration=30m  {pgurl:1-3}
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	monitor.go:127,cdc.go:300,cdc.go:773,test_runner.go:908: monitor failure: monitor task failed: dial tcp 34.138.55.232:26257: connect: connection refused
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.cdcBasicTest
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cdc.go:300
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerCDC.func7
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cdc.go:773
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1594
		Wraps: (4) monitor task failed
		Wraps: (5) dial tcp 34.138.55.232:26257
		Wraps: (6) connect
		Wraps: (7) connection refused
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *net.OpError (6) *os.SyscallError (7) syscall.Errno

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/cdc

This test on roachdash | Improve this report!

Jira issue: CRDB-19602

Epic CRDB-11732

@cockroach-teamcity cockroach-teamcity added branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 14, 2022
@cockroach-teamcity cockroach-teamcity added this to the 22.2 milestone Sep 14, 2022
@blathers-crl blathers-crl bot added the T-cdc label Sep 14, 2022
@miretskiy
Copy link
Contributor

**fatal error: concurrent map iteration and map write

goroutine 197 [running]:
runtime.fatal({0x51acc23?, 0x4776840?})
        GOROOT/src/runtime/panic.go:1066 +0x5d fp=0xc0055b5150 sp=0xc0055b5120 pc=0x48dc9d
runtime.mapiternext(0xc0055b52c8)
        GOROOT/src/runtime/map.go:873 +0x45 fp=0xc0055b51c0 sp=0xc0055b5150 pc=0x463b65
runtime.mapiterinit(0x0?, 0x4?, 0x0?)
        GOROOT/src/runtime/map.go:863 +0x236 fp=0xc0055b51e0 sp=0xc0055b51c0 pc=0x463ad6
github.com/cockroachdb/cockroach/pkg/jobs/jobspb.(*ChangefeedDetails).Size(0x0?)
        github.com/cockroachdb/cockroach/pkg/jobs/jobspb/bazel-out/k8-opt/bin/pkg/jobs/jobspb/jobspb_go_proto_/github.com/cockroachdb/cockroach/pkg/jobs/jobspb/jobs.pb.go:10608 +0xcc fp=0xc0055b5338 sp=0xc0055b51e0 pc=0x188c88c
github.com/cockroachdb/cockroach/pkg/sql/execinfrapb.(*ChangeAggregatorSpec).Size(0xc007932870)
        github.com/cockroachdb/cockroach/pkg/sql/execinfrapb/bazel-out/k8-opt/bin/pkg/sql/execinfrapb/execinfrapb_go_proto_/github.com/cockroachdb/cockroach/pkg/sql/execinfrapb/processors_changefeeds.pb.go:493 +0x65 fp=0xc0055b53c0 sp=0xc0055b5338 pc=0x1ed5ce5
github.com/cockroachdb/cockroach/pkg/sql/execinfrapb.(*ProcessorCoreUnion).Size(0x46159f?)
        github.com/cockroachdb/cockroach/pkg/sql/execinfrapb/bazel-out/k8-opt/bin/pkg/sql/execinfrapb/execinfrapb_go_proto_/github.com/cockroachdb/cockroach/pkg/sql/execinfrapb/processors.pb.go:1059 +0x705 fp=0xc0055b54c8 sp=0xc0055b53c0 pc=0x1ea4d45
github.com/cockroachdb/cockroach/pkg/sql/execinfrapb.(*ProcessorSpec).Size(0xc0055b5610)
        github.com/cockroachdb/cockroach/pkg/sql/execinfrapb/bazel-out/k8-opt/bin/pkg/sql/execinfrapb/execinfrapb_go_proto_/github.com/cockroachdb/cockroach/pkg/sql/execinfrapb/processors.pb.go:962 +0x7a fp=0xc0055b55e0 sp=0xc0055b54c8 pc=0x1ea42da
github.com/cockroachdb/cockroach/pkg/sql/execinfrapb.(*FlowSpec).Size(0x7f388b1800a8?)
        github.com/cockroachdb/cockroach/pkg/sql/execinfrapb/bazel-out/k8-opt/bin/pkg/sql/execinfrapb/execinfrapb_go_proto_/github.com/cockroachdb/cockroach/pkg/sql/execinfrapb/api.pb.go:1092 +0x11d fp=0xc0055b57c0 sp=0xc0055b55e0 pc=0x1e727fd
github.com/cockroachdb/cockroach/pkg/sql/execinfrapb.(*SetupFlowRequest).Size(0xc00791a780)
        github.com/cockroachdb/cockroach/pkg/sql/execinfrapb/bazel-out/k8-opt/bin/pkg/sql/execinfrapb/execinfrapb_go_proto_/github.com/cockroachdb/cockroach/pkg/sql/execinfrapb/api.pb.go:1062 +0x3a fp=0xc0055b57f0 sp=0xc0055b57c0 pc=0x1e7251a
github.com/cockroachdb/cockroach/pkg/sql/execinfrapb.(*SetupFlowRequest).Marshal(0x0?)
        github.com/cockroachdb/cockroach/pkg/sql/execinfrapb/bazel-out/k8-opt/bin/pkg/sql/execinfrapb/execinfrapb_go_proto_/github.com/cockroachdb/cockroach/pkg/sql/execinfrapb/api.pb.go:677 +0x25 fp=0xc0055b5838 sp=0xc0055b57f0 pc=0x1e703e5
github.com/cockroachdb/cockroach/pkg/rpc.codec.Marshal({}, {0x4f1fa60?, 0xc00791a780})
        github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/codec.go:34 +0x93 fp=0xc0055b5880 sp=0xc0055b5838 pc=0x16d7453
github.com/cockroachdb/cockroach/pkg/rpc.(*codec).Marshal(0xc0000b7000?, {0x4f1fa60?, 0xc00791a780?})
        <autogenerated>:1 +0x37 fp=0xc0055b58a0 sp=0xc0055b5880 pc=0x16ea5d7
github.com/cockroachdb/cockroach/pkg/rpc.(*growStackCodec).Marshal(0x0?, {0x4f1fa60?, 0xc00791a780?})
        <autogenerated>:1 +0x34 fp=0xc0055b58c8 sp=0xc0055b58a0 pc=0x16ea774
google.golang.org/grpc.encode({0x7f38949ab970?, 0xc0002a9fc0?}, {0x4f1fa60?, 0xc00791a780?})
        google.golang.org/grpc/external/org_golang_google_grpc/rpc_util.go:594 +0x44 fp=0xc0055b5918 sp=0xc0055b58c8 pc=0xc0ff64
google.golang.org/grpc.prepareMsg({0x4f1fa60?, 0xc00791a780?}, {0x7f38949ab970?, 0xc0002a9fc0?}, {0x0, 0x0}, {0x0, 0x0})
        google.golang.org/grpc/external/org_golang_google_grpc/stream.go:1628 +0xd2 fp=0xc0055b5990 sp=0xc0055b5918 pc=0xc279f2
google.golang.org/grpc.(*clientStream).SendMsg(0xc00778a360, {0x4f1fa60, 0xc00791a780})
        google.golang.org/grpc/external/org_golang_google_grpc/stream.go:797 +0x176 fp=0xc0055b5af0 sp=0xc0055b5990 pc=0xc21ed6**

@miretskiy
Copy link
Contributor

I suspect it might be new(ish) opts handling code. It seems that the crash happen early on during the
startup; Part of setup flow. This runs on 1 go routine.
To setup flow, one of the things we do is we create a sink; to create the sink we do:
https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/changefeedccl/sink.go#L131-L131
And the above method does:

for key, value := range opts {
		if _, ok := CaseInsensitiveOpts[key]; ok {
			opts[key] = strings.ToLower(value)
		}
	}

Which is not okay.

@miretskiy
Copy link
Contributor

Removing rel blocker; but need to fix before GA

@miretskiy miretskiy added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 19, 2022
craig bot pushed a commit that referenced this issue Sep 19, 2022
88129: opt: copy ColSet in CreateLocalityOptimizedLookupJoinPrivateIncludingCols r=mgartner a=mgartner

`CreateLocalityOptimizedLookupJoinPrivateIncludingCols` was mutating a
`opt.ColSet` field of another `LookupJoinPrivate` because it was calling
`ColSet.UnionWith` without copying the `ColSet` first. This commit fixes
the bug.

Fixes #88126

Release note: None


88130: changefeedccl: avoid concurrent map access r=[miretskiy] a=HonoreDB

go 1.18 introduced more stringent checks for unsafe concurrent map use, surfacing some new and exciting panics in changefeed code.

When backported, fixes #87939
When backported, fixes #88089
When backported, fixes #87899

Release note (bug fix): Fixed crashes in changefeed code when running on recent go versions.

88134: kvserver: tweak a comment about raft snaps r=nvanbenschoten a=tbg

Suggested by Nathan[^1].

[^1]: #87702 (comment)

Release note: None


Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Aaron Zinger <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
@craig craig bot closed this as completed in 03926b3 Sep 19, 2022
HonoreDB added a commit to HonoreDB/cockroach that referenced this issue Sep 20, 2022
go 1.18 introduced more stringent checks for unsafe concurrent map use, surfacing
some new and exciting panics in changefeed code.

When backported, fixes cockroachdb#87939
When backported, fixes cockroachdb#88089
When backported, fixes cockroachdb#87899

Release note (bug fix): Fixed crashes in changefeed code when running on recent go versions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). GA-blocker O-roachtest O-robot Originated from a bot. T-cdc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants