Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: import/mixed-versions failed #87837

Closed
cockroach-teamcity opened this issue Sep 12, 2022 · 11 comments
Closed

roachtest: import/mixed-versions failed #87837

cockroach-teamcity opened this issue Sep 12, 2022 · 11 comments
Labels
A-kv Anything in KV that doesn't belong in a more specific category. branch-master Failures and bugs on the master branch. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). no-test-failure-activity O-roachtest O-robot Originated from a bot. T-kv KV Team X-stale
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Sep 12, 2022

roachtest.import/mixed-versions failed with artifacts on master @ 773568fbda06ba9be9fb1bc34a331f21c8891ffa:

		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:917
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1594
		Wraps: (2) output in run_052436.680610357_n1_cockroach_workload_fixtures_import_tpcc
		Wraps: (3) ././cockroach workload fixtures import tpcc --warehouses=100  returned
		  | stderr:
		  | <... some data truncated by circular buffer; go to artifacts for details ...>
		  | [-] 10 +	github.com/lib/pq/external/com_github_lib_pq/conn.go:1856 +0x46
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +github.com/lib/pq.(*stmt).exec(0xc001a76dc0, {0xc000b33180, 0x4, 0x4})
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +	github.com/lib/pq/external/com_github_lib_pq/conn.go:1448 +0x793
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +github.com/lib/pq.(*conn).query(0xc000360900, {0xc00062ea80, 0x3f}, {0xc000b33180, 0x4, 0x4})
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +	github.com/lib/pq/external/com_github_lib_pq/conn.go:903 +0x312
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +github.com/lib/pq.(*conn).QueryContext(0xc?, {0x657f838, 0xc000076098}, {0xc00062ea80, 0x3f}, {0xc001a76d20, 0x4, 0xc001a76c80?})
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +	github.com/lib/pq/external/com_github_lib_pq/conn_go18.go:24 +0xd7
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +database/sql.ctxDriverQuery({0x657f838?, 0xc000076098?}, {0x7f6d7b5c5660?, 0xc000360900?}, {0x0?, 0x0?}, {0xc00062ea80?, 0x37ee525?}, {0xc001a76d20, 0x4, ...})
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +	GOROOT/src/database/sql/ctxutil.go:48 +0xe5
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +database/sql.(*DB).queryDC.func1()
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +	GOROOT/src/database/sql/sql.go:1757 +0x175
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +database/sql.withLock({0x6556568, 0xc001296b40}, 0xc0013b5a40)
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +	GOROOT/src/database/sql/sql.go:3439 +0x8c
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +database/sql.(*DB).queryDC(0x461501?, {0x657f838, 0xc000076098}, {0x0, 0x0}, 0xc001296b40, 0xc000b10fd0, {0xc00062ea80, 0x3f}, {0xc0005ed840, ...})
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +	GOROOT/src/database/sql/sql.go:1752 +0x211
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +database/sql.(*DB).query(0x0?, {0x657f838, 0xc000076098}, {0xc00062ea80, 0x3f}, {0xc0005ed840, 0x4, 0x4}, 0x29?)
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +	GOROOT/src/database/sql/sql.go:1735 +0xfd
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +database/sql.(*DB).QueryContext(0x653c0c0?, {0x657f838, 0xc000076098}, {0xc00062ea80, 0x3f}, {0xc0005ed840, 0x4, 0x4})
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +	GOROOT/src/database/sql/sql.go:1708 +0xda
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +database/sql.(*DB).Query(...)
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +	GOROOT/src/database/sql/sql.go:1726
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +github.com/cockroachdb/cockroach/pkg/ccl/workloadccl.importFixtureTable({0x657f800, 0xc000b5e1c0}, 0x0?, {0x0, 0x0}, {{0x510e0a6, 0x7}, {0xc000618780, 0x131}, {0x2dc6c0, ...}, ...}, ...)
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +	github.com/cockroachdb/cockroach/pkg/ccl/workloadccl/fixture.go:452 +0x3fb
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +github.com/cockroachdb/cockroach/pkg/ccl/workloadccl.ImportFixture.func1({0x657f800?, 0xc000b5e1c0?})
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +	github.com/cockroachdb/cockroach/pkg/ccl/workloadccl/fixture.go:400 +0xab
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.GoCtx.func1()
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +	github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:168 +0x25
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +golang.org/x/sync/errgroup.(*Group).Go.func1()
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:74 +0x64
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +created by golang.org/x/sync/errgroup.(*Group).Go
		  | I220912 15:23:33.202774 54 util/log/clog.go:409  [-] 10 +	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:71 +0xa5
		  |
		  | stdout:
		Wraps: (4) SSH_PROBLEM
		Wraps: (5) Node 1. Command with error:
		  | ``````
		  | ././cockroach workload fixtures import tpcc --warehouses=100
		  | ``````
		Wraps: (6) exit status 255
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.SSH (5) *hintdetail.withDetail (6) *exec.ExitError

	test_runner.go:1048,test_runner.go:947: test timed out (0s)

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

/cc @cockroachdb/bulk-io

This test on roachdash | Improve this report!

Jira issue: CRDB-19560

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 12, 2022
@cockroach-teamcity cockroach-teamcity added this to the 22.2 milestone Sep 12, 2022
@stevendanna
Copy link
Collaborator

Node 1 has an AdminSplit that has been taking its time:

I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹goroutine 2416 [select, 597 minutes]:›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader(0xc0077eb560)›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   google.golang.org/grpc/internal/transport/external/org_golang_google_grpc/internal/transport/transport.go:324 +0x7c›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹google.golang.org/grpc/internal/transport.(*Stream).RecvCompress(...)›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   google.golang.org/grpc/internal/transport/external/org_golang_google_grpc/internal/transport/transport.go:339›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹google.golang.org/grpc.(*csAttempt).recvMsg(0xc0043771e0, {0x4f98840?, 0xc006e59e80}, 0x0?)›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   google.golang.org/grpc/external/org_golang_google_grpc/stream.go:969 +0xbb›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹google.golang.org/grpc.(*clientStream).RecvMsg.func1(0x93?)›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   google.golang.org/grpc/external/org_golang_google_grpc/stream.go:834 +0x25›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹google.golang.org/grpc.(*clientStream).withRetry(0xc0077eb320, 0xc0019b6580, 0xc0019b6550)›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   google.golang.org/grpc/external/org_golang_google_grpc/stream.go:692 +0xd3›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹google.golang.org/grpc.(*clientStream).RecvMsg(0xc0077eb320, {0x4f98840?, 0xc006e59e80?})›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   google.golang.org/grpc/external/org_golang_google_grpc/stream.go:833 +0x11f›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹google.golang.org/grpc.invoke({0x657f8a8?, 0xc007027800?}, {0x5199a9c?, 0x2?}, {0x50c4180, 0xc00bc9f0e0}, {0x4f98840, 0xc006e59e80}, 0x0?, {0xc007e683a0, ...})›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   google.golang.org/grpc/external/org_golang_google_grpc/call.go:73 +0xd7›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor.ClientInterceptor.func2({0x657f8a8, 0xc007027800}, {0x5199a9c, 0x21}, {0x50c4180, 0xc00bc9f0e0}, {0x4f98840, 0xc006e59e80}, 0x55ef038a194f25a7?, 0x5442298, ...)›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:247 +0x38a›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹google.golang.org/grpc.(*ClientConn).Invoke(0x2fa31c93974157ea?, {0x657f8a8?, 0xc007027800?}, {0x5199a9c?, 0x0?}, {0x50c4180?, 0xc00bc9f0e0?}, {0x4f98840?, 0xc006e59e80?}, {0x0, ...})›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   google.golang.org/grpc/external/org_golang_google_grpc/call.go:35 +0x223›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/roachpb.(*internalClient).Batch(0xc005c50da0, {0x657f8a8, 0xc007027800}, 0x0?, {0x0, 0x0, 0x0})›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/roachpb/bazel-out/k8-opt/bin/pkg/roachpb/roachpb_go_proto_/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:9295 +0xc9›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/rpc/nodedialer.TracingInternalClient.Batch({{0x65d0ab0?, 0xc005c50da0?}}, {0x657f8a8, 0xc007027800}, 0xc00bc9f0e0, {0x0, 0x0, 0x0})›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/rpc/nodedialer/nodedialer.go:328 +0x171›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*grpcTransport).sendBatch(0xc0044f87e0, {_, _}, _, {_, _}, {{{0x0, 0x0, 0x0}, 0x0, ...}, ...})›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/transport.go:211 +0x15e›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*grpcTransport).SendNext(0xc0044f87e0, {_, _}, {{{0x0, 0x0, 0x0}, 0x0, {0x171405a3c3754c45, 0x0, 0x0}, ...}, ...})›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/transport.go:190 +0x158›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendToReplicas(0xc0016b2500, {_, _}, {{{0x0, 0x0, 0x0}, 0x0, {0x171405a3c3754c45, 0x0, 0x0}, ...}, ...}, ...)›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:2069 +0x110f›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendPartialBatch(_, {_, _}, {{{0x0, 0x0, 0x0}, 0x0, {0x171405a3c3754c45, 0x0, 0x0}, ...}, ...}, ...)›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1613 +0x805›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).divideAndSendBatchToRanges(_, {_, _}, {{{0x0, 0x0, 0x0}, 0x0, {0x171405a3c3754c45, 0x0, 0x0}, ...}, ...}, ...)›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1216 +0x433›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).Send(0xc0016b2500, {_, _}, {{{0x0, 0x0, 0x0}, 0x0, {0x171405a3c3754c45, 0x0, 0x0}, ...}, ...})›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:840 +0x629›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv.(*CrossRangeTxnWrapperSender).Send(0xc004107028, {_, _}, {{{0x0, 0x0, 0x0}, 0x0, {0x0, 0x0, 0x0}, ...}, ...})›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/db.go:223 +0xda›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv.(*DB).sendUsingSender(0xc004106fc0, {_, _}, {{{0x0, 0x0, 0x0}, 0x0, {0x0, 0x0, 0x0}, ...}, ...}, ...)›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/db.go:1000 +0x156›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv.(*DB).send(0x4ee805?, {_, _}, {{{0x0, 0x0, 0x0}, 0x0, {0x0, 0x0, 0x0}, ...}, ...})›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/db.go:983 +0x85›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv.sendAndFill({0x657f800, 0xc0069f9e80}, 0x1714062f763e9b78?, 0xc006715180)›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/db.go:844 +0x119›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv.(*DB).Run(0xc004106fc0, {0x657f800, 0xc0069f9e80}, 0xc004d73400?)›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/db.go:867 +0x74›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv.(*DB).AdminSplit(0x8bb2c97000?, {0x657f800, 0xc0069f9e80}, {0x4ea5b00, 0xc041db6cf0}, {0x5295500?, 0x41?, 0x0?}, {0x0, 0x0, ...})›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/db.go:597 +0xb2›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv/bulk.(*SSTBatcher).doFlush(0xc007ba2900, {0x657f800, 0xc0069f9e80}, 0x1)›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/bulk/sst_batcher.go:516 +0xfd4›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv/bulk.(*SSTBatcher).flushIfNeeded(0xc007ba2900, {0x657f800, 0xc0069f9e80}, {0xc088c212ef, 0x1d, 0x3a02d11})›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/bulk/sst_batcher.go:404 +0x4ae›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv/bulk.(*SSTBatcher).AddMVCCKey(0xc007ba2900, {0x657f800, 0xc0069f9e80}, {{0xc088c212ef, 0x1d, 0x3a02d11}, {0x1714058fe54b286e, 0x0, 0x0}}, {0xc088c2130c, ...})›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/bulk/sst_batcher.go:285 +0x5a5›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv/bulk.(*BufferingAdder).doFlush(0xc007ba2900, {0x657f800, 0xc0069f9e80}, 0x1)›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/bulk/buffering_adder.go:281 +0x745›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/kv/bulk.(*BufferingAdder).Add(0xc007ba2900, {0x657f800, 0xc0069f9e80}, {0xc014218990, 0x1d, 0x30}, {0xc0142189c0, 0x2e, 0x2e})›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/kv/bulk/buffering_adder.go:178 +0x385›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/sql/importer.ingestKvs.func5({0x657f800, 0xc0069f9e80})›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/sql/importer/import_processor.go:525 +0x44a›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.GoCtx.func1()›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:168 +0x25›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹golang.org/x/sync/errgroup.(*Group).Go.func1()›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:74 +0x64›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹created by golang.org/x/sync/errgroup.(*Group).Go›
I220912 15:23:33.223121 51 util/log/clog.go:409 ⋮ [-] 2127 +‹   golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:71 +0xa5›

@adityamaru
Copy link
Contributor

adityamaru commented Sep 19, 2022

I'm going to tag KV in giving the above symptom while continuing to look for what AdminSplit is waiting on. Unfortunately, the sendBatch doesn't seem to have the usual nodeID parameter to indicate what we're stuck on.

7/8 imports succeeded, one of the jobs was stuck on an admin split for 10+hours.

@adityamaru adityamaru added A-kv Anything in KV that doesn't belong in a more specific category. T-kv KV Team labels Sep 19, 2022
@stevendanna
Copy link
Collaborator

stevendanna commented Sep 19, 2022

For hours on node 3, I see what looks like 1 split a second on what may be the same key:

39662:I220912 15:23:12.738495 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38007  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36387] (‹manual›)‹›
39663:I220912 15:23:13.363881 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38008  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36388] (‹manual›)‹›
39664:I220912 15:23:14.177744 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38009  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36389] (‹manual›)‹›
39665:I220912 15:23:15.132366 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38010  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36390] (‹manual›)‹›
39666:I220912 15:23:16.179605 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38011  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36391] (‹manual›)‹›
39667:I220912 15:23:16.938684 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38012  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36392] (‹manual›)‹›
39668:I220912 15:23:17.979879 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38013  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36393] (‹manual›)‹›
39669:I220912 15:23:19.445198 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38014  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36394] (‹manual›)‹›
39670:I220912 15:23:20.944152 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38015  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36395] (‹manual›)‹›
39671:I220912 15:23:21.687844 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38016  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36396] (‹manual›)‹›
39672:I220912 15:23:22.962580 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38017  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36397] (‹manual›)‹›
39673:I220912 15:23:23.627525 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38018  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36398] (‹manual›)‹›
39674:I220912 15:23:24.764090 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38019  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36399] (‹manual›)‹›
39675:I220912 15:23:26.201897 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38020  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36400] (‹manual›)‹›
39676:I220912 15:23:26.714246 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38021  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36401] (‹manual›)‹›
39677:I220912 15:23:27.920447 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38022  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36402] (‹manual›)‹›
39678:I220912 15:23:28.626029 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38023  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36403] (‹manual›)‹›
39679:I220912 15:23:29.792226 12136 kv/kvserver/pkg/kv/kvserver/replica_command.go:420 ⋮ [n3,s3,r367/3:‹/Table/17{3/1/44/"…-4}›] 38024  initiating a split of this range at key ‹/Table/173/1/63/"\xa1\xee@+\xb0\xcfH\x00\x80\x00\x...› [r36404] (‹manual›)‹›

@adityamaru
Copy link
Contributor

Related to what Steven mentions above, node 3 is on the older binary version so 22.1 in this case. This is an interesting stack though it has not been spinning for long:

cockroach.stderr.log-* goroutine 12136 [select]:
cockroach.stderr.log-* github.com/cockroachdb/cockroach/pkg/util/retry.(*Retry).Next(0xc0023b5640)
cockroach.stderr.log-*  github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:127 +0x14a
cockroach.stderr.log-* github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeAdminCommandWithDescriptor(0x0, {0x6397238, 0xc001fd7620}, 0xc0023b5738)
cockroach.stderr.log-*  github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_command.go:536 +0x199
cockroach.stderr.log:* github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).AdminSplit(0x1, {0x6397238, 0xc001fd7620}, {{{0xc003f90f80, 0x1c, 0x20}, {0x0, 0x0, 0x0}, 0x0}, ...}, ...)
cockroach.stderr.log-*  github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_command.go:72 +0x175
cockroach.stderr.log-* github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeAdminBatch(0xc001eacc00, {0x6397238, 0xc001fd75f0}, 0xc001687400)
cockroach.stderr.log-*  github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_send.go:915 +0xda8
cockroach.stderr.log-* github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).sendWithoutRangeID(0xc001eacc00, {0x6397238, 0xc001fd75c0}, 0xc001687400)
cockroach.stderr.log-*  github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_send.go:177 +0x44a
cockroach.stderr.log-* github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).Send(...)
cockroach.stderr.log-*  github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_send.go:100
cockroach.stderr.log-* github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).Send(_, {_, _}, {{{0x171405a3c41aad61, 0x0, 0x0}, 0xc0007eff20, {0x3, 0x3, 0x3, ...}, ...}, ...})
cockroach.stderr.log-*  github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_send.go:197 +0x745
cockroach.stderr.log-* github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).Send(_, {_, _}, {{{0x0, 0x0, 0x0}, 0x0, {0x3, 0x3, 0x3, ...}, ...}, ...})
cockroach.stderr.log-*  github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:191 +0x1c8
cockroach.stderr.log-* github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal.func1({0x6397238, 0xc001fd7560})

It corresponds to this retry loop https://github.com/cockroachdb/cockroach/blob/v22.1.7/pkg/kv/kvserver/replica_command.go#L536

@stevendanna
Copy link
Collaborator

stevendanna commented Sep 19, 2022

Is it possible we are just constantly racing something else taking the descriptor lease? Seems unlikely that it would keep happening as we update are lease just before attempting the split.

https://github.com/cockroachdb/cockroach/blob/v22.1.7/pkg/kv/kvserver/replica_command.go#L559

I wonder if our request is such that we are now always going to get a ConditionFailed back for some reason...

@aayushshah15
Copy link
Contributor

@stevendanna mind taking a look at ^. Sounds reasonable to have that generally even after this issue is resolved.

@nvanbenschoten
Copy link
Member

Will we need to backport that for it to work in this test? It sounds like the retry loop was failing to terminate on a v22.1 node.

@aayushshah15
Copy link
Contributor

I think we should backport to both 22.1 and 22.2, if that patch looks reasonable to you.

aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue Sep 20, 2022
craig bot pushed a commit that referenced this issue Sep 21, 2022
88288: kvserver: log retriable errors received during splits r=aayushshah15 a=aayushshah15

Informs #87837

Release note: None

Co-authored-by: Aayush Shah <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Sep 21, 2022
blathers-crl bot pushed a commit that referenced this issue Sep 21, 2022
@exalate-issue-sync exalate-issue-sync bot added blocks-22.2.0-beta.1 release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team and removed T-disaster-recovery GA-blocker T-kv KV Team labels Sep 27, 2022
craig bot pushed a commit that referenced this issue Sep 27, 2022
88752: kvserver: log details about `ConditionFailedError` encountered by splits r=aayushshah15 a=aayushshah15

Informs #87837

Release justification: logging only change

Release note: None

88790: logictest: remove a duplicate query r=yuzefovich a=yuzefovich

This commit removes a single test query since it is an exact duplicate of another one about 100 lines up in the file. There is also no change in the session variables between the two spots.

Release note: None

88814: changefeedccl: Fix array encoding avro bug. r=miretskiy a=miretskiy

Fix latent array avro encoding bug where previously allocated memo array might contain 'nil' element, while the code assumed that the element must always be a map.

Release note: none.

Co-authored-by: Aayush Shah <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Sep 27, 2022
…lits

Informs #87837

Release justification: logging only change

Release note: None
blathers-crl bot pushed a commit that referenced this issue Sep 27, 2022
…lits

Informs #87837

Release justification: logging only change

Release note: None
@nvanbenschoten
Copy link
Member

The fact that we reached these initiating a split of this range at key log lines indicates that this is a different issue from #88667. @aayushshah15 have you tried to stress this yet to get a feel for the failure rate?

@nvanbenschoten nvanbenschoten added GA-blocker branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. blocks-22.2.0-beta.1 labels Oct 3, 2022
@aayushshah15
Copy link
Contributor

I've tried stressing this a lot more over the last week (~700 runs) and have not been able to repro aside from the one time (before #88835 had landed). That one instance of the stall had the split queue spamming the log message from here on a 22.1 node:

split saw concurrent descriptor modification; maybe retrying

Given that the AdminSplit stall, when it occurs, occurs on the 22.1 node, I'm inclined to suggest that we remove the ga-blocker label from this and let this issue stay open until we see it again in the wild.

@github-actions
Copy link

We have marked this test failure issue as stale because it has been
inactive for 1 month. If this failure is still relevant, removing the
stale label or adding a comment will keep it active. Otherwise,
we'll close it in 5 days to keep the test failure queue tidy.

@lunevalex lunevalex added the X-unactionable This was closed because it was unactionable. label Dec 9, 2022
@exalate-issue-sync exalate-issue-sync bot removed X-unactionable This was closed because it was unactionable. X-stale labels Dec 15, 2022
@pav-kv pav-kv added X-unactionable This was closed because it was unactionable. X-stale labels Dec 20, 2022
@exalate-issue-sync exalate-issue-sync bot removed the X-unactionable This was closed because it was unactionable. label Dec 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. branch-master Failures and bugs on the master branch. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). no-test-failure-activity O-roachtest O-robot Originated from a bot. T-kv KV Team X-stale
Projects
None yet
Development

No branches or pull requests

8 participants