Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dead node during running tpc-c (post import) #34226

Closed
awoods187 opened this issue Jan 24, 2019 · 6 comments
Closed

Dead node during running tpc-c (post import) #34226

awoods187 opened this issue Jan 24, 2019 · 6 comments
Assignees
Labels
S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors

Comments

@awoods187
Copy link
Contributor

awoods187 commented Jan 24, 2019

Describe the problem

Node died while running tpc-c on a six node cluster. In the shell I saw

Error: error in newOrder: EOF
Error:  exit status 1

To Reproduce

  1. export CLUSTER=andy-base
  2. roachprod create $CLUSTER -n 7 --clouds=aws --aws-machine-type-ssd=c5d.4xlarge
  3. roachprod run $CLUSTER:1-6 -- 'sudo umount /mnt/data1; sudo mount -o discard,defaults,nobarrier /dev/nvme1n1 /mnt/data1/; mount | grep /mnt/data1'
  4. roachprod stage $CLUSTER:1-6 cockroach
  5. roachprod stage $CLUSTER:7 workload
  6. roachprod start $CLUSTER:1-6
  7. roachprod adminurl --open $CLUSTER:1
  8. roachprod run $CLUSTER:1 -- "./cockroach workload fixtures import tpcc --warehouses=5000 --db=tpcc"
  9. roachprod run $CLUSTER:7 "./workload run tpcc --ramp=5m --warehouses=4000 --duration=15m --split --scatter {pgurl:1-3}"

Expected behavior
Completing tpc-c

Additional data / screenshots
image

Environment:
v2.2.0-alpha.20181217-820-g645c0c9

@awoods187
Copy link
Contributor Author

awoods187 commented Jan 24, 2019

ERROR: [n2,client=172.31.32.118:45212,user=root] a panic has occurred!
*
panic while executing 1 statements: SELECT _, _, _ FROM _ WHERE _ IN (_, _, __more5__) ORDER BY _; caused by runtime error: index out of range

@awoods187
Copy link
Contributor Author

@jordanlewis
Copy link
Member

This one is pretty wacky. It's an out of bounds error inside of protobuf, caused by GRPC marshalling some ScanRequest data to the network. Not sure what to make of this...

goroutine 447661 [running]:
runtime/debug.Stack(0x38b4f00, 0xc00e7a84c0, 0xc000000003)
	/usr/local/go/src/runtime/debug/stack.go:24 +0xa7
github.com/cockroachdb/cockroach/pkg/util/log.ReportPanic(0x38b4f00, 0xc00e7a84c0, 0xc0005e5300, 0x2f5da40, 0xc04a9aae40, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/crash_reporting.go:212 +0xa6
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).closeWrapper(0xc0167b8000, 0x38b4f00, 0xc00e7a84c0, 0x2d24ec0, 0x5486fa0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:704 +0x2dd
github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn.func1(0xc0167b8000, 0x38b4f00, 0xc00e7a84c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:423 +0x61
panic(0x2d24ec0, 0x5486fa0)
	/usr/local/go/src/runtime/panic.go:513 +0x1b9
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).divideAndSendBatchToRanges.func1(0xc002359730, 0xc002359a20, 0xc002359960, 0xc002359a18, 0xc002359697, 0xc002359718, 0xc00235969c)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:807 +0x50e
panic(0x2d24ec0, 0x5486fa0)
	/usr/local/go/src/runtime/panic.go:513 +0x1b9
github.com/cockroachdb/cockroach/pkg/roachpb.(*ScanRequest).MarshalTo(0xc0219b1f40, 0xc032607cc2, 0x15, 0x15, 0x17, 0x2, 0x13)
	/go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:10150 +0x17d
github.com/cockroachdb/cockroach/pkg/roachpb.(*RequestUnion_Scan).MarshalTo(0xc03e105368, 0xc032607cc0, 0x17, 0x17, 0xf2b11e, 0xc03e105368, 0x19)
	/go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:13237 +0xdf
github.com/cockroachdb/cockroach/pkg/roachpb.(*RequestUnion).MarshalTo(0xc002358060, 0xc032607cc0, 0x17, 0x17, 0x19, 0xc0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:13138 +0x73
github.com/cockroachdb/cockroach/pkg/roachpb.(*BatchRequest).MarshalTo(0xc07e542380, 0xc032607c00, 0xd7, 0xd7, 0xd7, 0xd7, 0x30c6000)
	/go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:14550 +0x246
github.com/cockroachdb/cockroach/pkg/roachpb.(*BatchRequest).Marshal(0xc07e542380, 0x30c6000, 0xc07e542380, 0x7f0fe2fa2bb8, 0xc07e542380, 0xc0abd16001)
	/go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:14525 +0x7f
github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/encoding/proto.codec.Marshal(0x30c6000, 0xc07e542380, 0xc039f56100, 0x3, 0xc00007ca70, 0xc00007ca00, 0x7f0fe3062ac8)
	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/encoding/proto/proto.go:70 +0x19c
github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.encode(0x7f0fe2ecd0f8, 0x5912dd8, 0x30c6000, 0xc07e542380, 0xc049683260, 0x38e00c0, 0x38b5d00, 0xc002358288, 0x0)
	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/rpc_util.go:487 +0x5e
github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*csAttempt).sendMsg(0xc007f529c0, 0x30c6000, 0xc07e542380, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/stream.go:482 +0xca
github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*clientStream).SendMsg(0xc07e542400, 0x30c6000, 0xc07e542380, 0xc005730300, 0x313d4c0)
	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/stream.go:403 +0x43
github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.invoke(0x38b4fc0, 0xc0c7b809f0, 0x313d4c0, 0x21, 0x30c6000, 0xc07e542380, 0x301af40, 0xc04b324460, 0xc005730300, 0xc0003fd8c0, ...)
	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/call.go:75 +0xfe
github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc.OpenTracingClientInterceptor.func1(0x38b4fc0, 0xc0c7b809f0, 0x313d4c0, 0x21, 0x30c6000, 0xc07e542380, 0x301af40, 0xc04b324460, 0xc005730300, 0x32614c8, ...)
	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc/client.go:47 +0xb49
github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*ClientConn).Invoke(0xc005730300, 0x38b4fc0, 0xc0c7b809f0, 0x313d4c0, 0x21, 0x30c6000, 0xc07e542380, 0x301af40, 0xc04b324460, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/call.go:35 +0x109
github.com/cockroachdb/cockroach/pkg/roachpb.(*internalClient).Batch(0xc096d57ad0, 0x38b4fc0, 0xc0c7b809f0, 0xc07e542380, 0x0, 0x0, 0x0, 0xc0c7b809f0, 0x6, 0x38b4fc0)
	/go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:9299 +0xd2
github.com/cockroachdb/cockroach/pkg/kv.(*grpcTransport).sendBatch(0xc049683140, 0x38b4fc0, 0xc0c7b809f0, 0x6, 0x3889000, 0xc096d57ad0, 0x0, 0x0, 0x600000006, 0x3, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/transport.go:199 +0x126
github.com/cockroachdb/cockroach/pkg/kv.(*grpcTransport).SendNext(0xc049683140, 0x38b4fc0, 0xc0c7b809f0, 0x0, 0x0, 0x600000006, 0x3, 0x4139, 0x0, 0xc081ce0d00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/transport.go:168 +0x130
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendToReplicas(0xc000668400, 0x38b4fc0, 0xc0c7b809f0, 0xc000668450, 0x4139, 0xc0a7f1f7c0, 0x3, 0x3, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:1345 +0x2d3
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendRPC(0xc000668400, 0x38b4fc0, 0xc0c7b809f0, 0x4139, 0xc0a7f1f7c0, 0x3, 0x3, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:400 +0x244
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendSingleRange(0xc000668400, 0x38b4fc0, 0xc0c7b809f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc081ce0d00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:478 +0x228
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendPartialBatch(0xc000668400, 0x38b4fc0, 0xc0c7b809f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc081ce0d00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:1121 +0x322
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).divideAndSendBatchToRanges(0xc000668400, 0x38b4fc0, 0xc0c7b809f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc081ce0d00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:944 +0x8b3
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).Send(0xc000668400, 0x38b4fc0, 0xc0c7b809f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc095829000, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:692 +0x48b
github.com/cockroachdb/cockroach/pkg/kv.(*txnLockGatekeeper).SendLocked(0xc064986fd0, 0x38b4fc0, 0xc0c7b809f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc095829000, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_coord_sender.go:232 +0xe8
github.com/cockroachdb/cockroach/pkg/kv.(*txnMetrics).SendLocked(0xc064986f98, 0x38b4fc0, 0xc0c7b809f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc095829000, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_metrics.go:56 +0xa2
github.com/cockroachdb/cockroach/pkg/kv.(*txnSpanRefresher).sendLockedWithRefreshAttempts(0xc064986f00, 0x38b4fc0, 0xc0c7b809f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc095829000, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_span_refresher.go:160 +0x83
github.com/cockroachdb/cockroach/pkg/kv.(*txnSpanRefresher).SendLocked(0xc064986f00, 0x38b4fc0, 0xc0c7b809f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc095829000, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_span_refresher.go:101 +0xf9
github.com/cockroachdb/cockroach/pkg/kv.(*txnPipeliner).SendLocked(0xc064986e80, 0x38b4fc0, 0xc0c7b809f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc095829000, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_pipeliner.go:165 +0xf9
github.com/cockroachdb/cockroach/pkg/kv.(*txnIntentCollector).SendLocked(0xc064986e40, 0x38b4fc0, 0xc0c7b809f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc095829000, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_intent_collector.go:105 +0x474
github.com/cockroachdb/cockroach/pkg/kv.(*txnSeqNumAllocator).SendLocked(0xc064986f80, 0x38b4fc0, 0xc0c7b809f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc095829000, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_sequence_nums.go:66 +0x23b
github.com/cockroachdb/cockroach/pkg/kv.(*txnHeartbeat).SendLocked(0xc064986d88, 0x38b4fc0, 0xc0c7b809f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc095829000, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_heartbeat.go:248 +0x533
github.com/cockroachdb/cockroach/pkg/kv.(*TxnCoordSender).Send(0xc064986c00, 0x38b4fc0, 0xc0c7b809f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc095829000, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_coord_sender.go:650 +0x53b
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).sendUsingSender(0xc0004d0300, 0x38b4fc0, 0xc0c7b809c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:622 +0x119
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).Send(0xc0917ba1b0, 0x38b4fc0, 0xc0c7b809c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:804 +0x13c
github.com/cockroachdb/cockroach/pkg/sql/row.(*txnKVFetcher).fetch(0xc045b8b1e0, 0x38b4fc0, 0xc0c7b809c0, 0xc0003274a0, 0xfe15c186)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/kv_batch_fetcher.go:242 +0x626
github.com/cockroachdb/cockroach/pkg/sql/row.(*txnKVFetcher).nextBatch(0xc045b8b1e0, 0x38b4fc0, 0xc0c7b809c0, 0x0, 0x203031, 0xc000acddd0, 0xc0891cd0e0, 0xc00235ba80, 0x6e8b1f, 0xc0cd1a6af0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/kv_batch_fetcher.go:326 +0x1dd
github.com/cockroachdb/cockroach/pkg/sql/row.(*kvFetcher).nextKV(0xc066efe4d8, 0x38b4fc0, 0xc0c7b809c0, 0x30f53a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/kv_fetcher.go:71 +0x2ef
github.com/cockroachdb/cockroach/pkg/sql/row.(*Fetcher).NextKey(0xc066efe4a0, 0x38b4fc0, 0xc0c7b809c0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/fetcher.go:489 +0x82
github.com/cockroachdb/cockroach/pkg/sql/row.(*Fetcher).StartScanFrom(0xc066efe4a0, 0x38b4fc0, 0xc0c7b809c0, 0x388fac0, 0xc045b8b1e0, 0x0, 0xe14801)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/fetcher.go:479 +0x97
github.com/cockroachdb/cockroach/pkg/sql/row.(*Fetcher).StartScan(0xc066efe4a0, 0x38b4fc0, 0xc0c7b809c0, 0xc0917ba1b0, 0xc028871520, 0x7, 0x7, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/fetcher.go:470 +0x1e7
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.(*tableReader).Start(0xc066efe000, 0x38b4fc0, 0xc0c7b808d0, 0x59150c0, 0x2c20f20)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/tablereader.go:253 +0x271
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.(*ProcessorBase).Run(0xc066efe000, 0x38b4fc0, 0xc0c7b808d0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/processors.go:801 +0x52
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.(*Flow).Run(0xc0891cd0e0, 0x38b4fc0, 0xc0c7b808d0, 0x325cc58, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/flow.go:657 +0x209
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run(0xc0004d5b80, 0xc0bc2dd800, 0xc0917ba1b0, 0xc00235c968, 0xc0a2601340, 0xc0167b8518, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:251 +0x8a0
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).PlanAndRun(0xc0004d5b80, 0x38b4fc0, 0xc086d71ec0, 0xc0167b8518, 0xc0bc2dd800, 0xc0917ba1b0, 0x38b6a80, 0xc063190580, 0xc0a2601340)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:793 +0x200
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execWithDistSQLEngine(0xc0167b8000, 0x38b4fc0, 0xc086d71ec0, 0xc0167b8450, 0x3, 0x7f0fcb99a710, 0xc087321440, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1012 +0x26e
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).dispatchToExecutionEngine(0xc0167b8000, 0x38b4fc0, 0xc086d71ec0, 0x38ba980, 0xc0ba051480, 0xc0c7bc23c5, 0x83, 0x0, 0xc085b82b80, 0x3, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:856 +0x6bf
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState(0xc0167b8000, 0x38b4fc0, 0xc086d71ec0, 0x38ba980, 0xc0ba051480, 0xc0c7bc23c5, 0x83, 0x0, 0xc085b82b80, 0x3, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:437 +0xe85
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt(0xc0167b8000, 0x38b4fc0, 0xc086d71ec0, 0x38ba980, 0xc0ba051480, 0xc0c7bc23c5, 0x83, 0x0, 0xc085b82b80, 0x3, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:93 +0x36f
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run(0xc0167b8000, 0x38b4f00, 0xc00e7a84c0, 0xc0008b30b8, 0x5400, 0x15000, 0xc0008b3150, 0xc01b05ffe0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1231 +0x1439
github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn(0xc0006251e0, 0x38b4f00, 0xc00e7a84c0, 0xc0167b8000, 0x5400, 0x15000, 0xc0008b3150, 0xc01b05ffe0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:425 +0xce
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).serveImpl.func4(0xc0006251e0, 0x38b4f00, 0xc00e7a84c0, 0xc0167b8000, 0x5400, 0x15000, 0xc0008b3150, 0xc01b05ffe0, 0xc01b05fff0, 0xc051bed384)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:319 +0x81
created by github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).serveImpl
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:318 +0x1033

@tbg
Copy link
Member

tbg commented Jan 24, 2019

The best I can think of is that this is a data race in action. Are any of these objects pooled? Then that's something to look at.

@jordanlewis
Copy link
Member

Duplicate of #34241.

@awoods187 awoods187 added the S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors label Mar 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors
Projects
None yet
Development

No branches or pull requests

3 participants