Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: TestBootstrapNewStore failed #47057

Closed
cockroach-teamcity opened this issue Apr 5, 2020 · 0 comments · Fixed by #47063
Closed

server: TestBootstrapNewStore failed #47057

cockroach-teamcity opened this issue Apr 5, 2020 · 0 comments · Fixed by #47063
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(server).TestBootstrapNewStore failed on master@beac4a53e0e2e2236eb5957f67abc1bf476ad1b6:

Fatal error:

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x2c9eda2]

Stack:

goroutine 66226 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc0075eb4d0, 0x4372360, 0xc006cde3c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:183 +0x11f
panic(0x36c10e0, 0x62b47e0)
	/usr/local/go/src/runtime/panic.go:679 +0x1b2
github.com/cockroachdb/cockroach/pkg/sql.(*TemporaryObjectCleaner).doTemporaryObjectCleanup(0xc007a9ca80, 0x4372360, 0xc006cde3c0, 0xc00742b860, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/temporary_schema.go:507 +0x7e2
github.com/cockroachdb/cockroach/pkg/sql.(*TemporaryObjectCleaner).Start.func1(0x4372360, 0xc006cde3c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/temporary_schema.go:557 +0x36a
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc006ff1fa0, 0xc0075eb4d0, 0xc007769160)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:198 +0x13e
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:191 +0xa8

Log preceding fatal error

I200405 21:35:05.910505 64987 base/addr_validation.go:342  [n?] web UI certificate addresses: IP=127.0.0.1,::1; DNS=localhost,*.local; CN=node
W200405 21:35:05.940210 64987 server/status/runtime.go:308  [n?] Could not parse build timestamp: parsing time "" as "2006/01/02 15:04:05": cannot parse "" as "2006"
I200405 21:35:05.948407 64987 server/server.go:1083  [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled
I200405 21:35:05.948766 64987 storage/rocksdb.go:606  opening rocksdb instance at "/tmp/TestBootstrapNewStore904100588"
I200405 21:35:06.071599 64987 server/config.go:573  [n?] 3 storage engines initialized
I200405 21:35:06.071631 64987 server/config.go:576  [n?] RocksDB cache size: 128 MiB
I200405 21:35:06.071640 64987 server/config.go:576  [n?] store 0: RocksDB, max size 0 B, max open file limit 1043576
I200405 21:35:06.071649 64987 server/config.go:576  [n?] store 1: in-memory, size 0 B
I200405 21:35:06.071657 64987 server/config.go:576  [n?] store 2: in-memory, size 0 B
I200405 21:35:06.072957 64987 server/server.go:1129  [n?] Sleeping till wall time 1586122506072923167 to catches up to 1586122506448379356 to ensure monotonicity. Delta: 375.456189ms
I200405 21:35:06.448957 64987 gossip/gossip.go:395  [n1] NodeDescriptor set to node_id:1 address:<network_field:"tcp" address_field:"127.0.0.1:32937" > attrs:<> locality:<> ServerVersion:<major_val:19 minor_val:2 patch:0 unstable:16 > build_tag:"v20.1.0-beta.4-417-gbeac4a5" started_at:1586122506448860460 cluster_name:"" sql_address:<network_field:"tcp" address_field:"127.0.0.1:32971" > 
W200405 21:35:06.455404 66013 kv/kvserver/replica_range_lease.go:554  can't determine lease status due to node liveness error: node not in the liveness table
github.com/cockroachdb/cockroach/pkg/kv/kvserver.init
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/node_liveness.go:44
runtime.doInit
	/usr/local/go/src/runtime/proc.go:5222
runtime.doInit
	/usr/local/go/src/runtime/proc.go:5217
runtime.doInit
	/usr/local/go/src/runtime/proc.go:5217
runtime.main
	/usr/local/go/src/runtime/proc.go:190
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1357
W200405 21:35:06.455460 66013 kv/kvserver/store.go:1541  [n1,s1,r6/1:/Table/{SystemCon…-11}] could not gossip system config: [NotLeaseHolderError] r6: replica (n1,s1):1 not lease holder; lease holder unknown
I200405 21:35:06.466679 64987 server/node.go:425  [n1] initialized store [n1,s1]: disk (capacity=48 GiB, available=30 GiB, used=74 KiB, logicalBytes=79 KiB), ranges=32, leases=1, queries=0.00, writes=0.00, bytesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=146.00 p90=1808.00 pMax=33135.00}, writesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=0.00 p90=0.00 pMax=0.00}
I200405 21:35:06.467322 64987 kv/kvserver/stores.go:247  [n1] read 0 node addresses from persistent storage
I200405 21:35:06.468332 64987 server/node.go:642  [n1] connecting to gossip network to verify cluster ID...
I200405 21:35:06.468640 64987 server/node.go:662  [n1] node connected via gossip and verified as part of cluster "5436dde6-e0be-4022-b679-66efa22d6fbd"
I200405 21:35:06.477289 64987 server/node.go:617  [n1] bootstrapped store [n1,s2]
I200405 21:35:06.478461 64987 server/node.go:617  [n1] bootstrapped store [n1,s3]
I200405 21:35:06.478953 64987 server/node.go:509  [n1] node=1: started with [<no-attributes>=/tmp/TestBootstrapNewStore904100588 <no-attributes>=<in-mem> <no-attributes>=<in-mem>] engine(s) and attributes []
I200405 21:35:06.479365 64987 server/server.go:1660  [n1] starting https server at 127.0.0.1:41053 (use: 127.0.0.1:41053)
I200405 21:35:06.479643 64987 server/server.go:1665  [n1] starting postgres server at 127.0.0.1:32971 (use: 127.0.0.1:32971)
I200405 21:35:06.479927 64987 server/server.go:1667  [n1] starting grpc server at 127.0.0.1:32937
I200405 21:35:06.480211 64987 server/server.go:1668  [n1] advertising CockroachDB node at 127.0.0.1:32937
I200405 21:35:06.483154 66226 sql/temporary_schema.go:458  [n1] running temporary object cleanup background job
I200405 21:35:06.490393 64987 server/server.go:1801  [n1] done ensuring all necessary migrations have run
I200405 21:35:06.490418 64987 server/server.go:2037  [n1] serving sql connections
I200405 21:35:06.491009 64987 util/stop/stopper.go:539  quiescing
I200405 21:35:06.491188 66244 sqlmigrations/migrations.go:653  [n1] starting wait for upgrade finalization before schema change job migration
W200405 21:35:06.492068 66245 server/node.go:879  [n1] node=1: unable to log node_restart event: log-event: node unavailable; try another peer
I200405 21:35:06.494188 65855 kv/kvserver/queue.go:578  [n1,s1,r6/1:/Table/{SystemCon…-11}] rate limited in MaybeAdd (replicate): node unavailable; try another peer
I200405 21:35:06.494297 65855 kv/kvserver/queue.go:578  [n1,s1,r6/1:/Table/{SystemCon…-11}] rate limited in MaybeAdd (merge): node unavailable; try another peer
W200405 21:35:06.494407 65855 kv/kvserver/store.go:1730  [n1,s1,r6/1:/Table/{SystemCon…-11}] unable to gossip on capacity change: node unavailable; try another peer
W200405 21:35:06.495263 66235 kv/txn.go:603  [n1,liveness-hb] failure aborting transaction: node unavailable; try another peer; abort caused by: result is ambiguous (server shutdown)
I200405 21:35:06.495309 66235 kv/kvserver/node_liveness.go:804  [n1,liveness-hb] retrying liveness update after kvserver.errRetryLiveness: result is ambiguous (server shutdown)
W200405 21:35:06.495363 66235 kv/txn.go:603  [n1,liveness-hb] failure aborting transaction: node unavailable; try another peer; abort caused by: node unavailable; try another peer
W200405 21:35:06.495387 66235 kv/kvserver/node_liveness.go:471  [n1,liveness-hb] failed node liveness heartbeat: node unavailable; try another peer
I200405 21:35:06.495553 66226 sql/temporary_schema.go:492  [n1] found 0 temporary schemas

More

Parameters:

  • GOFLAGS=-json
make stressrace TESTS=TestBootstrapNewStore PKG=./pkg/server TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Apr 5, 2020
@cockroach-teamcity cockroach-teamcity added this to the 20.1 milestone Apr 5, 2020
@rohany rohany assigned otan and unassigned spencerkimball Apr 6, 2020
otan added a commit to otan-cockroach/cockroach that referenced this issue Apr 6, 2020
In `beac4a53e0e2e2236eb5957f67abc1bf476ad1b6`, we introduced
stopper.ShouldQuiesce() to the retry.Closer so that server shutdowns
also shut down in-process retries to the temp schema cleaner.

However, when stopper.ShouldQuiesce() is called, the error that gets
wrapped in `errors.Wrap` is nil (as ctx.Err() is nil), and as such we
return with no error set. This causes potentially bugs afterwards as
users of the functions expected errors when this happens and not to
continue silently.

This PR bridges that gap by always wrapping an error around cases where
WithMaxAttempt is aborted by a context attempt.

Resolves cockroachdb#47057.

Release note: None.
craig bot pushed a commit that referenced this issue Apr 6, 2020
47053: sql: add telemetry for statement diagnostics r=RaduBerinde a=RaduBerinde

Add two telemetry counters for statement diagnostics - one when triggered via
the UI, one for EXPLAIN ANALYZE (DEBUG).

Release note: None

47056: workload/schemachange: create new table 90% of the time r=spaskob a=spaskob

Release note (bug fix): we were using an existing table name 100% of
the time when creating a new table which resulted in no tables created.

47063: retry: fix retry.WithMaxAttempt to deal with opt.Closer properly r=knz a=otan

In `beac4a53e0e2e2236eb5957f67abc1bf476ad1b6`, we introduced
stopper.ShouldQuiesce() to the retry.Closer so that server shutdowns
also shut down in-process retries to the temp schema cleaner.

However, when stopper.ShouldQuiesce() is called, the error that gets
wrapped in `errors.Wrap` is nil (as ctx.Err() is nil), and as such we
return with no error set. This causes potentially bugs afterwards as
users of the functions expected errors when this happens and not to
continue silently.

This PR bridges that gap by always wrapping an error around cases where
WithMaxAttempt is aborted by a context attempt.

Resolves #47057.

Release note: None.

Co-authored-by: Radu Berinde <[email protected]>
Co-authored-by: Spas Bojanov <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
otan added a commit to otan-cockroach/cockroach that referenced this issue Apr 6, 2020
In `beac4a53e0e2e2236eb5957f67abc1bf476ad1b6`, we introduced
stopper.ShouldQuiesce() to the retry.Closer so that server shutdowns
also shut down in-process retries to the temp schema cleaner.

However, when stopper.ShouldQuiesce() is called, the error that gets
wrapped in `errors.Wrap` is nil (as ctx.Err() is nil), and as such we
return with no error set. This causes potentially bugs afterwards as
users of the functions expected errors when this happens and not to
continue silently.

This PR bridges that gap by always wrapping an error around cases where
WithMaxAttempt is aborted by a context attempt.

Resolves cockroachdb#47057.

Release note: None.
@craig craig bot closed this as completed in c1df412 Apr 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants