Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testutils/testcluster: TestRestart failed #111674

Closed
cockroach-teamcity opened this issue Oct 3, 2023 · 3 comments · Fixed by #111776
Closed

testutils/testcluster: TestRestart failed #111674

cockroach-teamcity opened this issue Oct 3, 2023 · 3 comments · Fixed by #111776
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-testeng TestEng Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Oct 3, 2023

testutils/testcluster.TestRestart failed with artifacts on master @ 765bda989b6b438b1d552b7d93526ffca5a31923:

        	            	  | 	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1670
        	            	  | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.TestRestart
        	            	  | 	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster_test.go:369
        	            	  | testing.tRunner
        	            	  | 	GOROOT/src/testing/testing.go:1576
        	            	  | runtime.goexit
        	            	  | 	src/runtime/asm_amd64.s:1598
        	            	Wraps: (4) secondary error attachment
        	            	  | file 000544 (type 2) unknown to the objstorage provider: file does not exist
        	            	  | (1) attached stack trace
        	            	  |   -- stack trace:
        	            	  |   | github.com/cockroachdb/pebble/objstorage/objstorageprovider.(*provider).Lookup
        	            	  |   | 	github.com/cockroachdb/pebble/objstorage/objstorageprovider/external/com_github_cockroachdb_pebble/objstorage/objstorageprovider/provider.go:404
        	            	  |   | github.com/cockroachdb/pebble.checkConsistency
        	            	  |   | 	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/open.go:1142
        	            	  |   | github.com/cockroachdb/pebble.Open
        	            	  |   | 	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/open.go:327
        	            	  |   | github.com/cockroachdb/cockroach/pkg/storage.NewPebble
        	            	  |   | 	github.com/cockroachdb/cockroach/pkg/storage/pebble.go:1279
        	            	  |   | github.com/cockroachdb/cockroach/pkg/storage.Open
        	            	  |   | 	github.com/cockroachdb/cockroach/pkg/storage/open.go:306
        	            	  |   | github.com/cockroachdb/cockroach/pkg/server.(*Config).CreateEngines
        	            	  |   | 	github.com/cockroachdb/cockroach/pkg/server/config.go:850
        	            	  |   | github.com/cockroachdb/cockroach/pkg/server.NewServer
        	            	  |   | 	github.com/cockroachdb/cockroach/pkg/server/server.go:294
        	            	  |   | github.com/cockroachdb/cockroach/pkg/server.testServerFactoryImpl.New
        	            	  |   | 	github.com/cockroachdb/cockroach/pkg/server/testserver.go:2352
        	            	  |   | github.com/cockroachdb/cockroach/pkg/testutils/serverutils.NewServer
        	            	  |   | 	github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_server_shim.go:250
        	            	  |   | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).RestartServerWithInspect
        	            	  |   | 	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1736
        	            	  |   | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).RestartServer
        	            	  |   | 	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1680
        	            	  |   | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).Restart
        	            	  |   | 	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1670
        	            	  |   | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.TestRestart
        	            	  |   | 	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster_test.go:369
        	            	  |   | testing.tRunner
        	            	  |   | 	GOROOT/src/testing/testing.go:1576
        	            	  |   | runtime.goexit
        	            	  |   | 	src/runtime/asm_amd64.s:1598
        	            	  | Wraps: (2) file 000544 (type 2) unknown to the objstorage provider
        	            	  | Wraps: (3) file does not exist
        	            	  | Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString
        	            	Wraps: (5) L6: 000544: file 000544 (type 2) unknown to the objstorage provider: file does not exist
        	            	Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *secondary.withSecondaryError (5) *errutil.leafError
        	Test:       	TestRestart
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/20b387a8d637ed156700095e4968eb33/logTestRestart982583199
--- FAIL: TestRestart (29.26s)

Parameters: TAGS=bazel,gss,deadlock , stress=true

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-32012

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-testeng TestEng Team labels Oct 3, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.2 milestone Oct 3, 2023
@renatolabs
Copy link
Contributor

        	Error Trace:	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster_test.go:369
        	Error:      	Received unexpected error:
        	            	failed to create engines: L6: 000544: file 000544 (type 2) unknown to the objstorage provider: file does not exist

Unfortunately, I was not able to reproduce this locally under stress. Since the error message and stack trace are coming from Pebble, I'm pinging @cockroachdb/storage on this one -- can you take a look, please?

@itsbilal
Copy link
Contributor

itsbilal commented Oct 4, 2023

Looking at the pebble logs for that run, this is interesting, 000544 (assuming the crash is on n3,s3) is created by an excise:

I231003 19:29:09.157654 13072731 3@pebble/event.go:739 â‹® [n3,s3,pebble] 2825 [JOB 386] ingested L0:000536 (1.3KB), L0:000541 (1.0KB), L0:000537 (1.5KB), L0:000538 (1.1KB), L0:000539 (1.0KB), L6:000544 (7.2KB), L6:000545 (7.2KB)

Then there’s this line when we start up again (unclear what node/store this is though):

I231003 19:29:12.698713 13399173 3@pebble/event.go:735 â‹® [n?,s?,pebble] 3527 [JOB 3] sstable deleted 000544

Maybe the code to check for obsolete ssts on Open isn’t accounting for virtual sstables? But its' entirely possible it's something else, as I'm stringing together the above theory based on multiple assumptions on the crashing node/store

@itsbilal
Copy link
Contributor

itsbilal commented Oct 4, 2023

This is actually cockroachdb/pebble#2947 . A vendor bump of Pebble will fix this.

craig bot pushed a commit that referenced this issue Oct 5, 2023
110930: gcjob_test: add more logging to TestGCJobRetry r=rafiss a=andyyang890

This patch adds more logging to `TestGCJobRetry` to help debug
occasional flakes.

Informs: #110447

Release note: None

111510: api: increase timeout to request execution details r=maryliag a=adityamaru

In large clusters requesting execution details can definitely take longer than 5 seconds. This is because it involves collecting cluster wide traces, goroutines and contacting the coordinator node of the job to dump its execution details. Since this is a debug only tool we bump the timeout to 300s to give it all the time it needs.

Epic: none
Release note: None

111701: sql: add tests for privileges for statements in udfs r=rharding6373 a=rharding6373

This PR adds test coverage for privileges in UDFs, e.g., SELECT and INSERT privileges.

Epic: CRDB-25388
Informs: #87289

Release note: None

111704: Roachtest azure nightly r=healthy-pod,herkolategan a=smg260

These are a series of commits to enable roachtests to run on Azure in TeamCity. 

1. Add the relevant teamcity invoke script
2. Update authentication to look in CLI or environment for dev and TC respectively
3. Look for a default subscription in the environment, with fallback to existing "pick first" implementation
4. Add a security rule to allow roachtest host machine to connect to a vm via kafka admin
5. Update azure default location to one with more quota and `apt-get update` before installing go for a cdc test (failed on azure)

A follow up PR will enable an initial set of roachtests to run.

Epic: CC-25185
Release note: none



111776: go.mod: bump Pebble to b013ca78e9dc r=RaduBerinde a=RaduBerinde

b013ca78 db: keep track of virtual sstable size sum
62251e69 db: make checkpoint test even more deterministic
c7c47d6b db: turn testingAlwaysWaitForCleanup into an option
a05b0192 db: keep track of virtual sstable count in metrics
3c778710 db: add test for virtual sstable checkpointing
cb4dab66 db: add metrics for num backing sstables and size
8317cf38 db: incrementally keep tracking of backing table size
0f80e184 Update index.html
aa077af6 go.mod: specify Go 1.20
ccb9a7dc manifest: add invariant check for duplicate file backings
699fc0e8 db: only create one CreatedBackingTables entry per sstable
b2da10c6 db: remove trailing whitespace from compacting log line
1d696c79 db: cleanup btree obsoletion logic

Fixes: #111674
Release note: none

Co-authored-by: Andy Yang <[email protected]>
Co-authored-by: adityamaru <[email protected]>
Co-authored-by: rharding6373 <[email protected]>
Co-authored-by: Miral Gadani <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
@craig craig bot closed this as completed in d17831d Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-testeng TestEng Team
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants