Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: TestPebbleMetricEventListener failed #97843

Closed
cockroach-teamcity opened this issue Mar 1, 2023 · 6 comments
Closed

storage: TestPebbleMetricEventListener failed #97843

cockroach-teamcity opened this issue Mar 1, 2023 · 6 comments
Assignees
Labels
branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-storage Storage Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Mar 1, 2023

storage.TestPebbleMetricEventListener failed with artifacts on release-22.1 @ 3eda994b45adc98d9c6e972138ad5abb6fdeddea:

=== RUN   TestPebbleMetricEventListener
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/6daa2f8e166e40f81ba9993d3b469ef7/logTestPebbleMetricEventListener359695949
    test_log_scope.go:80: use -show-logs to present logs inline
    pebble_test.go:490: -- test log scope end --
    test_log_scope.go:406: unlinkat /artifacts/tmp/_tmp/6daa2f8e166e40f81ba9993d3b469ef7/logTestPebbleMetricEventListener359695949: directory not empty
--- FAIL: TestPebbleMetricEventListener (0.01s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

Jira issue: CRDB-24915

@cockroach-teamcity cockroach-teamcity added branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Mar 1, 2023
@cockroach-teamcity cockroach-teamcity added this to the 22.1 milestone Mar 1, 2023
@blathers-crl blathers-crl bot added the T-storage Storage Team label Mar 1, 2023
@nicktrav
Copy link
Collaborator

nicktrav commented Mar 1, 2023

This repros pretty quickly on the 22.1 release branch:

$ ./dev test ./pkg/storage --filter TestPebbleMetricEventListener --stress
...
--- FAIL: TestPebbleMetricEventListener (0.02s)
    test_log_scope.go:79: test logs captured to: /tmp/cockroach/_tmp/c5eb8fc8b8e683c19f3c3e4238f64094/logTestPebbleMetricEventListener898986826
    test_log_scope.go:80: use -show-logs to present logs inline
    pebble_test.go:490: -- test log scope end --
    test_log_scope.go:406: unlinkat /tmp/cockroach/_tmp/c5eb8fc8b8e683c19f3c3e4238f64094/logTestPebbleMetricEventListener898986826: directory not empty
FAIL
I230301 15:05:36.861379 1 (gostd) testmain.go:536  [-] 1  Test //pkg/storage:storage_test exited with error code 1


ERROR: <nil>

71 runs completed, 1 failures, over 0s
context canceled
FAIL

@cockroach-teamcity
Copy link
Member Author

storage.TestPebbleMetricEventListener failed with artifacts on release-22.1 @ 3f5627115601dcb7d5ecc125973bb0de5a715030:

=== RUN   TestPebbleMetricEventListener
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/6daa2f8e166e40f81ba9993d3b469ef7/logTestPebbleMetricEventListener523144325
    test_log_scope.go:80: use -show-logs to present logs inline
    pebble_test.go:490: -- test log scope end --

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

storage.TestPebbleMetricEventListener failed with artifacts on release-22.1 @ 14204b0e190f146ca46f2cc16f15d0949fc97bbd:

=== RUN   TestPebbleMetricEventListener
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/6daa2f8e166e40f81ba9993d3b469ef7/logTestPebbleMetricEventListener2829042837
    test_log_scope.go:80: use -show-logs to present logs inline
    pebble_test.go:490: -- test log scope end --

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

storage.TestPebbleMetricEventListener failed with artifacts on release-22.1 @ d21a4a188562e73172c66fa9d57eb1baa6ab149c:

=== RUN   TestPebbleMetricEventListener
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/6daa2f8e166e40f81ba9993d3b469ef7/logTestPebbleMetricEventListener3257151422
    test_log_scope.go:80: use -show-logs to present logs inline
    pebble_test.go:490: -- test log scope end --
    test_log_scope.go:406: unlinkat /artifacts/tmp/_tmp/6daa2f8e166e40f81ba9993d3b469ef7/logTestPebbleMetricEventListener3257151422: directory not empty
--- FAIL: TestPebbleMetricEventListener (0.01s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

storage.TestPebbleMetricEventListener failed with artifacts on release-22.1 @ d078a812ae8a61a7f63ac03fcf43189e3acacaa5:

=== RUN   TestPebbleMetricEventListener
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/6daa2f8e166e40f81ba9993d3b469ef7/logTestPebbleMetricEventListener3618786650
    test_log_scope.go:80: use -show-logs to present logs inline
    pebble_test.go:490: -- test log scope end --
    test_log_scope.go:406: unlinkat /artifacts/tmp/_tmp/6daa2f8e166e40f81ba9993d3b469ef7/logTestPebbleMetricEventListener3618786650: directory not empty
--- FAIL: TestPebbleMetricEventListener (0.01s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

jbowens added a commit to jbowens/cockroach that referenced this issue Mar 27, 2023
Informs cockroachdb#97843.
Release note: None.
Release justification: Non-production code changes.
jbowens added a commit to jbowens/cockroach that referenced this issue Mar 28, 2023
Informs cockroachdb#97843.
Release note: None.
Release justification: Non-production code changes.
@jbowens jbowens self-assigned this May 22, 2023
@jbowens
Copy link
Collaborator

jbowens commented May 31, 2023

This is a benign Engine.Close race. This patch fixes it:

diff --git a/pkg/storage/pebble.go b/pkg/storage/pebble.go
index 5a7203b59da..3b16ede5da9 100644
--- a/pkg/storage/pebble.go
+++ b/pkg/storage/pebble.go
@@ -656,6 +656,8 @@ type Pebble struct {
 	// with the filesyetem
 	closer io.Closer
 
+	closeWait sync.WaitGroup
+
 	wrappedIntentWriter intentDemuxWriter
 
 	storeIDPebbleLog *base.StoreIDContainer
@@ -896,7 +898,11 @@ func NewPebble(ctx context.Context, cfg PebbleConfig) (p *Pebble, err error) {
 	oldDiskSlow := lel.DiskSlow
 	lel.DiskSlow = func(info pebble.DiskSlowInfo) {
 		// Run oldDiskSlow asynchronously.
-		go oldDiskSlow(info)
+		p.closeWait.Add(1)
+		go func() {
+			defer p.closeWait.Done()
+			oldDiskSlow(info)
+		}()
 	}
 	cfg.Opts.EventListener = pebble.TeeEventListener(
 		p.makeMetricEtcEventListener(ctx),
@@ -976,7 +982,11 @@ func (p *Pebble) makeMetricEtcEventListener(ctx context.Context) pebble.EventLis
 					log.Fatalf(ctx, "disk stall detected: pebble unable to write to %s in %.2f seconds",
 						info.Path, redact.Safe(info.Duration.Seconds()))
 				} else {
-					go log.Errorf(ctx, "file write stall detected: %s", info)
+					p.closeWait.Add(1)
+					go func() {
+						defer p.closeWait.Done()
+						log.Errorf(ctx, "file write stall detected: %s", info)
+					}()
 				}
 				return
 			}
@@ -1015,6 +1025,7 @@ func (p *Pebble) Close() {
 		return
 	}
 	p.closed = true
+	p.closeWait.Wait()
 	_ = p.db.Close()
 	if p.fileRegistry != nil {
 		_ = p.fileRegistry.Close()

but seeing as this branch doesn't even get CI anymore, there's no point.

I checked master, and we do something similar to above:

// async launches the provided function in a new goroutine. It uses a wait group
// to synchronize with (*Pebble).Close to ensure all launched goroutines have
// exited before Close returns.
func (p *Pebble) async(fn func()) {
	p.asyncDone.Add(1)
	go func() {
		defer p.asyncDone.Done()
		fn()
	}()
}

Seems like the backport just missed that? Closing out.

@jbowens jbowens closed this as not planned Won't fix, can't repro, duplicate, stale May 31, 2023
@jbowens jbowens moved this to Done in [Deprecated] Storage Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-storage Storage Team
Projects
Archived in project
Development

No branches or pull requests

3 participants