Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/serverccl: TestTenantInstanceIDReclaimLoop failed #96414

Closed
cockroach-teamcity opened this issue Feb 2, 2023 · 2 comments · Fixed by #96446
Closed

ccl/serverccl: TestTenantInstanceIDReclaimLoop failed #96414

cockroach-teamcity opened this issue Feb 2, 2023 · 2 comments · Fixed by #96446
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Feb 2, 2023

ccl/serverccl.TestTenantInstanceIDReclaimLoop failed with artifacts on master @ 22244a780dcfaca48162dde8e0f90b5ba9b6bb9c:

Fatal error:

panic: concurrent write operations detected on file [recovered]
	panic: concurrent write operations detected on file

Stack:

goroutine 100409 [running]:
github.com/cockroachdb/pebble.(*DB).runCompaction.func1()
	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:2258 +0x23d
panic({0x7a364c0, 0xbebea90})
	GOROOT/src/runtime/panic.go:890 +0x262
github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).timeDiskOp(0xc007b49040, 0x2, 0xc00db3e558)
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_health.go:262 +0x1a5
github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).Sync(0xc007b49040)
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_health.go:219 +0x69
github.com/cockroachdb/pebble/vfs.(*enospcFile).Sync(0xc00b2669f0)
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_full.go:391 +0x6d
github.com/cockroachdb/pebble.(*DB).runCompaction(0xc006d36000, 0x11c, 0xc0028a2d00)
	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:2836 +0x312f
github.com/cockroachdb/pebble.(*DB).flush1(0xc006d36000)
	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:1685 +0x61d
github.com/cockroachdb/pebble.(*DB).flush.func1({0xbf0b230, 0xc00a7293e0})
	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:1624 +0x137
runtime/pprof.Do({0xbf0b1c0, 0xc0001b4008}, {{0xc00038dbe0?, 0x1a26f20?, 0xc006bca800?}}, 0xc005e3efa0)
	GOROOT/src/runtime/pprof/runtime.go:40 +0x123
github.com/cockroachdb/pebble.(*DB).flush(0xc006d36000)
	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:1617 +0x92
created by github.com/cockroachdb/pebble.(*DB).maybeScheduleFlush
	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:1533 +0x186
Log preceding fatal error

=== RUN   TestTenantInstanceIDReclaimLoop
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/c101b7a464a1afc1f5af0cd85792187e/logTestTenantInstanceIDReclaimLoop2273756182
    test_log_scope.go:79: use -show-logs to present logs inline

Parameters: TAGS=bazel,gss,race

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/multi-tenant @cockroachdb/server

This test on roachdash | Improve this report!

Jira issue: CRDB-24111

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Feb 2, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Feb 2, 2023
@knz
Copy link
Contributor

knz commented Feb 2, 2023

@jbowens this is a pebble race. Do you want to triage it?

@jbowens jbowens self-assigned this Feb 2, 2023
jbowens added a commit to jbowens/pebble that referenced this issue Feb 2, 2023
Add a facility for easily layering in the default VFS middleware—currently the
disk-health checking FS. Options.EnsureDefaults by default uses the disk-health
checking FS, but most of our tests explicitly set a VFS, and in particular a
*vfs.MemFS. These tests have always run without the disk-health checking
filesystem layer. Use the new WithFSDefaults method across many Pebble unit
tests and the Pebble metamorphic tests.

This is sufficient to surface the concurrent `Sync` operations observed in
cockroachdb/cockroach#96422 and cockroachdb/cockroach#96414. See cockroachdb#2282 for
context on where this panic is originating.
jbowens added a commit to jbowens/pebble that referenced this issue Feb 2, 2023
The file-level disk-health checker requires that a file not be used
concurrently, because it only supports timing a single in-flight operation at a
time. Pebble did not adhere to this contract for the data directory, which it
synced concurrently. This had the potential to leave a data directory Sync
untimed if an in-flight Syncs' timestamp was overwritten by the completion of
another Sync.

In cockroachdb#2282 we began checking for serialized writes in `invariants` builds. This
revealed these concurrent syncs in CockroachDB test failures under `-race`:
cockroachdb/cockroach#96414 and cockroachdb/cockroach#96422.
@jbowens
Copy link
Collaborator

jbowens commented Feb 2, 2023

Thanks @knz—fix is up in Pebble cockroachdb/pebble#2298. May take a bit to merge it and bump the cockroach Pebble version.

jbowens added a commit to cockroachdb/pebble that referenced this issue Feb 2, 2023
Add a facility for easily layering in the default VFS middleware—currently the
disk-health checking FS. Options.EnsureDefaults by default uses the disk-health
checking FS, but most of our tests explicitly set a VFS, and in particular a
*vfs.MemFS. These tests have always run without the disk-health checking
filesystem layer. Use the new WithFSDefaults method across many Pebble unit
tests and the Pebble metamorphic tests.

This is sufficient to surface the concurrent `Sync` operations observed in
cockroachdb/cockroach#96422 and cockroachdb/cockroach#96414. See #2282 for
context on where this panic is originating.
jbowens added a commit to cockroachdb/pebble that referenced this issue Feb 2, 2023
The file-level disk-health checker requires that a file not be used
concurrently, because it only supports timing a single in-flight operation at a
time. Pebble did not adhere to this contract for the data directory, which it
synced concurrently. This had the potential to leave a data directory Sync
untimed if an in-flight Syncs' timestamp was overwritten by the completion of
another Sync.

In #2282 we began checking for serialized writes in `invariants` builds. This
revealed these concurrent syncs in CockroachDB test failures under `-race`:
cockroachdb/cockroach#96414 and cockroachdb/cockroach#96422.
craig bot pushed a commit that referenced this issue Feb 3, 2023
95868: ui: set overview page node list data with stale tag if node is dead r=iAaronBuck a=iAaronBuck

71618: ui: set overview page node list data with stale tag if node is dead

![Screen Shot 2023-01-25 at 4 25 17 PM](https://user-images.githubusercontent.com/73749490/214694475-b946ec5b-0834-4cf0-86a2-cbe28ab529a7.png)
    
Issue:  [#71618](#71618)
Epic: None
    
Release note (ui change): Currently, the stale node metrics displayed to a user in the Cluster Overview Nodes Table may mislead users in to thinking that they are current values when in fact they are stale. This change rectifies that and adds a stale tag to metrics displayed to the user. This allows for users to be informed about the staleness of the data displayed to them regarding dead nodes.

96446: go.mod: bump Pebble to e9d3bb388ad6 r=RaduBerinde a=jbowens

```
e9d3bb38 vfs: handle concurrent directory Syncs in disk-health checking
917d3f3e db: add Options.WithFSDefaults
9fc4a208 db: flushable ingested sstable implementation
4a453f64 Revert "db: unflake TestArchiveCleaner"
31c33365 db: unflake TestArchiveCleaner
6f3bed0d pebble: minor cleanup around obsolete tables
35c90436 objstorage: add link-or-copy functionality
d443ab31 objstorage: use provider in table cache
0ff0f5d4 pebble: add a test for fatal message when table cache hits "no such file"
2367e8d7 sstable: introduce objstorage interface
59603de1 vfs: move Prefetch to vfs.File
654253a6 sstable: sort user-added range keys by suffix descending
```

Epic: None
Release note: None

Close #96414.
Close #96422.
Informs #96420.
Informs #96421.

Co-authored-by: Aaron Buck <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
@craig craig bot closed this as completed in f89ff0c Feb 3, 2023
jbowens added a commit to jbowens/pebble that referenced this issue Feb 6, 2023
Add a facility for easily layering in the default VFS middleware—currently the
disk-health checking FS. Options.EnsureDefaults by default uses the disk-health
checking FS, but most of our tests explicitly set a VFS, and in particular a
*vfs.MemFS. These tests have always run without the disk-health checking
filesystem layer. Use the new WithFSDefaults method across many Pebble unit
tests and the Pebble metamorphic tests.

This is sufficient to surface the concurrent `Sync` operations observed in
cockroachdb/cockroach#96422 and cockroachdb/cockroach#96414. See cockroachdb#2282 for
context on where this panic is originating.
jbowens added a commit to jbowens/pebble that referenced this issue Feb 6, 2023
The file-level disk-health checker requires that a file not be used
concurrently, because it only supports timing a single in-flight operation at a
time. Pebble did not adhere to this contract for the data directory, which it
synced concurrently. This had the potential to leave a data directory Sync
untimed if an in-flight Syncs' timestamp was overwritten by the completion of
another Sync.

In cockroachdb#2282 we began checking for serialized writes in `invariants` builds. This
revealed these concurrent syncs in CockroachDB test failures under `-race`:
cockroachdb/cockroach#96414 and cockroachdb/cockroach#96422.
jbowens added a commit to jbowens/pebble that referenced this issue Feb 6, 2023
Add a facility for easily layering in the default VFS middleware—currently the
disk-health checking FS. Options.EnsureDefaults by default uses the disk-health
checking FS, but most of our tests explicitly set a VFS, and in particular a
*vfs.MemFS. These tests have always run without the disk-health checking
filesystem layer. Use the new WithFSDefaults method across many Pebble unit
tests and the Pebble metamorphic tests.

This is sufficient to surface the concurrent `Sync` operations observed in
cockroachdb/cockroach#96422 and cockroachdb/cockroach#96414. See cockroachdb#2282 for
context on where this panic is originating.
jbowens added a commit to jbowens/pebble that referenced this issue Feb 6, 2023
The file-level disk-health checker requires that a file not be used
concurrently, because it only supports timing a single in-flight operation at a
time. Pebble did not adhere to this contract for the data directory, which it
synced concurrently. This had the potential to leave a data directory Sync
untimed if an in-flight Syncs' timestamp was overwritten by the completion of
another Sync.

In cockroachdb#2282 we began checking for serialized writes in `invariants` builds. This
revealed these concurrent syncs in CockroachDB test failures under `-race`:
cockroachdb/cockroach#96414 and cockroachdb/cockroach#96422.
jbowens added a commit to cockroachdb/pebble that referenced this issue Feb 6, 2023
Add a facility for easily layering in the default VFS middleware—currently the
disk-health checking FS. Options.EnsureDefaults by default uses the disk-health
checking FS, but most of our tests explicitly set a VFS, and in particular a
*vfs.MemFS. These tests have always run without the disk-health checking
filesystem layer. Use the new WithFSDefaults method across many Pebble unit
tests and the Pebble metamorphic tests.

This is sufficient to surface the concurrent `Sync` operations observed in
cockroachdb/cockroach#96422 and cockroachdb/cockroach#96414. See #2282 for
context on where this panic is originating.
jbowens added a commit to cockroachdb/pebble that referenced this issue Feb 6, 2023
The file-level disk-health checker requires that a file not be used
concurrently, because it only supports timing a single in-flight operation at a
time. Pebble did not adhere to this contract for the data directory, which it
synced concurrently. This had the potential to leave a data directory Sync
untimed if an in-flight Syncs' timestamp was overwritten by the completion of
another Sync.

In #2282 we began checking for serialized writes in `invariants` builds. This
revealed these concurrent syncs in CockroachDB test failures under `-race`:
cockroachdb/cockroach#96414 and cockroachdb/cockroach#96422.
jbowens added a commit to cockroachdb/pebble that referenced this issue Feb 6, 2023
Add a facility for easily layering in the default VFS middleware—currently the
disk-health checking FS. Options.EnsureDefaults by default uses the disk-health
checking FS, but most of our tests explicitly set a VFS, and in particular a
*vfs.MemFS. These tests have always run without the disk-health checking
filesystem layer. Use the new WithFSDefaults method across many Pebble unit
tests and the Pebble metamorphic tests.

This is sufficient to surface the concurrent `Sync` operations observed in
cockroachdb/cockroach#96422 and cockroachdb/cockroach#96414. See #2282 for
context on where this panic is originating.
jbowens added a commit to cockroachdb/pebble that referenced this issue Feb 6, 2023
The file-level disk-health checker requires that a file not be used
concurrently, because it only supports timing a single in-flight operation at a
time. Pebble did not adhere to this contract for the data directory, which it
synced concurrently. This had the potential to leave a data directory Sync
untimed if an in-flight Syncs' timestamp was overwritten by the completion of
another Sync.

In #2282 we began checking for serialized writes in `invariants` builds. This
revealed these concurrent syncs in CockroachDB test failures under `-race`:
cockroachdb/cockroach#96414 and cockroachdb/cockroach#96422.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants