Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: TestMVCCHistories failed #96422

Closed
cockroach-teamcity opened this issue Feb 2, 2023 · 0 comments · Fixed by #96446
Closed

storage: TestMVCCHistories failed #96422

cockroach-teamcity opened this issue Feb 2, 2023 · 0 comments · Fixed by #96446
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-storage Storage Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Feb 2, 2023

storage.TestMVCCHistories failed with artifacts on master @ c4257c934858dcdb54cec514c1d5642d4992d5c2:

Fatal error:

panic: concurrent write operations detected on file [recovered]
	panic: concurrent write operations detected on file

Stack:

goroutine 9791 [running]:
github.com/cockroachdb/pebble.(*DB).runCompaction.func1()
	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:2258 +0x23d
panic({0x27ac820, 0x31d3a00})
	GOROOT/src/runtime/panic.go:890 +0x262
        run [1 args]
        with t=E
          check_intent k=o
        ----
        error: (*withstack.withStack:) meta: "o" -> expected intent, found none
github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).timeDiskOp(0xc000964050, 0x2, 0xc000b7a508)
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_health.go:262 +0x1a5
github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).Sync(0xc000964050)
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_health.go:219 +0x69
github.com/cockroachdb/pebble/vfs.(*enospcFile).Sync(0xc00037a810)
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_full.go:391 +0x6d
github.com/cockroachdb/pebble.(*DB).runCompaction(0xc0007f8a00, 0xa9, 0xc000804400)
	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:2836 +0x312f
github.com/cockroachdb/pebble.(*DB).compact1(0xc0007f8a00, 0xc000804400, 0x0)
	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:2199 +0x31d
github.com/cockroachdb/pebble.(*DB).compact.func1({0x31f14d8, 0xc000ad0d20})
	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:2170 +0xd6
runtime/pprof.Do({0x31f1468, 0xc0000620a8}, {{0xc000301080?, 0xc0004aebd0?, 0x494565?}}, 0xc000675778)
	GOROOT/src/runtime/pprof/runtime.go:40 +0x123
github.com/cockroachdb/pebble.(*DB).compact(0xc0007f8a00, 0xc000804400, 0x0)
	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:2167 +0xb6
created by github.com/cockroachdb/pebble.(*DB).maybeScheduleCompactionPicker
	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:1888 +0x1105
Log preceding fatal error

        data: "k/20"/11.000000000,0 -> /BYTES/20
        meta: "k/30"/0,0 -> txn={id=00000000 key=/Min pri=0.00000000 epo=0 ts=11.000000000,0 min=0,0 seq=30} ts=11.000000000,0 del=false klen=12 vlen=7 mergeTs=<nil> txnDidNotUpdateMeta=true
        data: "k/30"/11.000000000,0 -> /BYTES/30
        data: "m"/30.000000000,0 -> /BYTES/a
        meta: "n"/0,0 -> txn={id=00000000 key=/Min pri=0.00000000 epo=0 ts=50.000000000,0 min=0,0 seq=30} ts=50.000000000,0 del=false klen=12 vlen=6 ih={{10 /BYTES/a}{20 /BYTES/b}} mergeTs=<nil> txnDidNotUpdateMeta=false
        data: "n"/50.000000000,0 -> /BYTES/c
        data: "n"/45.000000000,0 -> {localTs=40.000000000,0}/BYTES/c
        meta: "o"/0,0 -> txn={id=00000000 key=/Min pri=0.00000000 epo=0 ts=50.000000000,0 min=0,0 seq=30} ts=50.000000000,0 del=false klen=12 vlen=6 ih={{10 /BYTES/a}{20 /BYTES/b}} mergeTs=<nil> txnDidNotUpdateMeta=false
        data: "o"/50.000000000,0 -> /BYTES/c
    mvcc_history_test.go:353: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/2210/execroot/com_github_cockroachdb_cockroach/bazel-out/k8-fastbuild/bin/pkg/storage/storage_test_/storage_test.runfiles/com_github_cockroachdb_cockroach/pkg/storage/testdata/mvcc_histories/ignored_seq_nums:476:
        run [1 args]
        with t=E
          txn_ignore_seqs seqs=(5-35)
          get             k=n
          get             k=o
          resolve_intent  k=n status=PENDING
          resolve_intent  k=o status=PENDING
        ----
        get: "n" -> /BYTES/c @45.000000000,0
        get: "o" -> <no data>
        >> at end:
        txn: "E" meta={id=00000000 key=/Min pri=0.00000000 epo=0 ts=50.000000000,0 min=0,0 seq=30} lock=true stat=PENDING rts=50.000000000,0 wto=false gul=0,0 isn=1
        data: "k"/14.000000000,0 -> {localTs=11.000000000,0}/BYTES/b
        meta: "k/10"/0,0 -> txn={id=00000000 key=/Min pri=0.00000000 epo=0 ts=11.000000000,0 min=0,0 seq=10} ts=11.000000000,0 del=false klen=12 vlen=7 mergeTs=<nil> txnDidNotUpdateMeta=true
        data: "k/10"/11.000000000,0 -> /BYTES/10
        meta: "k/20"/0,0 -> txn={id=00000000 key=/Min pri=0.00000000 epo=0 ts=11.000000000,0 min=0,0 seq=20} ts=11.000000000,0 del=false klen=12 vlen=7 mergeTs=<nil> txnDidNotUpdateMeta=true
        data: "k/20"/11.000000000,0 -> /BYTES/20
        meta: "k/30"/0,0 -> txn={id=00000000 key=/Min pri=0.00000000 epo=0 ts=11.000000000,0 min=0,0 seq=30} ts=11.000000000,0 del=false klen=12 vlen=7 mergeTs=<nil> txnDidNotUpdateMeta=true
        data: "k/30"/11.000000000,0 -> /BYTES/30
        data: "m"/30.000000000,0 -> /BYTES/a
        data: "n"/45.000000000,0 -> {localTs=40.000000000,0}/BYTES/c
    mvcc_history_test.go:353: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/2210/execroot/com_github_cockroachdb_cockroach/bazel-out/k8-fastbuild/bin/pkg/storage/storage_test_/storage_test.runfiles/com_github_cockroachdb_cockroach/pkg/storage/testdata/mvcc_histories/ignored_seq_nums:498:
        run [1 args]
        with t=E
          get k=n
          get k=o
        ----
        get: "n" -> /BYTES/c @45.000000000,0
        get: "o" -> <no data>
    mvcc_history_test.go:353: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/2210/execroot/com_github_cockroachdb_cockroach/bazel-out/k8-fastbuild/bin/pkg/storage/storage_test_/storage_test.runfiles/com_github_cockroachdb_cockroach/pkg/storage/testdata/mvcc_histories/ignored_seq_nums:508:
        run [1 args]
        with t=E
          check_intent k=n
        ----
        error: (*withstack.withStack:) meta: "n" -> expected intent, found none
    mvcc_history_test.go:353: 
        /home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/2210/execroot/com_github_cockroachdb_cockroach/bazel-out/k8-fastbuild/bin/pkg/storage/storage_test_/storage_test.runfiles/com_github_cockroachdb_cockroach/pkg/storage/testdata/mvcc_histories/ignored_seq_nums:514:

Parameters: TAGS=bazel,gss,race

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

Jira issue: CRDB-24117

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Feb 2, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Feb 2, 2023
@blathers-crl blathers-crl bot added the T-storage Storage Team label Feb 2, 2023
@jbowens jbowens self-assigned this Feb 2, 2023
jbowens added a commit to jbowens/pebble that referenced this issue Feb 2, 2023
Add a facility for easily layering in the default VFS middleware—currently the
disk-health checking FS. Options.EnsureDefaults by default uses the disk-health
checking FS, but most of our tests explicitly set a VFS, and in particular a
*vfs.MemFS. These tests have always run without the disk-health checking
filesystem layer. Use the new WithFSDefaults method across many Pebble unit
tests and the Pebble metamorphic tests.

This is sufficient to surface the concurrent `Sync` operations observed in
cockroachdb/cockroach#96422 and cockroachdb/cockroach#96414. See cockroachdb#2282 for
context on where this panic is originating.
jbowens added a commit to jbowens/pebble that referenced this issue Feb 2, 2023
The file-level disk-health checker requires that a file not be used
concurrently, because it only supports timing a single in-flight operation at a
time. Pebble did not adhere to this contract for the data directory, which it
synced concurrently. This had the potential to leave a data directory Sync
untimed if an in-flight Syncs' timestamp was overwritten by the completion of
another Sync.

In cockroachdb#2282 we began checking for serialized writes in `invariants` builds. This
revealed these concurrent syncs in CockroachDB test failures under `-race`:
cockroachdb/cockroach#96414 and cockroachdb/cockroach#96422.
jbowens added a commit to cockroachdb/pebble that referenced this issue Feb 2, 2023
Add a facility for easily layering in the default VFS middleware—currently the
disk-health checking FS. Options.EnsureDefaults by default uses the disk-health
checking FS, but most of our tests explicitly set a VFS, and in particular a
*vfs.MemFS. These tests have always run without the disk-health checking
filesystem layer. Use the new WithFSDefaults method across many Pebble unit
tests and the Pebble metamorphic tests.

This is sufficient to surface the concurrent `Sync` operations observed in
cockroachdb/cockroach#96422 and cockroachdb/cockroach#96414. See #2282 for
context on where this panic is originating.
jbowens added a commit to cockroachdb/pebble that referenced this issue Feb 2, 2023
The file-level disk-health checker requires that a file not be used
concurrently, because it only supports timing a single in-flight operation at a
time. Pebble did not adhere to this contract for the data directory, which it
synced concurrently. This had the potential to leave a data directory Sync
untimed if an in-flight Syncs' timestamp was overwritten by the completion of
another Sync.

In #2282 we began checking for serialized writes in `invariants` builds. This
revealed these concurrent syncs in CockroachDB test failures under `-race`:
cockroachdb/cockroach#96414 and cockroachdb/cockroach#96422.
craig bot pushed a commit that referenced this issue Feb 3, 2023
95868: ui: set overview page node list data with stale tag if node is dead r=iAaronBuck a=iAaronBuck

71618: ui: set overview page node list data with stale tag if node is dead

![Screen Shot 2023-01-25 at 4 25 17 PM](https://user-images.githubusercontent.com/73749490/214694475-b946ec5b-0834-4cf0-86a2-cbe28ab529a7.png)
    
Issue:  [#71618](#71618)
Epic: None
    
Release note (ui change): Currently, the stale node metrics displayed to a user in the Cluster Overview Nodes Table may mislead users in to thinking that they are current values when in fact they are stale. This change rectifies that and adds a stale tag to metrics displayed to the user. This allows for users to be informed about the staleness of the data displayed to them regarding dead nodes.

96446: go.mod: bump Pebble to e9d3bb388ad6 r=RaduBerinde a=jbowens

```
e9d3bb38 vfs: handle concurrent directory Syncs in disk-health checking
917d3f3e db: add Options.WithFSDefaults
9fc4a208 db: flushable ingested sstable implementation
4a453f64 Revert "db: unflake TestArchiveCleaner"
31c33365 db: unflake TestArchiveCleaner
6f3bed0d pebble: minor cleanup around obsolete tables
35c90436 objstorage: add link-or-copy functionality
d443ab31 objstorage: use provider in table cache
0ff0f5d4 pebble: add a test for fatal message when table cache hits "no such file"
2367e8d7 sstable: introduce objstorage interface
59603de1 vfs: move Prefetch to vfs.File
654253a6 sstable: sort user-added range keys by suffix descending
```

Epic: None
Release note: None

Close #96414.
Close #96422.
Informs #96420.
Informs #96421.

Co-authored-by: Aaron Buck <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
@craig craig bot closed this as completed in f89ff0c Feb 3, 2023
jbowens added a commit to jbowens/pebble that referenced this issue Feb 6, 2023
Add a facility for easily layering in the default VFS middleware—currently the
disk-health checking FS. Options.EnsureDefaults by default uses the disk-health
checking FS, but most of our tests explicitly set a VFS, and in particular a
*vfs.MemFS. These tests have always run without the disk-health checking
filesystem layer. Use the new WithFSDefaults method across many Pebble unit
tests and the Pebble metamorphic tests.

This is sufficient to surface the concurrent `Sync` operations observed in
cockroachdb/cockroach#96422 and cockroachdb/cockroach#96414. See cockroachdb#2282 for
context on where this panic is originating.
jbowens added a commit to jbowens/pebble that referenced this issue Feb 6, 2023
The file-level disk-health checker requires that a file not be used
concurrently, because it only supports timing a single in-flight operation at a
time. Pebble did not adhere to this contract for the data directory, which it
synced concurrently. This had the potential to leave a data directory Sync
untimed if an in-flight Syncs' timestamp was overwritten by the completion of
another Sync.

In cockroachdb#2282 we began checking for serialized writes in `invariants` builds. This
revealed these concurrent syncs in CockroachDB test failures under `-race`:
cockroachdb/cockroach#96414 and cockroachdb/cockroach#96422.
jbowens added a commit to jbowens/pebble that referenced this issue Feb 6, 2023
Add a facility for easily layering in the default VFS middleware—currently the
disk-health checking FS. Options.EnsureDefaults by default uses the disk-health
checking FS, but most of our tests explicitly set a VFS, and in particular a
*vfs.MemFS. These tests have always run without the disk-health checking
filesystem layer. Use the new WithFSDefaults method across many Pebble unit
tests and the Pebble metamorphic tests.

This is sufficient to surface the concurrent `Sync` operations observed in
cockroachdb/cockroach#96422 and cockroachdb/cockroach#96414. See cockroachdb#2282 for
context on where this panic is originating.
jbowens added a commit to jbowens/pebble that referenced this issue Feb 6, 2023
The file-level disk-health checker requires that a file not be used
concurrently, because it only supports timing a single in-flight operation at a
time. Pebble did not adhere to this contract for the data directory, which it
synced concurrently. This had the potential to leave a data directory Sync
untimed if an in-flight Syncs' timestamp was overwritten by the completion of
another Sync.

In cockroachdb#2282 we began checking for serialized writes in `invariants` builds. This
revealed these concurrent syncs in CockroachDB test failures under `-race`:
cockroachdb/cockroach#96414 and cockroachdb/cockroach#96422.
jbowens added a commit to cockroachdb/pebble that referenced this issue Feb 6, 2023
Add a facility for easily layering in the default VFS middleware—currently the
disk-health checking FS. Options.EnsureDefaults by default uses the disk-health
checking FS, but most of our tests explicitly set a VFS, and in particular a
*vfs.MemFS. These tests have always run without the disk-health checking
filesystem layer. Use the new WithFSDefaults method across many Pebble unit
tests and the Pebble metamorphic tests.

This is sufficient to surface the concurrent `Sync` operations observed in
cockroachdb/cockroach#96422 and cockroachdb/cockroach#96414. See #2282 for
context on where this panic is originating.
jbowens added a commit to cockroachdb/pebble that referenced this issue Feb 6, 2023
The file-level disk-health checker requires that a file not be used
concurrently, because it only supports timing a single in-flight operation at a
time. Pebble did not adhere to this contract for the data directory, which it
synced concurrently. This had the potential to leave a data directory Sync
untimed if an in-flight Syncs' timestamp was overwritten by the completion of
another Sync.

In #2282 we began checking for serialized writes in `invariants` builds. This
revealed these concurrent syncs in CockroachDB test failures under `-race`:
cockroachdb/cockroach#96414 and cockroachdb/cockroach#96422.
jbowens added a commit to cockroachdb/pebble that referenced this issue Feb 6, 2023
Add a facility for easily layering in the default VFS middleware—currently the
disk-health checking FS. Options.EnsureDefaults by default uses the disk-health
checking FS, but most of our tests explicitly set a VFS, and in particular a
*vfs.MemFS. These tests have always run without the disk-health checking
filesystem layer. Use the new WithFSDefaults method across many Pebble unit
tests and the Pebble metamorphic tests.

This is sufficient to surface the concurrent `Sync` operations observed in
cockroachdb/cockroach#96422 and cockroachdb/cockroach#96414. See #2282 for
context on where this panic is originating.
jbowens added a commit to cockroachdb/pebble that referenced this issue Feb 6, 2023
The file-level disk-health checker requires that a file not be used
concurrently, because it only supports timing a single in-flight operation at a
time. Pebble did not adhere to this contract for the data directory, which it
synced concurrently. This had the potential to leave a data directory Sync
untimed if an in-flight Syncs' timestamp was overwritten by the completion of
another Sync.

In #2282 we began checking for serialized writes in `invariants` builds. This
revealed these concurrent syncs in CockroachDB test failures under `-race`:
cockroachdb/cockroach#96414 and cockroachdb/cockroach#96422.
@jbowens jbowens moved this to Done in [Deprecated] Storage Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-storage Storage Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants