Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on-disk and in-memory state diverged #40950

Closed
darinpp opened this issue Sep 20, 2019 · 0 comments · Fixed by #41018
Closed

on-disk and in-memory state diverged #40950

darinpp opened this issue Sep 20, 2019 · 0 comments · Fixed by #41018
Labels
A-kv Anything in KV that doesn't belong in a more specific category. B-os-windows Issues specific to the Windows OS. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors

Comments

@darinpp
Copy link
Contributor

darinpp commented Sep 20, 2019

Describe the problem

Running the script from #37315 when the node is running on windows and the storage is on a FAT32 drive leads to:

F190920 20:05:01.019384 247 storage/replica.go:982  [n1,s1,r1/1:/{Min-System/NodeL…}] on-disk and in-memory state diverged: [TruncatedState.Index: 0 != 860 TruncatedState.Term: 0 != 6]
goroutine 247 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc0004c2300, 0xc0004c23c0, 0x0, 0x62)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1016 +0xb8
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x6c22820, 0xc000000004, 0x6a4f274, 0x12, 0x3d6, 0xc0095be5a0, 0x86)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:872 +0x962
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x4984200, 0xc0076f90c0, 0xc000000004, 0x2, 0x0, 0x0, 0xc006f86e50, 0x1, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x2d3
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x4984200, 0xc0076f90c0, 0x1, 0xc000000004, 0x0, 0x0, 0xc006f86e50, 0x1, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:69 +0x93
github.com/cockroachdb/cockroach/pkg/util/log.Fatal(...)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:189
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).assertStateLocked(0xc000130800, 0x4984200, 0xc0076f90c0, 0x49c31a0, 0xc00046b680)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:982 +0x753
github.com/cockroachdb/cockroach/pkg/storage.(*replicaStateMachine).ApplySideEffects(0xc0001308c0, 0x49c3080, 0xc005c2f008, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:868 +0x8c1
github.com/cockroachdb/cockroach/pkg/storage/apply.mapCheckedCmdIter(0x13cd5098, 0xc000130ad8, 0xc006f874d8, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/cmd.go:182 +0x122
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).applyOneBatch(0xc006f87900, 0x49842c0, 0xc006d14240, 0x49c3140, 0xc000130a78, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:276 +0x22f
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).ApplyCommittedEntries(0xc006f87900, 0x49842c0, 0xc006d14240, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:242 +0xd6
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc000130800, 0x49842c0, 0xc006d14240, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:759 +0xdb7
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReady(0xc000130800, 0x49842c0, 0xc006d14240, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:432 +0x156
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processReady(0xc000973500, 0x49842c0, 0xc000ab4240, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3695 +0x13d
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc0003ecd00, 0x49842c0, 0xc000ab4240)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:227 +0x259
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x49842c0, 0xc000ab4240)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:161 +0x45
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc0002dbbc0, 0xc000646e10, 0xc0002dbbb0)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:196 +0x102
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:189 +0xaf

I wasn't able to reproduce on NTFS drive. This may be related somehow to #40918 as I was seeing both of these while running on FAT32. Neither showed up when using NTFS.
To Reproduce

  1. Run cockroach.exe start-single-node --insecure from a location that is FAT32
  2. Run the script from Crash: failed to update store after merging range: IO error: Failed to remove dir #37315
  3. The crash should happen at around 2m records inserted with 8GB FAT32 drive. With a different size FAT32 drive it seems to happen at different time.

Expected behavior
To not have a crash.

Environment:

  • CockroachDB version [CCL v19.2.0-alpha.20190606-2335-g876c5e7-dirty @ 2019/09/19 21:33:24 (go1.12.5)]
  • Server OS: [Microsoft Windows [Version 10.0.18362.356]]
  • Client app [JDBC postgres driver 42.2.5]
@darinpp darinpp added A-kv Anything in KV that doesn't belong in a more specific category. B-os-windows Issues specific to the Windows OS. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors labels Sep 20, 2019
craig bot pushed a commit that referenced this issue Sep 24, 2019
41018: c-deps: bump rocksdb for unique cache IDs on Windows r=ajkr a=ajkr

Picks up cockroachdb/rocksdb#58.

We found a corruption caused by multiple FAT32 files assigned the same
block cache key prefix. We don't know the extent to which this problem
affects other filesystems or other Windows file ID generation mechanisms.
We decided to turn off the reliance on filesystem for generating cache
keys on Windows. Instead we use randomization per table reader. This
would cause a performance penalty for use cases that open multiple table
readers per file, but I believe cockroach is not such a use case.

Fixes #40918, fixes #40950.

Release justification: Prevents corruption on some Windows filesystems

Release note: None

41020: util/log: fix GC of secondary loggers r=petermattis a=knz

Fixes #40974.

This is a subset of #40993 suitable for 19.2 and backport to 19.1.

Release justification: bug fix

Release note (bug fix): CockroachDB will now properly remove excess
secondary log files (SQL audit logging, statement execution logging,
and RocksDB events).

Co-authored-by: Andrew Kryczka <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
@craig craig bot closed this as completed in ece7b8b Sep 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. B-os-windows Issues specific to the Windows OS. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant