Skip to content

Commit

Permalink
storage: fatal on corruption encountered in background
Browse files Browse the repository at this point in the history
Previously, on-disk corruption would only fatal the node if an interator
observed it. Corruption encountered by a background job like a compaction would
not fatal the node. This can result in busy churning through compactions that
repeatedly fail, impacting cluster stability and user query latencies.

Now, on-disk corruption results in immediately exiting the node.

Epic: none
Fixes: #101101
Release note (ops change): When local corruption of data is encountered by a
background job, a node will now exit immediately.
  • Loading branch information
jbowens committed Apr 25, 2023
1 parent b89085a commit 4c5be04
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions pkg/storage/pebble.go
Original file line number Diff line number Diff line change
Expand Up @@ -1193,6 +1193,11 @@ func (p *Pebble) async(fn func()) {

func (p *Pebble) makeMetricEtcEventListener(ctx context.Context) pebble.EventListener {
return pebble.EventListener{
BackgroundError: func(err error) {
if errors.Is(err, pebble.ErrCorruption) {
log.Fatalf(ctx, "local corruption detected: %v", err)
}
},
WriteStallBegin: func(info pebble.WriteStallBeginInfo) {
atomic.AddInt64(&p.writeStallCount, 1)
startNanos := timeutil.Now().UnixNano()
Expand Down

0 comments on commit 4c5be04

Please sign in to comment.