Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/pebble: panic pebble: flush-to key #54284

Closed
ajwerner opened this issue Sep 11, 2020 · 4 comments · Fixed by #54422
Closed

storage/pebble: panic pebble: flush-to key #54284

ajwerner opened this issue Sep 11, 2020 · 4 comments · Fixed by #54422
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage.

Comments

@ajwerner
Copy link
Contributor

Describe the problem

I was running a cluster trying to debug an issue in the allocator when I saw that a node had died with the following panic:

panic: pebble: flush-to key (/Table/61/1/992/"\u007f\x05\x9bJs\xc1H\x00\x80\x00\x00\x00\x01\xc6D\x06"/0/1599852788.409022053,0) < flushed key (/Table/61/1/1821/"\xe9\"\x18\x11'(H\x00\x80\x00\x00\x00\x03A\xbf\xcd"/0,0)

goroutine 268922 [running]:
panic(0x492b400, 0xc0371efa80)
        /usr/local/go/src/runtime/panic.go:722 +0x2c2 fp=0xc002bc50d8 sp=0xc002bc5048 pc=0x7b15c2
github.com/cockroachdb/pebble/internal/rangedel.(*Fragmenter).FlushTo(0xc010f08570, 0xc047b94a80, 0x27, 0x30)
        /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/internal/rangedel/fragmenter.go:286 +0x828 fp=0xc002bc5220 sp=0xc002bc50d8 pc=0x16b4188
github.com/cockroachdb/pebble.(*compactionIter).Tombstones(0xc001c5c780, 0xc047b94a80, 0x27, 0x30, 0x0, 0xc047b94a80, 0x0, 0x30)
        /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/compaction_iter.go:603 +0xff fp=0xc002bc5270 sp=0xc002bc5220 pc=0x16fb7af
github.com/cockroachdb/pebble.(*DB).runCompaction.func4(0xc05cbee3f0, 0x27, 0x30, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/compaction.go:1898 +0x11c fp=0xc002bc5480 sp=0xc002bc5270 pc=0x173e8cc
github.com/cockroachdb/pebble.(*DB).runCompaction(0xc0008d6a80, 0x19a2, 0xc010f08480, 0x60c5200, 0x94b38e0, 0xc03c2aca00, 0xc016200c80, 0x6, 0x8, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/compaction.go:2233 +0x15ee fp=0xc002bc5bc8 sp=0xc002bc5480 pc=0x16f6dae
github.com/cockroachdb/pebble.(*DB).compact1(0xc0008d6a80, 0xc010f08480, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/compaction.go:1669 +0x167 fp=0xc002bc5e50 sp=0xc002bc5bc8 pc=0x16f5007
github.com/cockroachdb/pebble.(*DB).compact.func1(0x6164bc0, 0xc0119cc990)
        /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/compaction.go:1630 +0xbe fp=0xc002bc5ee0 sp=0xc002bc5e50 pc=0x173db4e
runtime/pprof.Do(0x6164bc0, 0xc0119cc990, 0xc000384ce0, 0x1, 0x1, 0xc001f24798)
        /usr/local/go/src/runtime/pprof/runtime.go:37 +0xf8 fp=0xc002bc5f68 sp=0xc002bc5ee0 pc=0x1019738
github.com/cockroachdb/pebble.(*DB).compact(0xc0008d6a80, 0xc010f08480, 0x0)
        /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/compaction.go:1627 +0xa2 fp=0xc002bc5fc8 sp=0xc002bc5f68 pc=0x16f4e82
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc002bc5fd0 sp=0xc002bc5fc8 pc=0x7e1d91
created by github.com/cockroachdb/pebble.(*DB).maybeScheduleCompactionPicker
        /go/src/github.com/cockroachdb/cockroach/vendor/github.com/cockroachdb/pebble/compaction.go:1424 +0x3f4

Environment:

  • v20.2.0-alpha.1-3327-g8412b1efe5 (x86_64-unknown-linux-gnu, built 2020/09/11 19:12:57, go1.13.14)
  • 9 node, gce cluster with n1-standard-4 nodes
@ajwerner ajwerner added the A-storage Relating to our storage engine (Pebble) on-disk storage. label Sep 11, 2020
@petermattis
Copy link
Collaborator

@itsbilal Please take a look. The SHA includes your compactionOutputSplitter refactor.

@itsbilal
Copy link
Contributor

Yep, already investigated. Details are on slack, but the core issue is a bad interaction between nonZeroSeqNumSplitter withholding a compaction split suggestion, and the end of the compaction. I'll fix this on Monday.

@petermattis
Copy link
Collaborator

Details are on slack

For posterity, it is useful to copy those details to this issue.

@itsbilal
Copy link
Contributor

Sure. Here's the relevant compaction input:

L4 input:

/Table/58/1/1999/...#0,SET
/Table/61/1/1180/.../0,0-/Table/61/1/1202/"...#13211169,RANGEDEL
/Table/61/1/1999/...#91952,SET

L3 input:

/Table/58/1/1333/...-/Table/58/1/1407/...#13759205,RANGEDEL
/Table/61/1/1821/...-/Table/61/1/1895/...#13504457,RANGEDEL

Compaction output into L4:

/Table/58/1/1999/...#0,SET 
/Table/61/1/1999/...#91952,SET

And then the compaction should finish, as the range del fragmenter is empty and no more keys remain.
The issue is that limit = /Table/61/1/992/, a grandparent limit that’s right before the last key in the compaction input, but we can’t truncate the compaction right before it (/Table/61/1/1999/...#91952,SET) because, from the splitter’s view, the fragmenter isn’t empty (/Table/61/1/1821/...-/Table/61/1/1895/...#13504457,RANGEDEL is sitting around in it), the last point seqnum was 0 (/Table/58/1/1999/...#0,SET).

So we “hold off” the split suggestion, but turns out that was the last key in the compaction, so we never get around to calling shouldSplitBefore again and resetting limit. We end up calling finishOutput(limit), with the now-out-of-date limit that’s earlier than both the latest range tombstone and the panic is triggered.

Possible solutions:

  1. Detect all end-of-compaction cases and set limit = nil
  2. Instead of just checking for c.rangeDelFrag.Empty(), have a c.rangeDelFrag.End() method and if that ends before the current key, treat the rangeDelFrag as empty as far as the nonzero sequence number checking is concerned.

itsbilal added a commit to itsbilal/pebble that referenced this issue Sep 14, 2020
Currently, it's possible for the nonZeroSeqNumSplitter to "withhold"
a compaction split suggestion in such a way that the `limit` variable
in the compaction loop is exceeded, only to advise a compaction split
at a later point.

This is usually not a concern as we just reset `limit`
when that compaction split is actually advised, but if the compaction
were to run out of keys in this narrow window, we would leave the limit
at a non-nil past key, which would violate an invariant in the rangedel
fragmenter as it can't truncate range tombstones to a passed key.

Similar logic already existed in older iteration of this code, which
is in use in the crl-release-20.2 branch. A refactor here introduced
this bug.

This change allows for a 3-way return value from shouldSplitBefore;
in the case where limit has been exceeded, we resort to resetting the
limit like before.

Will address cockroachdb/cockroach#54284 when this lands in cockroach
master.
itsbilal added a commit to itsbilal/pebble that referenced this issue Sep 15, 2020
Currently, it's possible for the nonZeroSeqNumSplitter to "withhold"
a compaction split suggestion in such a way that the `limit` variable
in the compaction loop is exceeded, only to advise a compaction split
at a later point.

This is usually not a concern as we just reset `limit`
when that compaction split is actually advised, but if the compaction
were to run out of keys in this narrow window, we would leave the limit
at a non-nil past key, which would violate an invariant in the rangedel
fragmenter as it can't truncate range tombstones to a passed key.

Similar logic already existed in older iteration of this code, which
is in use in the crl-release-20.2 branch. A refactor here introduced
this bug.

This change allows for a 3-way return value from shouldSplitBefore;
in the case where limit has been exceeded, we resort to resetting the
limit like before.

Will address cockroachdb/cockroach#54284 when this lands in cockroach
master.
itsbilal added a commit to itsbilal/pebble that referenced this issue Sep 15, 2020
Currently, it's possible for the nonZeroSeqNumSplitter to "withhold"
a compaction split suggestion in such a way that the `limit` variable
in the compaction loop is exceeded, only to advise a compaction split
at a later point.

This is usually not a concern as we just reset `limit`
when that compaction split is actually advised, but if the compaction
were to run out of keys in this narrow window, we would leave the limit
at a non-nil past key, which would violate an invariant in the rangedel
fragmenter as it can't truncate range tombstones to a passed key.

Similar logic already existed in older iteration of this code, which
is in use in the crl-release-20.2 branch. A refactor here introduced
this bug.

This change allows for a 3-way return value from shouldSplitBefore;
in the case where limit has been exceeded, we resort to resetting the
limit like before.

Will address cockroachdb/cockroach#54284 when this lands in cockroach
master.
itsbilal added a commit to cockroachdb/pebble that referenced this issue Sep 15, 2020
Currently, it's possible for the nonZeroSeqNumSplitter to "withhold"
a compaction split suggestion in such a way that the `limit` variable
in the compaction loop is exceeded, only to advise a compaction split
at a later point.

This is usually not a concern as we just reset `limit`
when that compaction split is actually advised, but if the compaction
were to run out of keys in this narrow window, we would leave the limit
at a non-nil past key, which would violate an invariant in the rangedel
fragmenter as it can't truncate range tombstones to a passed key.

Similar logic already existed in older iteration of this code, which
is in use in the crl-release-20.2 branch. A refactor here introduced
this bug.

This change allows for a 3-way return value from shouldSplitBefore;
in the case where limit has been exceeded, we resort to resetting the
limit like before.

Will address cockroachdb/cockroach#54284 when this lands in cockroach
master.
craig bot pushed a commit that referenced this issue Sep 15, 2020
54273: changefeedccl/schemafeed: only sort the unsorted tail of descriptors r=ajwerner a=ajwerner

This lead to a race detector warning firing (theorized).

I'd love to validate that this is the bug but I feel pretty good about it.

Fixes #48459.

Release note: None

54358: sql/catalog/descs: don't hydrate dropped tables r=ajwerner a=ajwerner

The invariant that types referenced by tables only exists for non-dropped
tables. We were not checking the state of the table when choosing to hydrate.
This lead to pretty catastropic failures when the invariant was violated.

Fixes #54343.

Release note (bug fix): Fixed bug from earlier alphas where dropping a database
which contained tables using user-defined types could result in panics.

54417: kvserver: improve a comment around node liveness r=irfansharif a=irfansharif

Release note: None

54422: vendor: Bump pebble to 08b545a1f5403e31a76b48f46a780c8d59432f57 r=petermattis a=itsbilal

Changes pulled in:

```
08b545a1f5403e31a76b48f46a780c8d59432f57 compaction: Invalidate limit when a splitter defers a split suggestion
6e5e695d8b1c33c0c4687bd7e804e9aaac66d9dd db: remove unused compaction.maxExpandedBytes
```

Fixes #54284.

Release note: None.

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
@craig craig bot closed this as completed in acb8c95 Sep 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants