Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestIteratorErrors fails with SeekLT violates upper bound panic #3592

Closed
RaduBerinde opened this issue May 6, 2024 · 8 comments · Fixed by #3593
Closed

TestIteratorErrors fails with SeekLT violates upper bound panic #3592

RaduBerinde opened this issue May 6, 2024 · 8 comments · Fixed by #3593

Comments

@RaduBerinde
Copy link
Member

RaduBerinde commented May 6, 2024

TestIteratorErrors runs into

panic: levelIter SeekLT to key "yaxhijzhbl" violates upper bound "cgavvrurmtfg" [recovered]

Can reproduce at 0bc137a with

go test -tags invariants -run TestIteratorErrors -v -exec stress -p 4

https://github.com/cockroachdb/pebble/actions/runs/8973148304/job/24642661778?pr=3590

@RaduBerinde
Copy link
Member Author

RaduBerinde commented May 6, 2024

At f03e7ef the same command produces panic: NextPrefix called while positioned on a span boundary

Stack:

        /opt/homebrew/opt/go/libexec/src/runtime/panic.go:914 +0x218
github.com/cockroachdb/pebble/internal/keyspan.(*InterleavingIter).NextPrefix(0x140006461c0?, {0x140006461c0?, 0x1400017b728?, 0x140004a88b0?})
        /Users/radu/go/src/github.com/cockroachdb/pebble/internal/keyspan/interleaving_iter.go:504 +0x15c
github.com/cockroachdb/pebble.(*Iterator).internalNextPrefix(0x140005d4f00, 0xc)
        /Users/radu/go/src/github.com/cockroachdb/pebble/iterator.go:1853 +0x16c
github.com/cockroachdb/pebble.(*Iterator).nextPrefix(0x140005d4f00)
        /Users/radu/go/src/github.com/cockroachdb/pebble/iterator.go:1816 +0x368
github.com/cockroachdb/pebble.(*Iterator).NextPrefix(0x10102d08b?)
        /Users/radu/go/src/github.com/cockroachdb/pebble/iterator.go:1724 +0x40
github.com/cockroachdb/pebble/metamorphic.(*retryableIter).NextPrefix.func1()
        /Users/radu/go/src/github.com/cockroachdb/pebble/metamorphic/retryable.go:124 +0x2c
github.com/cockroachdb/pebble/metamorphic.(*retryableIter).withRetry(0x14000310450, 0x1400017b880)
        /Users/radu/go/src/github.com/cockroachdb/pebble/metamorphic/retryable.go:47 +0x34
github.com/cockroachdb/pebble/metamorphic.(*retryableIter).NextPrefix(0x14000147410?)
        /Users/radu/go/src/github.com/cockroachdb/pebble/metamorphic/retryable.go:123 +0x44
github.com/cockroachdb/pebble/metamorphic.(*iterNextPrefixOp).run(0x1400031d1a0, 0x10080f98c?, {0x14000147410?, 0x10?, 0x140009073d0?})

jbowens added a commit to jbowens/pebble that referenced this issue May 6, 2024
This commit fixes a few issues surrounding NextPrefix and iterator errors.
Previously, under some circumstances, an error propagated by the underlying
iterator would not immediately be propagated up. Instead the Iterator could in
some circumstances call Next or NextPrefix on an already exhausted, errored
iterator, resulting in undefined behavior.

Similarly, the results of calling NextPrefix on an exhausted top-level
pebble.Iterator are not defined. This commit adapts the implementation to early
exit in this case. The metamorphic test may generate such operations.

Fix cockroachdb#3592.
@RaduBerinde
Copy link
Member Author

#3593 fixes the NextPrefix failure above, the original 'violates upper bound' panic still happens.

@jbowens
Copy link
Collaborator

jbowens commented May 6, 2024

Looks like we violate the monotonicity during the "cascading seeks" optimization:

level 0: seekLT("bznjnemb")
level 1: seekLT("bznjnemb")
level 2: seekLT("bznjnemb")
level 3: seekLT("awudbpw")
level 4: seekLT("awudbpw")
level 5: seekLT("awudbpw")
level 6: seekLT("ecjlwuz@5")
2024/05/06 15:48:01 mergingIter: upper bound violation: level 6: ecjlwuz@5 > bznjnemb
goroutine 35 [running]:
runtime/debug.Stack()
	/usr/local/go/src/runtime/debug/stack.go:24 +0x64
github.com/cockroachdb/pebble.(*mergingIter).seekLT(0x14000185028, {0x14006870d08?, 0x140004d1728?, 0x1007b66c0?}, 0x140004d16d8?, 0x0)
	/Users/jackson/go/src/github.com/cockroachdb/pebble/merging_iter.go:1049 +0x134
github.com/cockroachdb/pebble.(*mergingIter).SeekLT(0x14000185028, {0x14006870d08?, 0x30?, 0x14008818180?}, 0x58?)
	/Users/jackson/go/src/github.com/cockroachdb/pebble/merging_iter.go:1103 +0x4c
github.com/cockroachdb/pebble/internal/keyspan.(*InterleavingIter).SeekLT(0x14007195330, {0x14006870d08, 0x8, 0x18}, 0x0?)
	/Users/jackson/go/src/github.com/cockroachdb/pebble/internal/keyspan/interleaving_iter.go:370 +0xd8
github.com/cockroachdb/pebble.(*Iterator).iterLastWithinBounds(0x14000184a08)
	/Users/jackson/go/src/github.com/cockroachdb/pebble/iterator.go:2099 +0x50
github.com/cockroachdb/pebble.(*Iterator).PrevWithLimit(0x14000184a08, {0x0, 0x0, 0x0})
	/Users/jackson/go/src/github.com/cockroachdb/pebble/iterator.go:2060 +0x1f8
github.com/cockroachdb/pebble.(*Iterator).Prev(...)
	/Users/jackson/go/src/github.com/cockroachdb/pebble/iterator.go:1978
github.com/cockroachdb/pebble/metamorphic.(*retryableIter).Prev.func1()
	/Users/jackson/go/src/github.com/cockroachdb/pebble/metamorphic/retryable.go:132 +0x38
github.com/cockroachdb/pebble/metamorphic.(*retryableIter).withRetry(0x1400a292870, 0x140004d18a0)
	/Users/jackson/go/src/github.com/cockroachdb/pebble/metamorphic/retryable.go:47 +0x34
github.com/cockroachdb/pebble/metamorphic.(*retryableIter).Prev(0x10?)
	/Users/jackson/go/src/github.com/cockroachdb/pebble/metamorphic/retryable.go:131 +0x44
github.com/cockroachdb/pebble/metamorphic.(*iterPrevOp).run(0x14009b1ce70, 0x0?, {0x14009a18540?, 0x14006174830?, 0x14000b66cc0?})
	/Users/jackson/go/src/github.com/cockroachdb/pebble/metamorphic/ops.go:1738 +0xa8
github.com/cockroachdb/pebble/metamorphic.(*Test).runOp(0x140004d1a38?, 0x100ed01d8?, {0x14009a18540?, 0x1011b6280?, 0x14000b66cc0?})
	/Users/jackson/go/src/github.com/cockroachdb/pebble/metamorphic/test.go:415 +0x104
github.com/cockroachdb/pebble/metamorphic.(*Test).step(0x14008c8f8c0, 0x100edef64?, 0xe?)
	/Users/jackson/go/src/github.com/cockroachdb/pebble/metamorphic/test.go:401 +0x4c
github.com/cockroachdb/pebble/metamorphic.(*Test).Step(0x14008c8f8c0)
	/Users/jackson/go/src/github.com/cockroachdb/pebble/metamorphic/test.go:390 +0x7c
github.com/cockroachdb/pebble_test.TestIteratorErrors(0x140002869c0)
	/Users/jackson/go/src/github.com/cockroachdb/pebble/external_test.go:93 +0x750
testing.tRunner(0x140002869c0, 0x101289068)
	/usr/local/go/src/testing/testing.go:1689 +0xec
created by testing.(*T).Run in goroutine 1
	/usr/local/go/src/testing/testing.go:1742 +0x318

@jbowens
Copy link
Collaborator

jbowens commented May 6, 2024

maybe an unsurfaced iterator error still?

@RaduBerinde
Copy link
Member Author

It looks like it's been a problem for a while, I can repro failures on 2 week old commits.

@jbowens
Copy link
Collaborator

jbowens commented May 7, 2024

I'm still debugging, but it looks like a levelIter's rangeDelIterPtr is mysteriously nil while the corresponding mergingIterLevel's rangeDelIter is populated with a real (stale?) range deletion iterator. I'm a bit baffled.

@jbowens
Copy link
Collaborator

jbowens commented May 7, 2024

Ah, I think it's an ingested flushable so the rangeDelIter is actually an independent keyspanimpl.LevelIter. That makes more sense.

@jbowens
Copy link
Collaborator

jbowens commented May 7, 2024

Somethings up with the keyspan.LevelIter.

The sequence is something like:

  1. We call keyspan.SeekLE(k) to seek the ingested flushable's range del iter to a span covering a key ≤ k.
  2. This first performs a FragmentIterator.SeekGE(k). This exhausts the iterator, indicating that there's no span covering a key ≥ k.
  3. Then keyspan.SeekLE calls FragmentIterator.Prev, expecting to land on the largest span covering a key < k. This actually returns a span covering a key > k that the original SeekGE(k) should have surfaced.

jbowens added a commit to jbowens/pebble that referenced this issue May 7, 2024
When an ingested flushable creates a range deletion iterator, it uses the
keyspanimpl.LevelIter to expose a unified view of all the range deletions
across the files of the atomically ingested sstables. When loading a sstable's
range deletion iterator failed, it returned a non-nil iterator to work around
idiosyncroncies of the keyspanimpl.LevelIter. This could cause the LevelIter to
mistakenly think it had already loaded a file's range deletions. This commit
fixes ingestedFlushable.constructRangeDelIter to return a nil FragmentIterator
if there's an error, and refactors (keyspanimpl.LevelIter).loadFile.

Fix cockroachdb#3592.
@jbowens jbowens self-assigned this May 7, 2024
jbowens added a commit to jbowens/pebble that referenced this issue May 7, 2024
When an ingested flushable creates a range deletion iterator, it uses the
keyspanimpl.LevelIter to expose a unified view of all the range deletions
across the files of the atomically ingested sstables. When loading a sstable's
range deletion iterator failed, it returned a non-nil iterator to work around
idiosyncroncies of the keyspanimpl.LevelIter. This could cause the LevelIter to
mistakenly think it had already loaded a file's range deletions. This commit
fixes ingestedFlushable.constructRangeDelIter to return a nil FragmentIterator
if there's an error, and refactors (keyspanimpl.LevelIter).loadFile.

Fix cockroachdb#3592.
jbowens added a commit that referenced this issue May 7, 2024
This commit fixes a few issues surrounding NextPrefix and iterator errors.
Previously, under some circumstances, an error propagated by the underlying
iterator would not immediately be propagated up. Instead the Iterator could in
some circumstances call Next or NextPrefix on an already exhausted, errored
iterator, resulting in undefined behavior.

Similarly, the results of calling NextPrefix on an exhausted top-level
pebble.Iterator are not defined. This commit adapts the implementation to early
exit in this case. The metamorphic test may generate such operations.

Fix #3592.
jbowens added a commit that referenced this issue May 7, 2024
When an ingested flushable creates a range deletion iterator, it uses the
keyspanimpl.LevelIter to expose a unified view of all the range deletions
across the files of the atomically ingested sstables. When loading a sstable's
range deletion iterator failed, it returned a non-nil iterator to work around
idiosyncroncies of the keyspanimpl.LevelIter. This could cause the LevelIter to
mistakenly think it had already loaded a file's range deletions. This commit
fixes ingestedFlushable.constructRangeDelIter to return a nil FragmentIterator
if there's an error, and refactors (keyspanimpl.LevelIter).loadFile.

Fix #3592.
@jbowens jbowens moved this to Done in [Deprecated] Storage Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants