-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: correctly update the stats for rolled back deletes #113306
storage: correctly update the stats for rolled back deletes #113306
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice find!
I'll add the backport labels for the releases that we'll want to backport to. I suspect that not all of them will be clean backports, unfortunately.
Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi and @miraradeva)
pkg/storage/mvcc.go
line 630 at r1 (raw file):
// resolved value (key & bytes) are subtracted from the intents // counters if commit=true. func updateStatsOnResolve(
Can we add some coverage of this change in mvcc_stats_test.go
? In particular, it would be nice to add:
- two unit tests for it (tombstone rollback to value + value rollback to tombstone), unless you think this is entirely redundant with
pkg/storage/testdata/mvcc_histories/ignored_seq_nums_delete
- coverage of txn rollbacks in
TestMVCCStatsRandomized
pkg/storage/mvcc.go
line 713 at r1 (raw file):
// If the value is deleted or a delete was rolled back, adjust LiveCount. if !meta.Deleted && orig.Deleted {
The way we usually accomplish this in these stats functions is logic that looks closer to:
// in the first half of the function
if orig.Deleted {
ms.LiveCount++
}
// in the second half of the function
if meta.Deleted {
ms.LiveCount--
}
That decouples the relationship between orig
and meta
slightly. Conveniently, these branches already exist.
pkg/storage/mvcc.go
line 5090 at r1 (raw file):
// here. Instead, this should all be handled down below. var removeIntent bool removeIntent, rolledBackVal, err = mvccMaybeRewriteIntentHistory(ctx, writer, intent.IgnoredSeqNums, newMeta, latestKey)
👍 it wasn't good that we were modifying meta
. Thanks for fixing this. To confirm though, this wasn't needed to address the bug, right? (just double-checking my understanding).
4175f47
to
4d482a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi and @nvanbenschoten)
pkg/storage/mvcc.go
line 630 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Can we add some coverage of this change in
mvcc_stats_test.go
? In particular, it would be nice to add:
- two unit tests for it (tombstone rollback to value + value rollback to tombstone), unless you think this is entirely redundant with
pkg/storage/testdata/mvcc_histories/ignored_seq_nums_delete
- coverage of txn rollbacks in
TestMVCCStatsRandomized
1 is all set.
2 is a bit tricky because these randomized tests have no easy way of tweaking the probabilities of different operations. With the code I added I could not repro this bug because it needs a specific set of circumstances: a non-nil txn that commits and has an rolled back delete in its history. Out of 200 operations (the test default), for example, maybe 12 even get some interesting history, and the rest doesn't line up. Any suggestions?
pkg/storage/mvcc.go
line 713 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The way we usually accomplish this in these stats functions is logic that looks closer to:
// in the first half of the function if orig.Deleted { ms.LiveCount++ } // in the second half of the function if meta.Deleted { ms.LiveCount-- }That decouples the relationship between
orig
andmeta
slightly. Conveniently, these branches already exist.
Nice, I like that.
pkg/storage/mvcc.go
line 5090 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
👍 it wasn't good that we were modifying
meta
. Thanks for fixing this. To confirm though, this wasn't needed to address the bug, right? (just double-checking my understanding).
It was actually needed; I added a comment.
If we let mvccMaybeRewriteIntentHistory
change meta
, then the logic for updating the stats in updateStatsOnResolve
doesn't quite work because meta
doesn't correspond to the original stats (ms
) anymore.
For example, suppose we have an intent and a rolled back deletion tombstone in the mvcc history, such that the tombstone is the most recent value. If we pass meta
to mvccMaybeRewriteIntentHistory
, it will remove the tombstone from the history but will not update ms
to account for that. Then, we pass orig = meta
to updateStatsOnResolve
and orig.Deleted
is false
(referring to the intent), while we want it to be true
(referring to the tombstone) to be able to adjust LiveBytes
and LiveCount
correctly.
e7b22cc
to
6c5489f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aadityasondhi and @miraradeva)
pkg/storage/mvcc.go
line 630 at r1 (raw file):
Previously, miraradeva (Mira Radeva) wrote…
1 is all set.
2 is a bit tricky because these randomized tests have no easy way of tweaking the probabilities of different operations. With the code I added I could not repro this bug because it needs a specific set of circumstances: a non-nil txn that commits and has an rolled back delete in its history. Out of 200 operations (the test default), for example, maybe 12 even get some interesting history, and the rest doesn't line up. Any suggestions?
I don't think it needs to be able to reproduce the bug with a high probability, but making that probability non-zero would give us more confidence that we have coverage. That would ensure that stress over enough time would catch bugs in these cases.
pkg/storage/mvcc.go
line 5090 at r1 (raw file):
Previously, miraradeva (Mira Radeva) wrote…
It was actually needed; I added a comment.
If we let
mvccMaybeRewriteIntentHistory
changemeta
, then the logic for updating the stats inupdateStatsOnResolve
doesn't quite work becausemeta
doesn't correspond to the original stats (ms
) anymore.For example, suppose we have an intent and a rolled back deletion tombstone in the mvcc history, such that the tombstone is the most recent value. If we pass
meta
tomvccMaybeRewriteIntentHistory
, it will remove the tombstone from the history but will not updatems
to account for that. Then, we passorig = meta
toupdateStatsOnResolve
andorig.Deleted
isfalse
(referring to the intent), while we want it to betrue
(referring to the tombstone) to be able to adjustLiveBytes
andLiveCount
correctly.
Ah, that makes sense! Thanks for explaining.
d5814d6
to
ffd0bac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aadityasondhi and @nvanbenschoten)
pkg/storage/mvcc.go
line 630 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I don't think it needs to be able to reproduce the bug with a high probability, but making that probability non-zero would give us more confidence that we have coverage. That would ensure that stress over enough time would catch bugs in these cases.
There was a bug in the test. It was using the state's ts (s.TS
) even when an operation was running transactionally. Those were failing silently because the op ts didn't match the txn's read ts. Now, the test repros the bug nicely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aadityasondhi)
Previously, a delete on a key rolled back by a savepoint did not correctly update the MVCCStats when the intent for that key was resolved. In particular, if the rollback revealed a non-deleted value for the key, the stats considered that key/value non-live and included other inaccuracies. This is an example from the output of kvnemesis, which caught the bug: ``` engine stats: {ContainsEstimates:0 LastUpdateNanos:1698322873881949000 LockAge:0 GCBytesAge:0 LiveBytes:771 LiveCount:13 KeyBytes:1350 KeyCount:34 ValBytes:714 ValCount:53 IntentBytes:8 IntentCount:0 LockBytes:0 LockCount:0 RangeKeyCount:0 RangeKeyBytes:0 RangeValCount:0 RangeValBytes:0 SysBytes:254 SysCount:6 AbortSpanBytes:0} computed stats: {ContainsEstimates:0 LastUpdateNanos:1698322873881949000 LockAge:0 GCBytesAge:0 LiveBytes:161 LiveCount:1 KeyBytes:0 KeyCount:0 ValBytes:8 ValCount:0 IntentBytes:8 IntentCount:0 LockBytes:0 LockCount:0 RangeKeyCount:0 RangeKeyBytes:0 RangeValCount:0 RangeValBytes:0 SysBytes:0 SysCount:0 AbortSpanBytes:0} ``` This patch fixes the logic responsible for computing stats upon intent resolution and adds testing for rolled back deletes. Informs: cockroachdb#97444 Release note (bug fix): Rolled back deletes no longer cause a discrepancy between computed stats and the actual stored values.
ffd0bac
to
2ce4ebf
Compare
bors r=nvanbenschoten |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 2ce4ebf to blathers/backport-release-22.2-113306: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. error creating merge commit from 2ce4ebf to blathers/backport-release-23.1-113306: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. error creating merge commit from 2ce4ebf to blathers/backport-release-23.2-113306: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
@miraradeva it looks like none of these backports were clean, so we'll need to manually backport them to each release branch. The instructions for that are here: https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/900005932/Backporting+a+change+to+a+release+branch |
Yup, I'm on it. |
Recently, cockroachdb#113306 attempted to fix an issue in TestMVCCStatsRandomized where a transactional request was silently failing because its timestamp did not match the transaction's read timestamp. That patch handled many of the types of requests in TestMVCCStatsRandomized but not all (e.g. transactional GC requests). This patch addresses the issue by refactoring the timestamp manipulation to a single place in the test. Fixes: cockroachdb#119409 Release note: None
122976: storage: fix TestMVCCStatsRandomized's handling of timestamps r=miraradeva a=miraradeva Recently, #113306 attempted to fix an issue in TestMVCCStatsRandomized where a transactional request was silently failing because its timestamp did not match the transaction's read timestamp. That patch handled many of the types of requests in TestMVCCStatsRandomized but not all (e.g. transactional GC requests). This patch addresses the issue by refactoring the timestamp manipulation to a single place in the test. Fixes: #119409 Release note: None Co-authored-by: Mira Radeva <[email protected]>
Recently, #113306 attempted to fix an issue in TestMVCCStatsRandomized where a transactional request was silently failing because its timestamp did not match the transaction's read timestamp. That patch handled many of the types of requests in TestMVCCStatsRandomized but not all (e.g. transactional GC requests). This patch addresses the issue by refactoring the timestamp manipulation to a single place in the test. Fixes: #119409 Release note: None
Recently, cockroachdb#113306 attempted to fix an issue in TestMVCCStatsRandomized where a transactional request was silently failing because its timestamp did not match the transaction's read timestamp. That patch handled many of the types of requests in TestMVCCStatsRandomized but not all (e.g. transactional GC requests). This patch addresses the issue by refactoring the timestamp manipulation to a single place in the test. Fixes: cockroachdb#119409 Release note: None
Recently, cockroachdb#113306 attempted to fix an issue in TestMVCCStatsRandomized where a transactional request was silently failing because its timestamp did not match the transaction's read timestamp. That patch handled many of the types of requests in TestMVCCStatsRandomized but not all (e.g. transactional GC requests). This patch addresses the issue by refactoring the timestamp manipulation to a single place in the test. Fixes: cockroachdb#119409 Release note: None
Previously, a delete on a key rolled back by a savepoint did not correctly update the MVCCStats when the intent for that key was resolved. In particular, if the rollback revealed a non-deleted value for the key, the stats considered that key/value non-live and included other inaccuracies. This is an example from the output of kvnemesis, which caught the bug:
This patch fixes the logic responsible for computing stats upon intent resolution and adds testing for rolled back deletes.
Informs: #97444
Release note (bug fix): Rolled back deletes no longer cause a discrepancy between computed stats and the actual stored values.