-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: have sequenced reads use the intent history #33244
Conversation
f782b0c
to
c185bf0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 5 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained
c-deps/libroach/mvcc.h, line 169 at r1 (raw file):
} bool getFromIntentHistory(cockroach::storage::engine::enginepb::MVCCMetadata meta, int32_t seq) {
It's strange that we pass two fields into this method but then set another field - so it's not a fully static method. I'd recommend passing no fields in here.
@petermattis might also have other comments about the C++ style.
c-deps/libroach/mvcc.h, line 170 at r1 (raw file):
bool getFromIntentHistory(cockroach::storage::engine::enginepb::MVCCMetadata meta, int32_t seq) { int32_t largestSeq = -1;
We should use a binary_search
to accomplish this lookup.
c-deps/libroach/mvcc.h, line 290 at r1 (raw file):
if (txn_epoch_ == meta_.txn().epoch()) { if (txn_sequence_ >= meta_.txn().sequence()) {
We're going to need to make this logic conditional on a cluster version. To do this, we'll need to go through a few steps, outlined in cockroach_versions.go
.
We'll then want to check whether this version is "active" somewhere in or above the mvcc layer. This is going to be a little tricky because we'll need the right dependencies in a few places. Here's my suggestion:
- Add a new
IgnoreSeq
field toMVCCGetOptions
andMVCCScanOptions
- Comment with a
TODO(nvanbenschoten)
that the field can go away in 2.3. - Set this field when the new cluster version is NOT active in
cmd_get.go
,cmd_scan.go
, andcmd_reverse_scan.go
(I just counted and I think we'll need to set this in 5 places, but please double check). Here's an example of how to check whether a cluster setting is active. - Pass this option down through C++
- Make this condition
(ignore_seq_ || (txn_sequence_ >= meta_.txn().sequence()))
c-deps/libroach/mvcc.h, line 291 at r1 (raw file):
if (txn_epoch_ == meta_.txn().epoch()) { if (txn_sequence_ >= meta_.txn().sequence()) { // 8. We're reading our own txn's intent at a higher sequence.
"an equal or higher sequence"
c-deps/libroach/mvcc.h, line 299 at r1 (raw file):
// 9. We're reading our own txn's intent at a lower sequence. // This means the intent we're seeing was written after the read. // So we try and read a value from the intent history that is
s/So//
c-deps/libroach/mvcc.h, line 300 at r1 (raw file):
// This means the intent we're seeing was written after the read. // So we try and read a value from the intent history that is // appropriate instead. If no value is found still, read the value
s/still//
c-deps/libroach/mvcc.h, line 303 at r1 (raw file):
// as though the intent was not present. bool found = getFromIntentHistory(meta_, txn_sequence_); if (!found) {
nit: invert this condition so that we can separate this into a separate "steps" (e.g. step 10.)
pkg/kv/txn_interceptor_sequence_nums.go, line 69 at r1 (raw file):
// populateMetaLocked is part of the txnInterceptor interface. Since reads now // require the sequence number, we must pass it through as well.
No need for this comment.
pkg/kv/txn_interceptor_sequence_nums.go, line 77 at r1 (raw file):
// augmentMetaLocked is part of the txnInterceptor interface. Since reads now // require the sequence number, we must ensure we don't ever underestimate // the sequence.
No need for this comment.
c185bf0
to
a96a543
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
c-deps/libroach/mvcc.h, line 169 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It's strange that we pass two fields into this method but then set another field - so it's not a fully static method. I'd recommend passing no fields in here.
@petermattis might also have other comments about the C++ style.
Done.
c-deps/libroach/mvcc.h, line 170 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We should use a
binary_search
to accomplish this lookup.
Done. Using std:lower_bound
that does almost exactly what we want (internally uses binary search).
c-deps/libroach/mvcc.h, line 290 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We're going to need to make this logic conditional on a cluster version. To do this, we'll need to go through a few steps, outlined in
cockroach_versions.go
.We'll then want to check whether this version is "active" somewhere in or above the mvcc layer. This is going to be a little tricky because we'll need the right dependencies in a few places. Here's my suggestion:
- Add a new
IgnoreSeq
field toMVCCGetOptions
andMVCCScanOptions
- Comment with a
TODO(nvanbenschoten)
that the field can go away in 2.3.- Set this field when the new cluster version is NOT active in
cmd_get.go
,cmd_scan.go
, andcmd_reverse_scan.go
(I just counted and I think we'll need to set this in 5 places, but please double check). Here's an example of how to check whether a cluster setting is active.- Pass this option down through C++
- Make this condition
(ignore_seq_ || (txn_sequence_ >= meta_.txn().sequence()))
Adding a Commit to the PR that'll add a cluster setting.
c-deps/libroach/mvcc.h, line 291 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"an equal or higher sequence"
Done.
c-deps/libroach/mvcc.h, line 299 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/So//
Done.
c-deps/libroach/mvcc.h, line 300 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/still//
Done.
pkg/kv/txn_interceptor_sequence_nums.go, line 69 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
No need for this comment.
Done.
pkg/kv/txn_interceptor_sequence_nums.go, line 77 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
No need for this comment.
Done.
a96a543
to
776e08f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r2, 11 of 11 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained
c-deps/libroach/mvcc.h, line 29 at r2 (raw file):
// compareIntents compares two sequenced intents to check if the first one // has a sequence equal or lower than the second. bool compareIntents(cockroach::storage::engine::enginepb::MVCCMetadata_SequencedIntent intent1,
make the args const &
.
c-deps/libroach/mvcc.h, line 31 at r2 (raw file):
bool compareIntents(cockroach::storage::engine::enginepb::MVCCMetadata_SequencedIntent intent1, cockroach::storage::engine::enginepb::MVCCMetadata_SequencedIntent intent2) { if (intent1.sequence() <= intent2.sequence()) {
This can be return intent1.sequence() <= intent2.sequence();
, right?
I think it should also be strictly less, based on https://en.cppreference.com/w/cpp/algorithm/lower_bound.
c-deps/libroach/mvcc.h, line 195 at r2 (raw file):
return false; } const auto intent = *(low - 1);
std::lower_bound
returns " the first element in the range [first, last) that is not less than value". So does this logic work if we query at the same sequence as an intent? I think we might want std::upper_bound
.
c-deps/libroach/mvcc.h, line 312 at r2 (raw file):
// This means the intent we're seeing was written after the read. // If there exists a value in the intent history that has a sequence // number lower than the read sequence, read that value.
equal to or lower than!
c-deps/libroach/mvcc.h, line 317 at r2 (raw file):
return true; } // 10. If no value in the intent history has a sequence number lower
ditto
81f41f3
to
1bac244
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! Your last review was too chill @nvanbenschoten. You gave me some hope that this gets through before I leave cause there was nothing to iterate on. Give it another 👀 to see if you find anything else, pretty excited now!
Reviewable status: complete! 0 of 0 LGTMs obtained
c-deps/libroach/mvcc.h, line 29 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
make the args
const &
.
Done.
c-deps/libroach/mvcc.h, line 31 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This can be
return intent1.sequence() <= intent2.sequence();
, right?I think it should also be strictly less, based on https://en.cppreference.com/w/cpp/algorithm/lower_bound.
Well that isn't really necessary because making it <=
would allow for intents written at the same seq to be read as well. But I changed it anyway as I'm now using upper_bound
.
c-deps/libroach/mvcc.h, line 195 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
std::lower_bound
returns " the first element in the range [first, last) that is not less than value". So does this logic work if we query at the same sequence as an intent? I think we might wantstd::upper_bound
.
Actually lower_bound
used to work correctly even in that case as the compare predicate function would consider <=
instead of just <
. Since, upper_bound
achieves the same thing, and is the convention we should stick with it. Done.
c-deps/libroach/mvcc.h, line 312 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
equal to or lower than!
Done.
c-deps/libroach/mvcc.h, line 317 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
ditto
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ridwanmsharif Have you run any of the MVCC
benchmarks (in storage/engine) to see the impact of this change? This is performance sensitive code and extra care needs to be taken. I haven't had a chance to look at this change yet, so perhaps there is nothing of concern.
Reviewable status: complete! 0 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS The relevant benchmarks for this change are the various MVCCScan
benchmarks.
Reviewable status: complete! 0 of 0 LGTMs obtained
1bac244
to
e7d2599
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran a few benchmarks and there was virtually no hit in the benchmarks you mention. I believe that makes sense as the extra logic for reading the right value from history would apply on when there are a few writes on the same key by the same transaction. We've not been writing history for a while and the history is expected to be pretty small as well. I defer judgement to @nvanbenschoten now. Doesn't seem like I will be able to iterate on this anymore. I trust you will review this and push this through. I'll check back on this from my flight. Great working with you all. Happy Halloween! And Happy Holidays!
Reviewable status: complete! 0 of 0 LGTMs obtained
@ridwanmsharif sorry for letting this sit. I rebased this on master, fixed a small bug, and cleaned it up a bit. We're close to being ready to merge! The only issue is that I don't have permission to push to the pull request. Do you mind following these steps: https://help.github.com/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork/? |
That's great to hear! I do see that this PR already checks the "Allow edits from maintainers" option. I don't quite understand why you can't push to it. Also sent you an invitation to be a collaborator on the fork, maybe that fixes this? @nvanbenschoten |
e7d2599
to
49b37d8
Compare
I think the issue was actually that I was pushing to the wrong branch name. Pushing to the correct one fixed it! I left the first two commits alone (other than a rebase) and added two more. @ridwanmsharif you might be interested in looking them over. Otherwise, @petermattis, mind reviewing those two changes when you get a chance? I'm gone for the next two weeks, so @petermattis feel free to merge this if everything checks out on your end. I'm also happy to merge when I return in case there's any fallout. |
The third commit addressed the rest of my feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM. Thanks for fixing the bug too! I'll defer to @petermattis for the final ✔️
Reads can now use the intent history to serve reads from values written at a lower sequence. Reads no longer read all writes in a transaction but instead read all writes in the transaction with a higher sequence than it. If a key is written to multiple times within a transaction, the read can now get the appropriate value using the intent history. Release note: None
We need a cluster version for this change. We make sure to set this newly created `IgnoreSequence` to true each time we see a cluster version is too low. Release note: None
This also fixes one bug, where MVCCScan would get stuck reading the same key and never advancing. The added test cases caught this. Release note: None
To do this, it needed to read at the previous sequence number before writing at the current sequece number. Release note: None
49b37d8
to
20bf018
Compare
Ok, now that #34025 has calmed down, I'm going to go ahead and merge this. Thanks again for all the hard work on this @ridwanmsharif! bors r+ |
33244: storage: have sequenced reads use the intent history r=nvanbenschoten a=ridwanmsharif Reads can now use the intent history to serve reads from values written at a lower sequence. Reads no longer read all writes in a transaction but instead read all writes in the transaction with a higher sequence than it. If a key is written to multiple times within a transaction, the read can now get the appropriate value using the intent history. Release note: None Co-authored-by: Ridwan Sharif <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
Build succeeded |
Reads can now use the intent history to serve reads from
values written at a lower sequence. Reads no longer read
all writes in a transaction but instead read all writes
in the transaction with a higher sequence than it. If a key
is written to multiple times within a transaction, the read
can now get the appropriate value using the intent history.
Release note: None