-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libroach,engine: support pagination of ExportToSst #44440
libroach,engine: support pagination of ExportToSst #44440
Conversation
@itsbilal lmk if there's somebody better to review this. One improvement I'm considering is going through and naming the return param on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick first pass and left a few comments on the C++. Not a full review.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @itsbilal)
c-deps/libroach/db.cc, line 1100 at r1 (raw file):
// versions of a key to the writer. bool paginated = target_size > 0; std::string cur_key ("");
The explicit constructor call is unnecessary: s/ ("")//g
.
c-deps/libroach/db.cc, line 1104 at r1 (raw file):
// Seek to the MVCC metadata key for the provided start key and let the // incremental iterator find the appropriate version. DBKey seek_key = (DBKey){ .key = start.key };
Huh, this is some weird new C++ syntax. Are field initializers like that part of C++ now? I know for a while that was a gcc-extension. Doing some googling, it isn't clear to me if the (DBKey)
cast is necessary. I think that can be removed. Maybe.
2c7373f
to
0186f41
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @petermattis)
c-deps/libroach/db.cc, line 1100 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
The explicit constructor call is unnecessary:
s/ ("")//g
.
Done.
c-deps/libroach/db.cc, line 1104 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Huh, this is some weird new C++ syntax. Are field initializers like that part of C++ now? I know for a while that was a gcc-extension. Doing some googling, it isn't clear to me if the
(DBKey)
cast is necessary. I think that can be removed. Maybe.
It’s kosher as of C99 so I assumed it would just work in C++. You’re right about the cast. Removed.
If there's a more idiomatic way of constructing the DBKey with a zero timestamp, let me know. In go we call MakeMVCCMetadataKey
.
0186f41
to
b95fb60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @itsbilal)
c-deps/libroach/db.cc, line 1104 at r1 (raw file):
Previously, ajwerner wrote…
It’s kosher as of C99 so I assumed it would just work in C++. You’re right about the cast. Removed.
If there's a more idiomatic way of constructing the DBKey with a zero timestamp, let me know. In go we call
MakeMVCCMetadataKey
.
This idiom looks good to me.
c-deps/libroach/db.cc, line 1099 at r2 (raw file):
// SST then we need to keep track of when we've finished adding all of the // versions of a key to the writer. bool paginated = target_size > 0;
Might want to sprinkle some const
declarations around. Looks like this variable can be const, and some of the ones in the loop below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 10 files at r1, 2 of 2 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)
c-deps/libroach/db.cc, line 1141 at r2 (raw file):
int64_t new_size = cur_size + decoded_key.size() + iter.value().size(); bool is_over_target = paginated && cur_size > 0 && new_size > target_size; if (is_new_key && is_over_target) {
Nit: Might be a good idea adding paginated && to this conditional and removing it from is_over_target
. Makes it clearer that this only runs in the paginated cases, just like the conditional above. The boolean logic here (and the interactions between export_all_revisions
, paginated
, etc) are confusing enough.
c-deps/libroach/db.cc, line 1170 at r2 (raw file):
// If we're not returning an error, check to see if we need to return the resume key. if (res.data == NULL && resume_key.length() > 0) {
Maybe s/res.data == NULL/res.ok()/ ? That's how we usually check for non-error DBStatuses.
c-deps/libroach/db.cc, line 1171 at r2 (raw file):
// If we're not returning an error, check to see if we need to return the resume key. if (res.data == NULL && resume_key.length() > 0) { *resume = ToDBString(rocksdb::Slice(resume_key));
You should also be able to skip the resume_key
conversion from string to rocksdb::Slice. ToDBString
should still work. Rocksdb passes the two types around interchangeably a lot.
pkg/storage/engine/pebble.go, line 1153 at r2 (raw file):
} unsafeValue := iter.UnsafeValue() isNewKey := !exportAllRevisions || unsafeKey.Key.Compare(curKey) != 0
bytes.Equal(unsafeKey.Key, curKey)
is a bit more performant and easier to read
pkg/storage/engine/pebble.go, line 1169 at r2 (raw file):
isOverTarget := paginated && curSize > 0 && uint64(newSize) > targetSize if isNewKey && isOverTarget { resumeKey = append(roachpb.Key{}, unsafeKey.Key...) // allocate the right size
Unsure of what the comment is referring to - do you want to allocate exactly len(unsafeKey.Key)
bytes? In that case why not make([]byte, len(unsafeKey.Key))
and then copy(...)
?
This commit extends the engine interface to take a targetSize parameter in the ExportToSst method. The iteration will stope if the first version of a key to be added to the SST would lead to targetSize being exceeded. If exportAllRevisions is false, the targetSize will not be exceeded unless the first kv pair exceeds it. This commit additionally fixes a bug in the rocksdb implementation of DBExportToSst whereby the first key in the export request would be skipped. This case likely never occurred because the key passed to Export was rarely exactly the first key to be included (see the change related to seek_key in db.cc). The exportccl.TestRandomKeyAndTimestampExport was extended to excercise various targetSize limits. That test run under stress with the tee engine inspires some confidence and did catch the above mentioned bug. More testing would likely be good. This commit leaves the task of adopting the targetSize parameter for later. Fixes cockroachdb#39717. Release note: None
b95fb60
to
05f91e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the prompt reviews!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @itsbilal, and @petermattis)
c-deps/libroach/db.cc, line 1099 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Might want to sprinkle some
const
declarations around. Looks like this variable can be const, and some of the ones in the loop below.
Done.
c-deps/libroach/db.cc, line 1121 at r2 (raw file):
} bool is_new_key = !export_all_revisions || decoded_key.compare(rocksdb::Slice(cur_key)) != 0;
turns out I didn't need this rocksdb::Slice
either.
c-deps/libroach/db.cc, line 1141 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Nit: Might be a good idea adding paginated && to this conditional and removing it from
is_over_target
. Makes it clearer that this only runs in the paginated cases, just like the conditional above. The boolean logic here (and the interactions betweenexport_all_revisions
,paginated
, etc) are confusing enough.
Done.
c-deps/libroach/db.cc, line 1170 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Maybe s/res.data == NULL/res.ok()/ ? That's how we usually check for non-error DBStatuses.
erm that method seems to be on a rocksdb::Status
this is just a DBStatus
. The comment there says If DBStatus.data == NULL the operation succeeded.
I could use status
here but it's a little bit less. We also already do the same check above on status
.
c-deps/libroach/db.cc, line 1171 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
You should also be able to skip the
resume_key
conversion from string to rocksdb::Slice.ToDBString
should still work. Rocksdb passes the two types around interchangeably a lot.
Done.
pkg/storage/engine/pebble.go, line 1153 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
bytes.Equal(unsafeKey.Key, curKey)
is a bit more performant and easier to read
ha forgot about Equals
after the C++.
pkg/storage/engine/pebble.go, line 1169 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Unsure of what the comment is referring to - do you want to allocate exactly
len(unsafeKey.Key)
bytes? In that case why notmake([]byte, len(unsafeKey.Key))
and thencopy(...)
?
See how this makes you feel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @itsbilal, and @petermattis)
c-deps/libroach/db.cc, line 1170 at r2 (raw file):
Previously, ajwerner wrote…
erm that method seems to be on a
rocksdb::Status
this is just aDBStatus
. The comment there saysIf DBStatus.data == NULL the operation succeeded.
I could usestatus
here but it's a little bit less. We also already do the same check above onstatus
.
My bad, forgot this was DBStatus
. Sounds good then.
pkg/storage/engine/pebble.go, line 1169 at r2 (raw file):
Previously, ajwerner wrote…
See how this makes you feel.
Looks better - letting append
grow the slice from 0 usually means it'll end up being some power of 2. This is better if you want to allocate exactly that length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @itsbilal, and @petermattis)
44440: libroach,engine: support pagination of ExportToSst r=ajwerner a=ajwerner This commit extends the engine interface to take a targetSize parameter in the ExportToSst method. The iteration will stope if the first version of a key to be added to the SST would lead to targetSize being exceeded. If exportAllRevisions is false, the targetSize will not be exceeded unless the first kv pair exceeds it. This commit additionally fixes a bug in the rocksdb implementation of DBExportToSst whereby the first key in the export request would be skipped. This case likely never occurred because the key passed to Export was rarely exactly the first key to be included (see the change related to seek_key in db.cc). The exportccl.TestRandomKeyAndTimestampExport was extended to excercise various targetSize limits. That test run under stress with the tee engine inspires some confidence and did catch the above mentioned bug. More testing would likely be good. This commit leaves the task of adopting the targetSize parameter for later. Fixes #39717. Release note: None Co-authored-by: Andrew Werner <[email protected]>
Build succeeded |
In cockroachdb#44440 we added a `targetSize` parameter to enable pagination of export requests. In that PR we defined the targetSize to return just before the key that would lead to the `targetSize` being exceeded. This definition is unfortunate when thinking about a total size limit for pagination in the DistSender (which we'll add when cockroachdb#44341 comes in). Imagine a case where we set a total byte limit of 1MB and a file byte limit of 1MB. That setting should lead to at most a single file being emitted (assuming one range holds enough data). If we used the previous definition we'd create a file which is just below 1MB and then the DistSender would need send another request which would contain a tiny amount of data. This brings the behavior in line with the semantics introduced in cockroachdb#44341 for ScanRequests and is just easier to reason about. Release note: None
44489: sql: bugfixes around writing the old primary key in pk changes r=rohany a=rohany This PR fixes two bugs: * The logic for when to rewrite the old primary key was broken resulting in the old primary key not being rewritten in many cases. * The old primary key being created was not properly dealing with its dangling interleave information. This PR makes the design decision to not re-interleave the copy of the old primary index if it was interleaved. Release note: None 44551: jobs: always set start time of a job r=spaskob a=spaskob When starting a job via CreateAndStartJob if making the job started fails the job will stil be in system.jobs and can be adopted by another node later but the started time will be 0 in this case. We add a check and set it if necessary. Release note: none. 44553: engine: redefine targetSize and add maxSize to ExportToSst r=itsbilal a=ajwerner This PR is a follow-up of work from #44440 motivated by problems unearthed while typing #44482. The PR comes in two commits: 1) Re-define `targetSize` from being a target below which most requests would remain to being the size above which the export stops. 2) Add a `maxSize` parameter above which the `ExportToSst` call will fail. See the individual commits for more details. Co-authored-by: Rohan Yadav <[email protected]> Co-authored-by: Spas Bojanov <[email protected]> Co-authored-by: Andrew Werner <[email protected]>
This commit adopts the API change in cockroachdb#44440 and the previous commit. It adds a hidden cluster setting to control the target size. There's not a ton of testing but there's some. Further work includes: * Adding a mechanism to limit the number of files returned from an ExportRequest for use in CDC backfills. This is currently blocked on cockroachdb#44341. I'm omitting a release note because the setting is hidden. Release note: None.
This commit adopts the API change in cockroachdb#44440 and the previous commit. It adds a hidden cluster setting to control the target size. There's not a ton of testing but there's some. I'm omitting a release note because the setting is hidden. Release note: None.
44482: storageccl: add setting to control export file size and paginate export r=pbardea a=ajwerner This commit adopts the API change in #44440. It adds a hidden cluster setting to control the target size. The testing is minimal. This PR comes in two commits: 1) Add a parameter to the ExportSST request to control the target size 2) Add two cluster settings * Control the target size * Control how much over the target size before an error is generated Closes #43356 Release note: None. Co-authored-by: Andrew Werner <[email protected]>
This commit extends the engine interface to take a targetSize parameter in
the ExportToSst method. The iteration will stope if the first version of a key
to be added to the SST would lead to targetSize being exceeded. If
exportAllRevisions is false, the targetSize will not be exceeded unless the
first kv pair exceeds it.
This commit additionally fixes a bug in the rocksdb implementation of
DBExportToSst whereby the first key in the export request would be skipped.
This case likely never occurred because the key passed to Export was rarely
exactly the first key to be included (see the change related to seek_key in
db.cc).
The exportccl.TestRandomKeyAndTimestampExport was extended to excercise various
targetSize limits. That test run under stress with the tee engine inspires some
confidence and did catch the above mentioned bug. More testing would likely be
good.
This commit leaves the task of adopting the targetSize parameter for later.
Fixes #39717.
Release note: None