-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: use size-carrying point tombstones #104539
Conversation
Thank you for updating your pull request. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
glad to be a part of the community |
d27cb4c
to
607b6b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 13 of 15 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jbowens, @RaduBerinde, @renatolabs, and @srosenberg)
pkg/storage/engine.go
line 667 at r3 (raw file):
// // It is safe to modify the contents of the arguments after it returns. ClearEngineKey(key EngineKey) error
what about this case which can be used for intents?
I suppose we are only doing MVCC keys (and not SINGLEDELSIZED too, which is the common case for intents) since this estimation problem mainly manifests when doing large (bursty) GCs and not the relatively evenly spaced intent resolution?
pkg/storage/mvcc.go
line 2034 at r3 (raw file):
if err := writer.ClearMVCC(oldVersionKey, ClearOptions{ ValueSizeKnown: curProvValRaw != nil, ValueSize: uint32(len(curProvValRaw)),
How will we know if we got the size wrong: the code in mvcc.go
is quite complex -- I did go through these changes carefully, but staring at the complex conditional paths in mvcc.go always leaves me with a feeling that things are correct because they gets exercised by all the tests.
Should we be adding a counter inside Pebble of wrong sizes it encountered, and after each MVCC test do a full manual compaction and check that the counter is 0?
pkg/storage/batch_test.go
line 864 at r3 (raw file):
defer log.Scope(t).Close(t) e, err := Open(context.Background(), InMemory(),
Is it easy to have a simple unit test to check that the plumbing is working? It's great that we have the roachtest, but it would be good to detect any future plumbing bugs with unit tests.
This commit performs a small refactor of the batch reading interface exposed by the storage package. It removes 'Pebble' from a few names, removes the duplicated key kinds (Pebble exports these now), and adds assertions to ensure new key kinds trigger compilation failures. This motivated by cockroachdb#104539, in which I never even considered the batch reader or how it might need to change. Epic: none Release note: None
104753: storage: remove duplicated key kind constants r=jbowens a=jbowens This commit performs a small refactor of the batch reading interface exposed by the storage package. It removes 'Pebble' from a few names, removes the duplicated key kinds (Pebble exports these now), and adds assertions to ensure new key kinds trigger compilation failures. This motivated by #104539, in which I never even considered the batch reader or how it might need to change. Epic: none Release note: None Co-authored-by: Jackson Owens <[email protected]>
1a29767
to
50528d8
Compare
cf7c148
to
5012c4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde, @renatolabs, @srosenberg, and @sumeerbhola)
pkg/storage/engine.go
line 667 at r3 (raw file):
Previously, sumeerbhola wrote…
what about this case which can be used for intents?
I suppose we are only doing MVCC keys (and not SINGLEDELSIZED too, which is the common case for intents) since this estimation problem mainly manifests when doing large (bursty) GCs and not the relatively evenly spaced intent resolution?
hrm, I'm not sure. to be honest, this wasn't a principled exclusion; I was just focused on MVCC because I knew we always should know the key/val size due to the requirement of updating MVCC stats. I think it would be reasonable to extend it further, even to SINGLEDELSIZED, but I don't have a great sense of how frequent the estimation problem manifests there.
I've expanded the interface here so that it can be used at least in all the non-SINGLEDEL
instances.
pkg/storage/mvcc.go
line 2034 at r3 (raw file):
Previously, sumeerbhola wrote…
How will we know if we got the size wrong: the code in
mvcc.go
is quite complex -- I did go through these changes carefully, but staring at the complex conditional paths in mvcc.go always leaves me with a feeling that things are correct because they gets exercised by all the tests.Should we be adding a counter inside Pebble of wrong sizes it encountered, and after each MVCC test do a full manual compaction and check that the counter is 0?
I did this, manually sprinkling the manual compact throughout MVCC tests. I put the assert in the ForTesting
ConfigOption
.
pkg/storage/batch_test.go
line 864 at r3 (raw file):
Previously, sumeerbhola wrote…
Is it easy to have a simple unit test to check that the plumbing is working? It's great that we have the roachtest, but it would be good to detect any future plumbing bugs with unit tests.
Done.
2c8f3ac
to
08dea42
Compare
Adapt calls to ClearMVCC, ClearUnversioned, ClearEngineKey and ClearIntent to write a new kind of point tombstone exposed by Pebble that carries the size of the deleted kv pair. These sizes are used by Pebble to produce more accurate estimations of space amplification within the database and pick more productive compactions. In point-tombstone/hetereogeneous-value-sizes roachtest runs, the kv database's approximate disk size is reduced to 2.3 GiB by the end of the run. Epic: CRDB-25405 Release note (performance improvement): Improve the time to disk space reclamation when deleting rows. Previously, in scenarios where rows had large variations in row size, it was possible for disk space to not be reclaimed after MVCC garbage collection deleted the rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 11 files at r4, 15 of 18 files at r5, 5 of 5 files at r6, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde, @renatolabs, @srosenberg, and @sumeerbhola)
TFTRs! bors r+ |
Build succeeded: |
Adapt calls to ClearMVCC, ClearUnversioned, ClearEngineKey and ClearIntent to
write a new kind of point tombstone exposed by Pebble that carries the size of
the deleted kv pair. These sizes are used by Pebble to produce more accurate
estimations of space amplification within the database and pick more productive
compactions.
In point-tombstone/hetereogeneous-value-sizes roachtest runs, the kv database's
approximate disk size is reduced to 2.3 GiB by the end of the run.
Epic: CRDB-25405
Release note (performance improvement): Improve the time to disk space
reclamation when deleting rows. Previously, in scenarios where rows had large
variations in row size, it was possible for disk space to not be reclaimed
after MVCC garbage collection deleted the rows.