-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blockstore: atomize slot clearing, relax parent slot meta check #35124
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #35124 +/- ##
========================================
Coverage 81.7% 81.8%
========================================
Files 834 834
Lines 224299 224828 +529
========================================
+ Hits 183361 183923 +562
+ Misses 40938 40905 -33 |
e088ed3
to
6290f67
Compare
6290f67
to
ba189b1
Compare
reworked as per #35124 (comment) |
Backports to the stable branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. |
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
if let Some(mut slot_meta) = slot_meta { | ||
let child_slot = slot_meta.slot; | ||
if child_slot != from_slot || child_slot != to_slot { | ||
error!("Slot meta parent cleanup was requested for {}, but a range was specified {} {}", child_slot, from_slot, to_slot); | ||
return Err(BlockstoreError::InvalidRangeForSlotMetaCleanup); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to do this sanity checking if we fetch the SlotMeta
in this function ourselves instead of relying on the caller
Technically, this would add some overhead since we'd be fetching the SlotMeta
twice, once here and once in Blockstore::clear_unconfirmed_slot()
. I think that's ok, but if we're conscious about that, I think we could remove the SlotMeta
check in Blockstore::clear_unconfirmed_slot()
anyways. The rocksdb range deletes are cheap, and this function should only be getting called if ReplayStage
knows it has something that it needs to delete.
That being said, fetching a single SlotMeta
is cheap too and if we're calling this function, stuff is kind of messed up already (ie the node is likely going to have to repair the slot)
601f68c
to
45e292d
Compare
In the interest of keeping the backport surface small, i've added a new In a future master PR I can unify the remaining |
1b73953
to
6b496a5
Compare
@@ -273,6 +264,72 @@ impl Blockstore { | |||
Ok(columns_purged) | |||
} | |||
|
|||
fn purge_range(&self, write_batch: &mut WriteBatch, from_slot: Slot, to_slot: Slot) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to include purge_special_columns_exact()
in here?
fn purge_range() {}
fn purge_range_special_columns(){}
fn do_purge_range(&self, write_batch: &mut WriteBatch, from_slot: Slot, to_slot: Slot, should_purge_special_columns: bool)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, i just passed through PurgeType
instead of separate functions.
6b496a5
to
d8ac92d
Compare
d8ac92d
to
348b918
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits, liking the way this is coming together. Will take a final pass shortly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more nits on comments + simplification. Think we're good for a ship-it after these changes; I can give that quickly after you make the changes but think it'd be nice to get a final pass & approval from Carl too
clear_unconfirmed_slot can leave blockstore in an irrecoverable state if it panics in the middle. write batch this function, so that any errors can be recovered after restart. additionally relax the constraint that the parent slot meta must exist, as it could have been cleaned up if outdated.
891b514
to
4a95caa
Compare
Co-authored-by: steviez <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for pushing through after we collectively changed our mind several times. Just to be sure, let's see if we can get @carllin to give one more pass too
write_batch: &mut WriteBatch, | ||
from_slot: Slot, | ||
to_slot: Slot, | ||
purge_type: PurgeType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it's cleaner encapsulation to pass a should_purge_special_columns
boolean here, and then have a function PurgeType::should_should_purge_special_columns() -> bool
method on PurgeType
. This way purge_slot_cleanup_chaining()
doesn't have to know about PurgeType
at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way purge_slot_cleanup_chaining() doesn't have to know about PurgeType at all
I see the merit of adding the new function purge_slot_cleanup_chaining()
as this has the unique functionality from run_purge_with_stats
. However, I'm not sure I see the benefit of the boolean over the enum; can you elaborate?
The enum is part of the public API, so I think it is reasonable to expect someone to know about it. And the Exact
value of the enum means "go purge the special columns for these slots right now", which is what the boolean would convey.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It seems like the
PurgeType
was designed specifically for therun_purge_with_stats
workflow, so it seemed better to keep it isolated there rather than mixing it with this new clean function purge_slot_cleanup_chaining
as a utility function seems like it should just decide which columns to clean. Seems like cramming thePurgeType
in there was too high level.
* blockstore: atomize slot clearing, relax parent slot meta check clear_unconfirmed_slot can leave blockstore in an irrecoverable state if it panics in the middle. write batch this function, so that any errors can be recovered after restart. additionally relax the constraint that the parent slot meta must exist, as it could have been cleaned up if outdated. * pr feedback: use PurgeType, don't pass slot_meta * pr feedback: add unit test * pr feedback: refactor into separate function * pr feedback: add special columns to helper, err msg, comments * pr feedback: reword comments and write batch error message * pr feedback: bubble write_batch error to caller * pr feedback: reword comments Co-authored-by: steviez <[email protected]> --------- Co-authored-by: steviez <[email protected]> (cherry picked from commit cc4072b) # Conflicts: # ledger/src/blockstore.rs # ledger/src/blockstore/blockstore_purge.rs
* blockstore: atomize slot clearing, relax parent slot meta check clear_unconfirmed_slot can leave blockstore in an irrecoverable state if it panics in the middle. write batch this function, so that any errors can be recovered after restart. additionally relax the constraint that the parent slot meta must exist, as it could have been cleaned up if outdated. * pr feedback: use PurgeType, don't pass slot_meta * pr feedback: add unit test * pr feedback: refactor into separate function * pr feedback: add special columns to helper, err msg, comments * pr feedback: reword comments and write batch error message * pr feedback: bubble write_batch error to caller * pr feedback: reword comments Co-authored-by: steviez <[email protected]> --------- Co-authored-by: steviez <[email protected]> (cherry picked from commit cc4072b)
…na-labs#35124) * blockstore: atomize slot clearing, relax parent slot meta check clear_unconfirmed_slot can leave blockstore in an irrecoverable state if it panics in the middle. write batch this function, so that any errors can be recovered after restart. additionally relax the constraint that the parent slot meta must exist, as it could have been cleaned up if outdated. * pr feedback: use PurgeType, don't pass slot_meta * pr feedback: add unit test * pr feedback: refactor into separate function * pr feedback: add special columns to helper, err msg, comments * pr feedback: reword comments and write batch error message * pr feedback: bubble write_batch error to caller * pr feedback: reword comments Co-authored-by: steviez <[email protected]> --------- Co-authored-by: steviez <[email protected]>
clear_unconfirmed_slot can leave blockstore in an irrecoverable state
if it panics in the middle. write batch this function, so that any
errors can be recovered after restart.