-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cobra style surgery clear-page-elements
command
#417
Conversation
Added a couple of test cases, ready for review. Steps to fix the corrupted db file in #402 are as below.
|
@ptabor could you please let me finish this PR? This should be the last surgery series command. At least, no plan to add more in near future. |
@ahrtr Does it make more sense this command to take keys instead of integer indices? Couple of questions come into my mind when I see integers:
If it takes list of keys it'll be easier to integrate with another CLI tools.
|
Sorry if I look obsessed with the naming but from the |
Resolved all comments, thx. PTAL. cc @ptabor and @tjungblu to take a look as well.
I tend to keep using indices. The keys are not always visible / printable strings. It also needs deep understanding on both boltDB and business data to perform the surgery commands. Users must do very deep analysis before performing the surgery operations. The indices start from 0. I updated the flag usages.
The naming is discussed & agreed between me and @ptabor . We have two commands: |
lgtm (non-binding). Also happy to help to move everything over to Cobra. Shall we add some "warranty" warning to the help text? I don't expect average users to go and fix their bbolt files and know what they do with the surgery commands. For higher level commands like proposed earlier:
it kinda makes sense, but then we could directly offer a "bbolt repair" command? |
Makes sense to me. We can enhance it later & separately.
We can discuss this separately. Please also see my comment #417 (comment). |
@ahrtr No |
internal/surgeon/surgeon.go
Outdated
p.SetCount(uint16(start)) | ||
p.SetOverflow(0) | ||
if preOverflow != 0 || p.IsBranchPage() { | ||
if err := clearFreelist(path); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting philosophical tradeoff.
I'm on the side, that surgery by default should be really minimal.
- I assume that 'leaking' (i.e. getting unreferenced) pages is not critical for correctness.
-The cost of running etcd post freelist truncation might be significant.
So I would:
- add a flag (to the method) whether to this method whether freelist should be truncated.
- If the flag is false, the method should log Warning that following number of pages were potentially 'abandoned'.
- Have separate command to 'surgery abandon-freelist'
- [optional] Have a flag
surgery clear-page-elements ... --abandon-freelist
that might trigger both together. But as usage of this tools should be infrequent, I wouldn't optimize that aspect of user's experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to do something on the freelist, otherwise some pages are neither reachable nor in the freelist, eventually it can't pass the bbolt check
.
- either cleanup the freelist, so bbolt will scan the whole db to read all the freelist again. The bad side is reduce of performance on next startup. The good side is the implementation is simple.
- Or update its parent page, all the way to the root if needed. Obviously it complicates the implementation.
$ ./bbolt page ./db 36
Page ID: 36
Page Type: branch
Total Size: 4096 bytes
Overflow pages: 0
Item Count: 30
"0000": <pgid=4>
"0067": <pgid=2>
"0134": <pgid=3>
"10001": <pgid=9>
"10066": <pgid=10>
"10131": <pgid=5>
"10196": <pgid=6>
"20061": <pgid=7>
"20126": <pgid=11>
"20191": <pgid=12>
"30056": <pgid=13>
"30121": <pgid=8>
"30186": <pgid=15>
"40051": <pgid=16>
"40116": <pgid=14>
"40181": <pgid=18>
"50046": <pgid=19>
"50111": <pgid=17>
"50176": <pgid=21>
"60041": <pgid=22>
"60106": <pgid=20>
"60171": <pgid=24>
"70036": <pgid=25>
"70101": <pgid=23>
"70166": <pgid=27>
"80031": <pgid=28>
"80096": <pgid=26>
"80161": <pgid=30>
"90026": <pgid=31>
"90091": <pgid=32>
$ ./bbolt surgery clear-page-elements ./db --output ./new.db --pageId 36 --from 2 --to 3
All elements in [2, 3) in page 36 were cleared
$ ./bbolt check ./new.db
page 3: unreachable unfreed
1 errors found
invalid value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or update its parent page, all the way to the root if needed. Obviously it complicates the implementation.
I didn't get how it would help. I assume we are loosing OVERFLOW
$ ./bbolt surgery clear-page-elements ./db --output ./new.db --pageId 36 --from 2 --to 3
WARNING: The clearing has abandoned 1 page that is not yet referenced from free list.
Consider `./bbolt surgery abandon-freelist ...`.
All elements with indexes in [2, 3) in page 36 were cleared
$./bbolt surgery abandon-freelist ./new.db --output ./new.db2
$ ./bbolt check ./new.db2
check OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or update its parent page, all the way to the root if needed. Obviously it complicates the implementation.
I didn't get how it would help. I assume we are loosing OVERFLOW
Sorry for the confusion. I wanted to say: If it's clearing branch page items, then we need to traverse all sub pages referenced by the cleared items, so as to figure out the pages which should be added into the freelist.
$ ./bbolt surgery clear-page-elements ./db --output ./new.db --pageId 36 --from 2 --to 3
WARNING: The clearing has abandoned 1 page that is not yet referenced from free list.
Consider./bbolt surgery abandon-freelist ...
.
All elements with indexes in [2, 3) in page 36 were cleared
If I understand it correctly, you are suggesting two things:
- Add one more command
bbolt surgery abandon-freelist
; - Users must manually & explicitly execute the
surgery abandon-freelist
; we shouldn't do it automatically for users?
Generally I agree with this. Just with two minor comments:
- The page number may not be 1, so warning message will be something like "WARNING: The clearing has abandoned some page(s) that are not yet referenced from free list.";
- If no need to abandon the freelist, then do not print the warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Let me finish the bbolt surgery abandon-freelist
in a separate following PR.
Clear a branch element,
$ ./bbolt surgery clear-page-elements ./local-kv.db --output new.db --pageId 34 --from-index 1 --to-index 2
WARNING: The clearing has abandoned some pages that are not yet referenced from free list.
Please consider executing `./bbolt surgery abandon-freelist ...`
All elements in [1, 2) in page 34 were cleared
clear a leaf element,
$ ./bbolt surgery clear-page-elements ./local-kv.db --output new1.db --pageId 37 --from-index 1 --to-index 2
All elements in [1, 2) in page 37 were cleared
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ptabor PTAL, thx
I rebased this PR, but please only take a look at the last commit, in which I addressed your latest comments.
@ptabor Resolved all your comments (thanks). Could you take another look? |
ping @ptabor @serathius |
Signed-off-by: Benjamin Wang <[email protected]>
Signed-off-by: Benjamin Wang <[email protected]>
Signed-off-by: Benjamin Wang <[email protected]>
…ould have the same content Signed-off-by: Benjamin Wang <[email protected]>
Signed-off-by: Benjamin Wang <[email protected]>
…page Signed-off-by: Benjamin Wang <[email protected]>
Also resolved a bug related to overflow page. Signed-off-by: Benjamin Wang <[email protected]>
…ndon freelist Signed-off-by: Benjamin Wang <[email protected]>
Thank you @ptabor |
Linked to #370. Please read #370 (comment)
It can be used to fix the corrupted db file in #402.