-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gc: paginate GC of local key ranges #45444
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One approach for 2 is to scale the timeout by the range size. We could conservatively estimate that we can scan and gc 10 MB/s or something like that.
Have any ideas for 1?
Reviewed 1 of 1 files at r1, 3 of 3 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained
re:2, I like your suggestion but I was also thinking that we should process the abort span first, at least if it's large (but then why not always). re 1), at least heuristically we could just say that "if sysbytes > 64mb then run GC". This backfires if we can get >64mb of sysbytes for any other reason than a large gc'able abort span (or tons of old aborted txns). Tracking the abort span and txn record count in the stats should be doable, though, and given how much trouble we've had with them it's probably overdue: #27055 |
bors r=ajwerner |
Thanks for picking this up so quickly!
SGTM. I could see doing both.
It's probably worth giving the more complete solution a bit of time and only reverting to this heuristic if doing something better proves difficult. |
Build failed (retrying...) |
bors r=ajwerner |
bors r=ajwerner |
Partial (manual) backport of cockroachdb#45444. Release note: None
Partial (manual) backport of cockroachdb#45444. Release note: None
bors r=ajwerner one of these days |
This will come in handy in a future commit. Release note: None
Touches cockroachdb#40815. I'm not calling it fixed yet because there are two remaining issues: 1. GC will not trigger automatically based on a large abort span. This means that manual intervention is needed in such a case. 2. The local key ranges are processed after the user key ranges. Even if no user keys are GC'able, all of it has to be scanned regardless (to arrive at that conclusion). With a large enough range size, this scan could theoretically consume the entirety of the time budget. However, in practice we expect operators to be able to "fix" any range via a manual GC once this PR is merged. Follow-up work needs to be done to address 1) and 2) above. Release note (bug fix): Improved the ability of garbage collection process to make process through ranges exhibiting abnormally large numbers of transaction records and/or abort span entries.
b09485e
to
4c86748
Compare
bors r=ajwerner |
Build succeeded |
Partial (manual) backport of cockroachdb#45444. Release note: None
Partial (manual) backport of cockroachdb#45444. Release note: None
Touches #40815.
I'm not calling it fixed yet because there are two remaining issues:
means that manual intervention is needed in such a case.
Even if no user keys are GC'able, all of it has to be scanned
regardless (to arrive at that conclusion). With a large enough
range size, this scan could theoretically consume the entirety
of the time budget.
However, in practice we expect operators to be able to "fix" any
range via a manual GC once this PR is merged. Follow-up work needs
to be done to address 1) and 2) above.
Release note (bug fix): Improved the ability of garbage collection
process to make process through ranges exhibiting abnormally large
numbers of transaction records and/or abort span entries.