-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: dynamically increase compaction concurrency during downloads #116633
Conversation
These still only get executed at the compaction concurrency limit but having more of them in the queue will allow us to utilize higher compaction concurrency if it is configured. Release note: none. Epic: none.
This adds a new method to the Engine API to increase or decrease the compaction concurrency by some amount. This is added along-side the existing API for setting the compaction concurrency to a specific amount, as it allows callers to increase or decrease concurrency without needing to know the current value. Release note: none. Epic: none.
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API doesn't work well with SetCompactionConcurrency
, it can error out and/or restore the incorrect value.
Should we maybe make this specific to download compactions? Like the compaction limit would be X concurrent non-download compactions + Y concurrent download compactions, and this API would just adjust Y.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @jbowens, and @msbutler)
I thought about this a bit which is why I also added the CAS loop checking that it isn't setting it <1. I think I'm OK with the sorta meh interaction though, at least until pebble exposes a separate download compaction limit and then we switch this to control that. It can't reduce to below 1 -- it'll fail and log if it were to attempt that, e.g. if you'd manually reduced compaction concurrency by more than it had upped it. And if you manually upped it to some bigger number, well this will reduce that big number by the amount it had automatically upped it before. Presumably when you upped it you picked a number including the fact it was temporarily doing extra (download) compactions?🤷 Also, I think there's already an implicit assumption that |
My inclination is to file something to separate the limits as you suggest so we come back to this later, but not let potential interactions with an internal undocumented built-in hold up iterating on online restore, especially since the interactions aren't that worrying? |
Sure. One thing I would do here is to cap the number to 1 instead of causing an error to the download (e.g. is something else sets the compaction limit lower) |
b74d996
to
66ebeba
Compare
done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we could unit test the loop that adjusts compaction concurrency, but since this is a prototype, I'm ok delaying the unit test until we develop a more principled method for adjusting concurrency.
I'm good with this as a stopgap for now. However I think one better longer-term solution would be to have a separate download compactions concurrency in Pebble (it could be defined as "additional download compaction concurrency"), which can then be used by downloads only and not by other compactions. Pebble already has a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please file an issue to keep track of this, ideally we would have a bucket for Online Restore issues related to making the feature as robust as possible.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @dt, and @jbowens)
pkg/storage/pebble.go
line 933 at r4 (raw file):
} // AdjustCompactionConcurrency adjusts the compaction concurrency up or down by
[nit] move/copy this comment to the interface (the comment there is not very useful)
Early in an online restore, a workload may not be able to utilize all CPU due to the high latency of accessing still-remote data meaning that many or most queries spend most of their time waiting rather than executing. During this phase, increasing the compaciton concurrency can expedite getting data downloaded, to reduce that latency and improve the workload performance. However later in the download phase, when more of the most accessed data has been downloaded, the workload itself may be able to execute fast enough that it requires the majority of available CPU. At this point, excessive CPU usage by download compactions will actually negatively impact the workload performance. Thus it is desirable to maximize compaction concurrency when CPU is available but reduce it when becomes scarce. This change introduces an additional goroutine to the download call that monitors CPU usage and adjusts compaction concurrency up and down based on the CPU usage being below 70% or above 80% respectively. Release note: none. Epic: none.
66ebeba
to
acff1f2
Compare
TFTRs! bors r+ |
Build succeeded: |
Early in an online restore, a workload may not be able to utilize
all CPU due to the high latency of accessing still-remote data meaning
that many or most queries spend most of their time waiting rather than
executing. During this phase, increasing the compaciton concurrency can
expedite getting data downloaded, to reduce that latency and improve the
workload performance.
However later in the download phase, when more of the most accessed data
has been downloaded, the workload itself may be able to execute fast enough
that it requires the majority of available CPU. At this point, excessive
CPU usage by download compactions will actually negatively impact the workload
performance.
Thus it is desirable to maximize compaction concurrency when CPU is available
but reduce it when becomes scarce.
This change introduces an additional goroutine to the download call that
monitors CPU usage and adjusts compaction concurrency up and down based on
the CPU usage being below 70% or above 80% respectively.
Release note: none.
Epic: none.