-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
admission: async compaction cause disk bandwidth spikes #132332
Labels
A-admission-control
branch-master
Failures and bugs on the master branch.
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
P-1
Issues/test failures with a fix SLA of 1 month
T-admission-control
Admission Control
Comments
aadityasondhi
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
A-admission-control
T-admission-control
Admission Control
labels
Oct 10, 2024
Hi @aadityasondhi, please add branch-* labels to identify which branch(es) this C-bug affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
aadityasondhi
added
branch-master
Failures and bugs on the master branch.
P-1
Issues/test failures with a fix SLA of 1 month
labels
Oct 10, 2024
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Nov 7, 2024
Previously, we would ignore errors in disk token accouting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwdith for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occured. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Nov 7, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Nov 7, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Nov 8, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Nov 8, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Nov 8, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Nov 8, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Nov 13, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Nov 18, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Nov 18, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Nov 18, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Nov 19, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Nov 21, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Nov 21, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Dec 12, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Dec 12, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Dec 12, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Dec 12, 2024
Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes cockroachdb#132332. Release note: None
craig bot
pushed a commit
that referenced
this issue
Dec 16, 2024
133310: admission: error accounting for disk reads and writes r=sumeerbhola a=aadityasondhi Previously, we would ignore errors in disk token accounting hoping that the next adjustment interval would capture them. This caused issues when async compactions ran within an adjustment interval. These could spike the bandwidth for a section of the 15 second interval without moving the average too much (write-amp estimation). This patch introduces a higher frequency error accounting mechanism. It runs every 1s to account for any additional reads and writes that could have occurred. Errors are only deducted if the read and/or write bandwidth in the interval is greater than the number of tokens already deducted in that interval. Fixes #132332. Release note: None Co-authored-by: Aaditya Sondhi <[email protected]>
aadityasondhi
added a commit
to aadityasondhi/cockroach
that referenced
this issue
Dec 17, 2024
With cockroachdb#132332 resolved, we can turn this test back on. Informs: cockroachdb#132332. Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-admission-control
branch-master
Failures and bugs on the master branch.
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
P-1
Issues/test failures with a fix SLA of 1 month
T-admission-control
Admission Control
We noticed in #131484 (comment) that async compactions in pebble cause spikes in disk bandwidth, ultimately leading to the assertion failure since the disk bandwidth limiter cannot account for these reads and writes until after the 15s adjustment interval.
We can attribute these writes as error term in our write amp linear model and should account for them at a higher frequency. We discussed the implementation of such accounting internally.
Jira issue: CRDB-42982
Epic CRDB-37479
The text was updated successfully, but these errors were encountered: