-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: hotspots splits test is flaky #25036
Comments
@nvanbenschoten any idea how this happens? Is this perhaps because the blobs we're writing are so large that we can overshoot by 4mb? |
Yeah, that may be part of it, but I think the bigger issue is that the backpressure mechanism is only a soft upper bound at this point. We won't currently delay/reject a query if a split is not in-flight, and it's possible that if the system is so overloaded that the splitQueue can't keep up with the load then splits might not be in-flight on all ranges at the same time. I was hoping that this test was tuned to avoid entering this performance regime, but I guess it isn't. Part of the proposed fix for #24215 is to turn this into a hard upper bound at some larger range size. Once we hit that size, we'll start rejecting all clients instead of just slowing them down. That should help, at which point we can be more confident in this test and will only need to worry about the small overshoot you mentioned. How flaky is this? We might want to skip it until the hard upper bound is introduced. |
It looks pretty flaky, I see two failures for today. |
See cockroachdb#25036. Also introduce test skipping. Release note: None
See cockroachdb#25036. Also introduce test skipping. Release note: None
25059: roachtest: skip hotspot workload r=benesch,nvanbenschoten a=tschottdorf See #25036. Also introduce test skipping. Release note: None Co-authored-by: Tobias Schottdorf <[email protected]>
Fixes cockroachdb#25036. This fixes the flakes in the test. They have passed 5 straight times for me. It's also easy to prove that backpressure is necessary for the test to pass, just run the following and it quickly fails: ``` set cluster setting kv.range.backpressure_range_size_multiplier=0; ``` Release note: None
Fixes cockroachdb#25036. This fixes the flakes in the test. They have passed 5 straight times for me. It's also easy to prove that backpressure is necessary for the test to pass, just run the following and it quickly fails: ``` set cluster setting kv.range.backpressure_range_size_multiplier=0; ``` Release note: None
25261: storage: harden replica write backpressure mechanism r=nvanbenschoten a=nvanbenschoten Fixes #24215. Fixes #25036. Fixes #24974. Fixes #25141. Fixes #25142. This PR includes a change to harden the replica's write backpressure mechanism. We now reject backpressured write when a range is unsplittable. This is important for placing an upper bound on the size of a range when it consists of just a single row (#24215). The change also fixes a few flaky tests that resulted from backpressure not being restrictive enough. 25539: storage: put an Aborted txn inside a TxnAbortedError r=andreimatei a=andreimatei TxnAbortedErrors have an updated transaction it them. It makes sense for that transaction to be marked as Aborted, instead of Pending. Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Andrei Matei <[email protected]>
https://teamcity.cockroachdb.com/viewLog.html?buildId=620631&buildTypeId=Cockroach_Nightlies_WorkloadNightly
The text was updated successfully, but these errors were encountered: