-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intentresolver: fix testrace flake by extending timeouts #35085
intentresolver: fix testrace flake by extending timeouts #35085
Conversation
4bcf67e
to
aa1c66f
Compare
in your CI:
|
// intentResolutionBatchWait is used to configure the RequestBatcher which | ||
// batches intent resolution requests across transactions. Intent resolution | ||
// needs to occur in a relatively short period of time after the completion | ||
// of a transaction in order to minimize the contention footprint of the write | ||
// for other contending reads or writes. The chosen value was selected based | ||
// on some light experimentation to ensure that performance does not degrade | ||
// in the face of highly contended workloads. | ||
// in the face of highly contended workloads. This setting is a var rather |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would greatly prefer if you added that to the intentResolver
itself (unexported) instead so that your test can manipulate it directly. The pattern here is kind of dirty, and while sometimes it's the right thing to do just to "get things done" it seems that here you have everything in place to avoid it. I think your before
would take the *intentResolver
and after
would go away.
See cockroachdb#35085. Release note: None
35089: intentresolver: skip TestCleanupIntents r=knz a=tbg See #35085. (the PR is in flight, but this seems pretty flaky) Release note: None Co-authored-by: Tobias Schottdorf <[email protected]>
0dde36f
to
3807721
Compare
TFTR! I shouldn't push PRs just before going to bed. Firstly I forgot to git add the change to fix the panic and secondly yes the injection was sloppy. Given that the setting is passed along to the requestbatcher which I wasn't eager to fully re-create inside of the before func. I instead exposed the parameters for batching timeout which also adds symmetry to the gc batcher. Hopefully this feels a little bit better. |
3807721
to
9585216
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I shouldn't push PRs just before going to bed
It also never works out for me when I do it.
I also agree with your comment in the commit message that if this becomes flaky again, the test should be weakened as opposed to adding hacks. But hope this just works in eternity.
Reviewed 2 of 2 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)
bors r+ |
Build failed (retrying...) |
Merge conflict (retrying...) |
The unit tests which failed assumed that all messages intents in a single call to ResolveIntents could be queued within the batch timeout and idle window which is by default rather short (10 and 5 ms respectively). The testing validation logic is rather strict and assumes that the batch will fire due to size rather than time constraints. The fix is to allow the test to increase these values to make the test more robust to load. Before this change the flake was reproducible running testrace within several hundred iterations. Running testrace now seems to not provoke any failures. All that being said, I can see an argument that the testing logic should be made less rigid and should accept that the batches may be split up. Fixes cockroachdb#35064. Release note: None
9585216
to
7fdc815
Compare
Canceled |
bors r+ |
35085: intentresolver: fix testrace flake by extending timeouts r=ajwerner a=ajwerner The unit tests which failed assumed that all messages intents in a single call to ResolveIntents could be queued within the batch timeout and idle window which is by default rather short (10 and 5 ms respectively). The testing validation logic is rather strict and assumes that the batch will fire due to size rather than time constraints. The fix is to allow the test to increase these values to make the test more robust to load. Before this change the flake was reproducible running testrace within several hundred iterations. After the test it has not failed after running over 20k test race iterations from two concurrently running executions. Fixes #35064. Release note: None Co-authored-by: Andrew Werner <[email protected]>
Build succeeded |
The unit tests which failed assumed that all messages intents in a single call
to ResolveIntents could be queued within the batch timeout and idle window which
is by default rather short (10 and 5 ms respectively). The testing validation
logic is rather strict and assumes that the batch will fire due to size rather
than time constraints. The fix is to allow the test to increase these values to
make the test more robust to load. Before this change the flake was reproducible
running testrace within several hundred iterations. After the test it has not
failed after running over 20k test race iterations from two concurrently running
executions.
Fixes #35064.
Release note: None