-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiregionccl: deflake the multi-region cold start test #103666
multiregionccl: deflake the multi-region cold start test #103666
Conversation
01a99cf
to
34a125b
Compare
62c1a72
to
9139b57
Compare
9139b57
to
04ab2fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job piecing together the all the history here and deflaking this!
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jeffswenson and @rafiss)
-- commits
line 25 at r1:
nit: "and are not required to achieve conformance as we proactively nudge replicas through the replicate queue after ... "
655c6b8
to
859b6dd
Compare
I ran the test 2500 times with no flakes. The core source of the flakes is the test depends on every range having an available replica in every region. Any amount of replica movement causes the test to fail because many regions only have a single replica and temporary unavailability causes the sql servers to pick a replica that is available, but unreachable due to a networking hook injected by the test. There are a few independent changes in this PR: 1. It includes the settings from cockroachdb#100320. The lease transfer setting improves test performance and disabling load based splitting helps avoid replica migration. 2. Based on the insight from cockroachdb#96334#issuecomment-1492610753 the test checks the span conformance report for all servers instead of only one of the servers. 3. The test no longer triggers scatters while waiting for the span config bounds to conform. The scatters increased test flakiness and are not required to achieve conformance as we proactively nudge replicas through the replicate queue after cockroachdb#100349. 4. WaitForFullReplication is called to ensure replication queues are empty and there are no pending range moves. Fixes cockroachdb#96334 Release Note: None
859b6dd
to
2cbd35e
Compare
bors r+ |
Thanks for the review! |
Build succeeded: |
I ran the test 2500 times with no flakes. The core source of the flakes
is the test depends on every range having an available replica in every
region. Any amount of replica movement causes the test to fail because
many regions only have a single replica and temporary unavailability
causes the sql servers to pick a replica that is available, but
unreachable due to a networking hook injected by the test.
There are a few independent changes in this PR:
improves test performance and disabling load based splitting helps
avoid replica migration.
checks the span conformance report for all servers instead of only
one of the servers.
config bounds to conform. The scatters increased test flakiness
and are not required to achieve conformance as we proactively nudge
replicas through the replicate queue after kvserver: nudge replicate queue on span config update #100349.
empty and there are no pending range moves.
Fixes #96334
Release Note: None