-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VReplication: retry in WaitForPos when read of pos is killed off by deadlock detector #10621
Conversation
…eadlock detector Signed-off-by: Matt Lord <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
30cb252
to
f27c0ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you for investigating this!
One thing, since calls to dbClient.ExecuteFetch
can go into loop, is that we must now check ctx.Done()
or ctx.Err()
before calling qr, err := dbClient.ExecuteFetch(binlogplayer.ReadVReplicationStatus(uint32(id)), 10)
. Which requires some delicate refactoring because we then cannot populate the error message with data from qr
.
Good point! If we continually hit the deadlock error then it's possible that we won't get to the |
Signed-off-by: Matt Lord <[email protected]>
@shlomi-noach done here: cb01ba8 Thanks again! |
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Well, the DCO check flakiness has forced me to run the CI tests quite a few times. Good news is that it has demonstrated that the tests using MySQL 8.0 are now very stable! 🙂They would often need to be run 2 or 3 times to pass, now they're passing on the first try. 🥳 |
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Description
For whatever reason(s), we've seen the deadlock detector fire more frequently when using MySQL 8.0 in our CI tests. This has, in particular, hit our VReplication related tests where there is very high contention on a small number of
_vt.vreplication
records. The record(s) in the table are getting read by different connections (byid
, which is the Primary Key) while also getting constantly updated in other connections to update the progress:pos
,rows_copied
,heartbeat
, etc.This has led to the VReplication related tests that use MySQL 8.0 to be flaky, in particular:
In this PR we retry the read of the
_vt.vreplication.pos
field if we get a deadlock detected error when waiting for a tablet to reach a replication position, until we hit the context timeout.Related Issue(s)
Fixes: #10590
Related-to: #10620
Checklist