-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In the CutOver phase wait events up to lock timeout, the _del table lock to be not release, cause DB not writable. Introduced in #755 #992
Comments
link to #755 |
I suspect the sync.Once solution is incorrect and that this issue is true. |
Thanks for your reply, that there is way to avoid this problem? modify sync.Once , return the status value before executing the function. // if atomic.CompareAndSwapUint32(&o.done, 0, 1) { |
I'll be honest that I've last looked at this quite a while back and it will take me a while to make myself familiar with the logic again. Meanwhile consider |
I had same lock conflict issue when got
I set |
I met the same issue.
|
Can we use defer to drop oldTable in exceptional circumstances? like this
In my opinion, we should avoid managing resource in different thread, so we should deal exceptional circumstances in function AtomicCutOverMagicLock. |
@shlomi-noach Hello, In my product enviroment, this problem happens from time to time. However, due to the logic limitation of the service, service interruption cannot be accepted so we can't use -cut-over=two-step, and we are eager to solve this problem. Do you have time to review this repair idea? |
…to lock This PR contains a fix for the deadlock issue described in github#992 (which is also discussed here github#1171). The fix is introduced in the original repo here https://github.com/github/gh-ost/pull/1180/files. This PR contains a cherry-pick of that fix
when gh-ost appling binlog, pauses for a period of time, and then resumes appling binlog and row copy.
At this time, the row copy is completed and start the cutOver phase.
binlog apply delay, causes timeout waiting for events up to lock thread and then lock was not released on rollback, causing the DB to be locked and cannot be written.
gh-ost execution log :
the above sync.Once implementation logic, the one that is grabbed first holds the m.Lock() lock, and then another thread will wait until the previous thread After the execution is complete, change the o.done status from 0 to 1, and the subsequent thread can exit without having to execute.
In gh-ost during rollback operation,if the atomicCutOver() --> defer func() --> DropAtomicCutOverSentryTableIfExists() acquires the lock first, and the drop _del table will wait for the _del table lock, and the in the AtomicCutOverMagicLock() drop _del operation will Blocked,causes the DB to be unwritable more than 300s.
Thank you!
The text was updated successfully, but these errors were encountered: