-
Notifications
You must be signed in to change notification settings - Fork 188
shardddl/optimistic: warn add not fully dropped columns #1510
Conversation
…nto warnDropAddColumn
there're two failed test cases, seems one is tidb again changed the |
…nto warnDropAddColumn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(review later)
dm/master/shardddl/optimist.go
Outdated
// construct locks based on the shard DDL info. | ||
for task, ifTask := range ifm { | ||
o.lk.SetColumnMap(colm) | ||
defer o.lk.SetColumnMap(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I guess this does not effect correctness? OK it may release some memory)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this will affect correctness. We only use this column map when we recover lock tables.
@@ -291,6 +305,11 @@ func (o *Optimist) recoverLocks( | |||
} | |||
if op.Done { | |||
lock.TryMarkDone(op.Source, op.UpSchema, op.UpTable) | |||
err := lock.DeleteColumnsByDDLs(op.DDLs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hope we could left more comment, I think I might forget the logic somedays later
if DM-master sent a DROP COLUMN DDL, all shard tables had dropped that column and got synced. So we delete it from paritially dropped columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added under this function's definition.
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by writing |
/hold |
/unhold |
@GMHDBJD PTAL again, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
maybe we should add
in master/optimism_test.go |
@GMHDBJD Did this unit test fail because of "etcd race"? |
I think so. |
Okay. I will take a look and fix this. |
/lgtm |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: 213e07f
|
Signed-off-by: ti-srebot <[email protected]>
cherry pick to release-2.0 in PR #1537 |
What problem does this PR solve?
When users adds a column that isn't fully dropped in the downstream database, it will cause a data in-consistent problem.
dm-master will fill to synchronize optimistic shard DDLs after a restart or leader transfer.
What is changed and how it works?
When we drop a column, we save this column's name in etcd. When it's fully dropped in downstream database and we receive a
Done
DDL group from dm-worker, we can delete this column's name from the map and the etcd.Add source, upstreamSchema/upstreamTable to avoid columns affect others.
Time series: (+a/-a means add/drop column a)
If we added not fully dropped column c for [tb1], tb2 may report an error if we don't identify tb1 and tb2.
Fix the bug that dm-master doesn't have complete lock infomation if we meet an error when we recovering locks.
https://github.com/pingcap/dm/pull/1510/files#diff-265ec2690b9027cfdc9cb7f36dda664d0337e6a1b0149e2e058720aee099336eL273-L275
If we stopped recovering locks once we meet an error, dm-master leader may not have the full information for the other normal locks, which will cause an error in dm-worker.
After this PR, we never stop recovering locks even if we met an error in handling locks.
Fix the bug that dm-worker may get blocked if dm-master restarted before dm-worker received the optimistic shard ddl operation (after dm-worker put the optimistic shard ddl info).
Time series:
a. dm-worker marked done a operation before (with the help of dm-master)
b. dm-worker received ddl
c. dm-worker putted ddl info successfully
d. dm-master restarted before giving the ddl operation to dm-worker
e. dm-master restarted successfully but didn't put operation because this operation is marked as done.
f. dm-worker didn't receive the operation, will get blocked forever.
CASE:
tb1: a info: (DDLs: [add column c], tiBefore: [create table tb1 a int])
tb2: a b c info: (DDLs: [drop column c], tiBefore: [create table tb1 a int, b int, c int]) operation: done
Now dm-master will use tb2's schema as joined, and use joined for newly added table (tb1). If we recover info for tb2 at first, we will set tb1 table info as tb2's, which will cause dm-master to make the wrong decision (do nothing for
add column c
). Then we will fail in the downstream.Check List
Tests
Code changes
Side effects
Related changes