Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

shardddl/optimistic: warn add not fully dropped columns #1510

Merged
merged 34 commits into from
Mar 26, 2021

Conversation

lichunzhu
Copy link
Contributor

@lichunzhu lichunzhu commented Mar 15, 2021

What problem does this PR solve?

When users adds a column that isn't fully dropped in the downstream database, it will cause a data in-consistent problem.
dm-master will fill to synchronize optimistic shard DDLs after a restart or leader transfer.

What is changed and how it works?

  1. Report an error to the user when they want to add a not fully dropped column.
    When we drop a column, we save this column's name in etcd. When it's fully dropped in downstream database and we receive a Done DDL group from dm-worker, we can delete this column's name from the map and the etcd.
    Add source, upstreamSchema/upstreamTable to avoid columns affect others.

Time series: (+a/-a means add/drop column a)

              older ----------------> newer
  tb1: +a +b +c                  -c
  tb2:                                       +a +b +c
  tb3:                 +a +b +c

If we added not fully dropped column c for [tb1], tb2 may report an error if we don't identify tb1 and tb2.

  1. Fix the bug that dm-master doesn't have complete lock infomation if we meet an error when we recovering locks.
    https://github.com/pingcap/dm/pull/1510/files#diff-265ec2690b9027cfdc9cb7f36dda664d0337e6a1b0149e2e058720aee099336eL273-L275
    If we stopped recovering locks once we meet an error, dm-master leader may not have the full information for the other normal locks, which will cause an error in dm-worker.
    After this PR, we never stop recovering locks even if we met an error in handling locks.

  2. Fix the bug that dm-worker may get blocked if dm-master restarted before dm-worker received the optimistic shard ddl operation (after dm-worker put the optimistic shard ddl info).

Time series:
a. dm-worker marked done a operation before (with the help of dm-master)
b. dm-worker received ddl
c. dm-worker putted ddl info successfully
d. dm-master restarted before giving the ddl operation to dm-worker
e. dm-master restarted successfully but didn't put operation because this operation is marked as done.
f. dm-worker didn't receive the operation, will get blocked forever.

  1. (Not fixed yet) dm-master can't return correct DDLs when dm-master rebuild locks after a leader transfer or a restart.

CASE:

tb1: a info: (DDLs: [add column c], tiBefore: [create table tb1 a int])
tb2: a b c info: (DDLs: [drop column c], tiBefore: [create table tb1 a int, b int, c int]) operation: done

Now dm-master will use tb2's schema as joined, and use joined for newly added table (tb1). If we recover info for tb2 at first, we will set tb1 table info as tb2's, which will cause dm-master to make the wrong decision (do nothing for add column c). Then we will fail in the downstream.

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • Has exported function/method change
  • Has exported variable/fields change
  • Has persistent data change

Side effects

  • Possible performance regression
  • Increased code complexity

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation
  • Need to be included in the release note

@lance6716
Copy link
Collaborator

lance6716 commented Mar 17, 2021

there're two failed test cases, seems one is tidb again changed the SHOW CREATE TABLE behaviour for CLUSTERED INDEX. we could change our matching string

Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(review later)

tests/shardddl3/run.sh Outdated Show resolved Hide resolved
// construct locks based on the shard DDL info.
for task, ifTask := range ifm {
o.lk.SetColumnMap(colm)
defer o.lk.SetColumnMap(nil)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I guess this does not effect correctness? OK it may release some memory)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this will affect correctness. We only use this column map when we recover lock tables.

@lance6716 lance6716 added needs-cherry-pick-release-2.0 This PR should be cherry-picked to release-2.0. Remove this label after cherry-picked to release-2.0 needs-update-docs Should update docs after this PR is merged. Remove this label once the docs are updated needs-update-release-note This PR should be added into release notes. Remove this label once the release notes are updated labels Mar 18, 2021
@lance6716 lance6716 added this to the v2.0.2 milestone Mar 18, 2021
pkg/shardddl/optimism/lock.go Outdated Show resolved Hide resolved
pkg/shardddl/optimism/lock.go Show resolved Hide resolved
pkg/shardddl/optimism/lock_test.go Outdated Show resolved Hide resolved
pkg/shardddl/optimism/lock_test.go Outdated Show resolved Hide resolved
@@ -291,6 +305,11 @@ func (o *Optimist) recoverLocks(
}
if op.Done {
lock.TryMarkDone(op.Source, op.UpSchema, op.UpTable)
err := lock.DeleteColumnsByDDLs(op.DDLs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hope we could left more comment, I think I might forget the logic somedays later

if DM-master sent a DROP COLUMN DDL, all shard tables had dropped that column and got synced. So we delete it from paritially dropped columns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added under this function's definition.

pkg/shardddl/optimism/lock.go Outdated Show resolved Hide resolved
pkg/shardddl/optimism/column.go Outdated Show resolved Hide resolved
pkg/shardddl/optimism/column.go Outdated Show resolved Hide resolved
pkg/shardddl/optimism/column.go Outdated Show resolved Hide resolved
pkg/shardddl/optimism/column.go Show resolved Hide resolved
pkg/shardddl/optimism/lock_test.go Show resolved Hide resolved
@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • GMHDBJD
  • lance6716

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by writing /lgtm in a comment.
Reviewer can cancel approval by writing /lgtm cancel in a comment.

@ti-chi-bot ti-chi-bot added status/LGT2 Two reviewers already commented LGTM, ready for merge and removed status/LGT1 One reviewer already commented LGTM labels Mar 26, 2021
@lichunzhu
Copy link
Contributor Author

/hold

@lichunzhu
Copy link
Contributor Author

/unhold

@lichunzhu
Copy link
Contributor Author

@GMHDBJD PTAL again, thanks!

Copy link
Collaborator

@GMHDBJD GMHDBJD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@GMHDBJD
Copy link
Collaborator

GMHDBJD commented Mar 26, 2021

maybe we should add

	mockCluster := integration.NewClusterV3(t, &integration.ClusterConfig{Size: 1})
	defer mockCluster.Terminate(t)

	etcdTestCli = mockCluster.RandClient()

in master/optimism_test.go

@lichunzhu
Copy link
Contributor Author

lichunzhu commented Mar 26, 2021

	etcdTestCli = mockCluster.RandClient()

@GMHDBJD Did this unit test fail because of "etcd race"?

@GMHDBJD
Copy link
Collaborator

GMHDBJD commented Mar 26, 2021

I think so.

@lichunzhu
Copy link
Contributor Author

Okay. I will take a look and fix this.

@GMHDBJD
Copy link
Collaborator

GMHDBJD commented Mar 26, 2021

/lgtm

@lance6716
Copy link
Collaborator

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 213e07f

@ti-chi-bot ti-chi-bot merged commit 4a6b662 into pingcap:master Mar 26, 2021
ti-srebot pushed a commit to ti-srebot/dm that referenced this pull request Mar 26, 2021
@ti-srebot
Copy link

cherry pick to release-2.0 in PR #1537

@ti-srebot ti-srebot added already-cherry-pick-2.0 The related PR is already cherry-picked to release-2.0. Add this label once the PR is cherry-picked and removed needs-cherry-pick-release-2.0 This PR should be cherry-picked to release-2.0. Remove this label after cherry-picked to release-2.0 labels Mar 26, 2021
@lichunzhu lichunzhu deleted the warnDropAddColumn branch March 26, 2021 08:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
already-cherry-pick-2.0 The related PR is already cherry-picked to release-2.0. Add this label once the PR is cherry-picked needs-update-docs Should update docs after this PR is merged. Remove this label once the docs are updated needs-update-release-note This PR should be added into release notes. Remove this label once the release notes are updated size/XXL status/can-merge status/LGT2 Two reviewers already commented LGTM, ready for merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants