Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dm-operator/: support scaling a dm cluster with dm-masters and dm-workers #3186

Merged
merged 60 commits into from
Sep 2, 2020

Conversation

lichunzhu
Copy link
Contributor

@lichunzhu lichunzhu commented Aug 31, 2020

What problem does this PR solve?

#2868
Support scale in/out dmcluster.

What is changed and how does it work?

Check pvc and delete defer deleting pvcs to scale-out dm-master and dm-worker.

Check List

Tests

  • Unit test
  • E2E test
  • Stability test
  • Manual test (add detailed scripts or steps below)
    Test scale in and then scale out dm-master. dm-master can start correctly and pvc will be flushed while dm-worker will not flush pvc.
dm-master-basic-dm-master-0   Bound    pvc-9e42db35-927b-4476-9e1e-691f3e286293   1Gi        RWO            standard       20m
dm-master-basic-dm-master-1   Bound    pvc-d44bd6c4-cc94-4dad-a38a-85b6d3641096   1Gi        RWO            standard       16m
dm-master-basic-dm-master-2   Bound    pvc-a587a4d7-31f1-4075-82b5-8cd9e74f88ba   1Gi        RWO            standard       15m
dm-worker-basic-dm-worker-0   Bound    pvc-d4516dcc-a604-486d-823d-452a78f4482a   1Gi        RWO            standard       19m
dm-worker-basic-dm-worker-1   Bound    pvc-9c2b06d5-4cb9-4bc9-b510-0676eeb183ea   1Gi        RWO            standard       19m
dm-worker-basic-dm-worker-2   Bound    pvc-06215f47-18ad-47f7-9953-d35a45b091ce   1Gi        RWO            standard       19m

Does this PR introduce a user-facing change?:

NONE

@lichunzhu lichunzhu added the status/PTAL PR needs to be reviewed label Aug 31, 2020
@codecov-commenter
Copy link

codecov-commenter commented Aug 31, 2020

Codecov Report

Merging #3186 into master will decrease coverage by 0.47%.
The diff coverage is 22.30%.

@@            Coverage Diff             @@
##           master    #3186      +/-   ##
==========================================
- Coverage   40.90%   40.43%   -0.48%     
==========================================
  Files         168      170       +2     
  Lines       18507    18818     +311     
==========================================
+ Hits         7571     7609      +38     
- Misses      10283    10553     +270     
- Partials      653      656       +3     
Flag Coverage Δ
#unittest 40.43% <22.30%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

pkg/dmapi/master_control.go Outdated Show resolved Hide resolved
@@ -31,5 +31,5 @@ type TiKVGroupManager interface {

type DMManager interface {
// Sync implements the logic for syncing dmcluster.
Sync(*v1alpha1.DMCluster) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add DM here? I think it's not necessary as it's a method of DMManager which already includes the DM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because reclaimPolicyManager uses SyncDM as interface and I want to keep the same with that. I think using Sync is also okay.

pkg/manager/member/dm_master_scaler.go Outdated Show resolved Hide resolved
pkg/manager/member/dm_master_scaler.go Outdated Show resolved Hide resolved
return nil
}

// We need remove member from cluster before reducing statefulset replicas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the member removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be removed in syncing worker status. For dm-worker we can't remove its register info from dm-master when it's still alive. So we delete it later after it's keepalive lease is outdated or revoked. We can defer deleting dm-worker register info because dm-master will patch replication worker through keepalive info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the comment here does not match the code logic.

ordinal, err := util.GetOrdinalFromPodName(worker.Name)
if err != nil {
klog.Errorf("invalid worker name %s, can't offline this worker automatically, err: %s", worker.Name, err)
} else if ordinal >= dc.WorkerStsDesiredReplicas() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to https://github.com/pingcap/tidb-operator/blob/master/pkg/manager/member/tikv_failover.go#L38-L45 for how to check if an ordinal is not desired or not in case of advanced statefulset is enabled.

return nil
}

// We need remove member from cluster before reducing statefulset replicas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the comment here does not match the code logic.

@@ -549,3 +546,13 @@ func getWorkerConfigMap(dc *v1alpha1.DMCluster) (*corev1.ConfigMap, error) {
}
return cm, nil
}

func isWorkerPodDesired(dc *v1alpha1.DMCluster, podName string) bool {
ordinals := dc.WorkerStsDesiredOrdinals(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be false here?

Copy link
Contributor

@DanielZhangQD DanielZhangQD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lichunzhu lichunzhu merged commit 662b037 into pingcap:master Sep 2, 2020
@lichunzhu lichunzhu deleted the supportScaleDMCluster branch September 2, 2020 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/LGT2 status/PTAL PR needs to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants