Support DM 2.0 in TiDB Operator #2868

Ishiihara · 2020-07-05T21:48:50Z

Description

Integrates DM with TiDB Operator. Ideally, we would like TiDB Operator to manage DM as well.

Integration Method

Add dm controller under the logic of the existing TiDB operator

Deployment

No longer write a dm-operator alone, but add resource dmclusters in tidb-operator
dm-master and dm-worker are deployed through statefulsets
The dm-controller is only responsible for managing the cluster, and the subsequent dm task management requires the user to operate through the http/grpc api of the dm-master service, through the dm-ctl packaged in the image or run dm-ctl locally through kubectl port-forward.

Configuration

Deployment configurations should be saved through k8s configmaps and updated by the mechanism of statefulsets.

Service

dm-master should expose service NodePort method.

Rolling Update

dm-master

With the help of the partition of statefulset, rolling update in order from large to small numbers, check whether the POD is the latest state, and then check whether the POD is the leader. If it is the leader, deploy the leader migration, and then set it as the current POD sequence number partition.

dm-worker

Update directly.

Scale

For scale-in operation, delete member info first, and then delete pod from cluster.

High Availability

Achieved by DM's architecture.

Failover

dm-master

Assuming that the cluster has 3 master pods, if a dm-master pod goes down for more than 5 minutes (configurable), the operator will add a new dm-master pod. At this time, 4 pods will exist at the same time. After the failed dm-master node is restored, the operator will delete the newly started node. Then cluster will still have 3 master pods.

dm-worker

Almost the same as the above failover processing. The difference is that a new dm-worker may be started and the dm task has been assigned. If the worker goes offline at this time, the task will be re-assigned which will cause the task interrupted for a little while. Therefore, operator will not delete the newly started node, but keep it as 4 nodes. If users enables advanced-statefulsets, we can delete the intermediate node.

Monitor

Add the container containing the new dm monitoring file to the existing tidbMonitor pod. When deploying the monitoring of the dm cluster, copy the promethus and grafana configs to the target volumns.

Log

Reuse the current EFK system.

Value

Support dm-controller in TiDB Operator so that users can easily deploy, scale, upgrade dm cluster through TiDB Operator.

TODO lists

Workload Estimation

45

Time

GanttProgress: 90%
GanttStart: 2020-08-03
GanttDue: 2020-10-30

The text was updated successfully, but these errors were encountered:

DanielZhangQD added the status/help-wanted Extra attention is needed label Jul 6, 2020

DanielZhangQD added this to the v1.1.4 milestone Jul 6, 2020

DanielZhangQD assigned lichunzhu Jul 6, 2020

IANTHEREAL changed the title ~~Support DM in TiDB Operator~~ Support DM 2.0 in TiDB Operator Jul 28, 2020

IANTHEREAL added the priority/P0 label Jul 28, 2020

DanielZhangQD modified the milestones: v1.1.4, v1.2.0 Jul 29, 2020

lichunzhu mentioned this issue Aug 5, 2020

add dmclusters CRD #3071

Merged

lichunzhu mentioned this issue Aug 12, 2020

dm-operator: support discovery dm-master service in current discovery service #3098

Merged

This was referenced Aug 21, 2020

dm-operator/: support start a new dm cluster with dm-masters and dm-workers #3146

Merged

dm-operator/: support graceful upgrade a dm cluster with dm-masters and dm-workers #3172

Merged

This was referenced Aug 31, 2020

dm-operator/: support scaling a dm cluster with dm-masters and dm-workers #3186

Merged

dm-operator/: support auto-failover for dm-masters and dm-workers #3201

Merged

DanielZhangQD modified the milestones: v1.2.0, v1.2.0-alpha.1 Sep 10, 2020

lichunzhu mentioned this issue Sep 15, 2020

dm-operator/: support tls for dm cluster and sources #3271

Merged

DanielZhangQD modified the milestones: v1.2.0-alpha.1, v1.2.0 Nov 18, 2020

DanielZhangQD modified the milestones: v1.2.0, v1.2.0-beta.1 Jan 5, 2021

DanielZhangQD assigned csuzhangxc and unassigned lichunzhu Jan 5, 2021

DanielZhangQD modified the milestones: v1.2.0-alpha.1, v1.2.0-beta.1 Jan 5, 2021

handlerww added the area/tools label Jan 15, 2021

DanielZhangQD modified the milestones: v1.2.0, v1.3.0 Mar 8, 2021

DanielZhangQD modified the milestones: v1.2.0-rc.1, v1.2.0 Mar 26, 2021

DanielZhangQD modified the milestones: v1.2.0, v1.2.0-rc.1 Apr 19, 2021

This was referenced May 12, 2021

Fix dm-master restarted failed; add scale E2E cases #3972

Merged

e2e: add a basic test case for DM #3946

Merged

e2e: add dmctl and TLS cases for DM, some fix for previous cases #3977

Merged

This was referenced May 20, 2021

e2e: add delete dc, auto failover, upgrade test cases #3982

Merged

*: fix TidbMonitor for DmCluster with TLS enabled; add some e2e test cases #3991

Merged

csuzhangxc closed this as completed May 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support DM 2.0 in TiDB Operator #2868

Support DM 2.0 in TiDB Operator #2868

Ishiihara commented Jul 5, 2020 •

edited by csuzhangxc

Loading

Support DM 2.0 in TiDB Operator #2868

Support DM 2.0 in TiDB Operator #2868

Comments

Ishiihara commented Jul 5, 2020 • edited by csuzhangxc Loading

Description

Integration Method

Deployment

Configuration

Service

Rolling Update

dm-master

dm-worker

Scale

High Availability

Failover

dm-master

dm-worker

Monitor

Log

Category

Value

TODO lists

Workload Estimation

Time

Ishiihara commented Jul 5, 2020 •

edited by csuzhangxc

Loading