Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DM 2.0 in TiDB Operator #2868

Closed
28 of 29 tasks
Ishiihara opened this issue Jul 5, 2020 · 0 comments
Closed
28 of 29 tasks

Support DM 2.0 in TiDB Operator #2868

Ishiihara opened this issue Jul 5, 2020 · 0 comments
Assignees
Labels
Milestone

Comments

@Ishiihara
Copy link

Ishiihara commented Jul 5, 2020

Description

Integrates DM with TiDB Operator. Ideally, we would like TiDB Operator to manage DM as well.

Integration Method

Add dm controller under the logic of the existing TiDB operator

Deployment

  1. No longer write a dm-operator alone, but add resource dmclusters in tidb-operator
  2. dm-master and dm-worker are deployed through statefulsets
  3. The dm-controller is only responsible for managing the cluster, and the subsequent dm task management requires the user to operate through the http/grpc api of the dm-master service, through the dm-ctl packaged in the image or run dm-ctl locally through kubectl port-forward.

Configuration

Deployment configurations should be saved through k8s configmaps and updated by the mechanism of statefulsets.

Service

dm-master should expose service NodePort method.

Rolling Update

dm-master

With the help of the partition of statefulset, rolling update in order from large to small numbers, check whether the POD is the latest state, and then check whether the POD is the leader. If it is the leader, deploy the leader migration, and then set it as the current POD sequence number partition.

dm-worker

Update directly.

Scale

For scale-in operation, delete member info first, and then delete pod from cluster.

High Availability

Achieved by DM's architecture.

Failover

dm-master

Assuming that the cluster has 3 master pods, if a dm-master pod goes down for more than 5 minutes (configurable), the operator will add a new dm-master pod. At this time, 4 pods will exist at the same time. After the failed dm-master node is restored, the operator will delete the newly started node. Then cluster will still have 3 master pods.

dm-worker

Almost the same as the above failover processing. The difference is that a new dm-worker may be started and the dm task has been assigned. If the worker goes offline at this time, the task will be re-assigned which will cause the task interrupted for a little while. Therefore, operator will not delete the newly started node, but keep it as 4 nodes. If users enables advanced-statefulsets, we can delete the intermediate node.

Monitor

Add the container containing the new dm monitoring file to the existing tidbMonitor pod. When deploying the monitoring of the dm cluster, copy the promethus and grafana configs to the target volumns.

Log

Reuse the current EFK system.

Category

Feature

Value

Support dm-controller in TiDB Operator so that users can easily deploy, scale, upgrade dm cluster through TiDB Operator.

TODO lists

  • dmclusters CRD definition
    • Define the dmclusters cluster spec, including the dm-master's service, configmap, statefulsets and other k8s resource definitions
    • Define dmclusters cluster status, mainly refer to tidbcluster cluster status
    • Generate dmclusters resources API
  • Support discovery dm-master service in current discovery service
  • Implement dmclusters manager
    • deploy and start dmcluster
      • deploy and start dm-master, dm-master discovery
      • deploy and start dm-worker, dm-master service check
    • rolling update dmcluster
    • scale in/out dmcluster
    • auto failover dmcluster
  • support enable TLS in dmcluster on k8s
  • support monitor dmclusters in tidbMonitor
    • Package the dmcluster monitoring file and add container
    • Add cluster version detection, dm cluster parameters and monitoring file movement logic
  • Add unit tests for each module
  • Add e2e tests for dmcluster
    • dmcluster can be correctly deployed through resource yamls
      • After filling in the complete configuration, the dm cluster can start normally
      • After the dm cluster is started, you can normally create/view/modify synchronization tasks through the embedded dm-ctl of image
      • TLS can be configured normally
      • After deleting the CR of dmcluster, the dmcluster can be stopped normally
    • dm cluster can be correctly scaled in/out
    • auto failover of dm cluster can be achieved
    • the monitoring of the dm cluster is deployed correctly
    • whether the rolling upgrade of the dm cluster can be performed correctly, and the tasks can still work normally after the upgrade
    • high availability check: randomly kill the master/worker nodes in the dm cluster and check whether the dm cluster can still work normally
  • idbscheduler special schedule strategy for dm-master

Workload Estimation

45

Time

GanttProgress: 90%
GanttStart: 2020-08-03
GanttDue: 2020-10-30

@DanielZhangQD DanielZhangQD added the status/help-wanted Extra attention is needed label Jul 6, 2020
@DanielZhangQD DanielZhangQD added this to the v1.1.4 milestone Jul 6, 2020
@IANTHEREAL IANTHEREAL changed the title Support DM in TiDB Operator Support DM 2.0 in TiDB Operator Jul 28, 2020
@DanielZhangQD DanielZhangQD modified the milestones: v1.1.4, v1.2.0 Jul 29, 2020
@DanielZhangQD DanielZhangQD modified the milestones: v1.2.0, v1.2.0-alpha.1 Sep 10, 2020
@DanielZhangQD DanielZhangQD modified the milestones: v1.2.0-alpha.1, v1.2.0 Nov 18, 2020
@DanielZhangQD DanielZhangQD modified the milestones: v1.2.0, v1.2.0-beta.1 Jan 5, 2021
@DanielZhangQD DanielZhangQD assigned csuzhangxc and unassigned lichunzhu Jan 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants