Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The etcd client to the registry of Pump/Drainer info does not use auto-sync and fails when the PD cluster address changed #42643

Closed
kennytm opened this issue Mar 28, 2023 · 4 comments · Fixed by #43529 or #44081
Assignees
Labels
affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.2 affects-6.3 affects-6.4 affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-6.6 affects-7.0 affects-7.1 This bug affects the 7.1.x(LTS) versions. component/binlog fixes-7.1.1 This bug is fixed in 7.1.1 severity/major type/bug The issue is confirmed as a bug.

Comments

@kennytm
Copy link
Contributor

kennytm commented Mar 28, 2023

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

  1. Start a TiDB cluster with 3 PDs ① ② ③ and a Pump connected
  2. Scale-out 3 more PDs ④ ⑤ ⑥
  3. Wait 31 seconds
  4. Scale-in the original PDs ① ② ③
  5. Wait 31 seconds
  6. Run SHOW PUMP STATUS;

2. What did you expect to see? (Required)

We see the status of the pump at step 6

3. What did you see instead (Required)

Context deadline exceeded in etcd.(*Client).List

4. What is your TiDB version? (Required)

v4.0.14

@kennytm kennytm added type/bug The issue is confirmed as a bug. component/binlog affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.2 affects-6.3 affects-6.4 affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-6.6 affects-7.0 labels Mar 28, 2023
@kennytm
Copy link
Contributor Author

kennytm commented Mar 28, 2023

In #9575 we added AutoSyncInterval so the etcd client can automatically refresh the PD member list. But this change is never carried into the tidb-tools/tidb-binlog package (later merged into this repository):

tidb/util/etcd/etcd.go

Lines 81 to 85 in 10f0093

cli, err := clientv3.New(clientv3.Config{
Endpoints: endpoints,
DialTimeout: dialTimeout,
TLS: security,
})

(This might also manifest in the "MySQL-compatible AUTO_INCREMENT" feature #38809)

tidb/meta/autoid/autoid.go

Lines 588 to 591 in 00d48f9

etcdCli, err := clientv3.New(clientv3.Config{
Endpoints: addrs,
TLS: ebd.TLSConfig(),
})

While it is easy to just add back the option AutoSyncInterval: 30 * time.Second in, I think a bigger issue is why do we need to have 4 different etcd clients rather than sharing the same one with all the correct config 🤷


For now the lack of AutoSyncInterval means that any change in PD membership requires all tidb-server to be restarted.

@jackysp
Copy link
Member

jackysp commented Apr 19, 2023

Although the pump drainer has been maintained, it seems that other tool components (dm, cdc) may also have similar issues. Should we fix them all at once? @nongfushanquan @overvenus @lance6716

@hawkingrei
Copy link
Member

I think we can create a linter to found this problem.

ti-chi-bot bot pushed a commit that referenced this issue Jun 27, 2023
@lance6716 lance6716 assigned tiancaiamao and lichunzhu and unassigned lance6716 Jul 6, 2023
@tiancaiamao tiancaiamao added the fixes-7.1.1 This bug is fixed in 7.1.1 label Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.2 affects-6.3 affects-6.4 affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-6.6 affects-7.0 affects-7.1 This bug affects the 7.1.x(LTS) versions. component/binlog fixes-7.1.1 This bug is fixed in 7.1.1 severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
8 participants