After performing an online recovery, "halt-scheduling" has been set to true when reloading pd #8095

mayjiang0203 · 2024-04-18T08:47:48Z

Bug Report

What did you do?

What did you expect to see?

Should be set to false.

What did you see instead?

[2024/04/18 16:16:08.515 +08:00] [INFO] [cluster.go:1093] ["will run cmd"] [cmd:="tiup ctl:v8.1.0-pre pd -u http://pd3-peer.dr-auto-sync-8c12tikv-tps-7567843-1-466:2379 unsafe remove-failed-stores show"]
  {
    "info": "Unsafe recovery Finished",
    "time": "2024-04-18 16:15:42.491",
[2024/04/18 16:16:22.872 +08:00] [INFO] [cmd.go:197] ["Remote command finished"] [cmd="tiup cluster reload tidbcluster -R pd -y"] [exitcode=0] []
[2024/04/18 16:16:24.293 +08:00] [INFO] [pdutil.go:512] ["run pd ctl command"] [pdCmd="tiup ctl:v8.1.0-pre pd -u http://pd3-peer.dr-auto-sync-8c12tikv-tps-7567843-1-466:2379 config show all"]

What version of PD are you using (`pd-server -V`)?

v8.1.0

[2024/04/18 15:15:22.453 +08:00] [INFO] [workloadnode.run] [util.go:255] ["/tiup/deploy/pd-/bin/pd-server -V"] [workload=pd2]
[2024/04/18 15:15:22.455 +08:00] [INFO] [cmd.go:150] ["Start remote command"] [cmd="/tiup/deploy/pd-/bin/pd-server -V"] [nodename=pd2]
2024-04-18T15:15:22.455+0800 INFO k8s/client.go:223 it should be noted that a long-running command will not be interrupted even the use case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
Release Version: v8.1.0^M
Edition: Community^M
Git Commit Hash: 3ec92bd^M
Git Branch: HEAD^M
UTC Build Time: 2024-04-15 03:59:49^M

The text was updated successfully, but these errors were encountered:

mayjiang0203 · 2024-04-18T08:48:19Z

/severity major
/label affects-8.1
/label affects-7.1
/label affects-7.5
/remove-label may-affects-7.5
/remove-label may-affects-7.1
/remove-label may-affects-6.5
/remove-label may-affects-6.1
/remove-label may-affects-5.4

ti-chi-bot · 2024-04-19T01:55:52Z

@mayjiang0203: These labels are not set on the issue: affects-7.5, affects-7.1, affects-6.5, affects-6.1, affects-5.4.

In response to this:

/severity major
/label affects-8.1
/remove-label affects-7.5
/remove-label affects-7.1
/remove-label affects-6.5
/remove-label affects-6.1
/remove-label affects-5.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2024-04-23T03:16:17Z

@mayjiang0203: These labels are not set on the issue: may-affects-7.5, may-affects-7.1, may-affects-6.5, may-affects-6.1, may-affects-5.4.

In response to this:

/severity major
/label affects-8.1
/label affects-7.1
/label affects-7.5
/remove-label may-affects-7.5
/remove-label may-affects-7.1
/remove-label may-affects-6.5
/remove-label may-affects-6.1
/remove-label may-affects-5.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

mayjiang0203 · 2024-04-28T02:31:46Z

The impact of this bug: Reloading the cluster will become very slow because evicting the leader is not working anymore, and restarting TiKV requires waiting for a 10-minute timeout.
w/a is: reload pd first， then do "config set halt-scheduling false", after that can reload the cluster.

…8147) ref #6493, close #8095 Individually check the scheduling halt for online unsafe recovery to avoid unexpectedly persisting the halt option in the intermediate process. Signed-off-by: JmPotato <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

…8147) (#8155) ref #6493, close #8095 Individually check the scheduling halt for online unsafe recovery to avoid unexpectedly persisting the halt option in the intermediate process. Signed-off-by: JmPotato <[email protected]> Co-authored-by: JmPotato <[email protected]>

ref tikv#6493, close tikv#8095 Signed-off-by: ti-chi-bot <[email protected]>

…8147) (#8194) ref #6493, close #8095 Individually check the scheduling halt for online unsafe recovery to avoid unexpectedly persisting the halt option in the intermediate process. Signed-off-by: JmPotato <[email protected]> Co-authored-by: JmPotato <[email protected]> Co-authored-by: lhy1024 <[email protected]>

seiya-annie · 2024-06-11T10:46:31Z

/found customer

mayjiang0203 added the type/bug The issue is confirmed as a bug. label Apr 18, 2024

ti-chi-bot bot added severity/major may-affects-5.4 may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 may-affects-8.1 affects-8.1 This bug affects the 8.1.x(LTS) versions. and removed may-affects-8.1 labels Apr 18, 2024

ti-chi-bot bot added affects-7.1 This bug affects the 7.1.x(LTS) versions. and removed may-affects-7.5 may-affects-7.1 may-affects-6.5 may-affects-6.1 may-affects-5.4 labels Apr 19, 2024

ti-chi-bot bot added the affects-7.5 This bug affects the 7.5.x(LTS) versions. label Apr 23, 2024

JmPotato mentioned this issue May 7, 2024

*: individually check the scheduling halt for online unsafe recovery #8147

Merged

ti-chi-bot bot closed this as completed in #8147 May 8, 2024

ti-chi-bot mentioned this issue May 8, 2024

*: individually check the scheduling halt for online unsafe recovery (#8147) #8155

Merged

ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue May 20, 2024

This is an automated cherry-pick of tikv#8147

c86fcfb

ref tikv#6493, close tikv#8095 Signed-off-by: ti-chi-bot <[email protected]>

ti-chi-bot mentioned this issue May 20, 2024

*: individually check the scheduling halt for online unsafe recovery (#8147) #8193

Closed

ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue May 20, 2024

This is an automated cherry-pick of tikv#8147

9d2a577

ref tikv#6493, close tikv#8095 Signed-off-by: ti-chi-bot <[email protected]>

ti-chi-bot mentioned this issue May 20, 2024

*: individually check the scheduling halt for online unsafe recovery (#8147) #8194

Merged

ti-chi-bot bot added the report/customer Customers have encountered this bug. label Jun 11, 2024

github-project-automation bot added this to Questions and Bug Reports Aug 29, 2024

github-project-automation bot moved this to Closed in Questions and Bug Reports Aug 29, 2024

rleungx removed the affects-7.1 This bug affects the 7.1.x(LTS) versions. label Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After performing an online recovery, "halt-scheduling" has been set to true when reloading pd #8095

After performing an online recovery, "halt-scheduling" has been set to true when reloading pd #8095

mayjiang0203 commented Apr 18, 2024

mayjiang0203 commented Apr 18, 2024 •

edited

Loading

ti-chi-bot bot commented Apr 19, 2024

ti-chi-bot bot commented Apr 23, 2024

mayjiang0203 commented Apr 28, 2024

seiya-annie commented Jun 11, 2024

After performing an online recovery, "halt-scheduling" has been set to true when reloading pd #8095

After performing an online recovery, "halt-scheduling" has been set to true when reloading pd #8095

Comments

mayjiang0203 commented Apr 18, 2024

Bug Report

What did you do?

What did you expect to see?

What did you see instead?

What version of PD are you using (pd-server -V)?

mayjiang0203 commented Apr 18, 2024 • edited Loading

ti-chi-bot bot commented Apr 19, 2024

ti-chi-bot bot commented Apr 23, 2024

mayjiang0203 commented Apr 28, 2024

seiya-annie commented Jun 11, 2024

What version of PD are you using (`pd-server -V`)?

mayjiang0203 commented Apr 18, 2024 •

edited

Loading