Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More safe way to cleanup tombstone nodes #858

Merged
merged 9 commits into from
Oct 23, 2020
Merged

Conversation

lucklove
Copy link
Member

@lucklove lucklove commented Oct 22, 2020

What problem does this PR solve?

With TiUP, if we want to offline a TiKV server, we need do this first:

tiup cluster scale-in <cluster-name> -N <tikv-node-id>

This will send a request to PD servers to tell them that we want to offline this TiKV server, then PD servers will transfer all regions in this TiKV to some other place. After transferring data, the state of this TiKV server will be tombstone. At this point, the data of this server will be deleted (see the yellow line in the picture below).

屏幕快照 2020-10-22 下午12 20 46

But oncel the user execute the display command:

tiup cluster display <cluster-name>

The TiKV server in tombstone state will be destroyed and cleanup at once, this may confuse users and it's unsafe.

What is changed and how it works?

DO NOT cleanup tombstone TiKV servers in display command, but add a prune command to do this, and give the user a choice to cancel the action:

屏幕快照 2020-10-22 下午12 23 34

Check List

Tests

  • Integration test
  • Manual test (add detailed scripts or steps below)

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Release notes:

NONE

@lucklove lucklove added type/enhancement Categorizes issue or PR as related to an enhancement. status/PTAL labels Oct 22, 2020
@lucklove lucklove self-assigned this Oct 22, 2020
@lucklove lucklove added this to the v1.2.1 milestone Oct 22, 2020
@codecov-io
Copy link

codecov-io commented Oct 22, 2020

Codecov Report

Merging #858 into master will decrease coverage by 0.03%.
The diff coverage is 51.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #858      +/-   ##
==========================================
- Coverage   53.09%   53.06%   -0.04%     
==========================================
  Files         261      263       +2     
  Lines       19014    19038      +24     
==========================================
+ Hits        10096    10102       +6     
- Misses       7358     7370      +12     
- Partials     1560     1566       +6     
Flag Coverage Δ
#cluster 45.00% <51.02%> (-0.06%) ⬇️
#dm 25.25% <47.16%> (+0.02%) ⬆️
#integrate 47.93% <51.54%> (-0.03%) ⬇️
#playground 22.15% <ø> (-0.02%) ⬇️
#tiup 10.77% <ø> (ø)
#unittest 20.96% <8.33%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/cluster/operation/action.go 51.51% <ø> (ø)
components/cluster/command/prune.go 47.61% <47.61%> (ø)
components/dm/command/prune.go 50.00% <50.00%> (ø)
pkg/cluster/manager.go 65.82% <50.00%> (-0.22%) ⬇️
components/cluster/command/display.go 27.41% <100.00%> (-7.03%) ⬇️
components/cluster/command/root.go 45.50% <100.00%> (+0.32%) ⬆️
components/dm/command/display.go 84.61% <100.00%> (+30.91%) ⬆️
components/dm/command/root.go 54.54% <100.00%> (+0.37%) ⬆️
pkg/cluster/operation/scale_in.go 54.00% <100.00%> (ø)
pkg/utils/http_client.go 66.66% <0.00%> (-5.56%) ⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1b803c...b225d2a. Read the comment docs.

Signed-off-by: lucklove <[email protected]>
Copy link
Contributor

@lonng lonng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Oct 22, 2020
pkg/cluster/manager.go Outdated Show resolved Hide resolved
@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Oct 23, 2020
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Oct 23, 2020
@lucklove lucklove merged commit 2471475 into pingcap:master Oct 23, 2020
@lucklove lucklove deleted the prune branch October 23, 2020 03:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/LGT2 Indicates that a PR has LGTM 2. type/enhancement Categorizes issue or PR as related to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants