Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SKS-1903: Fix deleting a PG might create multiple deletion tasks at the same time #151

Merged
merged 3 commits into from
Oct 25, 2023

Conversation

haijianyang
Copy link
Contributor

修复删除放置组产生重复删除任务

产生原因

通过 SDK 批量删除放置组,每个放置组都会产生一个删除任务。CAPE 使用了同步轮询,当删除任务超时之后,CAPE 下一次 reconcile 会马上再次尝试删除。而 Tower 没有控制,所以出现了同一个放置组被多个任务并发删除的情况。

解决

1.从 Tower 查询出来需要被删除的放置组
2.过滤出来正在被删除中的(防止产生重复删除任务)
3.删除不是正在被删除的放置组
4.Cluster controller 等待所有的放置组被删除完成,否则 requeue。

测试

测试环境:3主机嵌套集群

1.创建 1CP + 10 个 3Worker 节点组 集群,缩容为 1 个 3Worker 节点组,节点组和放置组被正常删除。

2.删除上述集群,节点组均被正常删除。

3.1 创建 1CP + 10 个 3Worker 节点组 集群。
3.2 并启动脚本每秒给集群创建一个放置组
3.3 然后删除该集群
3.4 不断切换主机的 mongoDB primary。(暂停 primary 所在的主机)
3.5 观察到删除放置组的任务出现了错误:
image

image

3.6 选择其中一个放置组,多次删除任务是按照时间先后顺序的,没有同时出现并发删除现象。
image

@codecov
Copy link

codecov bot commented Oct 20, 2023

Codecov Report

Merging #151 (64de1ff) into master (f855a41) will decrease coverage by 0.42%.
Report is 1 commits behind head on master.
The diff coverage is 27.41%.

@@            Coverage Diff             @@
##           master     #151      +/-   ##
==========================================
- Coverage   56.77%   56.35%   -0.42%     
==========================================
  Files          17       17              
  Lines        3160     3208      +48     
==========================================
+ Hits         1794     1808      +14     
- Misses       1210     1244      +34     
  Partials      156      156              
Files Coverage Δ
controllers/elfcluster_controller.go 70.40% <100.00%> (+0.81%) ⬆️
...ntrollers/elfmachine_controller_placement_group.go 72.34% <100.00%> (+0.26%) ⬆️
pkg/service/vm.go 0.00% <0.00%> (ø)

... and 1 file with indirect coverage changes

controllers/elfcluster_controller.go Outdated Show resolved Hide resolved
controllers/elfcluster_controller.go Outdated Show resolved Hide resolved
controllers/elfmachine_controller_placement_group.go Outdated Show resolved Hide resolved
@@ -630,8 +630,12 @@ func (r *ElfMachineReconciler) deletePlacementGroup(ctx *context.MachineContext)
return false, nil
}

if err := ctx.VMService.DeleteVMPlacementGroupsByName(ctx, *placementGroup.Name); err != nil {
if pgNames, err := ctx.VMService.DeleteVMPlacementGroupsByName(ctx, *placementGroup.Name); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里加一个func DeleteVMPlacementGroupsByNamePrefix()吧,以区别于DeleteVMPlacementGroupByName

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除单个 PG 和多个 PG 使用不同的 func?删除的逻辑应该可以复用,DeleteVMPlacementGroupByName 调用 DeleteVMPlacementGroupsByNamePrefix ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单个 PG 是by Name,多个 PG是by NamePrefix,用一个func会混淆

pkg/service/vm.go Outdated Show resolved Hide resolved
@haijianyang haijianyang requested a review from jessehu October 24, 2023 06:53
pkg/service/vm.go Outdated Show resolved Hide resolved
pkg/service/vm.go Outdated Show resolved Hide resolved
pkg/service/vm.go Outdated Show resolved Hide resolved
@haijianyang haijianyang requested a review from jessehu October 24, 2023 07:53
@jessehu jessehu changed the title SKS-1903: Fix deleting a placement group would create multiple deletion tasks at the same time SKS-1903: Fix deleting a PG might create multiple deletion tasks at the same time Oct 24, 2023
@haijianyang haijianyang merged commit 4a4ddd4 into smartxworks:master Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants