Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SKS-2227: Fix deleting a PG might create multiple deletion tasks at the same time #164

Merged
merged 1 commit into from
Dec 13, 2023

Conversation

haijianyang
Copy link
Contributor

@haijianyang haijianyang commented Dec 12, 2023

Issue 删除集群产生重复的删除放置组任务

复现

在 ELF 嵌套集群(非嵌套暂未复现)环境创建任意的集群后删除集群,一定概率可以复现。

在本地创建多个同名字前缀的放置组,然后连续两次调用 DeleteVMPlacementGroupsByNamePrefix 函数,也出现了重复删除删除放置组的任务。因此可以判断,Tower 没有做幂等处理,导致第二次删除的时候,放置组的 EntityAsyncStatus 还是 nil,因此产生重复的删除任务。

image
	deleteVMPlacementGroupParams := clientvmplacementgroup.NewDeleteVMPlacementGroupParams()
	deleteVMPlacementGroupParams.RequestBody = &models.VMPlacementGroupDeletionParams{
		Where: &models.VMPlacementGroupWhereInput{
			NameStartsWith:       TowerString(namePrefix),
			EntityAsyncStatusNot: nil,
		},
	}

综上述,删除集群的时候,连续两次 Reconcile ElfCluster 可能会出现该问题。

Change

Tower 没有做幂等处理,难以避免删除重复的删除任务,只能尽可能降低出现重复删除任务的概率。
恢复最开始的方式,设置超时时间,轮询删除任务,这样可以避免短时间内两次连续的 Reconcile ElfCluster。

	withLatestStatusTask, err := svr.WaitTask(ctx, taskID, config.WaitTaskTimeoutForPlacementGroupOperation, config.WaitTaskInterval)
	if err != nil {
		return pgNames, errors.Wrapf(err, "failed to wait for placement groups with name prefix %s deleting task to complete in %s: taskID %s", namePrefix, config.WaitTaskTimeoutForPlacementGroupOperation, taskID)
	}

Test

  1. 在嵌套集群创建 10个放置组的集群,然后删除。重复测试三次,未发现重复删除任务。
    image

image

  1. 在非嵌套集群创建 10个放置组的集群,然后删除。重复测试三次,未发现重复删除任务。
    image

image

  1. E2E 6 次测试未发现重复删除任务。

@haijianyang haijianyang requested a review from jessehu December 12, 2023 07:19
Copy link

codecov bot commented Dec 12, 2023

Codecov Report

Attention: 13 lines in your changes are missing coverage. Please review.

Comparison is base (a30b311) 59.92% compared to head (decbfeb) 59.72%.

Files Patch % Lines
pkg/service/vm.go 0.00% 13 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #164      +/-   ##
==========================================
- Coverage   59.92%   59.72%   -0.20%     
==========================================
  Files          20       20              
  Lines        3603     3615      +12     
==========================================
  Hits         2159     2159              
- Misses       1296     1308      +12     
  Partials      148      148              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@haijianyang haijianyang merged commit 5e75d87 into smartxworks:master Dec 13, 2023
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants