SKS-2227: Fix deleting a PG might create multiple deletion tasks at the same time #164
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue 删除集群产生重复的删除放置组任务
复现
在 ELF 嵌套集群(非嵌套暂未复现)环境创建任意的集群后删除集群,一定概率可以复现。
在本地创建多个同名字前缀的放置组,然后连续两次调用 DeleteVMPlacementGroupsByNamePrefix 函数,也出现了重复删除删除放置组的任务。因此可以判断,Tower 没有做幂等处理,导致第二次删除的时候,放置组的 EntityAsyncStatus 还是 nil,因此产生重复的删除任务。
综上述,删除集群的时候,连续两次 Reconcile ElfCluster 可能会出现该问题。
Change
Tower 没有做幂等处理,难以避免删除重复的删除任务,只能尽可能降低出现重复删除任务的概率。
恢复最开始的方式,设置超时时间,轮询删除任务,这样可以避免短时间内两次连续的 Reconcile ElfCluster。
Test