Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix podgroup not created #3561

Merged
merged 1 commit into from
Jul 9, 2024

Conversation

liuyuanchun11
Copy link
Contributor

@liuyuanchun11 liuyuanchun11 commented Jul 5, 2024

During the rolling upgrade of replicaset, pg_controller occasionally receives the addPod event and creates a podgroup. Then, pg_controller receives the addReplicaSet (replicas = 0) event and deletes the corresponding podgroup (to solve the pg fc problem). The updateReplicaSet (replicas = 1) event is received but not processed. As a result, the pod group corresponding to the pod is not correctly created.

issue: #3563

@volcano-sh-bot
Copy link
Contributor

Welcome @liuyuanchun11!

It looks like this is your first PR to volcano-sh/volcano.

Thank you, and welcome to Volcano. 😃

@volcano-sh-bot volcano-sh-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 5, 2024
@googs1025
Copy link
Member

Is there a scenario that can be reproduced?

@liuyuanchun11
Copy link
Contributor Author

Is there a scenario that can be reproduced?

There is a high probability that this issue occurs when multiple deployments are upgraded in rolling mode. The key point is that the add pod event is received before the replicaset event.

err := pg.vcClient.SchedulingV1beta1().PodGroups(rs.Namespace).Delete(context.TODO(), pgName, metav1.DeleteOptions{})
if err != nil && !apierrors.IsNotFound(err) {
klog.Errorf("Failed to delete PodGroup <%s/%s>: %v", rs.Namespace, pgName, err)
}
} else if *rs.Spec.Replicas > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else is no need and we should add a comment to explain why this is added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand what you mean by else is not needed. Are you saying that the webhook already guarantees that replicas are unsigned, so there's no need to write a > 0 check?
The comment I'll add later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for misunderstanding, I mean we can use if *rs.Spec.Replicas > 0 directly and no need to use else if.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified based on review comments.

if podList != nil {
for _, pod := range podList.Items {
klog.V(4).Infof("Try to create podgroup for pod %s/%s", pod.Namespace, pod.Name)
err := pg.createNormalPodPGIfNotExist(&pod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just check whether rs associated pg exists first, only create pg when it doesn't exist to avoid unnecessary API request, cause it's a rare case here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createNormalPodPGIfNotExist will checks if the podgroup exists.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@volcano-sh-bot volcano-sh-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 8, 2024
@william-wang
Copy link
Member

@liuyuanchun11 please log a issue and add the description for the bug and associate the pr with issue.

@liuyuanchun11 liuyuanchun11 changed the title Fix podgroup not created Bug Fix podgroup not created Bug (#3563) Jul 8, 2024
@volcano-sh-bot volcano-sh-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 8, 2024
@liuyuanchun11
Copy link
Contributor Author

liuyuanchun11 commented Jul 9, 2024

@liuyuanchun11 please log a issue and add the description for the bug and associate the pr with issue.

done, pr has been associated with issue #3563

@volcano-sh-bot volcano-sh-bot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Jul 9, 2024
@volcano-sh-bot volcano-sh-bot added retest-not-required-docs-only approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 9, 2024
@liuyuanchun11
Copy link
Contributor Author

/reopen

@volcano-sh-bot volcano-sh-bot reopened this Jul 9, 2024
@volcano-sh-bot
Copy link
Contributor

@liuyuanchun11: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@volcano-sh-bot volcano-sh-bot added do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. and removed retest-not-required-docs-only labels Jul 9, 2024
@volcano-sh-bot volcano-sh-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 9, 2024
Signed-off-by: liuyuanchun <[email protected]>
@volcano-sh-bot volcano-sh-bot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Jul 9, 2024
@liuyuanchun11 liuyuanchun11 changed the title Fix podgroup not created Bug (#3563) Fix podgroup not created Jul 9, 2024
@wangyang0616
Copy link
Member

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 9, 2024
Copy link
Member

@william-wang william-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: william-wang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 9, 2024
@volcano-sh-bot volcano-sh-bot merged commit ed25410 into volcano-sh:master Jul 9, 2024
14 checks passed
@liuyuanchun11 liuyuanchun11 deleted the fix_pg_not_create branch July 11, 2024 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants