Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods cannot be restarted when using scheduling gates #2937

Closed
bacherfl opened this issue Feb 1, 2024 · 0 comments · Fixed by #2946
Closed

Pods cannot be restarted when using scheduling gates #2937

bacherfl opened this issue Feb 1, 2024 · 0 comments · Fixed by #2946
Assignees
Labels
bug Something isn't working lifecycle-operator

Comments

@bacherfl
Copy link
Member

bacherfl commented Feb 1, 2024

After the initial deployment of a pod for a workload (i.e. when the related KeptnWorkloadVersion is in a completed state), we have the problem that pods related to that workload stay stuck in a pending state when restarting them.

This is because the mutating webhook always adds the the scheduling gate for the pod (see

if scheduled := handleScheduling(a.SchedulingGatesEnabled, a.Log, pod); scheduled {
), even though the related KeptnWorkloadVersion is already in a completed state.

In this case, also the fix in #2926 will not help, since the replica set of a pod does not change if it is restarted.

A potential solution for this is to do a lookup of a matching KeptnWorkloadVersion for a pod before adding the scheduling gate, as done in the scheduler:

func (sMgr *WorkloadManager) Permit(ctx context.Context, pod *corev1.Pod) Status {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lifecycle-operator
Projects
Archived in project
1 participant