Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add pendingTimeout for non-deadline timeout Fixes #10341 #12762

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

drawlerr
Copy link
Contributor

@drawlerr drawlerr commented Mar 7, 2024

Fixes #10341

Motivation

Current Template timeout parameter can handle nodes spending too long in pending phase, but if the node happens to progress to running this timeout parameter transfers to activeDeadlineSeconds, leaving no way to set a timeout that covers just the pending phase.

Modifications

I added a new pendingTimeout field to Template that should only apply while a node is in pending phase. Existing logic for timeout was re-used and expanded, and requeueAfter was added to help ensure more timely deadline enforcement.

Verification

Added TestCheckTemplateTimeouts to operator_test, performed manual verifications with a workflow with sleeping + unschedulable pods with combinations of pendingTimeout and timeout

@Joibel Joibel self-assigned this Mar 15, 2024
@drawlerr drawlerr marked this pull request as ready for review June 17, 2024 15:17
@Joibel
Copy link
Member

Joibel commented Jul 2, 2024

If my workflow fails because of pendingTimeout, the workflow is marked as failed, but the pod is not deleted. This is surprising, is that what you see @drawlerr

I merged in main for #13214, but that's not helped.

if templateDeadline != nil {
if !pendingOnly && (pod.Spec.ActiveDeadlineSeconds == nil || time.Since(*templateDeadline).Seconds() < float64(*pod.Spec.ActiveDeadlineSeconds)) {
newActiveDeadlineSeconds := int64(time.Until(*templateDeadline).Seconds())
if newActiveDeadlineSeconds <= 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems slightly odd that we cut off with 1 second remaining on the deadline - any reason this is not zero?

@tooptoop4
Copy link
Contributor

tooptoop4 commented Sep 29, 2024

🚢 this would fix aws/amazon-vpc-cni-k8s#2808

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a pendingTimeout parameter
3 participants