You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Observed Behavior:
We are running the Generative AI workloads using Keda-scaled jobs. But noticed that Keda scaled job pods are prematurely terminated by Karpenter after 14 mins.
For example, I placed a message in an SQS queue, which triggered a Keda job to start a pod. This pod ran for 14 minutes before being terminated.
We have set the following annotation for Karpenter not to disrupt GPU nodes on both the Scaled job and its pod as well.
Description
Observed Behavior:
We are running the Generative AI workloads using Keda-scaled jobs. But noticed that Keda scaled job pods are prematurely terminated by Karpenter after 14 mins.
For example, I placed a message in an SQS queue, which triggered a Keda job to start a pod. This pod ran for 14 minutes before being terminated.
We have set the following annotation for Karpenter not to disrupt GPU nodes on both the Scaled job and its pod as well.
Karpenter logs:
Expected Behavior:
Keda-scaled jobs should run successfully without terminating the Keda-scaled job pods.
Reproduction Steps (Please include YAML):
Keda scaled job manifest file details.
Versions:
Kubernetes Version (
kubectl version
):Client Version: v1.29.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.4-eks-036c24b
Karpenter Version: 0.34.5
Keda Version: ghcr.io/kedacore/keda:2.13.0
The text was updated successfully, but these errors were encountered: