You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When doing a canary rollout with maxUnavailable: 0, if I have an AnalysisRun fail, all of the canary pods are immediately terminated even though the previous revision has no available pods.
Looks like my change in GetReplicasForScaleDown is what broke it:
// GetReplicasForScaleDown returns the number of replicas to consider for scaling down.// isStableRS indicates if the supplied ReplicaSet is the stableRSfuncGetReplicasForScaleDown(rs*appsv1.ReplicaSet, isStableRSbool) int32 {
ifrs==nil {
returnint32(0)
}
if*rs.Spec.Replicas<rs.Status.AvailableReplicas {
// The ReplicaSet is already going to scale down replicas since the availableReplica count is bigger// than the spec count. The controller uses the .Spec.Replicas to prevent the controller from// assuming the extra replicas (availableReplica - .Spec.Replicas) are going to remain available.// Otherwise, the controller use those extra replicas to scale down more replicas and potentially// violate the min available.return*rs.Spec.Replicas
}
ifisStableRS&&rs.Status.AvailableReplicas<*rs.Spec.Replicas { // this logic should not apply// The stable ReplicaSet might be scaled up, but its pods may be unavailable.// In this case we need to return the spec.Replicas so that the controller will still// consider scaling down this ReplicaSet. Without this, a rollout update could become stuck// not scaling down the stable, in order to make room for more canaries.return*rs.Spec.Replicas
}
returnrs.Status.AvailableReplicas
}
Summary
When doing a canary rollout with
maxUnavailable: 0
, if I have anAnalysisRun
fail, all of the canary pods are immediately terminated even though the previous revision has no available pods.Diagnostics
Versions of all tools:
Repro yaml can be found here.
Steps to repro:
The moment the
AnalysisRun
fails you'll see:All of the canary pods are terminated immediately and you're left with close to 0 ready pods.
Paste the logs from the rollout controller
Logs for the entire controller:
https://gist.github.com/a4ff10e0c07013f824594f7d84b21c4e
Logs for a specific rollout:
https://gist.github.com/b13fae524c2192da376643a425dadeba
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
The text was updated successfully, but these errors were encountered: