Argo rollouts will scale down stable when canary is missing pods #2050

MarkSRobinson · 2022-05-25T04:25:14Z

Summary

Argo scaled down the stable RS even while the new replica set is not fully available. In this case, it scaled from 110 pods down to 0 but didn't update the routing until the new RS was 100% ready.

Argo shouldn't scale down the stable RS until it switches the to use the new canary as stable.
Argo shouldn't scale down the stable RS until the canary RS is ready.

Diagnostics

What version of Argo Rollouts are you running?

1.2.0

time="2022-05-24T07:57:11Z" level=info msg="Previous weights: &TrafficWeights{Canary:WeightDestination{Weight:100,ServiceName:api-v2-canary,PodTemplateHash:5bb4f8b6b6,},Stable:WeightDestination{Weight:0,ServiceName:api-v2-stable,PodTemplateHash:779d9dfb64,},Additional:[]WeightDestination{},Verified:nil,}" namespace=cs-team rollout=api-v2
--
  |   | time="2022-05-24T07:57:11Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:11Z" level=info msg="Enqueueing parent of cs-team/api-v2-779d9dfb64: Rollout cs-team/api-v2"
  |   | time="2022-05-24T07:57:11Z" level=info msg="Enqueueing parent of cs-team/api-v2-779d9dfb64: Rollout cs-team/api-v2"
  |   | time="2022-05-24T07:57:11Z" level=info msg="Enqueueing parent of cs-team/api-v2-779d9dfb64: Rollout cs-team/api-v2"
  |   | time="2022-05-24T07:57:11Z" level=info msg="Set 'scale-down-deadline' annotation on 'api-v2-779d9dfb64' to 2022-05-24T07:57:41Z (30s)" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:12Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:12Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:12Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:12Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:12Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:12Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:12Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:12Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:12Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:12Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:12Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:12Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:14Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:14Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:14Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:33Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:33Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:33Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:34Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:34Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:34Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:34Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:34Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:35Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:36Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:37Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:38Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:39Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:39Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:40Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:40Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:40Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:40Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:40Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:40Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:40Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:40Z" level=info msg="RS 'api-v2-779d9dfb64' has not reached the scaleDownTime" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:41Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:41Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:41Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:41Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:57:41Z" level=info msg="Found 110 available pods in old RS cs-team/api-v2-779d9dfb64" namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:41Z" level=info msg="Enqueueing parent of cs-team/api-v2-779d9dfb64: Rollout cs-team/api-v2"
  |   | time="2022-05-24T07:57:41Z" level=info msg="Enqueueing parent of cs-team/api-v2-779d9dfb64: Rollout cs-team/api-v2"
  |   | time="2022-05-24T07:57:41Z" level=info msg="Enqueueing parent of cs-team/api-v2-779d9dfb64: Rollout cs-team/api-v2"
  |   | time="2022-05-24T07:57:41Z" level=info msg="Scaled down ReplicaSet api-v2-779d9dfb64 (revision 3292) from 110 to 0" event_reason=ScalingReplicaSet namespace=cs-team rollout=api-v2
  |   | time="2022-05-24T07:57:41Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"cs-team\", Name:\"api-v2\", UID:\"2a76d2a7-e138-4113-b3aa-eb1ec96011cd\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"1213092265\", FieldPath:\"\"}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled down ReplicaSet api-v2-779d9dfb64 (revision 3292) from 110 to 0"

....



time="2022-05-24T07:59:48Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable Show context
--
  |   | time="2022-05-24T07:59:59Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:59:59Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:59:59Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:59:59Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:59:59Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T07:59:59Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T08:00:08Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T08:00:08Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T08:00:08Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T08:00:08Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T08:00:08Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T08:00:08Z" level=info msg="delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available" namespace=cs-team rollout=api-v2 service=api-v2-stable
  |   | time="2022-05-24T08:00:25Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"cs-team\", Name:\"api-v2\", UID:\"2a76d2a7-e138-4113-b3aa-eb1ec96011cd\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"1213103929\", FieldPath:\"\"}): type: 'Normal' reason: 'SwitchService' Switched selector for service 'api-v2-stable' from '779d9dfb64' to '5bb4f8b6b6'"
  |   | time="2022-05-24T08:00:25Z" level=info msg="Switched selector for service 'api-v2-stable' from '779d9dfb64' to '5bb4f8b6b6'" event_reason=SwitchService namespace=cs-team rollout=api-v2

I've tried to brief on these logs, but I can get the full logs if you would like.

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

The text was updated successfully, but these errors were encountered:

harikrongali · 2022-05-26T18:36:14Z

can you post manifest files that you are using?

MarkSRobinson · 2022-05-26T19:54:28Z

@harikrongali

I've attached the Rollout manifest. I can attach others if they'd be useful. We're using LinkerD as the service mesh topology in the cluster.

rollout.txt

harikrongali · 2022-05-31T19:54:08Z

@perenesenko can you provide your findings here?

perenesenko · 2022-05-31T20:05:53Z

@MarkSRobinson
Could you provide the whole rollout content with the status field:

kubectl get rollout [rolloutname] -o yaml

Could you also provide the rollout info also:

kubectl argo rollouts get rollouts [rolloutname]

perenesenko · 2022-05-31T20:11:47Z

Due to logs, I see that canary pods should be reached 100% ready as I see the next log in the first line:

Previous weights: &TrafficWeights{Canary:WeightDestination{Weight:100,ServiceName:api-v2-canary,.....

We're shifting the traffic only in case the canary pods readiness number reached to the desired number. So It should be the 100%
Then we're switching the service labels. But this does not happen because canary pods not 100% ready:

delaying service switch from 779d9dfb64 to 5bb4f8b6b6: ReplicaSet not fully available

Have to figure out why on a previous step 100% was ready, but it's not ready on a step with switching the labels.

MarkSRobinson · 2022-06-02T23:56:10Z

@perenesenko

Here's the full rollout object
rollout.txt

What I can figure happened is that we had a networking problem causing 30 pods to fall off being ready (either the node itself died or networking on the node died, we're not certain).

From the logs, it sounds like this could be prevent by either forcing the networking switch knowing pods are missing or cancelling the scale down until the networking is switched successfully.

…rgoproj#2050

…rgoproj#2050 Signed-off-by: Jack Andersen <[email protected]>

jandersen-plaid · 2022-08-10T19:17:29Z

I think the key issue is that the rollout is continuing reconciliation when the selectors are delayed in being swapped.

That is, https://sourcegraph.com/github.com/argoproj/argo-rollouts/-/blob/rollout/service.go?L284 returns nil as if everything is normal when we cannot swap the replicaset hashes of the stable and canary replicasets. This allows the reconciliation to proceed onward to this step: https://sourcegraph.com/github.com/argoproj/argo-rollouts@25f40d2bb8d6432e54d4eba8f37842af6f0138ad/-/blob/rollout/canary.go?L56:14-56:37 where the canary set (which should be the stable set, but was delayed in being set to stable because of unhealthy replicas) is set to 0 and the stable (which is already 0) is set to 100.

If there was an error within the service "ensureSVCTargets" function then the controller would skip this cycle (as is expected) and not continue to update the weights on the rollout (resulting in no traffic loss).

I have added a pull request that I think will fix this issue, but let me know if I am incorrect here.

mubarak-j · 2022-09-13T20:38:39Z

One of our apps with over 170 pods suffered an outage due to this bug #2235. Reading this issue, it seems this bug always had the potential to hit since using dynamicStableScale: true (~ 30 releases/rollouts since then). But it seems to be to have made possible by a longer than usual delay for EKS to spin up new nodes/pods during the rollout (~ 5 mins longer) and for that much for new replicaset to be ready.

Sometimes disabling dynamicStableScale as a workaround can be cost-prohibitive specially with long canary releases. I'm curious if this bug hits in the last step, and in my case switching from 50% to 100% traffic in the last step to something with a minimal increment e.g 95% to 100% and therefore waiting on fewer pods to be ready will reduce the chances of encountering this bug?

…rgoproj#2050 Signed-off-by: Jack Andersen <[email protected]>

zachaller · 2022-11-09T21:34:07Z

This is also an interesting #1820

…rgoproj#2050 Signed-off-by: Jack Andersen <[email protected]>

jandersen-plaid · 2022-11-26T05:42:44Z

For what it is worth, I think #2187 is ready to go -- I had originally scoped it out to all configurations of argo rollouts, but some of the end to end tests actually rely on swapping service selectors when the canary replicaset is still not ready, so I dialed it back to just dynamicStableScale.

I have applied the patch of the change to the release-1.3 branch (jandersen-plaid#1) if watchers of this issue want to try it out themselves. It is up to the maintainers if they will accept it into the next minor release or a patch release of 1.3.

Please test out this new version before you put it into a production environment: I was not able to construct a consistent test for this because the failure mode relies on obstructing pods from becoming ready at a specific point in time. That being said, I am confident that it is ready to be tested and cautiously rolled out.

Should also help with #1820 and #2235

zachaller · 2022-11-27T04:34:23Z

@jandersen-plaid Thanks for updating that I will also take a look at what you have done to "dial" it back a bit. I have been very slowly working at another fix for this with just changing the calculation to account for availability instead of introducing an error state if it pans out I think it would make a bit more sense to go that route. However if it does not pan out I think what you did will end up also making sense. I will try to get the calculation changes figured out here soon just been busy with lots of other things currently. But I should have some time to really dedicate to finishing it.

jandersen-plaid · 2022-11-28T00:27:33Z

I have been very slowly working at another fix for this with just changing the calculation to account for availability instead of introducing an error state if it pans out I think it would make a bit more sense to go that route

Great! I took a look at the PR and your approach overall seems more correct to me (adding error states generally leads to difficulty in discerning when the state is updated 😢 ), so I look forward to the final result!

For what it is worth, I had the exact same end to end tests fail for me as well (TestALBExperimentStepNoSetWeight and TestIstioUpdateInMiddleZeroCanaryReplicas). I think that their success actually depends on the replicasets not being ready. There is likely a timing issue between when the tests think the rollout is in a "final" state vs. when the rollout is actually in a final state with an available replicaset. Adjusting those tests to account for different conditions before ExpectRevisionPodCount("3", 1) (for TestIstioUpdateInMiddleZeroCanaryReplicas in https://github.com/argoproj/argo-rollouts/blob/master/test/e2e/istio_test.go#L277) and the Assert (for TestALBExperimentStepNoSetWeight in https://github.com/argoproj/argo-rollouts/blob/master/test/e2e/aws_test.go#L156-L167) should be enough to get all tests passing.

Adding the condition that dynamicStableScale: true effectively skips these tests and ensures that existing tested behavior will be kept (as opposed to adjusting the tests to account for the new behavior). I felt this was generally okay, considering the failure mode with dynamicStableScale is more dire than normal rollouts (double pods available in normal rollouts vs. 0 in the old RS and 100% in the new RS for dynamicStableScale), and the tests that were failing with dynamicStableScale: false were failing because they relied on service selectors being switched before the replicasets were ready.

zachaller · 2022-12-01T23:23:47Z

@jandersen-plaid Here is the PR, @MarkSRobinson Are you able to reproduce this enough that you could help test it if I where to get you a build?

MarkSRobinson · 2022-12-02T22:36:47Z

@zachaller The bug doesn't reliably happen. So I can test this out but it might take a while to get feedback. My concern is that this PR is built on the 1.4 branch and I'm not entirely sure I want to test all the changes in production.

Let me see if I can backport this to 1.3 release branch.

MarkSRobinson · 2022-12-02T22:54:49Z

Ok, fix back-ported - #2449

zachaller · 2022-12-05T15:57:08Z

@MarkSRobinson Do you plan on building a docker image with that patch based on 1.3 or would you like me to? Also note I refactored the PR a bit as well to simplify it

MarkSRobinson · 2022-12-06T02:18:06Z

@zachaller I built it and pushed it to our internal repo. We're testing it out on the testing cluster right now.

jstewart612 · 2022-12-07T21:15:45Z

@MarkSRobinson status on this? This has now caused TWO production outages for our organization and, as far as we are concerned, is a massive bug that needs immediate fixing.

raxod502-plaid · 2022-12-07T21:32:25Z

(Just so you know, @MarkSRobinson, @jandersen-plaid, and myself aren't affiliated with the Argo projects, we're users like yourself. Right there with you, this issue has caused outages for us as well and we're excited to help get it fixed as soon as possible.)

jstewart612 · 2022-12-08T00:05:00Z

@raxod502-plaid @jandersen-plaid @MarkSRobinson apologies: was doing a lot of avatar clicking to see who was officially on the project and mistook Mark for one of them.

MarkSRobinson added the bug Something isn't working label May 25, 2022

jandersen-plaid added a commit to jandersen-plaid/argo-rollouts that referenced this issue Aug 10, 2022

fix: return an error when we cannot swap the replicaset hashes fixes a…

94533f3

…rgoproj#2050

jandersen-plaid mentioned this issue Aug 10, 2022

fix: return an error when we cannot swap the replicaset hashes fixes #2050 #2187

Closed

6 tasks

jandersen-plaid added a commit to jandersen-plaid/argo-rollouts that referenced this issue Aug 10, 2022

fix: return an error when we cannot swap the replicaset hashes fixes a…

408bc79

…rgoproj#2050 Signed-off-by: Jack Andersen <[email protected]>

zachaller mentioned this issue Sep 7, 2022

Service selector switched after older replica scaled down to zero causing an outage #2235

Closed

2 tasks

harikrongali added this to the v1.4 milestone Oct 20, 2022

jandersen-plaid added a commit to jandersen-plaid/argo-rollouts that referenced this issue Nov 8, 2022

fix: return an error when we cannot swap the replicaset hashes fixes a…

fa4f666

…rgoproj#2050 Signed-off-by: Jack Andersen <[email protected]>

jandersen-plaid added a commit to jandersen-plaid/argo-rollouts that referenced this issue Nov 8, 2022

fix: return an error when we cannot swap the replicaset hashes fixes a…

3b82b66

…rgoproj#2050 Signed-off-by: Jack Andersen <[email protected]>

jandersen-plaid added a commit to jandersen-plaid/argo-rollouts that referenced this issue Nov 26, 2022

fix: return an error when we cannot swap the replicaset hashes fixes a…

c265366

…rgoproj#2050 Signed-off-by: Jack Andersen <[email protected]>

jandersen-plaid added a commit to jandersen-plaid/argo-rollouts that referenced this issue Nov 26, 2022

fix: return an error when we cannot swap the replicaset hashes fixes a…

87b617f

…rgoproj#2050 Signed-off-by: Jack Andersen <[email protected]>

zachaller mentioned this issue Dec 1, 2022

fix(trafficrouting): Do not block the switch of service selectors for single pod failures #2441

Merged

zachaller closed this as completed in #2441 Dec 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Argo rollouts will scale down stable when canary is missing pods #2050

Argo rollouts will scale down stable when canary is missing pods #2050

MarkSRobinson commented May 25, 2022

harikrongali commented May 26, 2022

MarkSRobinson commented May 26, 2022

harikrongali commented May 31, 2022

perenesenko commented May 31, 2022 •

edited

Loading

perenesenko commented May 31, 2022

MarkSRobinson commented Jun 2, 2022

jandersen-plaid commented Aug 10, 2022

mubarak-j commented Sep 13, 2022 •

edited

Loading

zachaller commented Nov 9, 2022 •

edited

Loading

jandersen-plaid commented Nov 26, 2022

zachaller commented Nov 27, 2022

jandersen-plaid commented Nov 28, 2022

zachaller commented Dec 1, 2022 •

edited

Loading

MarkSRobinson commented Dec 2, 2022

MarkSRobinson commented Dec 2, 2022

zachaller commented Dec 5, 2022 •

edited

Loading

MarkSRobinson commented Dec 6, 2022

jstewart612 commented Dec 7, 2022

raxod502-plaid commented Dec 7, 2022

jstewart612 commented Dec 8, 2022

Argo rollouts will scale down stable when canary is missing pods #2050

Argo rollouts will scale down stable when canary is missing pods #2050

Comments

MarkSRobinson commented May 25, 2022

Summary

Diagnostics

harikrongali commented May 26, 2022

MarkSRobinson commented May 26, 2022

harikrongali commented May 31, 2022

perenesenko commented May 31, 2022 • edited Loading

perenesenko commented May 31, 2022

MarkSRobinson commented Jun 2, 2022

jandersen-plaid commented Aug 10, 2022

mubarak-j commented Sep 13, 2022 • edited Loading

zachaller commented Nov 9, 2022 • edited Loading

jandersen-plaid commented Nov 26, 2022

zachaller commented Nov 27, 2022

jandersen-plaid commented Nov 28, 2022

zachaller commented Dec 1, 2022 • edited Loading

MarkSRobinson commented Dec 2, 2022

MarkSRobinson commented Dec 2, 2022

zachaller commented Dec 5, 2022 • edited Loading

MarkSRobinson commented Dec 6, 2022

jstewart612 commented Dec 7, 2022

raxod502-plaid commented Dec 7, 2022

jstewart612 commented Dec 8, 2022

perenesenko commented May 31, 2022 •

edited

Loading

mubarak-j commented Sep 13, 2022 •

edited

Loading

zachaller commented Nov 9, 2022 •

edited

Loading

zachaller commented Dec 1, 2022 •

edited

Loading

zachaller commented Dec 5, 2022 •

edited

Loading