Rollout is not scaling down old replicasets properly #70

jessesuen · 2019-05-14T23:13:19Z

Here is a rollout which is in a Suspended state, but with four replicasets scaled to two:

The expectation is in a steady state (Suspended), there should only be two replicasets scaled higher than 0 (active and preview)

I ran a diff against the last three revisions (12, 11, 9) of the ReplicaSet. I'm not sure what happened to ReplicaSet revision 10. Notice that the only differences are in metadata and status. The replicaset spec is the same, which means the pod template is the same. However the bug is that the replicaset hash name is not the same.

$ diff rs-12 rs-11
6,8c6,7
<     rollout.argoproj.io/revision: '12'
<     rollout.argoproj.io/revision-history: '10'
<   creationTimestamp: '2019-05-14T21:54:49Z'
---
>     rollout.argoproj.io/revision: '11'
>   creationTimestamp: '2019-05-14T22:16:39Z'
16c15
<     rollouts-pod-template-hash: 65c456b799
---
>     rollouts-pod-template-hash: 7d58696fd9
18c17
<   name: web-service-integration-65c456b799
---
>   name: web-service-integration-7d58696fd9
27c26
<   resourceVersion: '95549146'
---
>   resourceVersion: '95556597'
29,30c28,29
<     /apis/apps/v1/namespaces/fdp-connectivity-web-service-integration-usw2-ppd-qal/replicasets/web-service-integration-65c456b799
<   uid: e584ea76-7692-11e9-9427-0a985b86565a
---
>     /apis/apps/v1/namespaces/fdp-connectivity-web-service-integration-usw2-ppd-qal/replicasets/web-service-integration-7d58696fd9
>   uid: f1c6de3d-7695-11e9-9427-0a985b86565a
36c35
<       rollouts-pod-template-hash: 65c456b799
---
>       rollouts-pod-template-hash: 7d58696fd9
57c56
<         rollouts-pod-template-hash: 65c456b799
---
>         rollouts-pod-template-hash: 7d58696fd9

$ diff rs-12 rs-9
6,7c6
<     rollout.argoproj.io/revision: '12'
<     rollout.argoproj.io/revision-history: '10'
---
>     rollout.argoproj.io/revision: '9'
16c15
<     rollouts-pod-template-hash: 65c456b799
---
>     rollouts-pod-template-hash: 748b545485
18c17
<   name: web-service-integration-65c456b799
---
>   name: web-service-integration-748b545485
27c26
<   resourceVersion: '95549146'
---
>   resourceVersion: '95539841'
29,30c28,29
<     /apis/apps/v1/namespaces/fdp-connectivity-web-service-integration-usw2-ppd-qal/replicasets/web-service-integration-65c456b799
<   uid: e584ea76-7692-11e9-9427-0a985b86565a
---
>     /apis/apps/v1/namespaces/fdp-connectivity-web-service-integration-usw2-ppd-qal/replicasets/web-service-integration-748b545485
>   uid: e58000cc-7692-11e9-9427-0a985b86565a
36c35
<       rollouts-pod-template-hash: 65c456b799
---
>       rollouts-pod-template-hash: 748b545485
57c56
<         rollouts-pod-template-hash: 65c456b799
---
>         rollouts-pod-template-hash: 748b545485

This implies that the pod template hash may be getting different hashes for the same pod template.

During this time, we know from talking to the user, that the rollout's spec.template.spec was changed to only modify resource requests/limits to equivalent values (e.g. 2000m -> '2'). I suspect that the underlying issue is when we call: controller.ComputeHash(), it does not correctly considering these values to be the same, and results in different pod template hashes.

The text was updated successfully, but these errors were encountered:

jessesuen · 2019-05-15T05:21:10Z

We confirmed the pod template hash is sensitive to resource differences. The solution is to remarshal the object to normalize it before computing the pod template hash.

jessesuen · 2019-05-16T04:13:12Z

pod template hash inconsistency has been resolved in #75.

A second fix is needed to scale down old replicasets.

jessesuen · 2019-05-17T09:42:09Z

Fixed.

jessesuen added the bug Something isn't working label May 14, 2019

dthomson25 mentioned this issue May 16, 2019

Scale down older RS on non-happy path #76

Merged

jessesuen closed this as completed May 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rollout is not scaling down old replicasets properly #70

Rollout is not scaling down old replicasets properly #70

jessesuen commented May 14, 2019 •

edited

Loading

jessesuen commented May 15, 2019

jessesuen commented May 16, 2019

jessesuen commented May 17, 2019

Rollout is not scaling down old replicasets properly #70

Rollout is not scaling down old replicasets properly #70

Comments

jessesuen commented May 14, 2019 • edited Loading

jessesuen commented May 15, 2019

jessesuen commented May 16, 2019

jessesuen commented May 17, 2019

jessesuen commented May 14, 2019 •

edited

Loading