Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argo rollout getting degraded after using update. #41

Closed
mangglesh opened this issue Jul 12, 2024 · 4 comments
Closed

argo rollout getting degraded after using update. #41

mangglesh opened this issue Jul 12, 2024 · 4 comments

Comments

@mangglesh
Copy link

mangglesh commented Jul 12, 2024

get_clean_dict function in generate file is generating wrong steps config for rollout object, and making steps in strategy
for argo rollout crd invalid.

self.spec.strategy before clean function for rollout object
{'canary': {'maxSurge': '25%', 'maxUnavailable': 0, 'steps': [{'setWeight': 10}, {'pause': {'duration': '1h'}}, {'setWeight': 20}, {'pause': {}}]}}

after clean function
get_clean_dict(self)['spec']['strategy']
{'canary': {'maxSurge': '25%', 'maxUnavailable': 0, 'steps': [{'setWeight': 10}, {'pause': {'duration': '1h'}}, {'setWeight': 20}, {}]}}

{'pause': {}} has been changed to {}
making rollout object invalid.

after update rollout is marked degraded with message
message: >-
InvalidSpec: The Rollout "example-rollout-vartical" is invalid:
spec.strategy.steps[3]: Invalid value: "step.Experiment: true step.Pause:
true step.SetWeight: true step.Analysis: true step.SetCanaryScale: true
step.SetHeaderRoute: true step.SetMirrorRoutes: true": Step must have one of
the following set: experiment, setWeight, setCanaryScale or pause

steps to reproduce

step 1

create argo rollout with following config.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"argoproj.io/v1alpha1","kind":"Rollout","metadata":{"annotations":{},"name":"example-rollout-vartical","namespace":"default"},"spec":{"minReadySeconds":30,"replicas":10,"revisionHistoryLimit":3,"selector":{"matchLabels":{"app":"nginx"}},"strategy":{"canary":{"maxSurge":"25%","maxUnavailable":0,"steps":[{"setWeight":10},{"pause":{"duration":"1h"}},{"setWeight":20},{"pause":{}}]}},"template":{"metadata":{"labels":{"app":"nginx"}},"spec":{"containers":[{"image":"nginx:1.15.4","name":"nginx","ports":null,"resources":{"limits":{"cpu":"500m","memory":"128Mi"},"requests":{"cpu":"250m","memory":"64Mi"}}}]}}}}
rollout.argoproj.io/revision: '6'
creationTimestamp: '2024-07-10T10:28:18Z'
generation: 18
labels:
k8slens-edit-resource-version: v1alpha1
managedFields:
- apiVersion: argoproj.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:spec: {}
manager: kubectl-client-side-apply
operation: Update
time: '2024-07-10T10:28:18Z'
- apiVersion: argoproj.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:spec: {}
manager: rollouts-controller
operation: Update
time: '2024-07-12T17:20:16Z'
- apiVersion: argoproj.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:template: {}
manager: OpenAPI-Generator
operation: Update
time: '2024-07-12T18:08:07Z'
- apiVersion: argoproj.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:strategy:
f:canary:
.: {}
f:steps: {}
manager: node-fetch
operation: Update
time: '2024-07-12T18:10:57Z'
- apiVersion: argoproj.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:status:
.: {}
f:conditions: {}
f:observedGeneration: {}
f:phase: {}
manager: rollouts-controller
operation: Update
subresource: status
time: '2024-07-12T18:10:57Z'
name: example-rollout-vartical
namespace: default
resourceVersion: '249246189'
uid: 37f55c60-42bf-4163-b084-ecb01f453d89
selfLink: >-
/apis/argoproj.io/v1alpha1/namespaces/default/rollouts/example-rollout-vartical
status:
HPAReplicas: 2
availableReplicas: 2
blueGreen: {}
canary: {}
conditions:
- lastTransitionTime: '2024-07-12T17:20:48Z'
lastUpdateTime: '2024-07-12T17:20:48Z'
message: Rollout has minimum availability
reason: AvailableReason
status: 'True'
type: Available
- lastTransitionTime: '2024-07-12T17:21:43Z'
lastUpdateTime: '2024-07-12T17:21:43Z'
message: Rollout is paused
reason: RolloutPaused
status: 'False'
type: Paused
- lastTransitionTime: '2024-07-12T17:22:14Z'
lastUpdateTime: '2024-07-12T17:22:14Z'
message: RolloutCompleted
reason: RolloutCompleted
status: 'True'
type: Completed
- lastTransitionTime: '2024-07-12T18:10:57Z'
lastUpdateTime: '2024-07-12T18:10:57Z'
message: Rollout is healthy
reason: RolloutHealthy
status: 'True'
type: Healthy
- lastTransitionTime: '2024-07-12T17:21:43Z'
lastUpdateTime: '2024-07-12T18:10:57Z'
message: >-
ReplicaSet "example-rollout-vartical-db494c4f7" has successfully
progressed.
reason: NewReplicaSetAvailable
status: 'True'
type: Progressing
currentPodHash: db494c4f7
currentStepHash: 66b4c88d5
currentStepIndex: 4
observedGeneration: '18'
phase: Healthy
readyReplicas: 2
replicas: 2
restartedAt: '2024-07-12T17:15:20Z'
selector: app=nginx
stableRS: db494c4f7
updatedReplicas: 2
spec:
minReadySeconds: 30
replicas: 2
restartAt: '2024-07-12T17:15:20Z'
revisionHistoryLimit: 3
selector:
matchLabels:
app: nginx
strategy:
canary:
maxSurge: 25%
maxUnavailable: 0
steps:
- setWeight: 10
- pause:
duration: 1h
- setWeight: 20
- pause: {}
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx:1.15.4
name: nginx
resources:
limits:
cpu: '0.55'
memory: 144179Ki

step 2

update the resource using update function in huraku using

resource.update()

@mangglesh mangglesh changed the title argo rollout getting degraded after change update. argo rollout getting degraded after using update. Jul 12, 2024
@mangglesh
Copy link
Author

image

@haxsaw
Copy link
Owner

haxsaw commented Jul 14, 2024

HI there- would you be able to provide a bit more info please? Could you let me know if you're using either the hikaru metapackage or the separate hikaru-core and hikaru-models, and regardless of which could you tell me the versions involved?

Edited: After taking a bit more time I see that you have shown what you expect to happen; sorry for not having time for that earlier. I have an idea about what's going on and I'll investigate in that direction, but if what I'm thinking is the case then it isn't clear what can be done about it.

The gist of my suspicion is this: the clean dict function is geared towards making minimal acceptable dicts for K8s, and to do that there are rules that are followed that K8s seems to want that prune empty entries from a model and not represent them in dicts. For example, we don't send empty dicts around for a value that would take a dict, and this seems to agree with what K8s expects (it doesn't seem to like empty lists/dicts). This clearly isn't a hard-fast rule-- as you point out we send an empty dict as a list element. It kind of depends on the kind of thing we're marshalling into a dict.

If this is indeed the problem, then we may have a bit of an issue on our hands because loosening this up to benefit CRDs may actually break talking to regular K8s resources. I could look into developing multiple rules for handling empty entries based on whether we're marshalling a CRD or a regular resource, but doing that could break other CRDs that observe the practices that K8s uses.

It could be a little tricky and may require input from users. I'll have a look and a bit of a think and follow up in a bit.

@haxsaw
Copy link
Owner

haxsaw commented Jul 14, 2024

So I had a peek into the code and first thing I read was the doc string, which states:

`

Turns an instance of a HikaruBase into a dict without values of None

This function returns a Python dict object that represents the hierarchy
of objects starting at ``obj`` and recusing into any nested objects.
The returned dict **does not** include any key/value pairs where the value
of the key is None or empty.

`

So yeah, there's our problem, and it's documented behavior, behavior that's meant to satisfy K8s. Since the last element in the list is a dict whose only key has an 'empty' value, we don't marshall the key out. Returnig an empty dict for the final list element is probably a gray area according to these rules, but I certainly see where/why it's happening (it actually is ignoring the empty dict and returns the default value which is an empty dict, which is what you're seeing).

Since this is documented behavior, this isn't so much a request for a bug fix but rather a feature request. It certainly can't (and probably shouldn't) be done automatically for CRDs as it might break the code of a lot of people who are playing by the existing rules here. I could introduce something like an optional argument, allows_empty, that causes get_clean_dict() to ignore these rules, but that may wind up biting people later if a CRD ever contains standard K8s objects-- they would no longer marshall out correctly. I'm open to ideas as to a solution, but it certainly won't be simply changing Hikaru to now allow None/empty values-- that'll break too many other things. Got any ideas?

@haxsaw
Copy link
Owner

haxsaw commented Sep 4, 2024

Since no one has responded with a better idea and this is working as documented I'm going to close this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants