Operation cannot be fulfilled on replicasets.apps: the object has been modified #1775

artem-nefedov · 2022-01-14T15:37:58Z

Summary

We sometimes observe the problem where Rollout would be permanently stuck in degraded state due to the error:

Message:         RolloutAborted: Rollout aborted update to revision 2: Unable to scale ReplicaSet for template 'canary-preview' to desired replica count '1': Operation cannot be fulfilled on replicasets.apps "podinfo-test-argo-rollouts-55848cf857-2-0-canary-preview": the object has been modified; please apply your changes to the latest version and try again

This sometimes happens when experiment tries to scale up preview replica. The issue is very rare. We run a lot of tests, and seen this multiple times already, but when I intentionally wanted to reproduce it to collect info, the test was run ~90 times before it happened.

Reproduction steps

Apply manifests (attached)
Wait for rollout to become complete
Immediately run kubectl patch rollout podinfo-test-argo-rollouts --type=json '-p=[{"op":"replace","path":"/spec/template/spec/containers/0/image","value":"stefanprodan/podinfo:3.1.1"}]'
Pray for reproduction

Diagnostics

Argo Rollouts version is v.1.1.1.
k8s version 1.21 (EKS).

After reproducing the issue, I've tried to collect every possible debug info: objects output, controller logs, etc.
All of that is attached:

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

The text was updated successfully, but these errors were encountered:

jessesuen · 2022-01-15T00:05:01Z

The issue is that resource conflict errors are a fact of life, and the experiment controller needs to accommodate this possibility by retrying/re-reconciling the experiment. I think the fix for this should be easy.

func (ec *experimentContext) scaleTemplateRS(rs *appsv1.ReplicaSet, template v1alpha1.TemplateSpec, templateStatus *v1alpha1.TemplateStatus, desiredReplicaCount int32, experimentReplicas int32) {
...
	_, _, err := ec.scaleReplicaSetAndRecordEvent(rs, desiredReplicaCount)
	if err != nil { //  check if this is resource conflict error and don't fail.
		templateStatus.Status = v1alpha1.TemplateStatusError
		templateStatus.Message = fmt.Sprintf("Unable to scale ReplicaSet for template '%s' to desired replica count '%v': %v", templateStatus.Name, desiredReplicaCount, err)
	} else {
...
	}
}

artem-nefedov added the bug Something isn't working label Jan 14, 2022

jessesuen mentioned this issue Jan 15, 2022

fix: retry Experiment ReplicaSet scaling conflict errors #1778

Merged

jessesuen closed this as completed in #1778 Jan 19, 2022

croomes mentioned this issue Mar 9, 2022

Operation cannot be fulfilled on rollouts.argoproj.io: the object has been modified #1904

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operation cannot be fulfilled on replicasets.apps: the object has been modified #1775

Operation cannot be fulfilled on replicasets.apps: the object has been modified #1775

artem-nefedov commented Jan 14, 2022 •

edited

Loading

jessesuen commented Jan 15, 2022

Operation cannot be fulfilled on replicasets.apps: the object has been modified #1775

Operation cannot be fulfilled on replicasets.apps: the object has been modified #1775

Comments

artem-nefedov commented Jan 14, 2022 • edited Loading

Summary

Reproduction steps

Diagnostics

jessesuen commented Jan 15, 2022

artem-nefedov commented Jan 14, 2022 •

edited

Loading