Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operation cannot be fulfilled on replicasets.apps: the object has been modified #1775

Closed
artem-nefedov opened this issue Jan 14, 2022 · 1 comment · Fixed by #1778
Closed
Labels
bug Something isn't working

Comments

@artem-nefedov
Copy link

artem-nefedov commented Jan 14, 2022

Summary

We sometimes observe the problem where Rollout would be permanently stuck in degraded state due to the error:

Message:         RolloutAborted: Rollout aborted update to revision 2: Unable to scale ReplicaSet for template 'canary-preview' to desired replica count '1': Operation cannot be fulfilled on replicasets.apps "podinfo-test-argo-rollouts-55848cf857-2-0-canary-preview": the object has been modified; please apply your changes to the latest version and try again

This sometimes happens when experiment tries to scale up preview replica. The issue is very rare. We run a lot of tests, and seen this multiple times already, but when I intentionally wanted to reproduce it to collect info, the test was run ~90 times before it happened.

Reproduction steps

  1. Apply manifests (attached)
  2. Wait for rollout to become complete
  3. Immediately run kubectl patch rollout podinfo-test-argo-rollouts --type=json '-p=[{"op":"replace","path":"/spec/template/spec/containers/0/image","value":"stefanprodan/podinfo:3.1.1"}]'
  4. Pray for reproduction

Diagnostics

Argo Rollouts version is v.1.1.1.
k8s version 1.21 (EKS).

After reproducing the issue, I've tried to collect every possible debug info: objects output, controller logs, etc.
All of that is attached:


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@artem-nefedov artem-nefedov added the bug Something isn't working label Jan 14, 2022
@jessesuen
Copy link
Member

The issue is that resource conflict errors are a fact of life, and the experiment controller needs to accommodate this possibility by retrying/re-reconciling the experiment. I think the fix for this should be easy.

func (ec *experimentContext) scaleTemplateRS(rs *appsv1.ReplicaSet, template v1alpha1.TemplateSpec, templateStatus *v1alpha1.TemplateStatus, desiredReplicaCount int32, experimentReplicas int32) {
...
	_, _, err := ec.scaleReplicaSetAndRecordEvent(rs, desiredReplicaCount)
	if err != nil { //  check if this is resource conflict error and don't fail.
		templateStatus.Status = v1alpha1.TemplateStatusError
		templateStatus.Message = fmt.Sprintf("Unable to scale ReplicaSet for template '%s' to desired replica count '%v': %v", templateStatus.Name, desiredReplicaCount, err)
	} else {
...
	}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants