-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stuck on progressing after upgrade to v1.0.1 #1359
Comments
There have been many changes in both the controller's logic and CRD since v0.8.2, so you may need to migrate the rollout's CRs to the latest version to make it work. |
Stracktrace but formatted:
|
The only possible explanation is func (c *rolloutContext) reconcileBlueGreenPause(activeSvc, previewSvc *corev1.Service) {
...
pauseCond := getPauseCondition(c.rollout, v1alpha1.PauseReasonBlueGreenPause)
if pauseCond == nil && !c.rollout.Status.ControllerPause {
if pauseCond == nil {
c.log.Info("pausing")
}
c.pauseContext.AddPauseCondition(v1alpha1.PauseReasonBlueGreenPause)
return
}
if !c.pauseContext.CompletedBlueGreenPause() {
c.log.Info("pause incomplete")
if c.rollout.Spec.Strategy.BlueGreen.AutoPromotionSeconds > 0 {
c.checkEnqueueRolloutDuringWait(pauseCond.StartTime, c.rollout.Spec.Strategy.BlueGreen.AutoPromotionSeconds) // <<<< panic.
}
} else {
c.log.Infof("pause completed")
c.pauseContext.RemovePauseCondition(v1alpha1.PauseReasonBlueGreenPause)
} |
I was trying to reproduce the error with the following steps, but couldn't get the same results.
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: rollout-bluegreen
spec:
replicas: 2
revisionHistoryLimit: 2
selector:
matchLabels:
app: rollout-bluegreen
template:
metadata:
labels:
app: rollout-bluegreen
spec:
containers:
- name: rollouts-demo
image: argoproj/rollouts-demo:blue
imagePullPolicy: Always
ports:
- containerPort: 8080
strategy:
blueGreen:
activeService: rollout-bluegreen-active
previewService: rollout-bluegreen-preview
autoPromotionEnabled: false
---
apiVersion: v1
kind: Service
metadata:
name: rollout-bluegreen-active
spec:
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
selector:
app: rollout-bluegreen
---
apiVersion: v1
kind: Service
metadata:
name: rollout-bluegreen-preview
spec:
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
selector:
app: rollout-bluegreen
@eilonmonday , could you provide more details about how to reproduce the error? |
guys, seems to be an issue with the autoPromotionEnabled flag |
@eilonmonday , I tested with |
Hi guys, I'm experiencing a similar issue with |
@BillyMorgan are you seeing the stack trace in the logs as well? |
This panic should be resolved in v1.0.3. Please give it a try. |
There is another issue when using the previewReplicaCount feature. This is being fixed and will be in a v1.0.4 release |
We are working with Argo rollouts for 2 years (version 0.8.2), lately, we have thought of upgrading to v1.0.1 our production environment. since it is a very sensitive environment we upgraded staging environment 1 month ago.
Everything went good, and we had hundreds of deployments to staging with Argo v1.0.1,
All of a sudden, the whole thing stopped working. after deployment, and when all pods are up, it is stuck on progressing while both(blue and green) are healthy.
It happened in 2 different clusters
I tried to investigate logs:
time="2021-07-18T09:46:17Z" level=error msg="Recovered from panic: runtime error: invalid memory address or nil pointer dereference\ngoroutine 160 [running]:\nruntime/debug.Stack(0xc000b4d438, 0x1c18d40, 0x2d971f0)\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x9f\ngithub.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1.1.1(0xc002f69d50, 0xc000b4db30)\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:149 +0x5b\npanic(0x1c18d40, 0x2d971f0)\n\t/usr/local/go/src/runtime/panic.go:965 +0x1b9\ngithub.com/argoproj/argo-rollouts/rollout.(*rolloutContext).reconcileBlueGreenPause(0xc002bcaa80, 0xc001caa000, 0xc00260d900)\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/bluegreen.go:177 +0x5fe\ngithub.com/argoproj/argo-rollouts/rollout.(*rolloutContext).rolloutBlueGreen(0xc002bcaa80, 0x1ea0d24, 0x17)\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/bluegreen.go:48 +0x17d\ngithub.com/argoproj/argo-rollouts/rollout.(*rolloutContext).reconcile(0xc002bcaa80, 0xc00001d800, 0xc002bcaa80)\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/context.go:79 +0x1e5\ngithub.com/argoproj/argo-rollouts/rollout.(*Controller).syncHandler(0xc0009ae000, 0xc000c0d560, 0x19, 0x0, 0x0)\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:387 +0x51a\ngithub.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1.1(0x0, 0x0)\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:153 +0x7c\ngithub.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem.func1(0x21621d0, 0xc00007c1e0, 0x1e8bdd7, 0x7, 0xc001c4de60, 0xc00076bec0, 0x1b8c1a0, 0xc003010710, 0x0, 0x0)\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:157 +0x323\ngithub.com/argoproj/argo-rollouts/utils/controller.processNextWorkItem(0x21621d0, 0xc00007c1e0, 0x1e8bdd7, 0x7, 0xc001c4de60, 0xc00076bec0, 0xc0005cce01)\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:169 +0x9a\ngithub.com/argoproj/argo-rollouts/utils/controller.RunWorker(...)\n\t/go/src/github.com/argoproj/argo-rollouts/utils/controller/controller.go:104\ngithub.com/argoproj/argo-rollouts/rollout.(*Controller).Run.func1()\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:319 +0xa5\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc001e04450)\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x5f\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001e04450, 0x2107e40, 0xc001c236b0, 0x7e6e01, 0xc0000a2f00)\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0x9b\nk8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001e04450, 0x3b9aca00, 0x0, 0x1, 0xc0000a2f00)\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133 +0x98\nk8s.io/apimachinery/pkg/util/wait.Until(0xc001e04450, 0x3b9aca00, 0xc0000a2f00)\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90 +0x4d\ncreated by github.com/argoproj/argo-rollouts/rollout.(*Controller).Run\n\t/go/src/github.com/argoproj/argo-rollouts/rollout/controller.go:318 +0xac\n" namespace=monday rollout=staging-monday-api time="2021-07-18T09:46:17Z" level=error msg="rollout syncHandler error: Recovered from Panic" namespace=monday rollout=staging-monday-api
I tried to:
delete Argo rollout, its namespace, and then install - the same issue occurred.
then I tried to uninstall again, and install v0.8.2 and the deployment passed...
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
The text was updated successfully, but these errors were encountered: