Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Relic AnalysisTemplate not working (Rollout stuck) #2024

Closed
pragmaticivan opened this issue May 7, 2022 · 8 comments
Closed

New Relic AnalysisTemplate not working (Rollout stuck) #2024

pragmaticivan opened this issue May 7, 2022 · 8 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@pragmaticivan
Copy link

Summary

I've tried a rollout without AnalysisTemplate and it works as expected (Linkerd + Nginx)

When I add an AnalysisTemplate it just gets stuck.

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  args:
  - name: application-name
  metrics:
  - name: success-rate
    successCondition: result.successRate >= 0.95
    provider:
      newRelic:
        profile: argo-rollouts-newrelic-nonprod-credentials
        query: FROM Transaction SELECT percentage(count(*), WHERE httpResponseCode != 500) as successRate where appName = '{{ args.application-name }}'

usage:

      analysis:
        templates:
        - templateName: success-rate
        args:
        - name: application-name
          value: myapp

Diagnostics

What version of Argo Rollouts are you running?
1.2.0

# Paste the logs from the rollout controller

# Logs for the entire controller:

time="2022-05-07T15:41:12Z" level=info msg="Running initial measurement" analysisrun=rollouts-demo-db57cbfcd-6 metric=success-rate namespace=shared
time="2022-05-07T15:41:12Z" level=info msg="Taking 1 Measurement(s)..." analysisrun=rollouts-demo-db57cbfcd-6 namespace=shared
time="2022-05-07T15:41:12Z" level=info msg="Started syncing rollout" generation=8 namespace=shared resourceVersion=312747643 rollout=rollouts-demo
time="2022-05-07T15:41:12Z" level=info msg="Started syncing rollout" generation=1 namespace=shared resourceVersion=312698611 rollout=podinfo
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x19d9889]

goroutine 270 [running]:
github.com/argoproj/argo-rollouts/analysis.(*Controller).runMeasurements.func1({{{0xc000f5c9e0, 0xc}, {0x0, 0x0}, {0x0, 0x0}, 0x0, {0xc000f44280, 0x1a}, {0x0, ...}, ...}, ...})
/go/src/github.com/argoproj/argo-rollouts/analysis/analysis.go:334 +0x409
created by github.com/argoproj/argo-rollouts/analysis.(*Controller).runMeasurements
/go/src/github.com/argoproj/argo-rollouts/analysis/analysis.go:319 +0x34b

# Logs for a specific rollout:

time="2022-05-07T15:44:25Z" level=info msg="syncing service" namespace=shared rollout=rollouts-demo service=rollouts-demo-stable
time="2022-05-07T15:44:25Z" level=info msg="syncing service" namespace=shared rollout=rollouts-demo service=rollouts-demo-canary
time="2022-05-07T15:44:25Z" level=info msg="Started syncing rollout" generation=8 namespace=shared resourceVersion=312747643 rollout=rollouts-demo


---
<!-- Issue Author: Don't delete this message to encourage other users to support your issue! -->
**Message from the maintainers**:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
@pragmaticivan pragmaticivan added the bug Something isn't working label May 7, 2022
@pragmaticivan
Copy link
Author

I also got this example tested: https://github.com/edmocosta/rollouts-demo

I might be worse than I thought:

cd69-2 namespace=shared
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x19d9889]

goroutine 723 [running]:
github.com/argoproj/argo-rollouts/analysis.(*Controller).runMeasurements.func1({{{0xc0014c8d70, 0x10}, {0xc0014c8d80, 0x3}, {0xc0014c8d83, 0x3}, 0x0, {0xc001c32cc0, 0x3b}, {0xc001c32d00, ...}, ...}, ...})
	/go/src/github.com/argoproj/argo-rollouts/analysis/analysis.go:334 +0x409
created by github.com/argoproj/argo-rollouts/analysis.(*Controller).runMeasurements
	/go/src/github.com/argoproj/argo-rollouts/analysis/analysis.go:319 +0x34b

This actually explodes the whole controller.

@pragmaticivan
Copy link
Author

After bumping to the previous version, I was able to figure out the problem.

A simple issue in the account-id for newrelic used to throw an error before in 1.1.1 and now it just swallows it. IN 1.2.0 it's creating a memory issue.

@harikrongali harikrongali added the good first issue Good for newcomers label May 23, 2022
@nikhil1raghav
Copy link
Contributor

I want to work on this issue, @pragmaticivan can you please tell me where is this error located that is swallowed in 1.2.0
As it breaks in /go/src/github.com/argoproj/argo-rollouts/analysis/analysis.go:319 this goroutine. I checked it creates a newRelic provider but found everything same in 1.1.1 and 1.2.0 regarding how account Id is handled.

@shakefu
Copy link

shakefu commented Jul 21, 2022

FYI this critical bug broke our entire production cluster for many hours since the Rollout controller kept crash looping.

@shakefu
Copy link

shakefu commented Jul 21, 2022

@perenesenko @jessesuen @leoluz @harikrongali It looks like the fix for this is already in master https://github.com/argoproj/argo-rollouts/blob/master/analysis/analysis.go#L329

Any idea when you'll cut a bugfix release to resolve this? I don't see any releases since May. I'd rather not have to fork the repository to get a fix out since this is a production issue for us.

@harikrongali
Copy link
Contributor

@shakefu we can do a patch release next week

@shakefu
Copy link

shakefu commented Jul 21, 2022

@harikrongali Thanks so much!

zachaller added a commit to zachaller/argo-rollouts that referenced this issue Jul 21, 2022
Signed-off-by: zachaller <[email protected]>
leoluz pushed a commit that referenced this issue Jul 25, 2022
Signed-off-by: zachaller <[email protected]>
@leoluz
Copy link
Contributor

leoluz commented Jul 26, 2022

Released argo-rollouts patch v1.2.2 to address this issue:
https://github.com/argoproj/argo-rollouts/releases/tag/v1.2.2

If the problem remains please reopen this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants