Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic and SIGSEGV for the ADOT collector for CloudWatch(AWS) #9978

Closed
kullu-ashish opened this issue May 12, 2022 · 2 comments · Fixed by #17285
Closed

Panic and SIGSEGV for the ADOT collector for CloudWatch(AWS) #9978

kullu-ashish opened this issue May 12, 2022 · 2 comments · Fixed by #17285
Assignees
Labels
bug Something isn't working comp:aws AWS components

Comments

@kullu-ashish
Copy link

Describe the bug
I am getting the following issue while deploying the ADOT collector for AWS CloudWatch based on the document - https://docs.aws.amazon.com/eks/latest/userguide/configure-cw.html

kubectl logs my-collector-cloudwatch-collector-666665859f-7lrtq -n default
2022/05/12 01:17:51 AWS OTel Collector version: v0.17.0
2022/05/12 01:17:51 found no extra config, skip it, err: open /opt/aws/aws-otel-collector/etc/extracfg.txt: no such file or directory
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x8a664a]

Steps to reproduce
Not Reproducible. With a different EKS cluster I can't replicate the issue.

What did you expect to see?
I should see -
kubectl apply -f collector-config-cloudwatch.yaml
opentelemetrycollector.opentelemetry.io/my-collector-cloudwatch created
clusterrole.rbac.authorization.k8s.io/otel-prometheus-role created
clusterrolebinding.rbac.authorization.k8s.io/otel-prometheus-role-binding created

What did you see instead?
I saw -
2022/05/12 01:17:51 AWS OTel Collector version: v0.17.0
2022/05/12 01:17:51 found no extra config, skip it, err: open /opt/aws/aws-otel-collector/etc/extracfg.txt: no such file or directory
panic: runtime error: invalid memory address or nil pointer dereference

[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x8a664a]
goroutine 1 [running]:
go.uber.org/zap.(*Logger).check(0x0, 0x1, {0x3610ca9, 0x1b})
go.uber.org/[email protected]/logger.go:270 +0x6a
go.uber.org/zap.(*Logger).Warn(0x0, {0x3610ca9, 0x0}, {0xc0004337c0, 0x1, 0x1})
go.uber.org/[email protected]/logger.go:199 +0x3e
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter.(*Config).Validate(0xc0007dd180)
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/config.go:97 +0x289
go.opentelemetry.io/collector/config.(*Config).Validate(0xc0006daf20)
go.opentelemetry.io/[email protected]/config/config.go:74 +0x2a8
go.opentelemetry.io/collector/service.(*configProvider).Get(0xc000481aa0, {0x3c16150, 0xc00012e000}, {0xc00070a3f0, 0xc00070a6c0, 0xc00070a960, 0xc00070a240})
go.opentelemetry.io/[email protected]/service/config_provider.go:167 +0x2de
go.opentelemetry.io/collector/service.(*Collector).setupConfigurationComponents(0xc000716000, {0x3c16150, 0xc00012e000})
go.opentelemetry.io/[email protected]/service/collector.go:171 +0xa3
go.opentelemetry.io/collector/service.(*Collector).Run(0xc000716000, {0x3c16150, 0xc00012e000})
go.opentelemetry.io/[email protected]/service/collector.go:221 +0x24e
main.newCommand.func1(0xc000710000, {0x35d8b36, 0x1, 0x1})
github.com/aws-observability/aws-otel-collector/cmd/awscollector/main.go:117 +0xf5
github.com/spf13/cobra.(*Command).execute(0xc000710000, {0xc000128010, 0x1, 0x1})
github.com/spf13/[email protected]/command.go:856 +0x60e
github.com/spf13/cobra.(*Command).ExecuteC(0xc000710000)
github.com/spf13/[email protected]/command.go:974 +0x3bc
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/[email protected]/command.go:902
main.runInteractive({{0xc00070a3f0, 0xc00070a6c0, 0xc00070a960, 0xc00070a240}, {{0x35f3e9b, 0x12}, {0x35f29a7, 0x12}, {0x3b32b10, 0x7}}, ...})
github.com/aws-observability/aws-otel-collector/cmd/awscollector/main.go:83 +0x5d
main.run({{0xc00070a3f0, 0xc00070a6c0, 0xc00070a960, 0xc00070a240}, {{0x35f3e9b, 0x12}, {0x35f29a7, 0x12}, {0x3b32b10, 0x7}}, ...})
github.com/aws-observability/aws-otel-collector/cmd/awscollector/main_others.go:42 +0xf8
main.main()
github.com/aws-observability/aws-otel-collector/cmd/awscollector/main.go:76 +0x2d8

What version did you use?
AWS OTel Collector version: v0.17.0

What config did you use?
Config: Same as given in the document

Environment
OS: (e.g., "Ubuntu 20.04") - Node is having AMI - ami-0e4286e3300d2ec0f (amazon-eks-node-1.20-v20220406)
Compiler(if manually compiled): (e.g., "go 14.2") : No

Additional context
Add any other context about the problem here.

Since it was giving error in the https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/awsemfexporter/config.go validate function due to some metrics, I changed the section for exporters from -

  awsemf:
    region: "<AWS_REGION>"
    namespace: ContainerInsights/Prometheus
    log_group_name: '/aws/containerinsights/${CLUSTER_NAME}/prometheus'
    resource_to_telemetry_conversion:
      enabled: true
    dimension_rollup_option: NoDimensionRollup
    parse_json_encoded_attr_values: [Sources, kubernetes]
    metric_declarations:
      - dimensions: [[EKS_Cluster, EKS_Namespace, EKS_PodName]]

To -

  awsemf:
    region: "<AWS_REGION>"
    namespace: ContainerInsights/Prometheus
    log_group_name: '/aws/containerinsights/${CLUSTER_NAME}/prometheus'
    resource_to_telemetry_conversion:
      enabled: true

It started working. Basically removed the last 4 lines. But my query is why it is not working with the metric_declarartions?

@kullu-ashish kullu-ashish added the bug Something isn't working label May 12, 2022
@mx-psi mx-psi added the comp:aws AWS components label May 12, 2022
@codeboten
Copy link
Contributor

@Aneurysm9 please assign a priority to this one

@github-actions
Copy link
Contributor

github-actions bot commented Nov 8, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working comp:aws AWS components
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants