Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: runtime error: slice bounds out of range with v0.82.0 #24908

Closed
stephenhong opened this issue Aug 4, 2023 · 4 comments · Fixed by #24979
Closed

panic: runtime error: slice bounds out of range with v0.82.0 #24908

stephenhong opened this issue Aug 4, 2023 · 4 comments · Fixed by #24979
Labels
bug Something isn't working needs triage New item requiring triage

Comments

@stephenhong
Copy link

Component(s)

No response

What happened?

Description

The collector is shutting down and restarting repeatedly due to the following error

2023-08-04T13:20:30.557-0400	info	MetricsExporter	{"kind": "exporter", "data_type": "metrics", "name": "logging", "resource metrics": 17, "metrics": 166, "data points": 314}
panic: runtime error: slice bounds out of range [-63:] [recovered]
	panic: runtime error: slice bounds out of range [-63:]
goroutine 215 [running]:
go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End.func1()
	go.opentelemetry.io/otel/[email protected]/trace/span.go:383 +0x2a
go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End(0xc000b7b080, {0x0, 0x0, 0xc001aae8ca?})
	go.opentelemetry.io/otel/[email protected]/trace/span.go:421 +0xa29
panic({0x6e8c620, 0xc00063f9e0})
	runtime/panic.go:884 +0x213
go.opentelemetry.io/collector/pdata/internal/data/protogen/metrics/v1.(*Metric).MarshalToSizedBuffer(0xc000650140, {0xc001aae000, 0x202, 0xc071})
	go.opentelemetry.io/collector/[email protected]/internal/data/protogen/metrics/v1/metrics.pb.go:2246 +0x45c
go.opentelemetry.io/collector/pdata/internal/data/protogen/metrics/v1.(*ScopeMetrics).MarshalToSizedBuffer(0xc001086380, {0xc001aae000, 0xdbb, 0xc071})
	go.opentelemetry.io/collector/[email protected]/internal/data/protogen/metrics/v1/metrics.pb.go:2198 +0x23c
go.opentelemetry.io/collector/pdata/internal/data/protogen/metrics/v1.(*ResourceMetrics).MarshalToSizedBuffer(0xc000a96060, {0xc001aae000, 0xde3, 0xc071})
	go.opentelemetry.io/collector/[email protected]/internal/data/protogen/metrics/v1/metrics.pb.go:2144 +0x25c
go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/metrics/v1.(*ExportMetricsServiceRequest).MarshalToSizedBuffer(0xc001c2fd40, {0xc001aae000, 0xc071, 0xc071})
	go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/metrics/v1/metrics_service.pb.go:352 +0xac
go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/metrics/v1.(*ExportMetricsServiceRequest).Marshal(0xc0000ec400?)
	go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/metrics/v1/metrics_service.pb.go:332 +0x56
google.golang.org/protobuf/internal/impl.legacyMarshal({{}, {0x826d708, 0xc0006e2140}, {0x0, 0x0, 0x0}, 0x0})
	google.golang.org/[email protected]/internal/impl/legacy_message.go:402 +0xa2
google.golang.org/protobuf/proto.MarshalOptions.marshal({{}, 0xc0?, 0x0, 0x0}, {0x0, 0x0, 0x0}, {0x826d708, 0xc0006e2140})
	google.golang.org/[email protected]/proto/encode.go:166 +0x27b
google.golang.org/protobuf/proto.MarshalOptions.MarshalAppend({{}, 0x40?, 0x82?, 0xb?}, {0x0, 0x0, 0x0}, {0x81de340?, 0xc0006e2140?})
	google.golang.org/[email protected]/proto/encode.go:125 +0x79
github.com/golang/protobuf/proto.marshalAppend({0x0, 0x0, 0x0}, {0x7f3cdfde93e8?, 0xc001c2fd40?}, 0x70?)
	github.com/golang/[email protected]/proto/wire.go:40 +0xa5
github.com/golang/protobuf/proto.Marshal(...)
	github.com/golang/[email protected]/proto/wire.go:23
google.golang.org/grpc/encoding/proto.codec.Marshal({}, {0x70b8240, 0xc001c2fd40})
	google.golang.org/[email protected]/encoding/proto/proto.go:45 +0x4e
google.golang.org/grpc.encode({0x7f3cdfde9378?, 0xc74f210?}, {0x70b8240?, 0xc001c2fd40?})
	google.golang.org/[email protected]/rpc_util.go:633 +0x44
google.golang.org/grpc.prepareMsg({0x70b8240?, 0xc001c2fd40?}, {0x7f3cdfde9378?, 0xc74f210?}, {0x0, 0x0}, {0x82248b0, 0xc0001bc0a0})
	google.golang.org/[email protected]/stream.go:1766 +0xd2
google.golang.org/grpc.(*clientStream).SendMsg(0xc0006f6480, {0x70b8240?, 0xc001c2fd40})
	google.golang.org/[email protected]/stream.go:882 +0xfd
google.golang.org/grpc.invoke({0x8234bd8?, 0xc0009ec2d0?}, {0x7636bed?, 0x4?}, {0x70b8240, 0xc001c2fd40}, {0x70b8380, 0xc000010540}, 0x0?, {0xc000886060, ...})
	google.golang.org/[email protected]/call.go:75 +0xa8
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryClientInterceptor.func1({0x8234bd8, 0xc0009ec210}, {0x7636bed, 0x3f}, {0x70b8240, 0xc001c2fd40}, {0x70b8380, 0xc000010540}, 0xc001d6a000, 0x77f86f8, ...)
	go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/[email protected]/interceptor.go:100 +0x3e4
google.golang.org/grpc.(*ClientConn).Invoke(0xc001d6a000, {0x8234bd8, 0xc0009ec210}, {0x7636bed, 0x3f}, {0x70b8240, 0xc001c2fd40}, {0x70b8380, 0xc000010540}, {0xc000abb390, ...})
	google.golang.org/[email protected]/call.go:40 +0x24d
go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/metrics/v1.(*metricsServiceClient).Export(0xc000915098, {0x8234bd8, 0xc0009ec210}, 0xc0010c5770?, {0xc000abb390, 0x1, 0x1})
	go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/metrics/v1/metrics_service.pb.go:272 +0xc9
go.opentelemetry.io/collector/pdata/pmetric/pmetricotlp.(*grpcClient).Export(0x8234bd8?, {0x8234bd8?, 0xc0009ec210?}, {0xc0009ec1e0?}, {0xc000abb390?, 0xc000fca180?, 0x2?})
	go.opentelemetry.io/collector/[email protected]/pmetric/pmetricotlp/grpc.go:41 +0x30
go.opentelemetry.io/collector/exporter/otlpexporter.(*baseExporter).pushMetrics(0xc000970580, {0x8234ba0?, 0xc0009ec1e0?}, {0x8234bd8?})
	go.opentelemetry.io/collector/exporter/[email protected]/otlp.go:107 +0x87
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).Export(0x8234bd8?, {0x8234ba0?, 0xc0009ec1e0?})
	go.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:54 +0x34
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send(0xc000af4708, {0x8257508, 0xc000a054a0})
	go.opentelemetry.io/collector/[email protected]/exporterhelper/common.go:197 +0x96
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send(0xc000b36e60, {0x8257508, 0xc000a054a0})
	go.opentelemetry.io/collector/[email protected]/exporterhelper/queued_retry.go:384 +0x596
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send(0xc00094ec00, {0x8257508, 0xc000a054a0})
	go.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:125 +0x88
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1({0x8257508, 0xc000a054a0})
	go.opentelemetry.io/collector/[email protected]/exporterhelper/queued_retry.go:195 +0x39
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1()
	go.opentelemetry.io/collector/[email protected]/exporterhelper/internal/bounded_memory_queue.go:47 +0xb6
created by go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers
	go.opentelemetry.io/collector/[email protected]/exporterhelper/internal/bounded_memory_queue.go:42 +0x45

Steps to Reproduce

Run Otel collector v0.82.0 with the below config.yaml
There are multiple apps sending trace, metrics, and logs to this collector but not sure what exact data is causing this

Expected Result

No panic: runtime error

Actual Result

Got the above error

Collector version

v0.82.0

Environment information

Environment

OS: AmazonLinux2

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:8443"
  prometheus:
    config:
      scrape_configs:
        - job_name: '$NR_ACCOUNT_NAME/otel-self-metrics-gateway-$aws_region'
          scrape_interval: 1m
          static_configs:
            - targets: [ '0.0.0.0:9999' ]

exporters:
  logging:
    verbosity: normal
  splunk_hec:
    # Splunk HTTP Event Collector token.
    token: $SPLUNK_TOKEN
    # URL to a Splunk instance to send data to.
    endpoint: $SPLUNK_ENDPOINT
    # Optional Splunk source: https://docs.splunk.com/Splexicon:Source
    source: "otel"
    # Optional Splunk source type: https://docs.splunk.com/Splexicon:Sourcetype
    sourcetype: "otel"
    # Splunk index, optional name of the Splunk index targeted.
    index: $SPLUNK_INDEX
    # Maximum HTTP connections to use simultaneously when sending data. Defaults to 100.
    max_connections: 200
    # Whether to disable gzip compression over HTTP. Defaults to false.
    disable_compression: false
    # HTTP timeout when sending data. Defaults to 10s.
    timeout: 10s
  otlp:
    endpoint: $OTLP_ENDPOINT
    headers:
      api-key: $NR_API_KEY
    compression: gzip
  datadog:
    api:
      site: datadoghq.com
      key: $DD_API_KEY

processors:
  memory_limiter:
    check_interval: 1s
    limit_percentage: 65
    spike_limit_percentage: 10
  batch:
    send_batch_size: 4096
    send_batch_max_size: 4096
  filter:
    metrics:
      exclude:
        match_type: strict
        metric_names:
          # comment a metric to remove from exclusion rule
          - otelcol_exporter_queue_capacity
          # - otelcol_exporter_queue_size
          - otelcol_exporter_enqueue_failed_spans
          - otelcol_exporter_enqueue_failed_log_records
          - otelcol_exporter_enqueue_failed_metric_points
          # - otelcol_exporter_sent_metric_points
          - otelcol_exporter_send_failed_metric_points
          # - otelcol_exporter_sent_spans
          - otelcol_process_runtime_heap_alloc_bytes
          - otelcol_process_runtime_total_alloc_bytes
          - otelcol_processor_batch_timeout_trigger_send
          # - otelcol_process_memory_rss
          - otelcol_process_runtime_total_sys_memory_bytes
          # - otelcol_process_cpu_seconds
          - otelcol_process_uptime
          # - otelcol_receiver_accepted_metric_points
          # - otelcol_receiver_refused_metric_points
          # - otelcol_receiver_accepted_spans
          # - otelcol_receiver_refused_spans
          - otelcol_scraper_errored_metric_points
          - otelcol_scraper_scraped_metric_points
          - scrape_samples_scraped
          - scrape_samples_post_metric_relabeling
          - scrape_series_added
          - scrape_duration_seconds
          # - up

extensions:
  health_check:
    endpoint: "0.0.0.0:8080"
  pprof:
  zpages:
    endpoint: "0.0.0.0:11400"

service:
  extensions: [pprof, zpages, health_check]
  telemetry:
    metrics:
      level: detailed
      address: 0.0.0.0:9999
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [logging, otlp, datadog]
      processors: [memory_limiter, batch]
    metrics:
      receivers: [otlp, prometheus]
      exporters: [logging, otlp, datadog]
      processors: [memory_limiter, batch, filter]
    logs:
      receivers: [otlp]
      exporters: [logging, splunk_hec]
      processors: [memory_limiter, batch]

Log output

2023-08-04T13:20:30.557-0400	info	MetricsExporter	{"kind": "exporter", "data_type": "metrics", "name": "logging", "resource metrics": 17, "metrics": 166, "data points": 314}
panic: runtime error: slice bounds out of range [-63:] [recovered]
	panic: runtime error: slice bounds out of range [-63:]
goroutine 215 [running]:
go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End.func1()
	go.opentelemetry.io/otel/[email protected]/trace/span.go:383 +0x2a
go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End(0xc000b7b080, {0x0, 0x0, 0xc001aae8ca?})
	go.opentelemetry.io/otel/[email protected]/trace/span.go:421 +0xa29
panic({0x6e8c620, 0xc00063f9e0})
	runtime/panic.go:884 +0x213
go.opentelemetry.io/collector/pdata/internal/data/protogen/metrics/v1.(*Metric).MarshalToSizedBuffer(0xc000650140, {0xc001aae000, 0x202, 0xc071})
	go.opentelemetry.io/collector/[email protected]/internal/data/protogen/metrics/v1/metrics.pb.go:2246 +0x45c
go.opentelemetry.io/collector/pdata/internal/data/protogen/metrics/v1.(*ScopeMetrics).MarshalToSizedBuffer(0xc001086380, {0xc001aae000, 0xdbb, 0xc071})
	go.opentelemetry.io/collector/[email protected]/internal/data/protogen/metrics/v1/metrics.pb.go:2198 +0x23c
go.opentelemetry.io/collector/pdata/internal/data/protogen/metrics/v1.(*ResourceMetrics).MarshalToSizedBuffer(0xc000a96060, {0xc001aae000, 0xde3, 0xc071})
	go.opentelemetry.io/collector/[email protected]/internal/data/protogen/metrics/v1/metrics.pb.go:2144 +0x25c
go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/metrics/v1.(*ExportMetricsServiceRequest).MarshalToSizedBuffer(0xc001c2fd40, {0xc001aae000, 0xc071, 0xc071})
	go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/metrics/v1/metrics_service.pb.go:352 +0xac
go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/metrics/v1.(*ExportMetricsServiceRequest).Marshal(0xc0000ec400?)
	go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/metrics/v1/metrics_service.pb.go:332 +0x56
google.golang.org/protobuf/internal/impl.legacyMarshal({{}, {0x826d708, 0xc0006e2140}, {0x0, 0x0, 0x0}, 0x0})
	google.golang.org/[email protected]/internal/impl/legacy_message.go:402 +0xa2
google.golang.org/protobuf/proto.MarshalOptions.marshal({{}, 0xc0?, 0x0, 0x0}, {0x0, 0x0, 0x0}, {0x826d708, 0xc0006e2140})
	google.golang.org/[email protected]/proto/encode.go:166 +0x27b
google.golang.org/protobuf/proto.MarshalOptions.MarshalAppend({{}, 0x40?, 0x82?, 0xb?}, {0x0, 0x0, 0x0}, {0x81de340?, 0xc0006e2140?})
	google.golang.org/[email protected]/proto/encode.go:125 +0x79
github.com/golang/protobuf/proto.marshalAppend({0x0, 0x0, 0x0}, {0x7f3cdfde93e8?, 0xc001c2fd40?}, 0x70?)
	github.com/golang/[email protected]/proto/wire.go:40 +0xa5
github.com/golang/protobuf/proto.Marshal(...)
	github.com/golang/[email protected]/proto/wire.go:23
google.golang.org/grpc/encoding/proto.codec.Marshal({}, {0x70b8240, 0xc001c2fd40})
	google.golang.org/[email protected]/encoding/proto/proto.go:45 +0x4e
google.golang.org/grpc.encode({0x7f3cdfde9378?, 0xc74f210?}, {0x70b8240?, 0xc001c2fd40?})
	google.golang.org/[email protected]/rpc_util.go:633 +0x44
google.golang.org/grpc.prepareMsg({0x70b8240?, 0xc001c2fd40?}, {0x7f3cdfde9378?, 0xc74f210?}, {0x0, 0x0}, {0x82248b0, 0xc0001bc0a0})
	google.golang.org/[email protected]/stream.go:1766 +0xd2
google.golang.org/grpc.(*clientStream).SendMsg(0xc0006f6480, {0x70b8240?, 0xc001c2fd40})
	google.golang.org/[email protected]/stream.go:882 +0xfd
google.golang.org/grpc.invoke({0x8234bd8?, 0xc0009ec2d0?}, {0x7636bed?, 0x4?}, {0x70b8240, 0xc001c2fd40}, {0x70b8380, 0xc000010540}, 0x0?, {0xc000886060, ...})
	google.golang.org/[email protected]/call.go:75 +0xa8
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryClientInterceptor.func1({0x8234bd8, 0xc0009ec210}, {0x7636bed, 0x3f}, {0x70b8240, 0xc001c2fd40}, {0x70b8380, 0xc000010540}, 0xc001d6a000, 0x77f86f8, ...)
	go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/[email protected]/interceptor.go:100 +0x3e4
google.golang.org/grpc.(*ClientConn).Invoke(0xc001d6a000, {0x8234bd8, 0xc0009ec210}, {0x7636bed, 0x3f}, {0x70b8240, 0xc001c2fd40}, {0x70b8380, 0xc000010540}, {0xc000abb390, ...})
	google.golang.org/[email protected]/call.go:40 +0x24d
go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/metrics/v1.(*metricsServiceClient).Export(0xc000915098, {0x8234bd8, 0xc0009ec210}, 0xc0010c5770?, {0xc000abb390, 0x1, 0x1})
	go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/metrics/v1/metrics_service.pb.go:272 +0xc9
go.opentelemetry.io/collector/pdata/pmetric/pmetricotlp.(*grpcClient).Export(0x8234bd8?, {0x8234bd8?, 0xc0009ec210?}, {0xc0009ec1e0?}, {0xc000abb390?, 0xc000fca180?, 0x2?})
	go.opentelemetry.io/collector/[email protected]/pmetric/pmetricotlp/grpc.go:41 +0x30
go.opentelemetry.io/collector/exporter/otlpexporter.(*baseExporter).pushMetrics(0xc000970580, {0x8234ba0?, 0xc0009ec1e0?}, {0x8234bd8?})
	go.opentelemetry.io/collector/exporter/[email protected]/otlp.go:107 +0x87
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).Export(0x8234bd8?, {0x8234ba0?, 0xc0009ec1e0?})
	go.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:54 +0x34
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send(0xc000af4708, {0x8257508, 0xc000a054a0})
	go.opentelemetry.io/collector/[email protected]/exporterhelper/common.go:197 +0x96
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send(0xc000b36e60, {0x8257508, 0xc000a054a0})
	go.opentelemetry.io/collector/[email protected]/exporterhelper/queued_retry.go:384 +0x596
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send(0xc00094ec00, {0x8257508, 0xc000a054a0})
	go.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:125 +0x88
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1({0x8257508, 0xc000a054a0})
	go.opentelemetry.io/collector/[email protected]/exporterhelper/queued_retry.go:195 +0x39
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1()
	go.opentelemetry.io/collector/[email protected]/exporterhelper/internal/bounded_memory_queue.go:47 +0xb6
created by go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers
	go.opentelemetry.io/collector/[email protected]/exporterhelper/internal/bounded_memory_queue.go:42 +0x45

Additional context

No response

@stephenhong stephenhong added bug Something isn't working needs triage New item requiring triage labels Aug 4, 2023
@dmitryax
Copy link
Member

dmitryax commented Aug 5, 2023

Looks like another instance of open-telemetry/opentelemetry-collector#6794.

@stephenhong did you have this issue before 0.82.0?

@mx-psi, @songy23, @mackjmr is it possible that datadog exporter got a mutable operation on the original metrics pdata recently?

@mx-psi
Copy link
Member

mx-psi commented Aug 7, 2023

This could be related to the Datadog exporter. My current guess is DataDog/opentelemetry-mapping-go/pull/101 that was enabled in the Collector on #23445. I'll confirm with @gbbr and open a PR to set MutatesData to true.

mx-psi added a commit that referenced this issue Aug 7, 2023
**Description:** 

Correctly set `MutatesData` to `true` on the Datadog metrics exporter.
This was leading to panics when using multiple exporters.

**Link to tracking Issue:** Fixes #24908
@stephenhong
Copy link
Author

@dmitryax I saw this issue in v0.80.0 as well during local testing. I was using an old version of the Otel Java agent and the collector was not using the Datadog exporter. The error didn't show up again after switching the collector version to 0.82.0 so I thought it was fixed. But then when I enabled the Datadog exporter, I saw this error come up again

@mx-psi
Copy link
Member

mx-psi commented Aug 7, 2023

Thanks for the report @stephenhong, it is expected that you would see this in v0.80.0 as well if the underlying cause is DataDog/opentelemetry-mapping-go#101. This will be fixed on v0.83.0 by the PR that closed this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New item requiring triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants