Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otlp.setLabel panic on non-compliant attribute array values #13703

Closed
leehow1988 opened this issue Jul 16, 2024 · 11 comments · Fixed by elastic/apm-data#346
Closed

otlp.setLabel panic on non-compliant attribute array values #13703

leehow1988 opened this issue Jul 16, 2024 · 11 comments · Fixed by elastic/apm-data#346
Assignees

Comments

@leehow1988
Copy link

leehow1988 commented Jul 16, 2024

APM Server version (8.8.1 and 8.10.0 and 8.14.3):

apm-server exits for unknown reason, logs can see otpl related stack

Steps to reproduce:

We have a lot of services in our test environment, some of them use OpenTelemetry, some of them use Elastic APM Agent We found today that apm-server exits shortly after starting, as shown in the attached file

I started ELK using docker-compose,yaml, and the part about apm-server is as follows:

  apm-server:
    image: docker.elastic.co/apm/apm-server:8.14.3
    cap_add: ["CHOWN", "DAC_OVERRIDE", "SETGID", "SETUID"]
    cap_drop: ["ALL"]
    ports:
      - 8200:8200
    depends_on:
      - elasticsearch
      - kibana
    networks:
      - elk
    volumes:
      - ./apm-server/config/apm-server.yml:/usr/share/apm-server/apm-server.yml
      - /home/elk_data/apm-server/logs:/usr/share/apm-server/logs
      - /home/elk_data/apm-server/data:/usr/share/apm-server/data
    healthcheck:
      interval: 10s
      retries: 12
      test: curl --write-out 'HTTP %{http_code}' --fail --silent --output /dev/null http://localhost:8200/

apm-server.yaml content

apm-server:
  host: "0.0.0.0:8200"

  kibana:
    enabled: true
    host: "192.168.3.155:5601"
    username: "elastic"
    password: "***"

#-------------------------- Elasticsearch output --------------------------
output.elasticsearch:
  hosts: ["192.168.3.155:9200"]
  enabled: true
  username: "elastic"
  password: "***"

docker inspect elk-apm-server-1 |grep ExitCode -C 10

"State": {
    "Status": "exited",
    "Running": false,
    "Paused": false,
    "Restarting": false,
    "OOMKilled": false,
    "Dead": false,
    "Pid": 0,
    "ExitCode": 2,
    "Error": "",
    "StartedAt": "2024-07-16T08:08:36.738792834Z",
    "FinishedAt": "2024-07-16T08:11:38.74994692Z",
    "Health": {
        "Status": "unhealthy",
        "FailingStreak": 17,
        "Log": [
            {
                "Start": "2024-07-16T16:10:52.268632964+08:00",
                "End": "2024-07-16T16:10:52.7597741+08:00",
                "ExitCode": 127,
                "Output": "/bin/sh: 1: curl: not found\n"
            },
            {
                "Start": "2024-07-16T16:11:02.763336696+08:00",
                "End": "2024-07-16T16:11:02.860034212+08:00",
                "ExitCode": 127,
                "Output": "/bin/sh: 1: curl: not found\n"
            },
            {
                "Start": "2024-07-16T16:11:12.891897649+08:00",
                "End": "2024-07-16T16:11:12.998785585+08:00",
                "ExitCode": 127,
                "Output": "/bin/sh: 1: curl: not found\n"
            },
            {
                "Start": "2024-07-16T16:11:23.005486378+08:00",
                "End": "2024-07-16T16:11:23.096787177+08:00",
                "ExitCode": 127,
                "Output": "/bin/sh: 1: curl: not found\n"
            },
            {
                "Start": "2024-07-16T16:11:33.098991014+08:00",
                "End": "2024-07-16T16:11:33.258413466+08:00",
                "ExitCode": 127,
                "Output": "/bin/sh: 1: curl: not found\n"
            }
        ]
    }
}

Provide logs (if relevant): see attachment file

error.log

@leehow1988 leehow1988 added the bug label Jul 16, 2024
@carsonip carsonip self-assigned this Jul 19, 2024
@carsonip
Copy link
Member

The core of the issue is shown in your error log:

panic: interface conversion: interface {} is nil, not string
goroutine 3684 [running]:
github.com/elastic/apm-data/input/otlp.setLabel({0xc00502e960, 0x1b}, 0xc003ccef20, {0x1c9da00?, 0xc000ca7710?})
        github.com/elastic/[email protected]/input/otlp/metadata.go:508 +0x5f1
github.com/elastic/apm-data/input/otlp.TranslateTransaction.func1({0xc004fcc600, 0x1b}, {0xc0012a8e50?, 0xc0050564dc?})
        github.com/elastic/[email protected]/input/otlp/traces.go:290 +0x8be
go.opentelemetry.io/collector/pdata/pcommon.Map.Range({0xc0041ebaa0?, 0xc0050564dc?}, 0xc000b7a978)
        go.opentelemetry.io/collector/[email protected]/pcommon/map.go:222 +0x97

I managed to reproduce the issue by crafting an invalid otlp payload. It is invalid because according to OTel spec, arrays MUST be homogeneous, and nulls SHOULD NOT be allowed in arrays.

Please double check your OTel collectors and understand why they are sending attributes that do not comply to OTel spec. At the same time, we will improve handling in apm-server to handle this kind of invalid payload.

@leehow1988
Copy link
Author

thanks for reply. Hope that later versions will ignore this invalid payload and let the apm-server run normally.

@kruskall
Copy link
Member

kruskall commented Aug 28, 2024

Released v1.11.0 (https://github.com/elastic/apm-data/releases/tag/v1.11.0)

@rubvs let's wait for dependabot to pick up the new version and update it in apm-server 🙂

@kruskall
Copy link
Member

Dependabot PR Merged: #13950

@matglas
Copy link

matglas commented Sep 2, 2024

When can the new release be expected for apm-server?

@rubvs
Copy link

rubvs commented Sep 2, 2024

When can the new release be expected for apm-server?

@matglas, the fix is scheduled to be shipped in 8.15.2

@carsonip carsonip changed the title apm-server exits for unknown reason, logs can see otpl related stack otlp.setLabel panic on non-compliant attribute array values Sep 9, 2024
@kikmon
Copy link

kikmon commented Oct 9, 2024

H
I updated to 8.15.2 and I'm sill getting some "panic: interface conversion: interface {} is nil, not string" issues
Is it possible the issue isn't entirely fixed ?

@rubvs
Copy link

rubvs commented Oct 9, 2024

Hi @kikmon, can you give some more details please; setup, input, output expected, error logs, etc. Thanks

@kikmon
Copy link

kikmon commented Oct 10, 2024

Setup is quite basic.
Just running 8.15.2 windows version of ES and APM.
Input is a Jenkins server using the opentelemetry plugin
apm_server works fine with an empty test instance of Jenkins, but when fed with the real instance, apm_server panics after a few seconds.
in the latest log, there seem to be some tags with empty value. Maybe that's what's making apm panic (would be nice to know what stat exactly made it panic)

Here's the full callstack if it helps
panic: interface conversion: interface {} is nil, not string

goroutine 172 [running]:
github.com/elastic/apm-data/input/otlp.setLabel({0xc0003a1600, 0x1b}, 0xc000a98000, {0x20b95c0?, 0xc000586a98?})
github.com/elastic/[email protected]/input/otlp/metadata.go:532 +0x5f1
github.com/elastic/apm-data/input/otlp.TranslateTransaction.func1({0xc00011b1a0, 0x1b}, {0xc00044a570?, 0xc0019b6b08?})
github.com/elastic/[email protected]/input/otlp/traces.go:295 +0x90a
go.opentelemetry.io/collector/pdata/pcommon.Map.Range({0xc0003118e0?, 0xc0019b6b08?}, 0xc0006da968)
go.opentelemetry.io/collector/[email protected]/pcommon/map.go:222 +0x97
github.com/elastic/apm-data/input/otlp.TranslateTransaction({0xc0003118e0?, 0xc0019b6b08?}, {0xc000311940?, 0xc0019b6b08?}, {0xc000344070?, 0xc0019b6b08?}, 0xc000a98000)
github.com/elastic/[email protected]/input/otlp/traces.go:269 +0x2b7
github.com/elastic/apm-data/input/otlp.(*Consumer).convertSpan(0xc0002abb60, {0xc000311880?, 0xc0019b6b08?}, {0xc000344070?, 0xc0019b6b08?}, 0xc0004d6840, 0x0, 0xc0005869c0)
github.com/elastic/[email protected]/input/otlp/traces.go:209 +0x8da
github.com/elastic/apm-data/input/otlp.(*Consumer).convertScopeSpans(0xc0002abb60, {0xc000344070?, 0xc0019b6b08?}, 0xc0004d6840, 0x0, 0xc0005869c0)
github.com/elastic/[email protected]/input/otlp/traces.go:151 +0x70
github.com/elastic/apm-data/input/otlp.(*Consumer).convertResourceSpans(0xc0002abb60, {0xc0004261e0?, 0xc0019b6b08?}, {0xc000a84e00?, 0x1?, 0x3cb9340?}, 0xc0005869c0)
github.com/elastic/[email protected]/input/otlp/traces.go:139 +0x1ac
github.com/elastic/apm-data/input/otlp.(*Consumer).ConsumeTracesWithResult(0xc0002abb60, {0x28dcfb8, 0xc00132c3f0}, {0xc00052e180?, 0xc0019b6b08?})
github.com/elastic/[email protected]/input/otlp/traces.go:115 +0x332
github.com/elastic/apm-server/internal/beater/otlp.(*tracesService).Export(0xc0004fe238, {0x28dcfb8, 0xc00132c3f0}, {0xc00052e180?, 0xc0019b6b08?})
github.com/elastic/apm-server/internal/beater/otlp/grpc.go:100 +0x9d
go.opentelemetry.io/collector/pdata/ptrace/ptraceotlp.rawTracesServer.Export({{0x28d3690?, 0xc0004fe238?}}, {0x28dcfb8, 0xc00132c3f0}, 0xc00052e180)
go.opentelemetry.io/collector/[email protected]/ptrace/ptraceotlp/grpc.go:89 +0xea
go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler.func1({0x28dcfb8?, 0xc00132c3f0?}, {0x24309c0?, 0xc00052e180?})
go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:310 +0xcb
github.com/elastic/apm-server/internal/beater.(*Runner).Run.AnonymousRateLimit.func22({0x28dcfb8, 0xc00132c3f0}, {0x24309c0, 0xc00052e180}, 0x18d0165?, 0xc00052e300)
github.com/elastic/apm-server/internal/beater/interceptors/ratelimit.go:59 +0x174
google.golang.org/grpc.getChainUnaryHandler.func1({0x28dcfb8, 0xc00132c3f0}, {0x24309c0, 0xc00052e180})
google.golang.org/[email protected]/server.go:1196 +0xb2
github.com/elastic/apm-server/internal/beater.(*Runner).Run.Auth.func21({0x28dcfb8, 0xc00132c330}, {0x24309c0, 0xc00052e180}, 0xc000f14560?, 0xc000a84dc0)
github.com/elastic/apm-server/internal/beater/interceptors/auth.go:76 +0x16a
google.golang.org/grpc.getChainUnaryHandler.func1({0x28dcfb8, 0xc00132c330}, {0x24309c0, 0xc00052e180})
google.golang.org/[email protected]/server.go:1196 +0xb2
github.com/elastic/apm-server/internal/beater.(*Runner).Run.Timeout.func20({0x28dcfb8?, 0xc00132c330?}, {0x24309c0?, 0xc00052e180?}, 0xc000f14560?, 0xc00052e300?)
github.com/elastic/apm-server/internal/beater/interceptors/timeout.go:40 +0x2e
google.golang.org/grpc.getChainUnaryHandler.func1({0x28dcfb8, 0xc00132c330}, {0x24309c0, 0xc00052e180})
google.golang.org/[email protected]/server.go:1196 +0xb2
github.com/elastic/apm-server/internal/beater/interceptors.Metrics.(*metricsInterceptor).Interceptor.func1({0x28dcfb8, 0xc00132c330}, {0x24309c0, 0xc00052e180}, 0xc000f14560, 0xc000a84d40)
github.com/elastic/apm-server/internal/beater/interceptors/metrics.go:92 +0x26c
google.golang.org/grpc.getChainUnaryHandler.func1({0x28dcfb8, 0xc00132c330}, {0x24309c0, 0xc00052e180})
google.golang.org/[email protected]/server.go:1196 +0xb2
github.com/elastic/apm-server/internal/beater.(*Runner).Run.Logging.func19({0x28dcfb8, 0xc00132c330}, {0x24309c0, 0xc00052e180}, 0xc000f14560, 0xc000a84c80)
github.com/elastic/apm-server/internal/beater/interceptors/logging.go:54 +0x1ce
google.golang.org/grpc.getChainUnaryHandler.func1({0x28dcfb8, 0xc00132c330}, {0x24309c0, 0xc00052e180})
google.golang.org/[email protected]/server.go:1196 +0xb2
github.com/elastic/apm-server/internal/beater.(*Runner).Run.ClientMetadata.func18({0x28dcfb8, 0xc00132c2d0}, {0x24309c0, 0xc00052e180}, 0xc000f14560?, 0xc000a84c40)
github.com/elastic/apm-server/internal/beater/interceptors/metadata.go:66 +0x276
google.golang.org/grpc.getChainUnaryHandler.func1({0x28dcfb8, 0xc00132c2d0}, {0x24309c0, 0xc00052e180})
google.golang.org/[email protected]/server.go:1196 +0xb2
go.elastic.co/apm/module/apmgrpc/v2.NewUnaryServerInterceptor.func1({0x28dcfb8?, 0xc00132c2d0?}, {0x24309c0?, 0xc00052e180?}, 0xc000f14560?, 0xc00052e300?)
go.elastic.co/apm/module/apmgrpc/[email protected]/server.go:71 +0xf6
google.golang.org/grpc.NewServer.chainUnaryServerInterceptors.chainUnaryInterceptors.func1({0x28dcfb8, 0xc00132c2d0}, {0x24309c0, 0xc00052e180}, 0xc000f14560, 0x80?)
google.golang.org/[email protected]/server.go:1187 +0x85
go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/trace/v1._TraceService_Export_Handler({0x21dd100, 0xc0003af190}, {0x28dcfb8, 0xc00132c2d0}, 0xc000286680, 0xc0003a86c0)
go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/trace/v1/trace_service.pb.go:312 +0x143
google.golang.org/grpc.(*Server).processUnaryRPC(0xc000314200, {0x28dcfb8, 0xc00132c210}, {0x28e7ba0, 0xc000638600}, 0xc000352360, 0xc0002eaf90, 0x3c7dba0, 0x0)
google.golang.org/[email protected]/server.go:1379 +0xdf8
google.golang.org/grpc.(*Server).handleStream(0xc000314200, {0x28e7ba0, 0xc000638600}, 0xc000352360)
google.golang.org/[email protected]/server.go:1790 +0xe8b
google.golang.org/grpc.(*Server).serveStreams.func2.1()
google.golang.org/[email protected]/server.go:1029 +0x8b
created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 157
google.golang.org/[email protected]/server.go:1040 +0x125

@rubvs
Copy link

rubvs commented Oct 15, 2024

Hi @kikmon , sorry for the late response. The fix was not introduced in 8.15.2 and will be implemented in an upcoming release.

@kikmon
Copy link

kikmon commented Oct 16, 2024

Thanks, it explains a lot :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants