From 1ac2f27190c8bc0064beed51e2b5d87bb87e5f07 Mon Sep 17 00:00:00 2001 From: Damien Mathieu <42@dmathieu.com> Date: Fri, 25 Oct 2024 09:08:21 +0200 Subject: [PATCH] State of Profiling blog post (#5475) Co-authored-by: Severin Neumann Co-authored-by: Severin Neumann Co-authored-by: Pablo Baeyens --- .cspell/en-words.txt | 1 + content/en/blog/2024/state-profiling.md | 117 ++++++++++++++++++++++++ static/refcache.json | 32 +++++++ 3 files changed, 150 insertions(+) create mode 100644 content/en/blog/2024/state-profiling.md diff --git a/.cspell/en-words.txt b/.cspell/en-words.txt index 7f3a83d42974..d22571563ed6 100644 --- a/.cspell/en-words.txt +++ b/.cspell/en-words.txt @@ -34,6 +34,7 @@ discoverability dotnet Dyla dynatrace +ebpf emailservice EMEA erlang diff --git a/content/en/blog/2024/state-profiling.md b/content/en/blog/2024/state-profiling.md new file mode 100644 index 000000000000..70010be554bd --- /dev/null +++ b/content/en/blog/2024/state-profiling.md @@ -0,0 +1,117 @@ +--- +title: The State of Profiling +linkTitle: Profiling state +date: 2024-10-25 +cSpell:ignore: Baeyens Florian Geisendörfer Kalkanis Lehner Mathieu Rühsen +author: >- + [Damien Mathieu](https://github.com/dmathieu) (Elastic), [Pablo + Baeyens](https://github.com/mx-psi) (Datadog), [Felix + Geisendörfer](https://github.com/felixge) (Datadog), [Christos + Kalkanis](https://github.com/christos68k) (Elastic), [Morgan + McLean](https://github.com/mtwo) (Splunk), [Florian + Lehner](https://github.com/florianl) (Elastic), [Tim + Rühsen](https://github.com/rockdaboot) (Elastic) +issue: https://github.com/open-telemetry/opentelemetry.io/issues/5477 +sig: Profiling SIG +--- + +A little over six months ago, OpenTelemetry announced +[support for the profiling signal](/blog/2024/profiling/). While the signal is +still in development and isn’t yet recommended for production use, the Profiling +SIG has made substantial progress on many fronts. + +This post provides a summary of the progress the Profiling SIG has made over the +past six months. + +## OTLP improvements + +Profiles were added as a new signal type to OTLP in +[v1.3.0](https://github.com/open-telemetry/opentelemetry-proto/releases/tag/v1.3.0), +though this area is still marked as unstable as we continue to make changes to +it. + +While our original intent was to keep wire compatibility with +[pprof](https://github.com/google/pprof), that goal proved impractical, so the +Profiling SIG +[has decided](https://github.com/open-telemetry/opentelemetry-proto/issues/567#issuecomment-2286565449) +to refactor the protocol and not aim for strict compatibility with pprof. +Instead, we will aim for convertibility, similarly to what we already do for +other signals. This shift is still a work in progress, and is causing several +breaking changes to the profiling section of the protocol. Note that this has no +impact on the stable sections that make up the majority of the OTLP protocol, +like metrics, spans, logs, resources, etc. + +## eBPF agent improvements + +Back in June, the +[donation of the Elastic Continuous Profiling Agent](/blog/2024/elastic-contributes-continuous-profiling-agent/) +was finalized. Since then, the +[opentelemetry-ebpf-profiler](https://github.com/open-telemetry/opentelemetry-ebpf-profiler) +repository has been buzzing with improvements. + +Our next goal for the eBPF agent is for it to run as a Collector receiver. Once +this is complete, the Collector can be run on every node as an agent, which +collects profiles for that host and forwards them using OTLP. This architecture +will allow us to extract some specific parts of the agent that aren’t strictly +profiling, such as retrieving host metadata and system metrics, and move them to +processors, making the agent lighter and more modular. + +## Collector support + +Since +[v0.112.0](https://github.com/open-telemetry/opentelemetry-collector/releases/tag/v0.112.0), +the OpenTelemetry Collector is able to receive, process and export profiling +data, and has support for profile ingestion and export using OTLP. + +You can try it out by enabling the `service.profilesSupport` +[feature gate](https://github.com/open-telemetry/opentelemetry-collector/blob/main/featuregate/README.md#controlling-gates) +in your collector, followed by a configuration similar to the following, which +ingests and exports data using OTLP: + +```yaml +receivers: + otlp: + protocols: + grpc: +exporters: + otlp: + endpoint: 'localhost:4317' +service: + pipelines: + profiles: + receivers: [otlp] + exporters: [otlp] +``` + +While this feature can be used now on the Collector, we do not yet recommend +doing so in production: it is still under heavy development and is expected to +have breaking changes, such as the ones mentioned above with OTLP. + +However, this support in the Collector means that any receiver, processor or +exporter of the Collector can now start adding profiles support, which we highly +encourage to do, as a way to allow a smoother integration in the future, as well +as to find potential issues early. If you wish to report a bug or contribute on +this effort, you can +[view them on the contrib repository](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22+label%3Adata%3Aprofiles). + +## Semantic Conventions and Specification + +To improve interoperability, the Profiling SIG worked also on +[OpenTelemetry Semantic Conventions for profiling](/docs/specs/semconv/attributes-registry/profile/). +There is also ongoing work to introduce a +[profiling OpenTelemetry specification](https://github.com/open-telemetry/opentelemetry-specification/pull/4197). +This work will continue and should enable wide adoption across different +platforms, tools and other OTel signals. + +## What’s next ? + +Support for profiles in OpenTelemetry is moving very quickly, and while we’re +still far from being able to provide a stable signal, we are happy to report +that folks can start hacking with it, and integrate it within their modules. + +If you’re interested in helping profiling move forward, or face issues when +integrating with it, the Profiling SIG is always happy to get or provide help. + +You can find us on +[#otel-profiles](https://cloud-native.slack.com/archives/C03J794L0BV) in the +CNCF slack. diff --git a/static/refcache.json b/static/refcache.json index 508800b3d219..0871d6726e69 100644 --- a/static/refcache.json +++ b/static/refcache.json @@ -1831,6 +1831,10 @@ "StatusCode": 200, "LastSeen": "2024-08-09T10:45:49.257983-04:00" }, + "https://cloud-native.slack.com/archives/C03J794L0BV": { + "StatusCode": 200, + "LastSeen": "2024-10-24T15:10:31.184402+02:00" + }, "https://cloud-native.slack.com/archives/C041APFBYQP": { "StatusCode": 200, "LastSeen": "2024-01-30T05:18:18.947225-05:00" @@ -4999,6 +5003,10 @@ "StatusCode": 200, "LastSeen": "2024-06-12T11:21:46.656082+02:00" }, + "https://github.com/google/pprof": { + "StatusCode": 200, + "LastSeen": "2024-10-24T15:10:16.695786+02:00" + }, "https://github.com/gosnmp/gosnmp": { "StatusCode": 200, "LastSeen": "2024-01-18T19:55:40.84138-05:00" @@ -5991,6 +5999,10 @@ "StatusCode": 200, "LastSeen": "2024-01-30T16:14:54.058976-05:00" }, + "https://github.com/open-telemetry/opentelemetry-collector-contrib/issues": { + "StatusCode": 200, + "LastSeen": "2024-10-24T15:10:27.834953+02:00" + }, "https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/16462": { "StatusCode": 200, "LastSeen": "2024-01-30T05:18:40.093521-05:00" @@ -6243,6 +6255,10 @@ "StatusCode": 200, "LastSeen": "2024-07-02T09:23:49.72181125Z" }, + "https://github.com/open-telemetry/opentelemetry-collector/releases/tag/v0.112.0": { + "StatusCode": 200, + "LastSeen": "2024-10-24T15:10:25.832305+02:00" + }, "https://github.com/open-telemetry/opentelemetry-collector/releases/tag/v0.63.0": { "StatusCode": 200, "LastSeen": "2024-01-30T16:04:58.261649-05:00" @@ -6503,6 +6519,10 @@ "StatusCode": 200, "LastSeen": "2024-01-30T16:15:25.802104-05:00" }, + "https://github.com/open-telemetry/opentelemetry-ebpf-profiler": { + "StatusCode": 200, + "LastSeen": "2024-10-24T15:10:22.597683+02:00" + }, "https://github.com/open-telemetry/opentelemetry-erlang": { "StatusCode": 200, "LastSeen": "2024-01-18T19:10:24.771487-05:00" @@ -6971,10 +6991,18 @@ "StatusCode": 200, "LastSeen": "2024-01-18T19:37:06.679199-05:00" }, + "https://github.com/open-telemetry/opentelemetry-proto/issues/567#issuecomment-2286565449": { + "StatusCode": 200, + "LastSeen": "2024-10-24T15:10:18.85325+02:00" + }, "https://github.com/open-telemetry/opentelemetry-proto/issues/new": { "StatusCode": 200, "LastSeen": "2024-08-09T10:45:27.522647-04:00" }, + "https://github.com/open-telemetry/opentelemetry-proto/releases/tag/v1.3.0": { + "StatusCode": 200, + "LastSeen": "2024-10-24T15:10:14.278497+02:00" + }, "https://github.com/open-telemetry/opentelemetry-python": { "StatusCode": 200, "LastSeen": "2024-01-18T19:37:16.269952-05:00" @@ -7123,6 +7151,10 @@ "StatusCode": 200, "LastSeen": "2024-01-18T20:05:26.46768-05:00" }, + "https://github.com/open-telemetry/opentelemetry-specification/pull/4197": { + "StatusCode": 200, + "LastSeen": "2024-10-24T15:10:29.718998+02:00" + }, "https://github.com/open-telemetry/opentelemetry-specification/releases/tag/v1.17.0": { "StatusCode": 200, "LastSeen": "2024-01-30T05:18:18.661983-05:00"