Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect OTel instrumentations in applications #1101

Merged
merged 16 commits into from
Aug 26, 2024

Conversation

grcevski
Copy link
Contributor

@grcevski grcevski commented Aug 22, 2024

With this PR we add support for automatically detecting OpenTelemetry traces and metrics instrumentation inside processes, to remove them from being monitored by Beyla.

General implementation details:

  1. We monitor the routes in the PIDs filter and look for /v1/traces and /v1/metrics. If we encounter them, we set flags on the ServiceID that it exports traces, metrics or both.
  2. Traces and metrics exporters, check the already exporting flags and avoid duplicating the signals.
  3. Span metrics and service graph metrics are unaffected, only OpenTelemetry metrics will be disabled.
  4. The detection is on by default, but it can be turned off in the discovery configuration.

This feature is particularly useful when we monitor all applications in a cluster. If certain applications are instrumented with the OpenTelemetry SDKs, we'll avoid them from now.

Limitations:

  1. Given that we rely on seeing a call to /v1/traces and /v1/metrics, there might be some initial time window where Beyla will duplicate metrics/traces until it detects that the application is already instrumented.
  2. We don't inspect the bodies of traces and grpc requests, therefore we are unable to selectively disable some metrics based on what's instrumented in the application.

Test plan:

  • Ensure all of our integration tests still work with the change as-is
  • Add tests with instrumented applications
  • Add tests for system wide instrumentation, which uses a different type of filter
  • Add tests for ensuring the signals will be duplicated if the feature is turned off
  • Ensure both OTel and Prom export are tested with this new mode

TODO:

  • Unit tests
  • Integration tests
  • Docs

@codecov-commenter
Copy link

codecov-commenter commented Aug 22, 2024

Codecov Report

Attention: Patch coverage is 95.37037% with 5 lines in your changes missing coverage. Please review.

Project coverage is 81.97%. Comparing base (3fd9525) to head (c3d483d).
Report is 4 commits behind head on main.

Files Patch % Lines
pkg/internal/request/span.go 85.71% 4 Missing ⚠️
pkg/export/alloy/traces.go 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1101      +/-   ##
==========================================
+ Coverage   73.63%   81.97%   +8.34%     
==========================================
  Files         172      140      -32     
  Lines       13091    11514    -1577     
==========================================
- Hits         9640     9439     -201     
+ Misses       2901     1551    -1350     
+ Partials      550      524      -26     
Flag Coverage Δ
integration-test 57.11% <86.11%> (+5.88%) ⬆️
k8s-integration-test 59.25% <70.37%> (+6.37%) ⬆️
oats-test 37.08% <52.77%> (+3.33%) ⬆️
unittests 52.37% <59.25%> (+6.74%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@marctc marctc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM so far. Do you plan add docs about the new flag in this PR?

@grcevski
Copy link
Contributor Author

Yeah, I'll add docs on how to disable and what it does.

Copy link
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff!

Copy link
Contributor

@marctc marctc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! IMHO those integration tests are a bit overkill for this feature, but if there are not causing much slowing down, should be alright!

Copy link
Contributor

@rafaelroquetto rafaelroquetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice!

@@ -0,0 +1,24 @@
# Build the testserver binary
# Docker command must be invoked from the projec root directory
FROM golang:1.22 AS builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is still tagged as WIP, perhaps switch this to 1.23 already given #1105 is also coming

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It anyway doesn't matter, as even if Beyla is compiled with 1.23, it should support components compiled in earlier versions.

@grcevski grcevski changed the title WIP: Detect OTel instrumentations in applications Detect OTel instrumentations in applications Aug 26, 2024
@grcevski grcevski removed the wip label Aug 26, 2024
@grcevski grcevski merged commit 7adc34c into grafana:main Aug 26, 2024
8 checks passed
@grcevski grcevski deleted the detect_otel branch August 26, 2024 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants