-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for exporting internal metrics via OpenTelemetry library. #3816
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kirbyquerby Thanks for starting this effort.
@Aneurysm9 @rakyll @dashpole can you please review? |
@bogdandrutu would you mind taking a look at this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we will have to maintain OC and OTel instrumentation side-by-side for a long time? It seems likely that OC and OTel instrumentation will diverge subtly over time without extensive testing...
As an alternative, I would propose that we use both opencensus and opentelemetry client libraries at the same time while we migrate. We can use the OpenCensus bridge for OpenTelemetry to make both opencensus and opentelemetry clients serve metrics on the same prometheus endpoint.
We could roll this out with the following steps:
- Switch from the OpenCensus prometheus exporter to a "wrapped" OpenTelemetry prometheus exporter. Verify that self-obs metrics on the prometheus endpoint remain the same.
- Add the OpenTelemetry API/SDK and use the same prometheus exporter that OC uses.
- Migrate each metric from OC to OTel individually, and verify that the metric doesn't change.
@dashpole thanks for the suggestion! I tried the bridge, but ran into two problems:
Any idea where I'm going wrong? |
@dashpole thoughts? It looks like your suggestion doesn't fit the use case being tested. |
I looked into this yesterday, and it definitely won't work as-is. But I think the bridge should support this use-case, so i'm going to do some further investigation to see if I can extend the bridge to cover this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are starting with a "public" change that we don't want to support in the future (flags), and the flag is a no-op. Why do we need this then?
@bogdandrutu The intent of the flag is to allow us to iteratively migrate without breaking existing functionality and then delete the OpenCensus code once we're sure everything works. The alternative would be to switch everything out all at once -- would that be preferable? |
Not sure how is the plan going to work, please describe. I am not suggesting to do a big change, but to do incremental changes that will not be hocked until the last PR when we are ready with everything, then we can have that change flag protected |
@bogdandrutu the current plan is to make incremental changes guarded behind a flag that defaults to OpenCensus, change the default to OpenTelemetry once we're fairly confident we have feature parity, and then finally remove the flag (along with the OpenCensus code) once we're confident the migration is stable. Are you suggesting we still make incremental changes but make the new code completely unreachable, and only add the flag once all of the code has been added? |
555aa4e
to
a672918
Compare
@bogdandrutu I've removed the flag and made a const bool to control the behavior instead. PTAL. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
@bogdandrutu it looks like the use of a constant that can be set at build time to control the choice of OTel or OpenCensus metrics will be easily converted to a feature gate when we have such a system. I think this PR can be merged as-is. |
a672918
to
afc1336
Compare
Thanks for taking a look, Anthony! I've rebased to fix the go.mod/go.sum merge conflicts. |
Added a version tag to the collector's own telemetry metrics because today when looking at those metrics there is no way to know the collector's version, this can also help to know what collector versions are running out there. I based the "flag" to control this on [PR 3816](#3816)
Description: add support for exporting internal metrics via OpenTelemetry library.
This change adds a new flag
--metrics-backend
which can be set toopentelemetry
to use the OpenTelemetry Prometheus exporter for internal metrics instead of the current OpenCensus implementation. The flag defaults toopencensus
, preserving the current functionality. There will be follow up PRs that actually set up the existing internal metrics to be exported via the OpenTelemetry library (currently, the exporter will start but will not expose any metrics).Link to tracking Issue: Updates #816
Testing: Added unit tests for the new flag and manually tested to make sure the current internal metrics are still exported correctly using OpenCensus when
--metrics-backend
isn't set./cc @odeke-em