Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for exporting internal metrics via OpenTelemetry library. #3816

Merged
merged 1 commit into from
Sep 23, 2021

Conversation

kirbyquerby
Copy link
Member

Description: add support for exporting internal metrics via OpenTelemetry library.

This change adds a new flag --metrics-backend which can be set to opentelemetry to use the OpenTelemetry Prometheus exporter for internal metrics instead of the current OpenCensus implementation. The flag defaults to opencensus, preserving the current functionality. There will be follow up PRs that actually set up the existing internal metrics to be exported via the OpenTelemetry library (currently, the exporter will start but will not expose any metrics).

Link to tracking Issue: Updates #816

Testing: Added unit tests for the new flag and manually tested to make sure the current internal metrics are still exported correctly using OpenCensus when --metrics-backend isn't set.

/cc @odeke-em

@kirbyquerby kirbyquerby requested review from a team and tigrannajaryan August 10, 2021 19:21
Copy link
Member

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kirbyquerby Thanks for starting this effort.

config/configtelemetry/configtelemetry.go Outdated Show resolved Hide resolved
service/telemetry.go Show resolved Hide resolved
@alolita
Copy link
Member

alolita commented Aug 16, 2021

@Aneurysm9 @rakyll @dashpole can you please review?

@kirbyquerby
Copy link
Member Author

@bogdandrutu would you mind taking a look at this PR?

Copy link
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we will have to maintain OC and OTel instrumentation side-by-side for a long time? It seems likely that OC and OTel instrumentation will diverge subtly over time without extensive testing...

As an alternative, I would propose that we use both opencensus and opentelemetry client libraries at the same time while we migrate. We can use the OpenCensus bridge for OpenTelemetry to make both opencensus and opentelemetry clients serve metrics on the same prometheus endpoint.

We could roll this out with the following steps:

  1. Switch from the OpenCensus prometheus exporter to a "wrapped" OpenTelemetry prometheus exporter. Verify that self-obs metrics on the prometheus endpoint remain the same.
  2. Add the OpenTelemetry API/SDK and use the same prometheus exporter that OC uses.
  3. Migrate each metric from OC to OTel individually, and verify that the metric doesn't change.

@kirbyquerby
Copy link
Member Author

@dashpole thanks for the suggestion! I tried the bridge, but ran into two problems:

Any idea where I'm going wrong?

@alolita
Copy link
Member

alolita commented Aug 24, 2021

@dashpole thoughts? It looks like your suggestion doesn't fit the use case being tested.

@dashpole
Copy link
Contributor

I looked into this yesterday, and it definitely won't work as-is. But I think the bridge should support this use-case, so i'm going to do some further investigation to see if I can extend the bridge to cover this.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are starting with a "public" change that we don't want to support in the future (flags), and the flag is a no-op. Why do we need this then?

@kirbyquerby
Copy link
Member Author

@bogdandrutu The intent of the flag is to allow us to iteratively migrate without breaking existing functionality and then delete the OpenCensus code once we're sure everything works. The alternative would be to switch everything out all at once -- would that be preferable?

@bogdandrutu
Copy link
Member

Not sure how is the plan going to work, please describe. I am not suggesting to do a big change, but to do incremental changes that will not be hocked until the last PR when we are ready with everything, then we can have that change flag protected

@kirbyquerby
Copy link
Member Author

@bogdandrutu the current plan is to make incremental changes guarded behind a flag that defaults to OpenCensus, change the default to OpenTelemetry once we're fairly confident we have feature parity, and then finally remove the flag (along with the OpenCensus code) once we're confident the migration is stable.

Are you suggesting we still make incremental changes but make the new code completely unreachable, and only add the flag once all of the code has been added?

@kirbyquerby
Copy link
Member Author

@bogdandrutu I've removed the flag and made a const bool to control the behavior instead. PTAL.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@Aneurysm9
Copy link
Member

@bogdandrutu it looks like the use of a constant that can be set at build time to control the choice of OTel or OpenCensus metrics will be easily converted to a feature gate when we have such a system. I think this PR can be merged as-is.

@kirbyquerby
Copy link
Member Author

Thanks for taking a look, Anthony! I've rebased to fix the go.mod/go.sum merge conflicts.

@bogdandrutu bogdandrutu merged commit 6318840 into open-telemetry:main Sep 23, 2021
@kirbyquerby kirbyquerby deleted the issue-816-1 branch September 23, 2021 19:11
tigrannajaryan pushed a commit that referenced this pull request Oct 1, 2021
Added a version tag to the collector's own telemetry metrics because today when looking at those metrics there is no way to know the collector's version, this can also help to know what collector versions are running out there.
I based the "flag" to control this on [PR 3816](#3816)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants