Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector metrics names reported with OpenCensus and OpenTelemetry are inconsistent #5882

Closed
dmitryax opened this issue Aug 10, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@dmitryax
Copy link
Member

Currently Collector uses OpenCensus SDK to report its own metrics. Migration to OpenTelemetry metrics SDK started with a feature gate telemetry.useOtelForInternalMetrics. There is a significant difference in the produced metrics.

OpenCensus metrics utilize the prometheus exporter with the default metrics namespace set to otelcol. It means that all the reported metrics get the otelcol prefix, and we have metrics reported as otelcol_exporter_enqueue_failed_log_records, otelcol_process_uptime, etc.

OpenTelemetry metrics SDK doesn't have a capability to specify the namespace on all the metrics. So the metrics would be exporter_enqueue_failed_log_records, process_uptime, etc.

Before proceeding with the migration process, we need to decide what metric names would look like long term. In particular, do we want to keep otelcol prefix or remove it?

  1. If we want to keep the prefix, we should probably stop using the namespace configuration on the OpenCensus Prometheus exporter and make sure all the metrics have the otelcol prefix set explicitly. This should be a no-op change for end users of the default collector's metrics.

  2. If we want to remove the prefix, it will be a significant breaking change for current OTel Collector users. We should probably run it through a deprecation process using another feature gate with an extended (up to 1 year?) duration.

@dmitryax dmitryax added the bug Something isn't working label Aug 10, 2022
@dashpole
Copy link
Contributor

The "short_name" attribute is intended to be used for the prometheus namespace, but is currently a work-in-progress:

Also, I'm working on a migration OTel for OC -> OTel to help mitigate or prevent breaking changes when migrating:

@dmitryax
Copy link
Member Author

As discussed during a SIG meeting today, we will be able to move otelcol prefix from OC Prometheus exporter namespace to the Scope short_name attribute which is WIP ^. Prometheus exporter will be translating Scope short_name attribute to a prefix, so we will get consistent metrics across OC and OTel SDKs

@dmitryax
Copy link
Member Author

dmitryax commented Aug 10, 2022

@dashpole given that we currently expose Collector's metrics in Prometheus format only, do you think it makes sense to set otelcol as prefix on metrics instead of the namespace for now until we have an option to use Scope short_name? At least it'll bring the exposed metrics to a consistent state from Prometheus standpoint and will allow us to migrate to Scope short_name without breaking users with telemetry.useOtelForInternalMetrics enabled.

@bogdandrutu
Copy link
Member

The OC metrics are not at all present anyway when using telemetry.useOtelForInternalMetrics, since the bridge which is the most important part is not available.

@dashpole
Copy link
Contributor

I'm fine prefixing everything we use the OTel API for with otel for now. I'm not sure its worth the effort, but if people actually want to use otel instrumentation right now, we can definitely do that. I 100% agree with not breaking existing telemetry :)

@bogdandrutu
Copy link
Member

Just one small clarification:

Currently Collector uses OpenCensus SDK to report its own metrics. Migration to OpenTelemetry metrics SDK started with a feature gate telemetry.useOtelForInternalMetrics. There is a significant difference in the produced metrics.

Currently all these metrics are "not available" (not different) if telemetry.useOtelForInternalMetrics enabled. Do we care about the prefix fix right now or wait until the bridge is implemented?

@dashpole
Copy link
Contributor

I'd prefer waiting. I don't see the urgency in changing the OTel metric names while the OTel metric SDK is being rewritten.

@bogdandrutu
Copy link
Member

Closed by #6223 since now all metrics with prometheus otel have also the prefix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants