Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus fails to scrap Process/Runtime Instrumentation metrics due to issues with units #1617

Closed
1 of 2 tasks
kharabasz opened this issue Mar 18, 2024 · 11 comments
Closed
1 of 2 tasks
Assignees
Labels
comp:instrumentation.runtime Things related to OpenTelemetry.Instrumentation.Runtime

Comments

@kharabasz
Copy link

kharabasz commented Mar 18, 2024

Issue with OpenTelemetry.Instrumentation.Process & OpenTelemetry.Instrumentation.Runtime

List of all OpenTelemetry NuGet
packages
and version that you are
using (e.g. OpenTelemetry 1.3.2):

  • OpenTelemetry.Exporter.Prometheus.AspNetCore 1.8.0-beta.1
  • OpenTelemetry.Extensions.Hosting 1.8.0-beta.1
  • OpenTelemetry.Instrumentation.AspNetCore 1.7.1
  • OpenTelemetry.Instrumentation.EventCounters 1.5.1-alpha.1
  • OpenTelemetry.Instrumentation.Http 1.7.1
  • OpenTelemetry.Instrumentation.Process 0.5.0-beta.4
  • OpenTelemetry.Instrumentation.Runtime 1.7.0

Runtime version (e.g. net462, net48, net6.0, net7.0 etc. You can
find this information from the *.csproj file):

  • net8.0

Is this a feature request or a bug?

  • Feature Request
  • Bug

What is the expected behavior?

  • .AddProcessInstrumentation() coupled with .AddPrometheusExported() should correctly export metrics to Prometheus.
  • .AddRuntimeInstrumentation() coupled with .AddPrometheusExported() should correctly export metrics to Prometheus.

What is the actual behavior?

We receive an exemplar error when Prometheus hits the scrape target:

  • unit "seconds" not a suffix of metric "process_cpu_time_seconds_total"
  • unit "bytes" not a suffix of metric "process_runtime_dotnet_gc_allocations_size_bytes_total"

Process and Runtime metrics exported in Prometheus format do not meet OpenMetrics specification. The unit used attached in metadata does not match the metric suffix in several cases, causing Prometheus server and OpenTelemetry Collector's Prometheus receiver to throw errors while scraping the target.

The units were defined in metadata for those metrics:

# UNIT process_cpu_time_seconds_total seconds
...
# UNIT process_runtime_dotnet_gc_allocations_size_bytes_total bytes

OpenMetrics specification regarding units: https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#unit

Additional Context

OT configuration:

var otel = services.AddOpenTelemetry();

// Add OpenTelemetry Metrics and export to Prometheus.
otel.WithMetrics(meterProvider => meterProvider
    .AddAspNetCoreInstrumentation() // Inbound HTTP connections: https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/src/OpenTelemetry.Instrumentation.AspNetCore
    .AddHttpClientInstrumentation() // Outbound HTTP connections: https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/src/OpenTelemetry.Exporter.Prometheus.HttpListener
    .AddEventCountersInstrumentation(options =>
    {
        // https://learn.microsoft.com/en-us/dotnet/core/diagnostics/event-counters
        options.AddEventSources("Microsoft.Data.SqlClient.EventSource"); // https://github.com/dotnet/SqlClient/blob/v5.1.5/src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlClientEventSource.cs#L73
        options.AddEventSources("Microsoft.EntityFrameworkCore"); // https://github.com/dotnet/efcore/blob/v8.0.3/src/EFCore/Infrastructure/EntityFrameworkEventSource.cs#L45
    })
    .AddMeter("Microsoft.AspNetCore.Diagnostics") // https://learn.microsoft.com/en-us/dotnet/core/diagnostics/built-in-metrics-aspnetcore#microsoftaspnetcorediagnostics
    .AddMeter("Microsoft.AspNetCore.Routing") // https://learn.microsoft.com/en-us/dotnet/core/diagnostics/built-in-metrics-aspnetcore#microsoftaspnetcorerouting
    .AddMeter("Microsoft.AspNetCore.Server.Kestrel") // https://learn.microsoft.com/en-us/dotnet/core/diagnostics/built-in-metrics-aspnetcore#microsoftaspnetcoreserverkestrel
    .AddProcessInstrumentation() // https://github.com/open-telemetry/opentelemetry-dotnet-contrib/tree/main/src/OpenTelemetry.Instrumentation.Process
    .AddRuntimeInstrumentation() // https://github.com/open-telemetry/opentelemetry-dotnet-contrib/tree/main/src/OpenTelemetry.Instrumentation.Runtime
    .AddPrometheusExporter());
    
// ...

application.MapPrometheusScrapingEndpoint("/_internal/metrics");

With appsettings.json containing:

  "OTEL_DOTNET_EXPERIMENTAL_ASPNETCORE_ENABLE_GRPC_INSTRUMENTATION": "true"
@kharabasz kharabasz added the comp:instrumentation.runtime Things related to OpenTelemetry.Instrumentation.Runtime label Mar 18, 2024
@kharabasz kharabasz changed the title Prometheus fails to scrap Runtime Instrumentation metrics due to issues with units Prometheus fails to scrap Process/Runtime Instrumentation metrics due to issues with units Mar 19, 2024
@xiang17
Copy link
Contributor

xiang17 commented Apr 1, 2024

You can configure the Prometheus exporter to disable the _total suffix with this option: AddPrometheusExporter(o => o.DisableTotalNameSuffixForCounters = true)

The PR is here open-telemetry/opentelemetry-dotnet#5305 and is included in releases starting with 1.8.0-beta.1: https://github.com/open-telemetry/opentelemetry-dotnet/blob/dbec6f845a4295f908c85909602283802514b1b2/src/OpenTelemetry.Exporter.Prometheus.AspNetCore/CHANGELOG.md?plain=1#L19

The units would look like this:

# UNIT process_cpu_time_seconds seconds

# UNIT process_runtime_dotnet_gc_allocations_size_bytes bytes

@kharabasz
Copy link
Author

Thanks - I will take a look at the latest release this week!

@Abrynos
Copy link

Abrynos commented Apr 5, 2024

You can configure the Prometheus exporter to disable the _total suffix with this option: AddPrometheusExporter(o => o.DisableTotalNameSuffixForCounters = true)

This does not fix process_runtime_dotnet_jit_il_compiled_size_bytes_total.

Using following dependencies:

<PackageVersion Include="OpenTelemetry.Exporter.Prometheus.AspNetCore" Version="1.8.0-rc.1" />
<PackageVersion Include="OpenTelemetry.Extensions.Hosting" Version="1.8.0" />
<PackageVersion Include="OpenTelemetry.Instrumentation.AspNetCore" Version="1.8.0" />
<PackageVersion Include="OpenTelemetry.Instrumentation.Http" Version="1.8.0" />
<PackageVersion Include="OpenTelemetry.Instrumentation.Runtime" Version="1.8.0" />

@xiang17
Copy link
Contributor

xiang17 commented Apr 11, 2024

@Abrynos

I couldn't reproduce your issue. Can you create a minimal reproduce app?

I can get the expected metric:

# TYPE process_runtime_dotnet_jit_il_compiled_size_bytes counter
# UNIT process_runtime_dotnet_jit_il_compiled_size_bytes bytes
# HELP process_runtime_dotnet_jit_il_compiled_size_bytes Count of bytes of intermediate language that have been compiled since the process start.
process_runtime_dotnet_jit_il_compiled_size_bytes{otel_scope_name="OpenTelemetry.Instrumentation.Runtime",otel_scope_version="1.8.0"} 319661 1712867337352

with this code snippet:

builder.Services.AddOpenTelemetry()
    .WithMetrics(builder => builder
        .AddPrometheusExporter(options => options.DisableTotalNameSuffixForCounters = true)
        .AddRuntimeInstrumentation()
        .AddProcessInstrumentation());

var app = builder.Build();

app.UseOpenTelemetryPrometheusScrapingEndpoint();

using following dependencies:

    <PackageReference Include="OpenTelemetry.Exporter.Prometheus.AspNetCore" Version="1.8.0-rc.1" />
    <PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.8.0" />
    <PackageReference Include="OpenTelemetry.Instrumentation.Process" Version="0.5.0-beta.5" />
    <PackageReference Include="OpenTelemetry.Instrumentation.Runtime" Version="1.8.0" />

@Abrynos
Copy link

Abrynos commented Apr 17, 2024

@xiang17 I'm sorry, but it seems to have been a fluke I can't reproduce any more. Feel free to ignore my comment.

@DamianEdwards
Copy link

DamianEdwards commented Apr 19, 2024

I'm seeing this issue too in the .NET Aspire samples repo after updating to OpenTelemetry.Exporter.Prometheus.AspNetCore 1.8.0-rc.1

To workaround fully I had to downgrade to 1.8.0-beta.1 and disable the option highlighted above, i.e.:

builder.Services.AddOpenTelemetry()
    // BUG: Part of the workaround for https://github.com/open-telemetry/opentelemetry-dotnet-contrib/issues/1617
   .WithMetrics(metrics => metrics.AddPrometheusExporter(options => options.DisableTotalNameSuffixForCounters = true));

@JanVargovsky
Copy link

_total suffixes were here for a while and Prometheus had no problems with this format:

# TYPE process_runtime_dotnet_gc_allocations_size_bytes_total counter
# UNIT process_runtime_dotnet_gc_allocations_size_bytes_total bytes
# HELP process_runtime_dotnet_gc_allocations_size_bytes_total Count of bytes allocated on the managed GC heap since the process start. .NET objects are allocated from this heap. Object allocations from unmanaged languages such as C/C++ do not use this heap.
process_runtime_dotnet_gc_allocations_size_bytes_total 261876203736 1713947149366

but when I've upgraded from 1.7.0/1.7.1 to 1.8.1/1.8.0rc1 (detailed NuGet versions bellow) it added otel_* labels and now prometheus returns the mentioned error

# TYPE process_runtime_dotnet_gc_allocations_size_bytes_total counter
# UNIT process_runtime_dotnet_gc_allocations_size_bytes_total bytes
# HELP process_runtime_dotnet_gc_allocations_size_bytes_total Count of bytes allocated on the managed GC heap since the process start. .NET objects are allocated from this heap. Object allocations from unmanaged languages such as C/C++ do not use this heap.
process_runtime_dotnet_gc_allocations_size_bytes_total{otel_scope_name="OpenTelemetry.Instrumentation.Runtime",otel_scope_version="1.8.0"} 9932785464 1713947144241

old NuGets:

        <PackageReference Include="OpenTelemetry" Version="1.7.0" />
        <PackageReference Include="OpenTelemetry.Exporter.Prometheus.AspNetCore" Version="1.7.0-rc.1" />
        <PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.7.0" />
        <PackageReference Include="OpenTelemetry.Instrumentation.AspNetCore" Version="1.7.1" />
        <PackageReference Include="OpenTelemetry.Instrumentation.Http" Version="1.7.1" />
        <PackageReference Include="OpenTelemetry.Instrumentation.Runtime" Version="1.7.0" />

new NuGets:

        <PackageReference Include="OpenTelemetry" Version="1.8.1" />
        <PackageReference Include="OpenTelemetry.Exporter.Prometheus.AspNetCore" Version="1.8.0-rc.1" />
        <PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.8.1" />
        <PackageReference Include="OpenTelemetry.Instrumentation.AspNetCore" Version="1.8.1" />
        <PackageReference Include="OpenTelemetry.Instrumentation.Http" Version="1.8.1" />
        <PackageReference Include="OpenTelemetry.Instrumentation.Runtime" Version="1.8.0" />

@xiang17
Copy link
Contributor

xiang17 commented Sep 17, 2024

@JanVargovsky please take a look at the above solution with DisableTotalNameSuffixForCounters. It should have fixed the issue.

@kharabasz
Copy link
Author

kharabasz commented Sep 25, 2024

I can confirm the DisableTotalNameSuffixForCounters option fixed our issues. Thanks!

@Kielek
Copy link
Contributor

Kielek commented Sep 25, 2024

@kharabasz, do you need more help here or we can close the issue?

@kharabasz
Copy link
Author

@Kielek It can be closed - thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:instrumentation.runtime Things related to OpenTelemetry.Instrumentation.Runtime
Projects
None yet
Development

No branches or pull requests

6 participants