Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Do Not Merge] Add description doc for DotNet Runtime metrics #404

Closed
wants to merge 15 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 175 additions & 0 deletions src/OpenTelemetry.Instrumentation.Runtime/description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# Runtime metrics description
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "Runtime metrics overview" or "Runtime metric details" here instead of "description"? Also we should link to this from README.


Metrics name are prefixed with the `process.runtime.dotnet.` namespace, following
the general guidance for runtime metrics in the
[specs](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/runtime-environment-metrics.md#runtime-environment-specific-metrics---processruntimeenvironment).
Instrument Units [should follow](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/README.md#instrument-units)
the Unified Code for Units of Measure.
Comment on lines +3 to +7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-worded this a bit:

Metric names are prefixed with the process.runtime.dotnet namespace following
the general guidance in the OpenTelemetry Specification. Instrument units follow the Unified Code for Units of Measure as outlined in the OpenTelemetry specification.


## GC related metrics

The metrics in this section can be enabled by setting the
`RuntimeMetricsOptions.IsGcEnabled` switch.

| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values |
|-----------------------------------------------|--------------------------|-----------|-------------------|------------|------------------|------------------|
| process.runtime.dotnet.**gc.count** | Garbage Collection count | `{times}` | ObservableCounter | `Int64` | gen | gen0, gen1, gen2 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is suggested by @noahfalk - for folks who have been using the EventCounters, we might consider something like "legacy time in GC" and make it an explicit opt-in (with a clear API name indicating that it has an unclear definition, and we don't recommend it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw we should be able to give a precise definition for the legacy counter too, it just isn't always useful or intuitive. I'm pretty sure the existing counter is computed like this:

GC_n = most recent GC that has ended prior to computing the value
GC_n-1 = GC that ended prior to GC_n

T_end_n = time when GC_n ended
T_end_n-1 = time when GC_n-1 ended
T_start_n = time when GC_n started

inter_gc_time = T_end_n - T_end_n-1
gc_duration = T_end_n - T_start_n
counter value = 100 * gc_duration / inter_gc_time

As an example if GC_n-1 ran from 2.1 sec to 2.2 sec and GC_n ran from 2.5 sec to 3.0 sec then
inter_gc_time = 3.0 - 2.2 = 0.8
gc_duration = 3.0 - 2.5 = 0.5
counter_value = 100 * 0.5 / 0.8 = 62.5%

People usually assume that the value measures the fraction of GC time during a fixed sampling interval (say every 5 minutes) but really it is measuring the fraction of GC time during a variable interval between the most recent GC and the one before that. If the most recent two GCs happen to be close together the counter may appear unexpectedly high and this often causes confusion.


- [GC.CollectionCount](https://docs.microsoft.com/dotnet/api/system.gc.collectioncount):
The number of times garbage collection has occurred for the specified generation
of objects.

### Additional GC metrics only available for NETCOREAPP3_1_OR_GREATER

| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values |
|--------------------------------------------------|--------------------------------------------------|-------|-------------------|------------|------------------|----------------------------|
| process.runtime.dotnet.**gc.allocated.bytes** | Bytes allocated over the lifetime of the process | `By` | ObservableCounter | `Int64` | | |
| process.runtime.dotnet.**gc.fragmentation.size** | GC fragmentation size | `By` | ObservableGauge | `Int64` | gen | gen0, gen1, gen2, loh, poh |

- [GC.GetTotalAllocatedBytes](https://docs.microsoft.com/dotnet/api/system.gc.gettotalallocatedbytes):
Gets a count of the bytes allocated over the lifetime of the process. The returned
value does not include any native allocations. The value is an approximate count.

- [GCGenerationInfo.FragmentationAfterBytes Property](https://docs.microsoft.com/dotnet/api/system.gcgenerationinfo.fragmentationafterbytes#system-gcgenerationinfo-fragmentationafterbytes)
Gets the fragmentation in bytes on exit from the reported collection.

### Additional GC metrics only available for NET6_0_OR_GREATER

| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values |
|-----------------------------------------|--------------------|-------|-----------------|------------|------------------|----------------------------|
| process.runtime.dotnet.**gc.committed** | GC Committed Bytes | `By` | ObservableGauge | `Int64` | | |
| process.runtime.dotnet.**gc.heapsize** | | `By` | ObservableGauge | `Int64` | gen | gen0, gen1, gen2, loh, poh |

- [GCMemoryInfo.TotalCommittedBytes](https://docs.microsoft.com/dotnet/api/system.gcmemoryinfo.totalcommittedbytes?view=net-6.0#system-gcmemoryinfo-totalcommittedbytes):
Gets the total committed bytes of the managed heap.

- [GC.GetGCMemoryInfo().GenerationInfo[i].SizeAfterBytes](https://docs.microsoft.com/dotnet/api/system.gcgenerationinfo):
Represents the size in bytes of a generation on exit of the GC reported in GCMemoryInfo.

## JIT Compiler related metrics

The metrics in this section can be enabled by setting the
`RuntimeMetricsOptions.IsJitEnabled` switch.

These metrics are only available for NET6_0_OR_GREATER.
Copy link
Member

@CodeBlanch CodeBlanch Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the preprocessor directives in this doc like NET6_0_OR_GREATER is very technical. Maybe just say like...

These metrics are available when targeting .NET6 or later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is opened to just solicit feedback only.


| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values |
|-------------------------------------------------|--------------------------|-------------|-------------------|------------|------------------|------------------|
| process.runtime.dotnet.**il.bytes.jitted** | IL Bytes Jitted | `By` | ObservableCounter | `Int64` | | |
| process.runtime.dotnet.**methods.jitted.count** | Number of Methods Jitted | `{methods}` | ObservableCounter | `Int64` | | |
| process.runtime.dotnet.**time.in.jit** | Time spent in JIT | `ns` | ObservableCounter | `Int64` | | |

[JitInfo.GetCompiledILBytes](https://docs.microsoft.com/dotnet/api/system.runtime.jitinfo.getcompiledilbytes?view=net-6.0#system-runtime-jitinfo-getcompiledilbytes(system-boolean)):
xiang17 marked this conversation as resolved.
Show resolved Hide resolved
Gets the number of bytes of intermediate language that have been compiled.
The scope of this value is global.

[JitInfo.GetCompiledMethodCount](https://docs.microsoft.com/dotnet/api/system.runtime.jitinfo.getcompiledmethodcount?view=net-6.0#system-runtime-jitinfo-getcompiledmethodcount(system-boolean)):
Gets the number of methods that have been compiled.
The scope of this value is global.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to state The scope of this value is global. on all of these? Could probably just drop that since it seems obvious (IMO) or move it to the heading so we only state it once? 🤷


[JitInfo.GetCompilationTime](https://docs.microsoft.com/dotnet/api/system.runtime.jitinfo.getcompilationtime?view=net-6.0#system-runtime-jitinfo-getcompilationtime(system-boolean)):
Gets the amount of time the JIT Compiler has spent compiling methods.
The scope of this value is global.

## Threading related metrics

The metrics in this section can be enabled by setting the
`RuntimeMetricsOptions.IsThreadingEnabled` switch.

These metrics are only available for NETCOREAPP3_1_OR_GREATER.

| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values |
|-------------------------------------------------------------|--------------------------------------|-------------|-------------------|------------|------------------|------------------|
| process.runtime.dotnet.**monitor.lock.contention.count** | Monitor Lock Contention Count | `{times}` | ObservableCounter | `Int64` | | |
| process.runtime.dotnet.**threadpool.thread.count** | ThreadPool Thread Count | `{threads}` | ObservableGauge | `Int32` | | |
| process.runtime.dotnet.**threadpool.completed.items.count** | ThreadPool Completed Work Item Count | `{items}` | ObservableCounter | `Int64` | | |
| process.runtime.dotnet.**threadpool.queue.length** | ThreadPool Queue Length | `{items}` | ObservableGauge | `Int64` | | |
| process.runtime.dotnet.**active.timer.count** | Number of Active Timers | `{timers}` | ObservableGauge | `Int64` | | |

- [Monitor.LockContentionCount](https://docs.microsoft.com/dotnet/api/system.threading.monitor.lockcontentioncount?view=netcore-3.1):
Gets the number of times there was contention when trying to take the monitor's
lock.
- [ThreadPool.ThreadCount](https://docs.microsoft.com/dotnet/api/system.threading.threadpool.threadcount?view=netcore-3.1):
Gets the number of thread pool threads that currently exist.
- [ThreadPool.CompletedWorkItemCount](https://docs.microsoft.com/dotnet/api/system.threading.threadpool.completedworkitemcount?view=netcore-3.1):
Gets the number of work items that have been processed so far.
- [ThreadPool.PendingWorkItemCount](https://docs.microsoft.com/dotnet/api/system.threading.threadpool.pendingworkitemcount?view=netcore-3.1):
Gets the number of work items that are currently queued to be processed.
- [Timer.ActiveCount](https://docs.microsoft.com/dotnet/api/system.threading.timer.activecount?view=netcore-3.1):
Gets the number of timers that are currently active. An active timer is registered
to tick at some point in the future, and has not yet been canceled.

## Process related metrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a general question - do we want these to be in the OpenTelemetry.Instrumentation.Runtime package, or we want them to be in a different package (and if yes, what should be the name of that package)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote separate package and OpenTelemetry.Instrumentation.Process as a potential name?

Copy link
Contributor Author

@xiang17 xiang17 Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a PR to remove it: #446, and an issue to add them in a new package: #447.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the CPU metrics percentage collector... and will also push that to new namespace... #437 thx @reyang for ping about info of new namespace...


The metrics in this section can be enabled by setting the
`RuntimeMetricsOptions.IsProcessEnabled` switch.

| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values |
|-------------------------|----------------------------------------|----------------|-------------------|------------|------------------|------------------|
| process.cpu.utilization | CPU utilization of this process | `1` | ObservableGauge | `Double` | | |
| process.cpu.time | Processor time of this process | `s` | ObservableCounter | `Int64` | state | user, system |
| process.memory.usage | The amount of physical memory in use | `By` | ObservableGauge | `Int64` | | |
| process.memory.virtual | The amount of committed virtual memory | `By` | ObservableGauge | `Int64` | | |
xiang17 marked this conversation as resolved.
Show resolved Hide resolved

- CPU utilization
- [Process.TotalProcessorTime](https://docs.microsoft.com/dotnet/api/system.diagnostics.process.totalprocessortime)
divided by ([Environment.ProcessorCount](https://docs.microsoft.com/dotnet/api/system.environment.processorcount)
\* ([DateTime.Now](https://docs.microsoft.com/dotnet/api/system.datetime.now) -
[Process.StartTime](https://docs.microsoft.com/dotnet/api/system.diagnostics.process.starttime)))

- CPU Time:
- [Process.UserProcessorTime](https://docs.microsoft.com/dotnet/api/system.diagnostics.process.userprocessortime):
Gets the user processor time for this process.
- [Process.PrivilegedProcessorTime](https://docs.microsoft.com/dotnet/api/system.diagnostics.process.privilegedprocessortime):
Gets the privileged processor time for this process.

- Memory usage: [Process.GetCurrentProcess().WorkingSet64](https://docs.microsoft.com/dotnet/api/system.diagnostics.process.workingset64):
Gets the amount of physical memory, in bytes, allocated for the currently
active process.
- Memory virtual: [Process.GetCurrentProcess().VirtualMemorySize64](https://docs.microsoft.com/dotnet/api/system.diagnostics.process.virtualmemorysize64):
Gets the amount of the virtual memory, in bytes, allocated for the currently
active process.

Question: EventCounter implementation exposes a metric named `working-set` with
`Environment.WorkingSet`. Is it equal to `Process.GetCurrentProcess().WorkingSet64`
property? I need to decide on which is more suitable for showing users the memory
usage for the process, or whether to include both.

- [Environment.WorkingSet](https://docs.microsoft.com/en-us/dotnet/api/system.environment.workingset?view=net-6.0):
A 64-bit signed integer containing the number of bytes of physical memory mapped
to the process context.

## Assemblies related metrics

The metrics in this section can be enabled by setting the
`RuntimeMetricsOptions.IsAssembliesEnabled` switch.

| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values |
|-------------------------------------------|-----------------------------|----------------|-----------------|------------|------------------|------------------|
| process.runtime.dotnet.**assembly.count** | Number of Assemblies Loaded | `{assemblies}` | ObservableGauge | `Int64` | | |

- [AppDomain.GetAssemblies](https://docs.microsoft.com/dotnet/api/system.appdomain.getassemblies):
Gets the number of the assemblies that have been loaded into the execution context
of this application domain.

## Exception counter metric

The metrics in this section can be enabled by setting the
`RuntimeMetricsOptions.IsExceptionCounterEnabled` switch.

| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values |
|--------------------------------------------|--------------------------------------------|------------|-------------------|------------|------------------|------------------|
| process.runtime.dotnet.**exception.count** | Number of exception thrown in managed code | `{timers}` | ObservableCounter | `Int64` | | |

- [AppDomain.FirstChanceException](https://docs.microsoft.com/dotnet/api/system.appdomain.firstchanceexception)
Occurs when an exception is thrown in managed code, before the runtime searches
the call stack for an exception handler in the application domain.

## Currently out of scope

Regarding process.runtime.dotnet.**time-in-gc**: (DisplayName in [EventCounter implementation](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/RuntimeEventSource.cs#L96)
is "% Time in GC since last GC".) A new metric should replace it by calling a new
API GC.GetTotalPauseDuration().
The new API is added in code but not available yet.
It is targeted for 7.0.0 milestone in .NET Runtime repo.
See [dotnet/runtime#65989](https://github.com/dotnet/runtime/issues/65989)