Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Instrumentation.Process] Implement threadSafe Refresh() #664

Merged
merged 13 commits into from
Oct 10, 2022
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ public static MeterProviderBuilder AddProcessInstrumentation(
configure?.Invoke(options);

var instrumentation = new ProcessMetrics(options);
builder.AddMeter(ProcessMetrics.MeterInstance.Name);
builder.AddMeter(instrumentation.MeterInstance.Name);
return builder.AddInstrumentation(() => instrumentation);
}
}
48 changes: 31 additions & 17 deletions src/OpenTelemetry.Instrumentation.Process/ProcessMetrics.cs
Original file line number Diff line number Diff line change
Expand Up @@ -20,42 +20,56 @@

namespace OpenTelemetry.Instrumentation.Process;

internal class ProcessMetrics
internal sealed class ProcessMetrics
{
internal static readonly AssemblyName AssemblyName = typeof(ProcessMetrics).Assembly.GetName();
internal static readonly Meter MeterInstance = new(AssemblyName.Name, AssemblyName.Version.ToString());
private static readonly Diagnostics.Process CurrentProcess = Diagnostics.Process.GetCurrentProcess();
internal readonly Meter MeterInstance = new(AssemblyName.Name, AssemblyName.Version.ToString());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering, if creating 2 instances of ProcessMetrics class, do we expect every instance create its own meter with the same name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that in the SDK, MetricStreamIdentity is maintained, so there will be only one MetricStreamIdentity in this case.

What would happen is both of the callbacks from the instruments would be firing, but there will only be one export.
The second one will hit this https://github.com/open-telemetry/opentelemetry-dotnet/blob/7f71283c0bf2ee0e2c5e3c9617295ddaf20c7a0c/src/OpenTelemetry/Metrics/MetricReaderExt.cs#L51

Please refer to the testcases for the experiments I've done.
Thank you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there will be only one MetricStreamIdentity in this case.

I doubt that. MetricStreamIdentity is maintained at a MetricReader level within the SDK. In the scenario, where we have two instances of ProcessMetrics class due to multiple MeterProviders we would also have multiple readers each maintaining their own set of MetricStreamIdentity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I think each reader has its own storage with the current implementation - usage of instance for ProcessMetrics;
so it may not be required to maintain MetricStreamIdentity at the reader level.


static ProcessMetrics()
private readonly Diagnostics.Process currentProcess = Diagnostics.Process.GetCurrentProcess();
private double? memoryUsage;
private double? virtualMemoryUsage;

public ProcessMetrics(ProcessInstrumentationOptions options)
{
// TODO: change to ObservableUpDownCounter
MeterInstance.CreateObservableGauge(
this.MeterInstance.CreateObservableGauge(
"process.memory.usage",
() =>
{
CurrentProcess.Refresh();
return CurrentProcess.WorkingSet64;
if (!this.memoryUsage.HasValue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these callbacks to be thread-safe they need to be inside a lock. For example:

() =>
{
    lock(this)
    {
        if (!this.memoryUsage.HasValue)
        {
            this.Snapshot();
        }
        var value = this.memoryUsage.Value;
        this.memoryUsage = null;
        return value;
    }
}

{
this.Snapshot();
}

var value = this.memoryUsage.Value;
this.memoryUsage = null;
return value;
},
unit: "By",
description: "The amount of physical memory in use.");
description: "The amount of physical memory allocated for this process.");

// TODO: change to ObservableUpDownCounter
MeterInstance.CreateObservableGauge(
this.MeterInstance.CreateObservableGauge(
"process.memory.virtual",
() =>
{
CurrentProcess.Refresh();
return CurrentProcess.VirtualMemorySize64;
if (!this.virtualMemoryUsage.HasValue)
{
this.Snapshot();
}

var value = this.virtualMemoryUsage.Value;
this.virtualMemoryUsage = null;
return value;
},
unit: "By",
description: "The amount of committed virtual memory.");
description: "The amount of virtual memory allocated for this process that cannot be shared with other processes.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting "process.memory.virtual" to be Process.VirtualMemorySize64. I saw you linked to #673 which had a very nice explanation for why you made the change. I think you were attempting to be very thorough and precise in making this choice so I want to offer something equally detailed to give an alternate perspective on the problem.

When we get into details, I think 'committed memory' is unfortunately an ambiguous term and the best way to understand what the OTel spec writers intended will be for us to ask them and clarify the spec. In Mark Russinovich's blog post he repeatedly refers to memory that "counts against the commit limit", but he never actually says "Memory is 'committed memory' if and only if it counts against the commit limit". I think that would be an intuitive definition in context of his article and probably it is the definition he intended the reader to apply despite never stating it pedantically. For simplicity we can just call that his intended definition. His definition is probably technically precise for how some portion of the Windows Kernel uses the term, but I suspect it is not how most people understand and use it.

IMO the common definition of committed memory that most developers would use is that it is all the memory addresses a process can read from without triggering an access violation. This includes (at least) three categories of memory:

  • memory backed solely by RAM
  • memory backed by a swapfile/pagefile
  • memory backed by some other mapped file on disk

The first two agree with Mark's definition above so the difference is entirely whether the third bullet is included. To justify that it is common for most people to include file backed memory as part of 'committed memory' this article has a screenshot of the tool VMMap at the top. I consider VMMap to be the de-facto standard for investigating and categorizing virtual memory usage (incidentally its also a tool MarkR made). Looking at the table in that image is a column called 'Committed' and the first two rows are 'Image' and 'Mapped File'. If committed virtual memory did not include memory backed by files on disk then we'd expect those columns to be 0, but they aren't.

As a second example, this link is a help page for the !address command in windbg which shows information about the virtual address space. There is an example output part way down the page showing:

0:000> !address 75831234
Usage:                  Image
Base Address:           75831000
End Address:            758f6000
Region Size:            000c5000
Type:                   01000000MEM_IMAGE
State:                  00001000MEM_COMMIT
Protect:                00000020PAGE_EXECUTE_READ
More info:              lmv m kernel32
More info:              !lmi kernel32
More info:              ln 0x75831234

The debugger says that this address is part of the kernel32 image and it is on a memory page that is marked both as being MEM_IMAGE and MEM_COMMIT. Mark's blog post says specifically that "The virtual memory in Testlimit's address space where its executable and system DLL images are mapped therefore don't count toward the commit limit" and yet windbg describes this exact type of memory as being MEM_COMMIT.

I think the way VMMap and Windbg refer to committed memory is the way most developers would use the term. That definition corresponds to using Process.VirtualMemorySize64 as the metric value. Of course we can always ask the OTel spec writers if that is the definition they intended. Hope that helps!

}

/// <summary>
/// Initializes a new instance of the <see cref="ProcessMetrics"/> class.
/// </summary>
/// <param name="options">The options to define the metrics.</param>
public ProcessMetrics(ProcessInstrumentationOptions options)
private void Snapshot()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you seen any practical examples where the lack of a snapshot caused trouble? For comparison, the CPU usage, GC memory usage, and total VM usage EventCounters have been present in .NET Core since .NET Core 3.1, there is nothing that forces them to be read as a single snapshot, and I am not aware of any user feedback saying that it has ever caused someone trouble.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing the comment here:
#718

{
this.currentProcess.Refresh();
this.memoryUsage = this.currentProcess.WorkingSet64;
this.virtualMemoryUsage = this.currentProcess.PrivateMemorySize64;
}
}