-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve CPU/Memory metrics collection at Akka.Cluster.Metrics #4142
Comments
Also, in scala they use Sigar library, which seem to have bindings for .NET. @Aaronontheweb Should we port this? It will require users to have binaries of this library for their OS, but may be working approach anyway. |
Good to know @IgorFedchenko - I had no idea that they had .NET support. Do you know if that library does anything that requires elevated permissions? |
Can not find any particular permission requirements in related articles, seems like the most quick way is to download binaries from here, and check by myself (here are some code samples I found). So need to give it a try once we will work on this issue. Here is nice Wiki for the library. |
Comparing CPU load measurement between Akka.Cluster.Metrics and Perf Counter on windows are quite accurate. |
@Arkatufus https://stackoverflow.com/a/7455860/377476 Basically the |
What about what I suggested above @Arkatufus ? Avoiding perf counters is a good idea given that they aren't x-plat - want some sort of abstraction that works on all supported runtimes. |
FWIW In the past I've found that querying Actually, come and think of it, I frequently saw some of the |
|
IMHO, we should probably just track WorkingSet64 and GC.GetTotalMemory - for routing purposes that's probably accurate enough. In .NET 6, as @Arkatufus pointed out in our call this morning, we can dual target and add support for the new x-plat runtime performance APIs: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/available-counters |
What's the difference between Both APIs are available in .NET Standard 2.0. |
Are we calling |
We can implement the latest Microsoft cross platform performance metrics |
I've used it to collect the comparison data to create the graphs above, it's a lot easier to use since we only have a single source of truth to get all our numbers from. |
CPU numbers look good x-plat on both of the tested platforms so far - and the memory tracking issues are consistently off on both platforms, which makes me think it's just a matter of calling |
Forgot to mention that I induced artificial memory pressure on the test, that's why the memory chart looks different |
I'm seeing weird behavior when I'm using
|
So the only reason our MNTR specs are passing today: akka.net/src/contrib/cluster/Akka.Cluster.Metrics/Collectors/DefaultCollector.cs Lines 61 to 64 in 8bf4a61
|
closed via #6203 |
Introduction
Once #4126 will be merged, we will need to improve metrics loading, that is implemented in
DefaultCollector
.The basic idea is to collect:
Using that information,
AdaptiveLoadBalancingRoutingLogic
will calculate availability of each node and will perform "smart" routing.Ideally, we should collect more complete list:
What we have right now
CPU
There is no API available in
netstandard2.1
to collect CPU metrics out-of-the-box, likePerformanceCounters
. So what we are doing now is usingProcess.TotalProcessorTime
property, to get time, dedicated to current process. Having total time elapsed, we can give some estimation of CPU usage by current process.But talking about CPU total usage, this approach would require to get all processes info with
Process.GetProcesses()
- which is very time consuming (especially when we have to deal with access violation exceptions here), when there are lots of processes.So total CPU usage is just the same as current process CPU usage now. This is more or less fine for routing based on .NET process load, but not ideal if there are some other heavy processes running on machine.
Memory
Candidate list includes:
GC.GetTotalMemory
to get currently allocated managed memory size. There is alsoGC.GetGCMemoryInfo
- that will provide struct withTotalAvailableMemoryBytes
property, but this method is only available at.netstandard3.0
, and we are targeting2.1
PerformanceCouters, which are working under Windows, and there is Mono implementation. There are some other Windows-only ways to get metrics.
Process
class, which provides multiple memory-related propertiesUsing P/Invoke and working with native API
Getting some shell commands output, specific for OS
Currently, we are using the cross-platform sources available for
netstandard2.1
- theProcess
class.First issue
Same as for CPU: this is quite heavy to get all processes information. So current implementation treats
MemoryUsage
as current process usage, which is useful, but not ideal for nodes routing.Second issue
Another issue is understanding the term of "used" memory, and getting "available" memory info.
To track unmanaged memory as well as managed,
Process.PrivateMemorySize64
is used instead ofGC.GetTotalMemory
. It works well by itself. But it is hard to know the upper limit for this value, because it is not the allocated physical memory from RAM (see documentation).Getting "available" memory is much more tricky, and I did not find anything available under .NET Core sdk to get this value. Ideally would be getting available size of installed physical memory (or available part of it in cloud environment). So far, the
Process.VirtualMemorySize64
is used - but is is just a number of bytes in virtual address space, and does not correlate much with really available memory. But still it is one of the upper bounds for available memory, and can be used to get % of memory load (relative to other node).In my understanding, ideally would be loading
Available MBytes
PerformanceCounter (but on all platforms) to get available memory, and get some way to load installed total available memory. This two would allow to get% Used Memory
on the node, and perform routing. And provide all differentProcess
properties in addition, likeWorkingSet
,PrivateMemorySize64
, and others.Maybe there is some other convenient approach. The main idea here is that while current
used / available
relation isProcess.PrivateMemorySize64 / Process.VirtualMemorySize64
- it is always is range of[0, 1]
and reflects the memory load. So we can compare nodes based on this. But value of0.5
does not guarantee that there is available memory on the node at all, so need some more accurate values for node's memory capacity calculation.The text was updated successfully, but these errors were encountered: