-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/hostmetrics] Add available memory state (optionally) #19149
[receiver/hostmetrics] Add available memory state (optionally) #19149
Conversation
…utes thatt do not sum up well by default
Foresight Summary
View More Details⭕ build-and-test-windows workflow has finished in 5 seconds (41 minutes 41 seconds less than
|
Job | Failed Steps | Tests | |
---|---|---|---|
windows-unittest-matrix | - 🔗 | N/A | See Details |
windows-unittest | - 🔗 | N/A | See Details |
✅ telemetrygen workflow has finished in 1 minute 16 seconds (51 seconds less than main
branch avg.) and finished at 14th Mar, 2023.
Job | Failed Steps | Tests | |
---|---|---|---|
build-dev | - 🔗 | N/A | See Details |
publish-latest | - 🔗 | N/A | See Details |
publish-stable | - 🔗 | N/A | See Details |
✅ check-links workflow has finished in 1 minute 27 seconds and finished at 14th Mar, 2023.
Job | Failed Steps | Tests | |
---|---|---|---|
changed files | - 🔗 | N/A | See Details |
check-links | - 🔗 | N/A | See Details |
❌ build-and-test workflow has finished in 36 minutes 41 seconds (27 minutes 47 seconds less than main
branch avg.) and finished at 14th Mar, 2023. 1 job failed.
Job | Failed Steps | Tests | |
---|---|---|---|
unittest-matrix (1.19, connector) | - 🔗 | ✅ 113 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.20, connector) | - 🔗 | ✅ 113 ❌ 0 ⏭ 0 🔗 | See Details |
correctness-metrics | - 🔗 | ✅ 2 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.20, internal) | - 🔗 | ✅ 583 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.19, internal) | - 🔗 | ✅ 583 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.20, extension) | - 🔗 | ✅ 538 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.20, processor) | - 🔗 | ✅ 1545 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.19, extension) | - 🔗 | ✅ 538 ❌ 0 ⏭ 0 🔗 | See Details |
correctness-traces | - 🔗 | ✅ 17 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.19, processor) | - 🔗 | ✅ 1545 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.20, receiver-0) | - 🔗 | ✅ 2609 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.19, receiver-0) | - 🔗 | ✅ 2609 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.19, exporter) | - 🔗 | ✅ 2488 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.20, exporter) | - 🔗 | ✅ 2488 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.19, receiver-1) | - 🔗 | ✅ 1940 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.20, receiver-1) | - 🔗 | ✅ 1940 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.20, other) | - 🔗 | ✅ 4728 ❌ 0 ⏭ 0 🔗 | See Details |
unittest-matrix (1.19, other) | - 🔗 | ✅ 4728 ❌ 0 ⏭ 0 🔗 | See Details |
integration-tests | - 🔗 | ✅ 55 ❌ 0 ⏭ 0 🔗 | See Details |
setup-environment | - 🔗 | N/A | See Details |
check-codeowners | - 🔗 | N/A | See Details |
check-collector-module-version | - 🔗 | N/A | See Details |
checks | CodeGen 🔗 | N/A | See Details |
lint-matrix (receiver-0) | - 🔗 | N/A | See Details |
lint-matrix (receiver-1) | - 🔗 | N/A | See Details |
lint-matrix (processor) | - 🔗 | N/A | See Details |
lint-matrix (exporter) | - 🔗 | N/A | See Details |
lint-matrix (extension) | - 🔗 | N/A | See Details |
lint-matrix (connector) | - 🔗 | N/A | See Details |
lint-matrix (internal) | - 🔗 | N/A | See Details |
lint-matrix (other) | - 🔗 | N/A | See Details |
build-examples | - 🔗 | N/A | See Details |
lint | - 🔗 | N/A | See Details |
unittest (1.20) | - 🔗 | N/A | See Details |
unittest (1.19) | - 🔗 | N/A | See Details |
cross-compile (darwin, amd64) | - 🔗 | N/A | See Details |
cross-compile (darwin, arm64) | - 🔗 | N/A | See Details |
cross-compile (linux, 386) | - 🔗 | N/A | See Details |
cross-compile (linux, amd64) | - 🔗 | N/A | See Details |
cross-compile (linux, arm) | - 🔗 | N/A | See Details |
cross-compile (linux, arm64) | - 🔗 | N/A | See Details |
cross-compile (linux, ppc64le) | - 🔗 | N/A | See Details |
cross-compile (windows, amd64) | - 🔗 | N/A | See Details |
cross-compile (windows, 386) | - 🔗 | N/A | See Details |
build-package (rpm) | - 🔗 | N/A | See Details |
windows-msi | - 🔗 | N/A | See Details |
build-package (deb) | - 🔗 | N/A | See Details |
publish-check | - 🔗 | N/A | See Details |
publish-dev | - 🔗 | N/A | See Details |
publish-stable | - 🔗 | N/A | See Details |
✅ changelog workflow has finished in 2 minutes 39 seconds and finished at 14th Mar, 2023.
Job | Failed Steps | Tests | |
---|---|---|---|
changelog | - 🔗 | N/A | See Details |
✅ load-tests workflow has finished in 6 minutes 49 seconds (6 minutes 27 seconds less than main
branch avg.) and finished at 14th Mar, 2023.
Job | Failed Steps | Tests | |
---|---|---|---|
loadtest (TestTraceAttributesProcessor) | - 🔗 | ✅ 3 ❌ 0 ⏭ 0 🔗 | See Details |
loadtest (TestIdleMode) | - 🔗 | ✅ 1 ❌ 0 ⏭ 0 🔗 | See Details |
loadtest (TestMetric10kDPS|TestMetricsFromFile) | - 🔗 | ✅ 6 ❌ 0 ⏭ 0 🔗 | See Details |
loadtest (TestTraceNoBackend10kSPS|TestTrace1kSPSWithAttrs) | - 🔗 | ✅ 8 ❌ 0 ⏭ 0 🔗 | See Details |
loadtest (TestMetricResourceProcessor|TestTrace10kSPS) | - 🔗 | ✅ 12 ❌ 0 ⏭ 0 🔗 | See Details |
loadtest (TestBallastMemory|TestLog10kDPS) | - 🔗 | ✅ 18 ❌ 0 ⏭ 0 🔗 | See Details |
setup-environment | - 🔗 | N/A | See Details |
loadtest (TestTraceBallast1kSPSWithAttrs|TestTraceBallast1kSPSAddAttrs) | - 🔗 | ✅ 10 ❌ 0 ⏭ 0 🔗 | See Details |
✅ prometheus-compliance-tests workflow has finished in 3 minutes 31 seconds (4 minutes less than main
branch avg.) and finished at 14th Mar, 2023.
Job | Failed Steps | Tests | |
---|---|---|---|
prometheus-compliance-tests | - 🔗 | ✅ 21 ❌ 0 ⏭ 0 🔗 | See Details |
✅ e2e-tests workflow has finished in 13 minutes 28 seconds and finished at 14th Mar, 2023.
Job | Failed Steps | Tests | |
---|---|---|---|
kubernetes-test (v1.26.0) | - 🔗 | N/A | See Details |
kubernetes-test (v1.25.3) | - 🔗 | N/A | See Details |
kubernetes-test (v1.24.7) | - 🔗 | N/A | See Details |
kubernetes-test (v1.23.13) | - 🔗 | N/A | See Details |
*You can configure Foresight comments in your organization settings page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a safe solution, but not 100% sure it's sustainable in the long run.
@rogercoll @djaglowski WDYT?
Another option would be to introduce additional optional metrics like system.linux.memory.slab
and system.memory.available
. BTW is the available state value useful on other platforms? If not, it can be system.linux.memory.available
receiver/hostmetricsreceiver/internal/scraper/memoryscraper/metadata.yaml
Outdated
Show resolved
Hide resolved
receiver/hostmetricsreceiver/internal/scraper/memoryscraper/config.go
Outdated
Show resolved
Hide resolved
receiver/hostmetricsreceiver/internal/scraper/memoryscraper/memory_scraper_linux.go
Outdated
Show resolved
Hide resolved
…nfig.go Co-authored-by: Dmitrii Anoshin <[email protected]>
…mory_scraper_linux.go Co-authored-by: Dmitrii Anoshin <[email protected]>
Looks like it is: https://github.com/shirou/gopsutil/blob/c7c548d6bdb46b39cd36ae0815926de70ab1b8ca/mem/mem.go#L22 |
@@ -88,6 +88,15 @@ load: | |||
cpu_average: <false|true> | |||
``` | |||
|
|||
### Memory | |||
|
|||
`enable_additional_memory_states` specifies whether to enable reporting of a couple of additional memory state metrics: `slab_reclaimable`, `slab_unreclaimable` and `available`. When those are enabled, all the memory states don't sum up well (the result exceeds `total` memory) (default: `false`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a hill I'm going to die on, but in my opinion we should be looking to replace this metric with one or more metrics that use attributes appropriately to differentiate portions of a meaningful total.
I'm not an expert at memory management but it seems someone should be able to clearly state the relationships between these values. As a (probably wrong) starting point:
total = buffered + cached + inactive + free + used
slab = slab_reclaimable + slab_unreclaimable
available = free + slab_reclaimable
?
If we had a list of these relationships, it should be fairly easy to identify a correct data model. Without it, I don't know how to evaluate such a solution except to say that overloading a metric has never been necessary elsewhere, as far as I am aware. Again, this isn't intended to be blocking - just trying to articulate why in my opinion we are having to invent novel ways to tweak a metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right, we're dealing with a complex issue that doesn't have a simple and universal solution. However, I believe that including the ability to report available memory is a valuable feature for users, despite its potential drawbacks.
Memory available only has a unique value for Linux systems, as it was introduced by the kernel in version 3.16. For other platforms, it seems that Gopsutils is computing the value from others:
I would consider introducing |
@rogercoll thank you for sharing your thoughts on this topic! I do agree with some of what you've said, but there's another issue that I think we should consider as well. While it's true that the Linux kernel provides the |
@dmitryax @djaglowski what do you think? |
I'd lean toward introducing new optional metrics It should provide users with a better source in case they want to utilize both available and total linux memory metrics. |
Fine, I can rewrite my PR according to this then. Just to be sure:
|
Right, this should be available on Linux only, but I don't think we need to have
As pointed out by @rogercoll, it looks like the metric value is unique for Linux only, and other Unix-based OSes like FreeBSD can get that value deferred from other states. I'd like to get an agreement on this from @rogercoll and @djaglowski before starting implementation. We should also start by adding Also, these issues should be handled as separate follow-up PRs. Let's keep this one for the discussion. And thank you, @lootek, for helping with that! I think we are getting closer to finally resolving #7417 |
So is this gonna duplicate what we currently have? Or replace the existing memory states at some point?
ok, this makes sense. Unless we want to provide memory available for those as a kind of "handy shortcut".
Good point, for sure we should!
ok 👍
My pleasure :) And if possible I'd like to take part in having this finally resolved. |
We will need to remove |
Since such removal would be a breaking change I guess it'll be a typical 2-stage process (first the deprecation, then actiual removal)? Or are we gonna add a config setting like I did here in this PR? |
I would not tie the metrics with the values provided by Gopsutils (the underlying library could be changed), instead, to the ones provided by each kernel. If a user requires those metrics to be provided for all the systems by the collector, the user could use the metrics generation processor to generate the new ones (at the moment does not support metric attributes, I can open an issue/pr for that) as they are currently being provided. Specs issue to add the new system.memory.slab metric into the specification, which will solve the first issue of system.memory states not summing up to the total value. Please feel free to add extra information or change the proposal. |
Right, it should be 2-step approach
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Description:
Add
available
memory state to hostmetricsreceiver.Add
enable_additional_memory_states
configuration option to memoryscraper that controls reporting of memory state metrics:slab_reclaimable
,slab_unreclaimable
andavailable
. This is because when those are enabled, all the memory states don't sum up well (the result exceedstotal
memory).Link to tracking Issue:
#7417
Testing:
Updated the relevant memoryscraper unit tests.
Documentation:
The hostmetricsreceiver/memoryscraper docs were updated accordingly.