-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce resource usage #103
Conversation
The impact of this proposed change is shown in the following screenshot: The "regular" |
Differential function view comparing metricbeat (as of https://snapshots.elastic.co/8.12.0-f9668451/summary-8.12.0-SNAPSHOT.html) with metricbeat-own (metricbeat with this patch applied): |
0cb8e6e
to
80dde66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'm not an expert in the system metrics, while I do not see any issue in caching the information for 5min, I'd like to have a second option.
@fearful-symmetry do you see any issues with caching the cgroup paths for 5min?
caching the paths seems reasonable, I think? Particularly for V2, when we're just reading everything out of the same unified hierarchy. |
metric/system/cgroup/util.go
Outdated
cacheEntry, ok := tmp.(pathListWithTime) | ||
if ok { | ||
// If the cached entry for controllerPath is not older than 5 minutes, | ||
// return the cached entry. | ||
if time.Since(cacheEntry.added) < time.Duration(5*time.Minute) { | ||
cPaths.V2 = cacheEntry.pathList.V2 | ||
continue | ||
} | ||
} | ||
// Consider the existing entry for controllerPath invalid. The entry can | ||
// (1) not be casted to pathListWithTime or (2) is older than 5 minutes. | ||
r.v2ControllerPathCache.Delete(controllerPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using a sync.Map
why not use a struct wrapping regular map and a mutex? This would give us better type safety and we wouldn't need the type assertion in this code here. It should also make the logic simpler here, as it would be just about whether the cache entry has expired or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did introduce the synchronization mechanism as it is not guaranteed that Reader
is only used in a sequential order. And as multiple Go routine can use Reader
and a regular map
is not Go thread safe, I went with sync.Map
. I did choose sync.Map
over a struct with a regular map and a mutex, as it is harder to abuse and therefore more safe. While I agree on the added complexity around sync.Map
handling, it prevents direct accessing a map in a struct, without holding the sync mutex.
Is this ok for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally get the need for a synchronization here! It's just that the documentation of sync.Map
itself doesn't recommend using it over a regular map with a mutex, citing type safety as one of the reasons :)
The Map type is specialized. Most code should use a plain Go map instead, with separate locking or coordination, for better type safety and to make it easier to maintain other invariants along with the map content.
I also searched through the existing elastic-agent-system-metrics
and elastic-agent
codebases and I'm not seeing any uses of sync.Map
, probably for this same reason. So I'm hesitant to break that pattern here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While sync.Map
is not used in the two named repositories, other beats and Elastic projects are using it - https://github.com/search?type=code&q=org%3Aelastic+lang%3Ago+sync.Map
I updated the PR to keep the consistency of elastic-agent-system-metrics
Reduce GC load by preallocating memory. Signed-off-by: Florian Lehner <[email protected]>
As multiple PIDs can use the same v2 cgroup, cache the result to reduce CPU usage and number of syscalls.
Make sure v2 entries do not stay forever in the cache. Signed-off-by: Florian Lehner <[email protected]>
Signed-off-by: Florian Lehner <[email protected]>
de8db9b
to
210c199
Compare
## What does this PR do? Reintroduce the improvements from #103. This PR got reverted with #113 because of #109. The issue, that got reported in #109, was fixed with #116. So its time to bring back the performance improvements by reintroducing the cache for cgroup v2. ## Why is it important? <!-- Mandatory Explain here the WHY, or the rationale/motivation for the changes. --> ## Checklist <!-- Mandatory Add a checklist of things that are required to be reviewed in order to have the PR approved List here all the items you have verified BEFORE sending this PR. Please DO NOT remove any item, striking through those that do not apply. (Just in case, strikethrough uses two tildes. ~~Scratch this.~~) --> - [x] My code follows the style guidelines of this project - [x] I have commented my code, particularly in hard-to-understand areas - [ ] ~~I have added tests that prove my fix is effective or that my feature works~~ - [ ] ~~I have added an entry in `CHANGELOG.md`~~ --------- Signed-off-by: Florian Lehner <[email protected]>
What does this PR do?
Reduce resource usage by preallocating memory and using caches.
Why is it important?
elastic/elastic-agent-system-metrics
is a direct dependency of beats likemetricbeat
.As shown in the above screenshot,
metricbeat
spends most of its time inelastic/elastic-agent-system-metrics
doing syscalls to walk directories.Checklist
I have added tests that prove my fix is effective or that my feature worksI have added an entry inCHANGELOG.md