-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nnf-dm + clientmountd cpu usage #117
Comments
We've been investigating this. Early signs point to Go Garbage Collection and HTTP2 Health Checks. Both can be tuned. |
@behlendorf which version of go are you using the build the daemons? |
We're building with the RHEL 8.9 version of go.
|
We have found that we can reduce the CPU usage by approximately 66% by tuning garbage collection and the frequency of the HTTP2 health checks. In comparison of your original observations of 10m of CPU time over 14 days, we were able to get CPU usage down to 3m26s over 12 days (22-Dec to 02-Jan). This can be done by setting the following environment variables in the systemd unit file:
Output of
|
These environment variables have been checked into master in nnf-deploy for each daemon (i.e. nnf-dm, clientmountd). Those variables can be found here. |
The current solution is to start/stop these daemons at will. Flux will do that: flux-framework/flux-coral2#166 |
We've observed that the
nnf-dm
andclientmountd
daemons generate a surprising amount of system noise on the computes even when they should be idle. For reference, over the last 2 weeks they've been lightly used yet have wracked up ~10 minutes of cpu time each. This is compared to most other idle system daemons which report <5 seconds of cpu usage over the same time period. The usagennf-dm
andclientmountd
usage is similar across compute nodes.Corosync which is required for gfs2 generates even more noise on the compute nodes. One possible mitigation would be to only start/stop the pacemaker service on computes when a gfs2 filesystem has been requested. This could either be done by Flux when setting up the computes or the
clientmountd
which is already running there.The text was updated successfully, but these errors were encountered: