-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow performances depending on the volume used by the podman machine #19467
Comments
@vrothberg @ashley-cui Can one of you take a look and try to reproduce? |
Sorry, I am currently flooded. |
Perhaps this is podman machine attempting to figure out the disk space used in a Volume, which would run lots of CPU and hammer the disk as it walks disk tree? If podman desktop is asking for the size of volumes or containers modifications on images, this could cause Podman to get very active. |
I looked at this; hold tight. My observation is in the case where it is slow, the service is being flooded with requests including a lot of df (expensive!) requests. PD folks are looking at that and I will keep an eye on it. |
Out of curiosity, is there an update on this issue? |
I can reproduce using only podman on macOS And there is a huge difference using rootless or rootful. pre-reqs:
then if I run 5 containers like
and do the df command
so, very fast now if I switch to rootful
then it's like 7s |
Thanks @benoitf! I can even reproduce on Fedora with Podman v4.6. Very similar numbers: 0.071s rootless vs 7.562s root |
With an instrumented binary:
Processing containers seems to be the bottleneck: |
I did an strace on the podman machine df on Stevan machine, 2GB of scan of /var/lib/containers/storage/ is where the time is spent, each file get an newfstatat() which sounds normal, then 2 lgetxattr security.capability and 2 llistxattr securi Daniel |
Also while the podman machine df is run, top on the host only shows qemu using more CPU than normal |
rootless doesn't use naive diff since metacopy up is not supported for overlay in a user namespace. We could probably drop the naive diff in any case and just look at the directory size: diff --git a/drivers/overlay/overlay.go b/drivers/overlay/overlay.go
index 6b6f20637..7e91dd161 100644
--- a/drivers/overlay/overlay.go
+++ b/drivers/overlay/overlay.go
@@ -2162,10 +2162,6 @@ func (d *Driver) getLowerDiffPaths(id string) ([]string, error) {
// and its parent and returns the size in bytes of the changes
// relative to its base filesystem directory.
func (d *Driver) DiffSize(id string, idMappings *idtools.IDMappings, parent string, parentMappings *idtools.IDMappings, mountLabel string) (size int64, err error) {
- if d.options.mountProgram == "" && (d.useNaiveDiff() || !d.isParent(id, parent)) {
- return d.naiveDiff.DiffSize(id, idMappings, parent, parentMappings, mountLabel)
- }
-
p, err := d.getDiffPath(id)
if err != nil {
return 0, err |
Yes, that does the trick. At the moment, that drops it from 19.3s to 0.11s on my machine. A mind-blowing difference. I think it's OK to just look at the directory size; at least for this use case. |
Computing the diff size for the rootful overlay storage driver used the naive diff. The reasoning was that we have made use of rootful copy-up. The downside is a mind-blowing performance penalty in comparison to the rootless case. Hence, drop the naive diff and only compute the size of the directory which is absolutely sufficient for the motivating use case of podman-system-df. This drops the execution of system-df from 19.3s to 0.11s listing 5 containers and 1 image. Fixes: github.com/containers/podman/issues/19467 Signed-off-by: Giuseppe Scrivano <[email protected]> Signed-off-by: Valentin Rothberg <[email protected]>
I opened containers/storage#1688. |
Computing the diff size for the rootful overlay storage driver used the naive diff. The reasoning was that we have made use of rootful copy-up. The downside is a mind-blowing performance penalty in comparison to the rootless case. Hence, drop the naive diff and only compute the size of the directory which is absolutely sufficient for the motivating use case of podman-system-df. This drops the execution of system-df from 19.3s to 0.11s listing 5 containers and 1 image. Fixes: github.com/containers/podman/issues/19467 Signed-off-by: Valentin Rothberg <[email protected]>
Mainly to merge fixes for containers#19467 into the main branch. Fixes: containers#19467 Signed-off-by: Valentin Rothberg <[email protected]>
The performance issue in containers#19467 drove me to add a benchmark for system-df to avoid regressing on it in the future. Comparing current HEAD to v4.6.0 yields ``` /home/vrothberg/containers/podman/bin/podman system df ran 201.47 times faster than /usr/bin/podman system df ``` Signed-off-by: Valentin Rothberg <[email protected]>
Issue Description
When using Podman Desktop I'm getting a strange behavior where the podman service running in the podman machine is becoming unresponsive.
Please see the corresponding Podman Desktop issue:
Podman Desktop Issue
Checking in the podman machine, I can see that the podman service is consuming a lot of CPU. There are probably requests being sent by podman desktop, while podman is already overloaded. When stopping podman desktop, the podman service is getting back normal CPU usage.
Maybe this could be due to the time podman takes to calculate something which is constantly asked by podman desktop - it overload podman and as there is no cache, it does go off.
Steps to reproduce the issue
I'm not entirely sure on how to reproduce the issue. I think it appears after I have been using the
podman save
andpodman load
, which create a large file.On the other side:
$HOME:$HOME
folder I'm hitting the issuepodman save
into an empty folder and mounting it to podman machine, I don't get the issueDescribe the results you received
Slow performance
Describe the results you expected
Podman stays responsive
podman info output
Podman in a container
No
Privileged Or Rootless
Privileged
Upstream Latest Release
Yes
Additional environment details
Mac OS Ventura Intel
Additional information
Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting
The text was updated successfully, but these errors were encountered: