Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docker] psutil reports memory stats about host instead of container #2100

Open
willronchetti opened this issue Apr 26, 2022 · 13 comments
Open
Labels
bug docker linux vm any container (e.g. docker) or virtual OS (e.g. VMWare)

Comments

@willronchetti
Copy link

willronchetti commented Apr 26, 2022

Summary

When running psutil on AWS Fargate, when requesting memory info, metrics are reported for the underlying host machine instead of the container.

  • OS: { Dockerized Debian buster on AWS Fargate }
  • Architecture: { 64bit }
  • Psutil version: { 5.9.0 }
  • Python version: { 3.7.12 }
  • Type: { core }

Description

I run a containerized web application that uses psutil to determine when a process has used too much memory and needs to be killed and restarted. This mechanism stopped working on AWS Fargate because the process.memory_percent value reported is for the underlying Fargate host (I believe), not the container. I have no visibility into the host by definition and so am surprised psutil would report the host memory values. I know it's reporting the host memory values because my tasks are configured to use a constant maximum amount of memory (8Gb) but my rss percentage indicates psutil thinks there is more (when there isn't), causing my tasks to trigger OOM errors.

Relevant code: https://github.com/4dn-dcic/fourfront/blob/master/src/encoded/memlimit.py#L51-L62

Output from CloudWatch:

Restarting process. Memory usage: 529305600 (limit 524288000); Percentage 3.1474589064501415 (limit None)

Perhaps I am missing something here, but based on the percentage we can work backwards to see it is treating 16Gb as the memory limit, which is not the right value. I'm not sure where else it could be from if not from the host machine. It might be this is expected behavior under AWS Fargate, but such a limitation should be documented if so.

@dbwiddis
Copy link
Contributor

dbwiddis commented May 8, 2022

Similar question asked in #2076. From what I can tell, there is facility in psutil to use a custom /proc path for various proc plugins that give container-level results.

@btaens
Copy link

btaens commented Nov 14, 2024

This is still very much an issue.
There should be an option or flag for using /sys/fs/cgroup/memory.max and /sys/fs/cgroup/memory.current for retrieving the state of memory.

@giampaolo
Copy link
Owner

giampaolo commented Nov 15, 2024

Process memory is retrieved from /proc/{PID}/statm.
System memory is retrieved from /proc/meminfo.

To my understanding from this ticket (#2100) and #2076, reading /proc from within the container returns info about the host and not the container. And that is a problem. I would say it's the container that is at fault here, but anyway.

psutil.PROCFS_PATH was introduced to somewhat soften this problem. That is, you could set psutil.PROCFS_PATH to something != /proc that retrieves info about a "remote" host (e.g. a container). But not all info is retrieved from /proc, some APIs are implemented in C and don't rely on /proc (e.g. psutil.disk_partitions() and psutil.users() [1]), so psutil.PROCFS_PATH is not a complete solution.

Also, psutil should be smart enough to understand if it's running in a container, and automatically return the right info. But I have no idea how, and I suspect it's gonna be messy. E.g. there are different container technologies out there, and I imagine they do things differently and not in a standard fashion.

[1] could somebody check what psutil.disk_partitions() and psutil.users() return if invoked from the container? Info about the container or the host?

@dbwiddis
Copy link
Contributor

Also, psutil should be smart enough to understand if it's running in a container, and automatically return the right info. But I have no idea how, and I suspect it's gonna be messy

Similar discussion in my Java-based project: oshi/oshi#2632 (comment)

The issue with the proposed cgroup paths is that they are docker-only and there are other containers.

TLDR: an API conforming to this spec is probably the way to go: https://github.com/opencontainers/runtime-spec/blob/main/spec.md

@giampaolo
Copy link
Owner

giampaolo commented Nov 15, 2024

Hey Daniel! Hope you're good.

The issue with the proposed cgroup paths is that they are docker-only and there are other containers.

Yeah, that's what I thought. I bumped into this excellent article that you linked in one of those tickets:
https://fabiokung.com/2014/03/13/memory-inside-linux-containers/
It's worse than I thought. ^^: The article is from 2014 but it seems nothing has changed. Here's a summary about that article:

Virtual memory

  • /sys/fs/cgroup/memory.stat could be used as a replacement for /proc/meminfo, but the /sys/fs/cgroup location is non-standard, so it should be determined from /proc/self/mountinfo first (assuming this works with containers hehe).
  • /sys/fs/cgroup/memory.stat is present also when you're not in a container, so there should be another mechanism to decide whether to use that or /proc/meminfo. Meaning: how to reliably detect if we're in a container, and which one it is?
  • /sys/fs/cgroup/memory.stat format is not standard and may differ across containers (sigh!).

Swap

  • /proc/vmstat also does not work with containers, but it's unclear what's the replacement for it.
  • sysinfo(2) C syscall also does not work with containers.

Others

  • free and top CLI tools read from /proc, so they have the same problems (wow!). So I now wonder how people cope with this in general. Weird...
  • /proc/PID/* and /proc/net/* namespaces seem to work with containers (yay!), meaning that psutil.Process(), psutil.net_connections() and psutil.net_io_counters() APIs should work.
  • It's unclear whether /proc/stat, /proc/cpuinfo and /proc/stat work with containers. These are the files behind the psutil.cpu_*() APIs, which are also important.
  • psutil.disk_partitions() and psutil.users() rely on C syscalls, getmntent(2) and getutent(2) respectively. It's unclear whether these work with containers.

@giampaolo
Copy link
Owner

/sys/fs/cgroup/memory.stat could be used as a replacement for /proc/meminfo

This seems doable to me. @dbwiddis out of curiosity, did you implement this in oshi?

@giampaolo
Copy link
Owner

how to reliably detect if we're in a container, and which one it is?

Also not standard and quite a mess:
https://stackoverflow.com/questions/20010199/how-to-determine-if-a-process-runs-inside-lxc-docker

@dbwiddis
Copy link
Contributor

dbwiddis commented Nov 15, 2024

/sys/fs/cgroup/memory.stat could be used as a replacement for /proc/meminfo

This seems doable to me. @dbwiddis out of curiosity, did you implement this in oshi?

Nope. I have utility functions that are one-liners to fetch those values for those who know they're running in Docker. But as you say, we may not even know that at runtime.

so there should be another mechanism to decide whether to use that or /proc/meminfo

How about:

  1. Read /proc/meminfo
  2. Attempt to reserve whatever free memory is available (e.g., malloc())
  3. If it works, (malloc() returns non-0 pointer) release the memory (free()).
  4. If you get an exception (malloc() returns 0 and errno is ENOMEM), you have less memory. Start checking alternate file locations in a logical sequence and repeat from step 2.
  5. If all else fails, repeat steps 2-4 using a binary search to find the limit yourself :)

Not actually suggesting this, but it would work...

@dbwiddis
Copy link
Contributor

The serious approach:

  1. Have a config parameter the user can set to tell you what container you are using.
  2. Have an API based on OCI spec to return the appropriate limits programmatically based on the config, e.g., if config is set to "docker" read from cgroup path.

@dbwiddis
Copy link
Contributor

how to reliably detect if we're in a container, and which one it is?

Apparently figuring out what process 1 is seems to be a really good clue.

       /proc/pid/cgroup (since Linux 2.6.24)
              This file describes control groups to which the process
              with the corresponding PID belongs.  The displayed
              information differs for cgroups version 1 and version 2
              hierarchies.

              For each cgroup hierarchy of which the process is a
              member, there is one entry containing three colon-
              separated fields:

                  hierarchy-ID:controller-list:cgroup-path
#  cat /proc/1/cgroup | tail -n 1
0::/system.slice/docker.service

vs. bare metal

cat /proc/1/cgroup | tail -n 1
0::/init.scope

@giampaolo
Copy link
Owner

giampaolo commented Nov 15, 2024

Not sure why, but this is how it looks to me when I'm "logged" into a docker container:

root@ca6ea707f92c:/workspace# cat /proc/1/cgroup
0::/

I do have this file though:

root@ca6ea707f92c:/workspace# cat /.dockerenv 
root@ca6ea707f92c:/workspace# 

@giampaolo
Copy link
Owner

giampaolo commented Nov 16, 2024

Another thing I don't understand is that /sys/fs/cgroup/memory.stat doesn't have many fields that I see being mentioned online, like total_rss, total_cache and total_shmem. Below is how it looks like for me.
I'm also missing crucial files like /sys/fs/cgroup/memory/memory.limit_in_bytes.

root@8e6587cacdc8:/workspace/svn/psutil# cat /sys/fs/cgroup/memory.stat 
anon 843776
file 6664192
kernel 5922816
kernel_stack 32768
pagetables 73728
sec_pagetables 0
percpu 8008
sock 0
vmalloc 8192
shmem 0
zswap 0
zswapped 0
file_mapped 0
file_dirty 0
file_writeback 0
swapcached 0
anon_thp 0
file_thp 0
shmem_thp 0
inactive_anon 0
active_anon 815104
inactive_file 6664192
active_file 0
unevictable 0
slab_reclaimable 5581784
slab_unreclaimable 169264
slab 5751048
workingset_refault_anon 0
workingset_refault_file 0
workingset_activate_anon 0
workingset_activate_file 0
workingset_restore_anon 0
workingset_restore_file 0
workingset_nodereclaim 0
pgscan 0
pgsteal 0
pgscan_kswapd 0
pgscan_direct 0
pgscan_khugepaged 0
pgsteal_kswapd 0
pgsteal_direct 0
pgsteal_khugepaged 0
pgfault 487218
pgmajfault 0
pgrefill 0
pgactivate 0
pgdeactivate 0
pglazyfree 0
pglazyfreed 0
zswpin 0
zswpout 0
thp_fault_alloc 0
thp_collapse_alloc 0
root@8e6587cacdc8:/workspace/svn/psutil# 

@giampaolo giampaolo changed the title [AWS Fargate] psutil shows host machine memory [Docker] psutil reports memory stats about host instead of container Nov 17, 2024
@giampaolo giampaolo added docker vm any container (e.g. docker) or virtual OS (e.g. VMWare) labels Nov 17, 2024
@giampaolo
Copy link
Owner

psutil.net_io_counters() has the same problem (see #1011 (comment)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug docker linux vm any container (e.g. docker) or virtual OS (e.g. VMWare)
Projects
None yet
Development

No branches or pull requests

4 participants