-
Notifications
You must be signed in to change notification settings - Fork 426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicitly set FD limit #545
Comments
|
Thank you @michaelklishin! |
I think Docker defaults to Unfortunately, if we're set to something lower than we want, there's not really much we can do inside the container (besides including a warning in the container logs): $ docker run -it --rm --ulimit nofile=1024:1024 bash
bash-5.1# ulimit -n 1048576
bash: ulimit: open files: cannot modify limit: Operation not permitted |
overriding ERL_MAX_PORTS env var also works. Should we still aim for sane defaults in the Dockerfiles? |
I'm not sure why it would be any different? This is part of the sandboxing that Docker provides: 😕 $ docker pull rabbitmq
Using default tag: latest
latest: Pulling from library/rabbitmq
Digest: sha256:3d4c70ec5fc84c27efaeb56c50aafcac4fd8583b61398cc028e4876f84ae73d8
Status: Image is up to date for rabbitmq:latest
docker.io/library/rabbitmq:latest
$ docker run -it --rm --ulimit nofile=1024:1024 rabbitmq bash
root@f41cafc78bb2:/# ulimit -n 1048576
bash: ulimit: open files: cannot modify limit: Operation not permitted
Yes, absolutely, although I'm still not sure I understand what useful changes we could actually make here (since we can't control the |
:-)
we still can control what portion erlang consumes via ERL_MAX_PORTS |
Yes, you can go down, but you cannot go up. 😅 (And the issue here is an environment where the default is too low, right?) I'm not so sure about adding more shell code to set |
nope, the issue here was an environment with fd max set to 11billion something |
@tianon modern Linux default for file handle limit was last good in 1998, as you know. We should use something at least in the 8K range. Our recommendation to most users is 50K+. Setting |
As far as I can find looking through RabbitMQ and Erlang source code to try and understand what's the issue here (and how these things are interacting), the relevant bit of RabbitMQ source code is probably https://github.com/rabbitmq/rabbitmq-server/blob/7abf749a60458aba9d6c9e6bdec1d3b0b2254007/deps/rabbit_common/src/file_handle_cache.erl#L1543-L1564 ? That On Linux, So, if As far as a fix, I think the best we could do here is something in the entrypoint script (or maybe even in the (Another trick is that different versions of |
Yes, see this discussion: rabbitmq/erlang-rpm#104 (comment)
In both environments, the Erlang VM determines the max number of file descriptors available to it and uses that value. I can't exactly determine how at this time but that's the behavior we see. Grep the |
Ahh, Red Hat is the missing piece of the puzzle here (https://access.redhat.com/solutions/1479623 was specifically enlightening, especially the bits about RHEL 9). I spun up an Alma Linux machine and was finally able to reproduce. On my Debian system, I cannot increase the I've now also verified that either of Given that this is exhibiting outside containers too, I wonder if maybe this is something that This is roughly the sort of implementation I had in mind, but it doesn't feel like if [ -z "${ERL_MAX_PORTS:-}" ] && nofile="$(ulimit -n)" && [ "$nofile" -gt 65536 ]; then
export ERL_MAX_PORTS=65536
fi Should this maybe also consider the amount of memory the system has? Other system load? It feels like it's tough to come up with something generic here that isn't going to unexpectedly break large users who have been up until now perfectly happy with RabbitMQ using an enormous amount of RAM (and have configured it to do so on purpose). Perhaps it would be better/safer to test for (Would love to hear your thoughts, @michaelklishin ❤️) |
I'd go with We have discovered the same thing in rabbitmq/erlang-rpm#104: on CentOS Stream 9, the default file descriptor limit is dramatically higher, which is generally a good thing but in Erlang's case, increases initial memory allocation significantly. |
Testing for We should put a visible warning into release notes, too. |
This is a good point. @michaelklishin we should consider handling this in |
…dows See discussion here: docker-library/rabbitmq#545
See the linked PR for a starting point. I believe we can close the issue here. |
Nice, sounds good; thanks! 👍 Closing in favor of rabbitmq/rabbitmq-server#5684 🤘 |
…dows See discussion here: docker-library/rabbitmq#545 (cherry picked from commit 28d6b19)
…dows See discussion here: docker-library/rabbitmq#545 (cherry picked from commit 28d6b19) (cherry picked from commit a36657b)
…dows See discussion here: docker-library/rabbitmq#545 (cherry picked from commit 28d6b19) (cherry picked from commit a36657b) (cherry picked from commit fa687d1)
This change works around an issue in the rabbitmq-server version shipped with Ubuntu 20.04. Large emptyfiles limits in docker containers which run on hosts with low emptyfiles limits cause rabbitmq to break. It will use use 100% CPU of a single core when started, then time out (i.e. fail to start). An erlang process 'erl_child_setup' will continue to use 100% CPU even after rabbitmq failed to start (see docker-library/rabbitmq#545). The change adds a default emptyfiles limit to rabbitmq-server via /etc/default/rabbitmq-server, which is created in the Dockerfile before rabbitmq is installed. The default is generous: 65536. Additionally, the change ads a new container environment variable RABBITMQ_EMPTYFILES_LIMIT which, when set, will override the default above.
* Update kolla-ansible from branch 'master' to 1b74b18c2eb4eff7c38e010965aa34f2a353c4c5 - Merge "Add CentOS Stream 9 / Rocky Linux 9 host support" - Add CentOS Stream 9 / Rocky Linux 9 host support Added c9s jobs are non voting, as agreed on PTG to focus on Rocky Linux 9. Since both CS9 and RL9 have higher default fd limit (1073741816 vs 1048576 in CS8) - lowering that for: * RMQ - because Erlang allocates memory based on this (see [1], [2], [3]). * MariaDB - because Galera cluster bootstrap failed Changed openvswitch_db healthcheck, because for unknown reason the usual check (using lsof on /run/openvswitch/db.sock) is hanging on "Bad file descriptor" (even with privileged: true). [1]: docker-library/rabbitmq#545 [2]: rabbitmq/cluster-operator#959 (comment) [3]: systemd/systemd@a8b627a Depends-On: https://review.opendev.org/c/openstack/tenks/+/856296 Depends-On: https://review.opendev.org/c/openstack/kolla-ansible/+/856328 Depends-On: https://review.opendev.org/c/openstack/kolla-ansible/+/856443 Needed-By: https://review.opendev.org/c/openstack/kolla/+/836664 Co-Authored-By: Michał Nasiadka <[email protected]> Change-Id: I3f7b480519aea38c3927bee7fb2c23eea178554d
Added c9s jobs are non voting, as agreed on PTG to focus on Rocky Linux 9. Since both CS9 and RL9 have higher default fd limit (1073741816 vs 1048576 in CS8) - lowering that for: * RMQ - because Erlang allocates memory based on this (see [1], [2], [3]). * MariaDB - because Galera cluster bootstrap failed Changed openvswitch_db healthcheck, because for unknown reason the usual check (using lsof on /run/openvswitch/db.sock) is hanging on "Bad file descriptor" (even with privileged: true). [1]: docker-library/rabbitmq#545 [2]: rabbitmq/cluster-operator#959 (comment) [3]: systemd/systemd@a8b627a Depends-On: https://review.opendev.org/c/openstack/tenks/+/856296 Depends-On: https://review.opendev.org/c/openstack/kolla-ansible/+/856328 Depends-On: https://review.opendev.org/c/openstack/kolla-ansible/+/856443 Needed-By: https://review.opendev.org/c/openstack/kolla/+/836664 Co-Authored-By: Michał Nasiadka <[email protected]> Change-Id: I3f7b480519aea38c3927bee7fb2c23eea178554d
This is a backport from Zed. cephadm bits to use package from distro backported from I30f071865b9b0751f1336414a0ae82571a332530 Added c9s jobs are non voting, as agreed on PTG to focus on Rocky Linux 9. Since both CS9 and RL9 have higher default fd limit (1073741816 vs 1048576 in CS8) - lowering that for: * RMQ - because Erlang allocates memory based on this (see [1], [2], [3]). * MariaDB - because Galera cluster bootstrap failed Changed openvswitch_db healthcheck, because for unknown reason the usual check (using lsof on /run/openvswitch/db.sock) is hanging on "Bad file descriptor" (even with privileged: true). Added kolla_base_distro_version helper var. [1]: docker-library/rabbitmq#545 [2]: rabbitmq/cluster-operator#959 (comment) [3]: systemd/systemd@a8b627a Depends-On: https://review.opendev.org/c/openstack/ansible-collection-kolla/+/864993 Depends-On: https://review.opendev.org/c/openstack/kolla-ansible/+/864971 Depends-On: https://review.opendev.org/c/openstack/kolla-ansible/+/864973 Depends-On: https://review.opendev.org/c/openstack/kolla-ansible/+/870499 Co-Authored-By: Michał Nasiadka <[email protected]> Change-Id: I3f7b480519aea38c3927bee7fb2c23eea178554d
On Linux ERL_MAX_PORTS defaults to 65,536. But the Docker container for the service `mq` rather uses the host ulimits. When ulimits is infinite, it can make the container to take a very long time to start while using a lot of CPU for nothing. It depends on the host settings. This patch explicitely sets ulimits to 65,536 as recommended for `mq`. See docker-library/rabbitmq#545 for details.
On Linux ERL_MAX_PORTS defaults to 65,536. But the Docker container for the service `mq` rather uses the host ulimits. When ulimits is infinite, it can make the container to take a very long time to start while using a lot of CPU for nothing. It depends on the host settings. This patch explicitely sets ulimits to 65,536 as recommended for `mq`. See docker-library/rabbitmq#545 for details.
On Linux ERL_MAX_PORTS defaults to 65,536. But the Docker container for the service `mq` rather uses the host ulimits. When ulimits is infinite, it can make the container to take a very long time to start while using a lot of CPU for nothing. It depends on the host settings. This patch explicitely sets ulimits to 65,536 as recommended for `mq`. See docker-library/rabbitmq#545 for details.
Currently we don't set any FD limit, which means we are at mercy of host default settings which means Rabbit can OOM.
Please see here for the context: rabbitmq/cluster-operator#959
What has to be done: Something like
CMD ulimit -n 1024
has to be added.Happy to PR myself. Let me know
The text was updated successfully, but these errors were encountered: