-
Notifications
You must be signed in to change notification settings - Fork 16.8k
RabbitMQ high CPU usage on idle VM #3855
Comments
Would like to add that I'm also seeing this on GCP K8s 1.8.7-gke.1aprox 60% usage at idle. edit : chart : rabbitmq-0.6.17 edit: Upgraded to 0.6.25, no change. |
We're seeing this as well (k8s 1.8.10, rabbitmq-0.6.25). This is caused by a longstanding erlang issue related to nofile ulimit which has been known since at least 2014. If you disable the See https://github.com/bitnami/bitnami-docker-rabbitmq/pull/63 |
cool! the solution of disabling readiness and liveness worked so far! but is there any option to change the ulimit in the docker image, the chart, or the deployment itself? |
@robermorales the docker image including the fix (https://github.com/bitnami/bitnami-docker-rabbitmq/pull/69) is being prepared by bitnami; then the default value should be OK, but we should still expose it in the chart values. |
thanks! |
The fix has been released in docker images Since #4591 the I'm not sure if Maybe we should remove the floating tag and use read-only tags: In any case, we should bump to |
I still have the exact same issue in |
@rips-hb I also have some high CPU usage, but it's periodic, not constant. I found out it's the probes ( |
@thomas-riccardi for me it is constant unfortunately and I could resolve it by disabling liveness and readiness as suggested in this thread. Since it is only a test system that is no problem but I would rather have this checks on a production system. I will investigate a bit more and if I find something else I will create a new ticket. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions. |
still an issue |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions. |
The fix https://github.com/bitnami/bitnami-docker-rabbitmq/pull/63 is incomplete and does not set the ulimit for the liveness / readiness probes. So this is still an issue. |
@macropin you are right. |
@macropin @thomas-riccardi As I see it, the fix in bitnami/bitnami-docker-rabbitmq#63 is not only incomplete, it is completely inapplicable because it modifies the Docker entrypoint that we do not use. |
Also I concur with @thomas-riccardi's findings — I did some testing too, and it turns out that even setting ridiculously low |
could it fixed at |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions. |
@TomaszUrugOlszewski It should be fixed by #8140 (and subsequent fixes), modulo the issue about disk or memory alarms (see #8635). |
3.5.7 same issure,high cpu load |
rabbitmq 3.7.8 ,Erlang 21.1 same issue,high cpu useage. qps about 200/s |
But, Is this was fixed? |
I'm seeing the exact same behavior on 3.7.8 (Erlang 21.1) with very high CPU usage when idle and as a workaround disabling both the readiness and liveliness checks seems to fix the issue. |
Hi, Thanks for the feedback. If this issue is constant, then maybe it makes sense to change the readiness/liveness probes to simple tcp port checking. Thoughts on that? |
@javsalgar I tried again with the latest version of the chart (rabbitmq-4.0.1) with Rabbitmq 3.7.9 (Erlang 21.1) and this problem does not seem to happen anymore. |
Hi everyone, Do you still suffer High CPU usage because of readiness/liveness probes in the latest versions of the cart?? I agree with @javsalgar, we can make simpler probes (such as tcp port checking) or decrease the frequency if you're running into issues because of that. |
In my case, I greatly reduced the frequency of the probes. |
What values did you use @desaintmartin ? |
livenessProbe:
timeoutSeconds: 30
periodSeconds: 30
readinessProbe:
timeoutSeconds: 30
periodSeconds: 30 |
So the "fix" isn't a fix, rather a workaround. This should be reopened. Were the ulimits ever updated for the probes? |
Hi @macropin Currently liveness/readiness probes use |
Oh that's great. I missed that change. The High cpu usage of |
Yes, it was one of the reasons why the CPU was so high. That's why they were moved to curl on |
Hi, I've found another reason why RabbitMQ can have noticeable CPU usage when in idle or under a light load. RabbitMQ is running on Erlang and it is using its scheduling capabilities. To schedule a process Erlang is using scheduler threads and its number by default depends on a number of logical cores. And this is a problem when running in docker/kubernetes because RabbitMQ will think it has more resources than it actually has. In our case, a RabbitMQ node is running on a server with 40 cores but we are limiting it only to have 1 core in Kubernetes. Erlang will run 40 scheduler threads that are constantly context switching which generates the cpu usage. When I set the number of scheduler threads to 1 the CPU usage dropped from 23 % to 3 %.
numbers after
|
@Artimi how did you set"RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS" with rabbitmq helm? |
@infa-ddeore We actually are not using helm, just custom made kubernetes deployment. I just wrote it here because I had similar problems as are in this issue. |
@infa-ddeore the feature seems to be missing indeed. We could also use the downward api to get the |
@thomas-riccardi thanks for the pointers, for now I will update stable/rabbitmq chart locally with this variable and deploy that |
@Artimi setting "RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS" to "+S 1:1" doesn't seem helping me, I will try disabling liveness and readyness checks, my rmq is 3.7.8 and erlang 21 |
Thanks for reporting it @Artimi I just created a PR so the user has a couple of parameters to limit the number of scheduler threads. |
same issues in helm chart rabbitmq-ha-1.47.1 |
Is this a request for help?: Yes
Is this a BUG REPORT or FEATURE REQUEST? (choose one): FEATURE REQUEST
Version of Helm and Kubernetes:
Helm: 2.8.0
Kubectl server: 1.7.12-gke.1 (current) (It's GCP)
Which chart:
stable/rabbitmq
What happened:
High CPU usage on idle VM with only RabbitMQ running, generated by readiness/liveness probes. Basing on Stackdriver charts I see 100% CPU usage on n1-standard-2 VM. After forking chart, replacing probes with simple tcpSocket to 15672 port, it decreased to ~0%.
What you expected to happen:
To allow customizing health checks, use tcpSocket or httpGet instead of exec probes.
How to reproduce it (as minimally and precisely as possible):
Run it on GCP with n1-standard-2 VM type, and watch
The text was updated successfully, but these errors were encountered: