RabbitMQ high CPU usage on idle VM #3855

TomaszUrugOlszewski · 2018-02-23T17:34:43Z

Is this a request for help?: Yes

Is this a BUG REPORT or FEATURE REQUEST? (choose one): FEATURE REQUEST

Version of Helm and Kubernetes:
Helm: 2.8.0
Kubectl server: 1.7.12-gke.1 (current) (It's GCP)

Which chart:
stable/rabbitmq

What happened:
High CPU usage on idle VM with only RabbitMQ running, generated by readiness/liveness probes. Basing on Stackdriver charts I see 100% CPU usage on n1-standard-2 VM. After forking chart, replacing probes with simple tcpSocket to 15672 port, it decreased to ~0%.

What you expected to happen:
To allow customizing health checks, use tcpSocket or httpGet instead of exec probes.

How to reproduce it (as minimally and precisely as possible):
Run it on GCP with n1-standard-2 VM type, and watch

sbnl · 2018-03-15T09:48:41Z

Would like to add that I'm also seeing this on GCP K8s 1.8.7-gke.1aprox 60% usage at idle.
n1-standard-1 (3.7G) free tier.
erl_child_setup and beam.smp being the consumers a around 25% each.

edit : chart : rabbitmq-0.6.17

edit: Upgraded to 0.6.25, no change.

macropin · 2018-04-05T03:43:50Z

We're seeing this as well (k8s 1.8.10, rabbitmq-0.6.25). This is caused by a longstanding erlang issue related to nofile ulimit which has been known since at least 2014.

If you disable the liveness and readiness probes you will find the idle usage will come down a lot.

See https://github.com/bitnami/bitnami-docker-rabbitmq/pull/63

robermorales · 2018-04-23T17:41:53Z

cool! the solution of disabling readiness and liveness worked so far! but is there any option to change the ulimit in the docker image, the chart, or the deployment itself?

thomas-riccardi · 2018-04-23T17:53:23Z

@robermorales the docker image including the fix (https://github.com/bitnami/bitnami-docker-rabbitmq/pull/69) is being prepared by bitnami; then the default value should be OK, but we should still expose it in the chart values.

robermorales · 2018-04-23T19:12:18Z

thanks!

thomas-riccardi · 2018-05-02T09:11:17Z

The fix has been released in docker images 3.7.4-r4 and 3.6.15-r4 (and their aliases: 3.7.4, 3.7, 3.6.15 and 3.6).

Since #4591 the values.yaml became non-prod, and the default image tag is the floating tag 3.7.4 instead of 3.7.4-r1. (values-production.yaml still refers to 3.7.4-r1).
Alas, we also have by default pullPolicy: IfNotPresent so in practice the floating tag is not a great idea...

I'm not sure if values-production.yaml is a good pattern; maybe it could just overrides some values instead of redefining everything. (only uses: redis and rabbitmq).

Maybe we should remove the floating tag and use read-only tags: 3.7.4-r4.

In any case, we should bump to -r4 to get the high CPU usage fix.

rips-hb · 2018-06-20T14:11:04Z

I still have the exact same issue in 3.7.6-r8.

thomas-riccardi · 2018-06-20T14:18:11Z

@rips-hb I also have some high CPU usage, but it's periodic, not constant. I found out it's the probes (rabbitmqctl status) which use that much CPU periodically; it's a different issue (that should be created here), and I'm not sure what to do to fix it.

rips-hb · 2018-06-20T15:04:20Z

@thomas-riccardi for me it is constant unfortunately and I could resolve it by disabling liveness and readiness as suggested in this thread. Since it is only a test system that is no problem but I would rather have this checks on a production system. I will investigate a bit more and if I find something else I will create a new ticket.

stale · 2018-08-19T04:17:13Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

nerumo · 2018-08-19T07:07:50Z

still an issue

stale · 2018-09-18T07:41:55Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

macropin · 2018-09-25T02:56:21Z

The fix https://github.com/bitnami/bitnami-docker-rabbitmq/pull/63 is incomplete and does not set the ulimit for the liveness / readiness probes. So this is still an issue.

thomas-riccardi · 2018-09-25T08:59:15Z

@macropin you are right.
However, I did not find a difference in execution time for rabbitmqctl status (or rabbitmqctl node_health_check) with and without ulimit -n 1024 (or ulimit -n 65536), so trying to add the ulimit -n to healtchecks will probably not help.

thomas-riccardi · 2018-09-25T09:02:41Z

@macropin see also rabbitmq-ha advancements: #7378 (and #7752).

intelfx · 2018-10-02T14:23:52Z

@macropin @thomas-riccardi As I see it, the fix in bitnami/bitnami-docker-rabbitmq#63 is not only incomplete, it is completely inapplicable because it modifies the Docker entrypoint that we do not use.

intelfx · 2018-10-02T15:12:59Z

Also I concur with @thomas-riccardi's findings — I did some testing too, and it turns out that even setting ridiculously low ulimit -n 128 does not help neither to reduce the probes' CPU time nor to reduce overall Pod's CPU usage.

alexsandro-xpt · 2018-10-06T14:01:18Z

could it fixed at helm install? I have same problem here.

stale · 2018-11-05T14:57:15Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

thomas-riccardi · 2018-11-06T10:11:48Z

@TomaszUrugOlszewski It should be fixed by #8140 (and subsequent fixes), modulo the issue about disk or memory alarms (see #8635).

leixu26 · 2018-11-09T13:23:02Z

3.5.7 same issure,high cpu load

leixu26 · 2018-11-09T16:35:03Z

rabbitmq 3.7.8 ，Erlang 21.1 same issue,high cpu useage. qps about 200/s

alexsandro-xpt · 2018-11-13T19:40:06Z

But, Is this was fixed?

dnetguru · 2018-11-19T09:08:23Z

I'm seeing the exact same behavior on 3.7.8 (Erlang 21.1) with very high CPU usage when idle and as a workaround disabling both the readiness and liveliness checks seems to fix the issue.

javsalgar · 2018-12-07T14:34:33Z

Hi,

Thanks for the feedback. If this issue is constant, then maybe it makes sense to change the readiness/liveness probes to simple tcp port checking. Thoughts on that?

dnetguru · 2018-12-08T02:37:29Z

@javsalgar I tried again with the latest version of the chart (rabbitmq-4.0.1) with Rabbitmq 3.7.9 (Erlang 21.1) and this problem does not seem to happen anymore.

juan131 · 2019-01-02T12:20:08Z

Hi everyone,

Do you still suffer High CPU usage because of readiness/liveness probes in the latest versions of the cart??

I agree with @javsalgar, we can make simpler probes (such as tcp port checking) or decrease the frequency if you're running into issues because of that.

desaintmartin · 2019-01-03T08:54:26Z

In my case, I greatly reduced the frequency of the probes.

juan131 · 2019-01-03T09:37:26Z

What values did you use @desaintmartin ?

desaintmartin · 2019-01-03T09:38:51Z

  livenessProbe:
    timeoutSeconds: 30
    periodSeconds: 30
  readinessProbe:
    timeoutSeconds: 30
    periodSeconds: 30

macropin · 2019-01-04T02:16:43Z

So the "fix" isn't a fix, rather a workaround. This should be reopened. Were the ulimits ever updated for the probes?

juan131 · 2019-01-04T12:38:56Z

Hi @macropin

Currently liveness/readiness probes use curl instead of rabbitmqctl. Why do you consider necessary to update the ulimits on the probes?

macropin · 2019-01-05T05:48:44Z

Oh that's great. I missed that change. The High cpu usage of rabbitmqctl was due to it not inheriting the entrypoint ulimit which cased a cpu usage issue with the Erlang JVM.

juan131 · 2019-01-08T14:56:00Z

Yes, it was one of the reasons why the CPU was so high. That's why they were moved to curl on
#8140

Artimi · 2019-04-01T10:43:49Z

Hi, I've found another reason why RabbitMQ can have noticeable CPU usage when in idle or under a light load. RabbitMQ is running on Erlang and it is using its scheduling capabilities. To schedule a process Erlang is using scheduler threads and its number by default depends on a number of logical cores. And this is a problem when running in docker/kubernetes because RabbitMQ will think it has more resources than it actually has. In our case, a RabbitMQ node is running on a server with 40 cores but we are limiting it only to have 1 core in Kubernetes. Erlang will run 40 scheduler threads that are constantly context switching which generates the cpu usage. When I set the number of scheduler threads to 1 the CPU usage dropped from 23 % to 3 %.
You can check how many scheduler threads you are using with rabbitmqctl status:

$ rabbitmqctl status
...
{erlang_version,
     "Erlang/OTP 20 [erts-9.3.3.3] [source] [64-bit] [smp:40:40] [ds:40:40:10] [async-threads:640] [hipe] [kernel-poll:true]\n"},
...

numbers after smp are the number of threads. For more info see Erlang scheduler details. You can set the value using environment variable:

RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+S 1:1"

infa-ddeore · 2019-04-10T16:05:07Z

@Artimi how did you set"RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS" with rabbitmq helm?

Artimi · 2019-04-11T06:37:51Z

@infa-ddeore We actually are not using helm, just custom made kubernetes deployment. I just wrote it here because I had similar problems as are in this issue.

thomas-riccardi · 2019-04-11T15:26:50Z

@infa-ddeore the feature seems to be missing indeed.
It could be easily added like I did in #12908 for the metrics container.

We could also use the downward api to get the cpu requests or limits, and generate automatically the correct value for RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS in the command. Or just helm templating.

infa-ddeore · 2019-04-12T00:32:04Z

@thomas-riccardi thanks for the pointers, for now I will update stable/rabbitmq chart locally with this variable and deploy that

infa-ddeore · 2019-04-12T01:16:43Z

@Artimi setting "RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS" to "+S 1:1" doesn't seem helping me, I will try disabling liveness and readyness checks, my rmq is 3.7.8 and erlang 21

juan131 · 2019-04-15T08:04:50Z

Thanks for reporting it @Artimi

I just created a PR so the user has a couple of parameters to limit the number of scheduler threads.

zzguang520 · 2021-08-03T07:15:05Z

same issues in helm chart rabbitmq-ha-1.47.1

mjrsnyder mentioned this issue Jun 1, 2018

[stable/rabbitmq-ha] High CPU usage at "idle" from livelinessProbe and readinessProbe #5873

Closed

gcbirzan mentioned this issue Jun 13, 2018

Allow setting nofiles ulimt via an env variable docker-library/rabbitmq#265

Closed

RaymondArias mentioned this issue Jul 15, 2018

[stable/rabbit-ha] Added liveness probe period #6643

Closed

stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 19, 2018

stale bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 19, 2018

stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 18, 2018

stale bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 25, 2018

tompizmor mentioned this issue Oct 3, 2018

[stable/rabbitmq] Allow specifying ulimit and update healthchecks to avoid high CPU usage #8140

Merged

3 tasks

steven-sheehy mentioned this issue Oct 22, 2018

[stable/rabbitmq] failing probes on disk or memory alarms #8635

Closed

stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 5, 2018

juan131 mentioned this issue Jan 3, 2019

[stable/rabbitmq] Reduce readiness/liveness probes frequency #10377

Merged

3 tasks

k8s-ci-robot closed this as completed in #10377 Jan 3, 2019

AntonFriberg mentioned this issue Jan 9, 2019

High idle cpu usage caused by rabbitmq container polyaxon/polyaxon#309

Closed

juan131 mentioned this issue Apr 15, 2019

[stable/rabbitmq] Add parameters to limit the number of scheduler threads. #13044

Merged

3 tasks

thomas-riccardi mentioned this issue Apr 15, 2019

[stable/rabbitmq] feature request: automatic configuration for schedulers threads using resources requests #13050

Closed

wanderboessenkool mentioned this issue Oct 14, 2019

Change 'rabbitmqctl status' to a wget | grep to save CPU ansible/awx#5009

Merged

ShockwaveNN mentioned this issue Nov 26, 2019

Fix RabbitMQ high CPU usage on idle VM ONLYOFFICE/Docker-DocumentServer#193

Merged

orgads mentioned this issue Jul 10, 2022

[bitnami/rabbitmq] RabbitMQ high CPU usage while idle bitnami/charts#11116

Closed

RabbitMQ high CPU usage on idle VM #3855

RabbitMQ high CPU usage on idle VM #3855

Comments

TomaszUrugOlszewski commented Feb 23, 2018

sbnl commented Mar 15, 2018 • edited Loading

macropin commented Apr 5, 2018 • edited Loading

robermorales commented Apr 23, 2018

thomas-riccardi commented Apr 23, 2018

robermorales commented Apr 23, 2018

thomas-riccardi commented May 2, 2018

rips-hb commented Jun 20, 2018

thomas-riccardi commented Jun 20, 2018

rips-hb commented Jun 20, 2018

stale bot commented Aug 19, 2018

nerumo commented Aug 19, 2018

stale bot commented Sep 18, 2018

macropin commented Sep 25, 2018 • edited Loading

thomas-riccardi commented Sep 25, 2018

thomas-riccardi commented Sep 25, 2018

intelfx commented Oct 2, 2018 • edited Loading

intelfx commented Oct 2, 2018

alexsandro-xpt commented Oct 6, 2018

stale bot commented Nov 5, 2018

thomas-riccardi commented Nov 6, 2018

leixu26 commented Nov 9, 2018

leixu26 commented Nov 9, 2018

alexsandro-xpt commented Nov 13, 2018

dnetguru commented Nov 19, 2018

javsalgar commented Dec 7, 2018

dnetguru commented Dec 8, 2018

juan131 commented Jan 2, 2019

desaintmartin commented Jan 3, 2019

juan131 commented Jan 3, 2019

desaintmartin commented Jan 3, 2019

macropin commented Jan 4, 2019 • edited Loading

juan131 commented Jan 4, 2019

macropin commented Jan 5, 2019

juan131 commented Jan 8, 2019

Artimi commented Apr 1, 2019

infa-ddeore commented Apr 10, 2019

Artimi commented Apr 11, 2019

thomas-riccardi commented Apr 11, 2019 • edited Loading

infa-ddeore commented Apr 12, 2019

infa-ddeore commented Apr 12, 2019

juan131 commented Apr 15, 2019

zzguang520 commented Aug 3, 2021

sbnl commented Mar 15, 2018 •

edited

Loading

macropin commented Apr 5, 2018 •

edited

Loading

macropin commented Sep 25, 2018 •

edited

Loading

intelfx commented Oct 2, 2018 •

edited

Loading

macropin commented Jan 4, 2019 •

edited

Loading

thomas-riccardi commented Apr 11, 2019 •

edited

Loading