-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitnami/rabbitmq] RabbitMQ high CPU usage while idle #11116
Comments
Hi, It looks to me that this is something more related to RabbitMQ itself, so the upstream devs should recommend which settings (in case it is a matter of only settings and not a bug) suit better to avoid the issue. |
Use REST APIs for liveness/readiness probes, instead of spawning expensive erlang processes. Fixes #11116 Signed-off-by: Orgad Shaneh <[email protected]>
Use REST APIs for liveness/readiness probes, instead of spawning expensive erlang processes. Fixes bitnami#11116 Signed-off-by: Orgad Shaneh <[email protected]> Signed-off-by: vaggeliskls <[email protected]>
Use REST APIs for liveness/readiness probes, instead of spawning expensive erlang processes. Fixes #11116 Signed-off-by: Orgad Shaneh <[email protected]>
Same issue here. Environment: Chart versions: After restart it works fine for some time and then CPU goes to 100% of available: Debug, node with an issue:
So Healthy node shows:
Yaml: replicaCount: 3
resources:
requests:
cpu: 500m
memory: 700Mi
limits:
cpu: 600m
memory: 900Mi
...
metrics:
enabled: true
clustering:
forceBoot: true
readinessProbe:
periodSeconds: 60
timeoutSeconds: 40
livenessProbe:
periodSeconds: 60
timeoutSeconds: 40
|
I have observed this unusual CPU usage on Chart v10.3.9 (and RabbitMQ 3.10.20) running on GKE: ~0.4 cpu used per pod on idle. At least for now, here is my solution : The idea is to ensure that the Erlang VM uses only 1 scheduler (and not many more) for the "CTL" commands, so the CLI tools like EDIT: it doesn't solve everything, cf my comment below |
To set We can set extra env variables via
However, so far I understood rabbitmq do not recommend to use CLI tools for probes:
So, something like this seems to fix CPU usage:
(liveness looses functionality, so we might want something better here) |
My above comment was too quick: I do get an immediate improvement According to my first analysis, it's the I agree with @Igor-lkm's remark: we should avoid the CLI tool for probes. |
We are also facing this issue? what could be best solution for this issue ? |
Here's an example. The Authorization header contains base64 encoding of user:password (on this example, user:p4ssw0rd). customLivenessProbe:
failureThreshold: 6
httpGet:
httpHeaders:
- name: "Authorization"
value: "Basic dXNlcjpwNHNzdzByZA=="
path: "/api/health/checks/virtual-hosts"
port: 15672
initialDelaySeconds: 120
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 20
customReadinessProbe:
failureThreshold: 3
httpGet:
httpHeaders:
- "name": "Authorization"
value: "Basic dXNlcjpwNHNzdzByZA=="
path: "/api/health/checks/local-alarms"
port: 15672
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 20 |
And this is in terraform: resource "random_password" "root" {
length = 16
min_lower = 1
min_upper = 1
min_numeric = 1
special = false
}
resource "random_password" "erlang_cookie" {
length = 16
min_lower = 1
min_upper = 1
min_numeric = 1
special = false
}
locals {
root-user = {
username = "Admin"
password = random_password.root.result
}
}
resource "helm_release" "rabbitmq" {
name = "my-rabbitmq"
namespace = "namespace"
repository = "https://charts.bitnami.com/bitnami"
chart = "rabbitmq"
version = "11.13.0"
timeout = 240
values = [yamlencode({
replicaCount = 1
auth = {
username = local.root-user.username
password = local.root-user.password
erlangCookie = random_password.erlang_cookie.result
}
customLivenessProbe = {
httpGet = {
path = "/api/health/checks/virtual-hosts"
port = 15672
httpHeaders = [{
name = "Authorization"
value = "Basic ${base64encode("${local.root-user.username}:${local.root-user.password}")}"
}]
}
initialDelaySeconds = 120
periodSeconds = 30
timeoutSeconds = 20
failureThreshold = 6
successThreshold = 1
}
customReadinessProbe = {
httpGet = {
path = "/api/health/checks/local-alarms"
port = 15672
httpHeaders = [{
name = "Authorization"
value = "Basic ${base64encode("${local.root-user.username}:${local.root-user.password}")}"
}]
}
initialDelaySeconds = 10
periodSeconds = 30
timeoutSeconds = 20
failureThreshold = 3
successThreshold = 1
}
})]
} |
I actually tried this
|
Thanks for updating this issue and creating the associated PR. The team will review it and provide feedback. Once merged the PR, this issue will be automatically closed. |
Use REST APIs for liveness/readiness probes, instead of spawning expensive erlang processes. Reapply of bitnami#11117 and bitnami#11180. Fixes bitnami#11116. Signed-off-by: Orgad Shaneh <[email protected]> (cherry picked from commit 73966c6)
Use REST APIs for liveness/readiness probes, instead of spawning expensive erlang processes. Reapply of #11117 and #11180. Fixes #11116. Signed-off-by: Orgad Shaneh <[email protected]> (cherry picked from commit 73966c6) Signed-off-by: Juan José Martos <[email protected]> Co-authored-by: Juan José Martos <[email protected]>
Use REST APIs for liveness/readiness probes, instead of spawning expensive erlang processes. Reapply of bitnami#11117 and bitnami#11180. Fixes bitnami#11116. Signed-off-by: Orgad Shaneh <[email protected]> (cherry picked from commit 73966c6) Signed-off-by: Juan José Martos <[email protected]> Co-authored-by: Juan José Martos <[email protected]>
Use REST APIs for liveness/readiness probes, instead of spawning expensive erlang processes. Reapply of #11117 and #11180. Fixes #11116. Signed-off-by: Orgad Shaneh <[email protected]> (cherry picked from commit 73966c6) Signed-off-by: Juan José Martos <[email protected]> Co-authored-by: Juan José Martos <[email protected]>
Thanks for reporting. I'll try to look into this next week. |
@orgads I think it was a mix of unexpected event, I went through many upgrade and never had any CPU peak/problem (this is why it was my first suspect) but downgrading actually did not solve the problem. I then updated the customLivenessProbe and customReadinessProbe since I was using load definition and it seems now that the CPU is very low even in idle, I will monitor the next few hours but it seems that it solved the problem. |
So my summary: What cases high CPU usage is CLI tools of rabbitmq Example: and rabbitmq do not recommend to use CLI tools for probes actually. This PR fixed this #16082 However it fixes only if
|
Great summary @Igor-lkm, thanks! |
Great summary @Igor-lkm, thanks! |
I've updated from RabbitMQ 3.11.5 to 3.13.3 (chart 11.2.0 to 14.4.4) and noticed my CPU usage triple. I have Setting the probes port to amqp seems to fix the CPU usage, but @Igor-lkm above seems to mention it also loose functionality? That doesn't seem good. I also see with this snippet only it loose all other defaut timing values, should we put them back? Is @orgads solution with http get and headers a better one? |
In fact just settings amqp reducing to default timings caused my second and third node to always get killed. @orgads solution worked. |
Name and Version
bitnami/rabbitmq 10.1.11
What steps will reproduce the bug?
Just run it and watch the CPU usage.
What do you see instead?
High CPU usage while idle
Additional information
The liveness and readiness probe execute a separate Erlang process, which is very expensive.
This was reported (and fixed) in helm/charts#3855 but not to bitnami.
The text was updated successfully, but these errors were encountered: