Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change 'rabbitmqctl status' to a wget | grep to save CPU #5009

Conversation

wanderboessenkool
Copy link
Contributor

  • This reduces CPU usage from 250 millis on idle to 25 millis on idle
  • Default rabbitmq user needs administrator privileges
SUMMARY

In the default configuration for the awx-rabbit container, the healthchekcs (livenessProbe and readinessProbe) use the comand 'rabbitmqctl status'. This causes high CPU usage, even on idle (250 millis). By changing the healthchecks to an authenticated request to http://localhost:15672/api/healthchecks/node, and grepping for the desired result in the output, this can be reduced to less than 25 millis, a factor ten saving.

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME
  • Installer
AWX VERSION
awx: 7.0.0
ADDITIONAL INFORMATION

There are other projects suffering the same from the CPU hungry behavior of rabbitmqctl status, see:

- This reduces CPU usage from 250 millis on idle to 25 millis on idle
- Default rabbitmq user needs administrator privileges
@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@wenottingham
Copy link
Contributor

What ensures wget and ash are in the base container image?

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@wanderboessenkool
Copy link
Contributor Author

What ensures wget and ash are in the base container image?

wget is provided by busybox, as is ash. They are both present in the official awx-rabbit image, which pulls from rabbitmq:${RABBITMQ_VERSION}-management-alpine, which pulls from rabbitmq:${RABBITMQ_VERSION}-alpine, which pulls from alpine:3.10, which has both installed by default.

When you put in your own {{ kubernetes_rabbitmq_image }} it would depend. I can switch the shell to /bin/sh or /bin/bash, since they are present as well in the same image and probably present in more base images than ash.

If you want to get rid of wget it will have to be replaced by curl, or a custom python script that gets added to the image.

Copy link
Contributor

@ryanpetrello ryanpetrello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matburt you have any thoughts on this?

@matburt
Copy link
Member

matburt commented Oct 17, 2019

This is pretty compelling! I'd be willing to merge this if you can show you verification methodology, how you sampled usage, and provide me a way to verify it.

@wanderboessenkool
Copy link
Contributor Author

This is pretty compelling! I'd be willing to merge this if you can show you verification methodology, how you sampled usage, and provide me a way to verify it.

Without the change, with rabbitmqctl status:

oc adm top pod --containers

POD                  NAME            CPU(cores)   MEMORY(bytes)                      
awx-0                awx-memcached   0m           7Mi                        
awx-0                awx-web         0m           552Mi              
awx-0                awx-celery      5m           458Mi                    
awx-0                awx-rabbit      206m         102M
postgresql-2-8w8qd   postgresql      4m           87Mi

With the change to http based healthchecks, grepping for {"status":"ok"}:
oc adm top pod --containers

POD                  NAME            CPU(cores)   MEMORY(bytes)
awx-0                awx-memcached   0m           7Mi                        
awx-0                awx-web         0m           552Mi                                   
awx-0                awx-celery      5m           702Mi                                   
awx-0                awx-rabbit      16m          118Mi
postgresql-2-8w8qd   postgresql      5m           91Mi

In prometheus: rate(container_cpu_usage_seconds_total{namespace="awx", container_name="awx-rabbit"}[1m])
(substituting namespace="awx" for whichever namespace awx is running in.)

Copy link
Member

@shanemcd shanemcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am all for this change but we can't use /bin/ash because our downstream images are RHEL-based. Please switch to /bin/sh or /bin/bash then I'll +1.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@wanderboessenkool
Copy link
Contributor Author

@shanecmd. I've changed /bin/ash to /bin/sh.

Looking at the downstream image they do not seem to have wget installed, but they do have curl. That doesn't help in trying to unify the two. I've added another change that moves the healthcheck into a small inline python program.

Going forward that healthcheck could be moved into a separate script file, but it would have to be added to both the awx and the downstream images.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

Copy link
Member

@shanemcd shanemcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really impressive stuff. Thank you for the solid work and quick turnaround time.

@shanemcd
Copy link
Member

I just noticed that we're templating out the username / password in the deployment object itself. Can we move this to read from an env var?

@wanderboessenkool
Copy link
Contributor Author

I just noticed that we're templating out the username / password in the deployment object itself. Can we move this to read from an env var?

Yes we can :-)

Copy link
Contributor

@ryanpetrello ryanpetrello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shanemcd
Copy link
Member

Apologies for the noisy feedback, juggling a lot atm and just returned from vacation today. Another thought: we could do something like this to pull this into a ConfigMap / mounted file which would be cleaner than inlining the whole script.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@wanderboessenkool
Copy link
Contributor Author

Apologies for the noisy feedback, juggling a lot atm and just returned from vacation today. Another thought: we could do something like this to pull this into a ConfigMap / mounted file which would be cleaner than inlining the whole script.

@shanemcd That would definitely be nicer on the yaml for the healthchecks, but it might make it harder for people to debug. If wanted I can make those changes tomorrow.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@shanemcd
Copy link
Member

Thanks for this!

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded (gate pipeline).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants