-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Important cluster metrics to monitor #546
Conversation
Before we start editing any doc page, we need to be able to preview them. This makes it easy to get started on OS X with a single command: `make preview` We committed the Pipfile which locks down the Python dependencies Partner-in-crime: @MarcialRosales
These are the 5 most important metrics for determining whether a RabbitMQ cluster is healthy. The plan is to continue with the top 5 node metrics.
I'll keep my eye on this, thanks. |
Cluster links (partitions) and alarms are also "top" in my books. Can we change this to "important metrics" since this is not a competition? :) |
The exact question was Which are the top 5 RabbitMQ metrics that we should monitor?, and I instantly thought If you can only monitor 5 metrics about your RabbitMQ Cluster, these would be it, to the Everybody's Free To Wear Sunscreen tune. Partitions and alarms are essential checks, but are they metrics? For example, partitions would be addressed by the first metric: This metric needs to be collected from each node so that we can be confident that each node is able to communicate with every other node. Alarms would be addressed by the most important node metrics (separate PR maybe?): Erlang run queue, memory, disk, file descriptors & socket descriptors. |
The presence of certain things can be thought of as a boolean gauge but fair enough. |
When pipenv is not installed, $(realpath ..) is blank, and we cannot use it as a target. We have to hard-code the location to /usr/local/bin/pipenv. Default the local to a well-known one otherwise python3 fail to install.
Closing per discussion with @gerhard. This PR has been superseded by monitoring guide and toolchain updates in |
This question came up recently, we wanted to add it to the metrics docs.
It's still WIP, but we wanted to share it early so that we can discuss.