-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add prometheus exporter to the gateway #344
Conversation
Signed-off-by: Paul Cuzner <[email protected]>
New settings control whether the promtheus endpoint is enabled, and if so what port it's bound to. enable_prometheus_exporter prometheus_port Signed-off-by: Paul Cuzner <[email protected]>
Provides the NVMeOFCollector that handles the prometheus scrape request, and makes all the required rpc calls to gather gateways stats. Signed-off-by: Paul Cuzner <[email protected]>
Signed-off-by: Paul Cuzner <[email protected]>
The main "serve" method can now start the prometheus exporter based on the config option being set. Signed-off-by: Paul Cuzner <[email protected]>
Here's an example of what gets returned to prometheus
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me. But, as I have no idea about Prometheus I'm not sure I'm the right person to review.
2 questions:
- Shouldn't we have a stop function in the code?
- Should we add a test to github for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see the Prometheus client working! Great work! I left a few comments over there.
Signed-off-by: Paul Cuzner <[email protected]>
Requires at least 0.19 to satisfy the requirement of a https endpoint. Signed-off-by: Paul Cuzner <[email protected]>
Signed-off-by: Paul Cuzner <[email protected]>
The prometheus_bdev_pools setting is a comma separated string of those rbd pools that should emit bdev metrics. If this setting is not defined, metrics for all bdevs will be emitted. Signed-off-by: Paul Cuzner <[email protected]>
The following changes have been made; - enable endpoint to be https (requiring 0.19 or above) - use a prometheus_bdev_pools parameter to govern metrics returned - use infometricfamily for simple static metric - additional log messages added for debug purposes Signed-off-by: Paul Cuzner <[email protected]>
@gbregman the prometheus endpoint runs as a daemon thread and doesn't maintain state, so shutdown is generally not a consideration. As far as a github CI test is concerned, I think that makes sense. Once we get this in, I'm happy to help with that. |
@pcuzner, some comments and questions:
@oritwas WDYS? |
@caroav to answer your questions;
|
Signed-off-by: Paul Cuzner <[email protected]>
Signed-off-by: Paul Cuzner <[email protected]>
Signed-off-by: Paul Cuzner <[email protected]>
The exporter now runs a background thread to collect the spdk stats to reduce the effect of DoS attacks. Also, the spdk rpc calls are now timed and returned within the prometheus payload Signed-off-by: Paul Cuzner <[email protected]>
Signed-off-by: Paul Cuzner <[email protected]>
As per comments from Aviv. Signed-off-by: Paul Cuzner <[email protected]>
Signed-off-by: Paul Cuzner <[email protected]>
Signed-off-by: Paul Cuzner <[email protected]>
Signed-off-by: Paul Cuzner <[email protected]>
Signed-off-by: Paul Cuzner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good! Thanks for addressing my comments.
I left a couple of cosmetic comments, but the only thing I'm more concerned about is the HTTP fallkback when certificates are not found.
Also note that I check https mode by adding a crt and key to the container, and it switched to https mode (confirmed with curl) |
I ran tests for the updated way to handle SSL on/off, scenarios and log messages are below; SSL requested, but keys not found
SSL requested and keys found
SSL disabled by config
exporter disabled
|
@epuertat please review SSL changes |
do not merge - once outstanding concerned are addressed I'll squash the commits first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks a lot @pcuzner for the work done!
if ssl: | ||
cert_filepath = config.get('mtls', 'server_cert') | ||
key_filepath = config.get('mtls', 'server_key') | ||
|
||
if os.path.exists(cert_filepath) and os.path.exists(key_filepath): | ||
httpd_ok = start_httpd(port=port, certfile=cert_filepath, keyfile=key_filepath) | ||
else: | ||
httpd_ok = False | ||
logger.error("Unable to start prometheus exporter - missing cert/key file(s)") | ||
else: | ||
# SSL mode explicitly disabled by config option | ||
httpd_ok = start_httpd(port=port) | ||
|
||
if httpd_ok: | ||
logger.info(f"Prometheus exporter running in {mode} mode, listening on port {port}") | ||
REGISTRY.register(NVMeOFCollector(spdk_rpc_client, config)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this kind of flows is where exceptions work best:
if ssl: | |
cert_filepath = config.get('mtls', 'server_cert') | |
key_filepath = config.get('mtls', 'server_key') | |
if os.path.exists(cert_filepath) and os.path.exists(key_filepath): | |
httpd_ok = start_httpd(port=port, certfile=cert_filepath, keyfile=key_filepath) | |
else: | |
httpd_ok = False | |
logger.error("Unable to start prometheus exporter - missing cert/key file(s)") | |
else: | |
# SSL mode explicitly disabled by config option | |
httpd_ok = start_httpd(port=port) | |
if httpd_ok: | |
logger.info(f"Prometheus exporter running in {mode} mode, listening on port {port}") | |
REGISTRY.register(NVMeOFCollector(spdk_rpc_client, config)) | |
try: | |
if ssl: | |
# No need the check for certificates, this method already raises an Exception if not found | |
start_http_server( | |
port=port, | |
certfile=config.get('mtls', 'server_cert'), | |
keyfile=config.get('mtls', 'server_key') | |
) | |
else: | |
# SSL mode explicitly disabled by config option | |
httpd_ok = start_httpd(port=port) | |
except Exception: | |
# logger.exception() automatically attaches the Exception raised | |
logger.exception("Failed to start the prometheus http server") | |
else: | |
logger.info(f"Prometheus exporter running in {mode} mode, listening on port {port}") | |
REGISTRY.register(NVMeOFCollector(spdk_rpc_client, config)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I check for files to give more meaningful error messages.
|
||
def start_httpd(**kwargs): | ||
"""Start the prometheus http endpoint, catching any exception""" | ||
logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another nit here: it's a common practice to get the logger at the top of the file, rather than inside every method, as __name__
is constant throughout a file, so all these calls will get the same singleton:
import logging
logger = logging.getLogger(__name__)
|
||
|
||
def timer(method): | ||
def call(self, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but with decorators one should do this, so that the returned method has all the metadata/types/signature/docstring/... of the decorated method
:
def call(self, *args, **kwargs): | |
@functools.wraps(method) | |
def call(self, *args, **kwargs): |
It seems at every change, I'm introducing further issues - with all these rewrites, it would have been more expedient for others to provide the feature 😄 |
This PR introduces an embedded prometheus exporter to the gateway, to support external monitoring and potentially alerting.
Signed-off-by: Paul Cuzner [email protected]