-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make custom metrics work with gunicorn reload #2873
Comments
@axsaucedo @RafalSkolasinski any updates for this ticket ? |
@anggao we discussed it yesterday morning as we started revisiting the discussion around persistence, but it seems like it's quite nuanced, we haven't been able to identify a simple way to address this that doesn't end up being a relatively big hack... We could remove the worker ID, or provide an option to disable worker ID through an env variable, but it seems like it may be addressing this edge case. Not sure if @cliveseldon you would have any thoughts on this? |
@axsaucedo Thank you! Can you elaborate how you plan to get rid of the worker id, are you planning to do aggragation at python server layer ? |
Right now the custom metrics are exposed through a process in the model container, since the metrics contains label
worker-id
, each restart/reload of the gunicorn will create a new data series, this caused an issue, as both old and new data series are exists and will be send back as prometheus scrape response.With auto-reload process in our model (in order to avoid potential OOM), this resulting over 10MB prometheus response after running the model for a while, which caused scrape timeout and huge memory bump of the prometheus server.
I think we need a better way to workaround this issue, as reload gunicorn seems a common method to avoid OOM in model deployment.
The text was updated successfully, but these errors were encountered: