-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Liveness probe kills Seldon engine container with long init waiting time (Python wrapper) #674
Comments
I think it's 20 seconds init and then 3 retries so effectively 35 seconds. I recently had to increase the retries so now it's effectively 55 seconds - SeldonIO/seldon-operator#22 It would be good to make that configurable as the linked issue shows this can be an issue even for cases that are not downloading a model. |
I've had a look + chat with @gsunner to see what are the changes necessary. It seems that the changes would require small modifications to the Python wrapper, together with potential changes to the Seldon Operator. Both of these are explained below. For context, the readiness and liveness probes are currently both configured to the Python Wrapper ChangesIn regards to the changes to the Python wrapper, there would be a new
Operator changesThe operator would also have to change, so that the seldon engine containers are deplyoed with a liveness probe that reaches to the It would be great to get thoughts on these planned changes, especially from @cliveseldon as it seems it will need changes on the operator (for the liveness endpoint). |
I think there are two issues to handle
We could say for 2. you need to provide your own readiness and liveness probes. |
Oh interesting. I agree with those points. One thing that is in my mind as well is that we may not need to modify the current wrappers to change the liveness probe URL. The reasoning behind this is because it seems that right now both readiness and liveness probes point to the |
I don't think this is correct. See here |
Note that we have two sets of probes - the probes for the model container and the probes for engine container. I suspect what @axsaucedo was referring to about liveness and readiness pointing to the same path was the engine probes. This came up yesterday with a timeout for RH opendatahub. The fix for that was: This is relevant because the engine ready endpoint checks that the whole graph is ready, including the model container. |
This can be closed as it can be resolved by adding your own liveness probe to the model. |
The Seldon engine containers get killed if the init function takes more than 20 seconds. This is the default behaviour as the init function of the python wrapper gets triggered after the flask server has been initialised and the pod has been marked as initialised. This is problematic if the container needs to perform init work that depends on parameters passed by the init function (through Seldon deploy params). An example of this is the PyTorch Hub integration which crashes if the model selected is large enough for the download to take longer than 20 seconds (#642).
It is possible to avoid this issue if the init work is done before the wrapping class definition (I.e accessing the parameters directly from the env variables). The liveness probe doesn't get triggered in this case as the container is not marked as ready given the download executes synchronously before even reaching the wrapper definition.
As we look to use reusable model servers as a more common design pattern, where the model weights or binaries are downloaded from an object store, we will need to think of an standard way to handle longer loading times during initialisation.
This could be tackled by making the parameters accessible as env variables by default, or alternatively by adding a PRE function that would execute before flask server initialisation.
The text was updated successfully, but these errors were encountered: