-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Liveness probe kills seldon engine container when model predict function takes a long time to send a result #753
Comments
Interesting. This would actually get fixed once #684 is merged (as a quick fix), as that pr will enable for multiple workers which will be able to respond to the liveness probe whilst the longer request is processing. For the meantime you can configure your own liveness probes to provide a longer time - have you tried extending the liveness probe times? |
Alright, configured my liveness probe with an insanely large timeoutSeconds. No dice. It just fails like clock work:
Note my model is REST based. |
The other thing strange is that it always happen at the EXACT same time no matter what I set my liveness probe too. What is going on here? |
I think it's the engine liveness probe failing rather than the one on your model, just like in #674 (comment) Could you try again using the latest snapshot version - 0.3.2-SNAPSHOT ? |
I agree @ryandawsonuk but I don't know how to change the liveness/readiness probe (if it is even possible). Let me see if I can get the SNAPSHOT installed. Is there any workaround to this issue though? It is preventing ALL of our models from being deployed under Seldon right now. |
Have you tried with latest master install of the seldon-operator? As the liveness probe that @ryandawsonuk mentioned should fix your issue. |
Just tried 0.3.2-SNAPSHOT, same thing. Are you saying I should try master of everything or just seldon-operator? |
Just the seldon-operator should be good enough. Can you check the yaml of your deployment and see if the liveness probe for the engine point to |
@cliveseldon On 0.3.2-SNAPSHOT:
|
Another question, is there a way to just shutoff the seldon-container-engine liveness probe completely or fake it as a temporary workaround? |
Not at present. So its suggesting the java code is not able to respond to the liveness probe with a timeout of 2 secs when processing the return of the image. Are you sure the java has enough memory and its not garbage collecting or otherwise stalled during this time? Have you set engine resource request to a high value and does your cluster node have enough memory? |
If java was running out of memory, wouldn't we see that in the Spring logs? The pod's node has 64GB of RAM, that's plenty. I've limited my pod to 4GB, I can up to 8GB if you think that will help? |
its more about adding more resources to the engine. |
Which engine? You mean my pod or the seldon-container-engine. I think you mean the latter. Both are running on a host with 64GB of RAM. Are you talking about giving java more memory? How do I do that exactly? Here is the java command you guys run:
One thing I suggest is run it in '-server' mode or set higher defaults for -Xms and maybe -Xmx. I was reading Issue #398 but the doc link goes to 404. How can I allocate more resources to the seldon-engine-container sidecar? |
you can add a
|
Ok, slightly different now. Readiness probe failed not the liveness one. That was with 4Gi. I'm going to double it and try again (if possible, talking to admin about getting more resources). |
Eureka! 8Gi worked!!!!!!! Alright, so how can this error be diagnosed a bit better in the future? And is there anything future model deployers should know? I mean this really hit me out of left field without a lot of things to go on. |
great! Some metrics from a k8s dashboard that shows the java engine is thrashing/garbage collecting would be ideal. We should be able to add engine memory, cpu metrics or something that shows the pod is under heavy load. |
Thanks again Clive et al. I would have given you a few bucks for just a warning or alert message from the side car container that memory is low. What I do suggest is that in future releases deployers have more control over the seldon-container-engine side car resources etc. |
Have created #761 as a follow-up action |
Seldon-Core 0.3.0
Client Version: 1.15
Server Version: 1.13
What's happening is we have a long pipeline for model segmentation (a color image gets segmented into hundreds of images, some DL is performed, then reassembled and finally returned).
The whole pipeline works until the very end when we are returning the largish image (4032, 3024, 3) over the wire. The network latency is probably around 1-2 minutes through my VPN connection. As a result, the liveness probe kills the container like so:
I can literally watch these message appear as seldon is serializing the image and sending it over to the wire. As a result, the client sees a 502 response in its request.
How can I avoid this from happening? This is very similar to Issue #674 but happens at the end of predict.
The text was updated successfully, but these errors were encountered: