-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seldon core operator is restarting due to failed renewal of lease #4147
Comments
is this related to kubernetes-sigs/kubebuilder#2604 |
Is there anything particular about your cluster that would mean resource locks fail? |
similar issue kedacore/keda#2836 |
One option might be to allow longer deadlines to allow users to handle noisy/network issues in their clusters? |
Hi @cliveseldon @axsaucedo, Many Thanks for the change. |
There is not explicit docs at present. Setting these values require understanding the k8s leadership election process from the controller-runtime docs. Look forward to hearing how you get on. Also adding docs from your experience as a PR would be welcome. Feel free to open an issue. |
Sure, Thank you. May I know when is the planned release for Seldon 1.15? |
Describe the bug
Seldon core operator pod is getting restarted due to failed to retrieve resource lock getting below error logs.
{"version": "1.0", "level": "INFO", "host": "ccs-seldon-75dfcb5bf9-bwkfv.ccs", "system": "ml-inf-seldon", "type": "log", "log": {"message": "E0610 16:36:08.554283 7 leaderelection.go:325] error retrieving resource lock ccs/a33bd623.machinelearning.seldon.io: Get \"https://10.254.0.1:443/apis/coordination.k8s.io/v1/namespaces/ccs/leases/a33bd623.machinelearning.seldon.io\": context deadline exceeded"}, "time": "2022-06-10T16:36:09.511Z"} {"version": "1.0", "level": "INFO", "host": "ccs-seldon-75dfcb5bf9-bwkfv.ccs", "system": "ml-inf-seldon", "type": "log", "log": {"message": "I0610 16:36:08.554523 7 leaderelection.go:278] failed to renew lease ccs/a33bd623.machinelearning.seldon.io: timed out waiting for the condition"}, "time": "2022-06-10T16:36:09.512Z"} {"version": "1.0", "level": "INFO", "host": "ccs-seldon-75dfcb5bf9-bwkfv.ccs", "system": "ml-inf-seldon", "type": "log", "log": {"message": "setup : problem running manager"}, "time": "2022-06-10T16:36:09.512Z"}
Wanted to get more insights on this issue and is this issue is related ( kubernetes/client-go#966 )
To reproduce
Expected behaviour
Seldon pod must not restart it should retry for lease renewal.
Environment
Cloud Provider: Bare Metal
Kubernetes Cluster Version:
[root:ccs-01-control-01 /root]$ kubectl version Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.5", GitCommit:"e338cf2c6d297aa603b50ad3a301f761b4173aa6", GitTreeState:"clean", BuildDate:"2020-12-09T11:18:51Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.5", GitCommit:"e338cf2c6d297aa603b50ad3a301f761b4173aa6", GitTreeState:"clean", BuildDate:"2020-12-09T11:10:32Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Deployed Seldon System Images: v1.11.0
Model Details
The text was updated successfully, but these errors were encountered: