connect timed out with cluster-manager and api server #523

kaysonx · 2019-04-24T10:29:12Z

I followed the instruction with helm to install the seldon core, but get follow error:

Unexpected error trying to create CRD with:
io.kubernetes.client.ApiException: java.net.SocketTimeoutException: connect timed out

Failed to instantiate [io.seldon.clustermanager.k8s.SeldonDeploymentWatcher]: Constructor threw exception; nested exception is io.kubernetes.client.ApiException: java.net.SocketTimeoutException: connect timed out

and I've checked the role/rolebinding/sa, it's the excepted with helm charts.

Any suggestions folks?

BTW, the command I use is:

helm install ./seldon-core-crd-0.2.6.tgz  --name seldon-core-crd  --set usage_metrics.enabled=true

helm install ./seldon-core-0.2.6.tgz  --name seldon-core  --namespace seldon \
--set apife.image.name=my-private-registry/apife:0.2.6 \
--set cluster_manager.image.name=my-private-registry/cluster-manager:0.2.6 \
--set engine.image.name=my-private-registry/engine:0.2.6 \
--set redis.image.name=my-private-registry/redis:4.0.1

The text was updated successfully, but these errors were encountered:

ukclivecox · 2019-04-24T10:43:51Z

This seems to suggest the cluster-manager pod can't connect to the k8s API. Is there anything special about the cluster you are running on? Does it allow k8s API access from your namespace or does the RBAC of your cluster disallow this?

kaysonx · 2019-04-24T11:22:01Z

Thanks your reply!

The cluster is ok, we already have a customer scheduler with client-go sdk writing and running on it.
Just check the pod, the token is mount rightly with:
/var/run/secrets/kubernetes.io/serviceaccount from seldon-token-mwcpb

and the corresponding role is:

apiVersion: v1
items:
- apiVersion: rbac.authorization.k8s.io/v1
  kind: Role
  metadata:
    creationTimestamp: 2019-04-24T09:34:00Z
    name: seldon-local
    namespace: seldon
    resourceVersion: "40929244"
    selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/seldon/roles/seldon-local
    uid: 176d39f6-6674-11e9-a41d-70106fba9de6
  rules:
  - apiGroups:
    - '*'
    resources:
    - deployments
    - services
    verbs:
    - '*'
  - apiGroups:
    - machinelearning.seldon.io
    resources:
    - '*'
    verbs:
    - '*'
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

I guess the k8s java sdk using this mount token: /var/run/secrets/kubernetes.io/serviceaccount
Is that right? or the role need more permission with.

ukclivecox · 2019-04-24T12:03:39Z

Yes it will be using seldon-local RBAC and the default token that k8s adds to the pod. Is your cluster setup with any restrictions?

ukclivecox · 2019-08-23T07:52:06Z

Please reopen if still an issue on 0.4.0 release.

* refactor method name * add conn close * add signal handler * change const name * add msg for agent * lint * remove const * agent protos for new message * send drain event to scheduler * http drainer server * add test * wire up drainer service in agent * fix test * fix flaky test * add dummy handler on scheduler * add env variable * tidy up envs for compose * refactor variable name * fix lint * add scheduler logic for drain * add model waiter helper * Add test for model waiter * signal model * adjust test * add Draining state * add isDraining state to Server Replica * update memory to mark server replica as draining * fix state updates with draining * add Draining to function return * add filter for server is draining * add scheduler test * lint * add extra wait * skip draining * prefer available replicas over draining * wait on other replica is available * add note * update stats with draining * reviews * update to model state if schedulefailed previously * restore state-> string test * PR reviews

ukclivecox closed this as completed Aug 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

connect timed out with cluster-manager and api server #523

connect timed out with cluster-manager and api server #523

kaysonx commented Apr 24, 2019

ukclivecox commented Apr 24, 2019

kaysonx commented Apr 24, 2019

ukclivecox commented Apr 24, 2019

ukclivecox commented Aug 23, 2019

connect timed out with cluster-manager and api server #523

connect timed out with cluster-manager and api server #523

Comments

kaysonx commented Apr 24, 2019

ukclivecox commented Apr 24, 2019

kaysonx commented Apr 24, 2019

ukclivecox commented Apr 24, 2019

ukclivecox commented Aug 23, 2019