Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seldon core operator issue on EKS with Calico "Address is not allowed" and Proposed Solution #3967

Closed
marianobilli opened this issue Mar 4, 2022 · 4 comments
Assignees
Labels

Comments

@marianobilli
Copy link
Contributor

marianobilli commented Mar 4, 2022

Describe the bug

When deploying validation webhooks on EKS clusters that uses calico CNI plugin instead of the CNI the Control plane of EKS cannot reach the private IPs of the pods assigned by calico.

This is the case for the seldon-core-operator and its validating webhook.

The following error shows when trying to apply a new SeldonDeployment object

Error from server (InternalError): error when creating "seldondeployment.yaml": Internal error occurred: failed calling webhook "v1.vseldondeployment.kb.io": Post "https://seldon-webhook-service.seldon-system.svc:443/validate-machinelearning-seldon-io-v1-seldondeployment?timeout=10s": Address is not allowed

To reproduce

  1. Setup EKS cluster with Calico network plugin
  2. Install seldon core operator from the helm chart
  3. Apply any demo SeldonDeployment kind

Proposed Solution

This is a pretty well known issue as described here

The easy solution is to set hostNetwork: true on the seldon-controller-manager deployment

For this the Helm Template deployment_seldon-controller-manager.yaml needs to be modifed with this:

    spec:
      hostNetwork: '{{ .Values.manager.hostNetwork }}'     <<< addition
      containers:
      - args:

Also will be necessary to allow changing the default exposed ports as they may collide with other exposed ports on the node.

And default values.yaml
needs to false add:

manager:
  hostNetwork: false # Change this value to true in case you are running in an AWS EKS cluster with a network plugin such as Calico.

Environment

  • Cloud Provider: AWS
  • Network Plugin: Calico
  • Kubernetes Cluster Version: Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
  • Deployed Seldon System Images: seldon-core-executor:1.13.1 & seldon-core-operator:1.13.1
@marianobilli
Copy link
Contributor Author

if you want me to do the PR pls let me know

@axsaucedo
Copy link
Contributor

Thanks for reporting @marianobilli - we haven't seen this below, we used to deploy seldon core without issues with calico back when we tried in Digitalocean but seems this may have changed. Yes a PR would be quite useful - let us know if you need any pointers for the PR 👍

@marianobilli
Copy link
Contributor Author

@cliveseldon @axsaucedo given that #3971 was already merged we could close this issue.
However I dont want to mess up your workflow for 1.14.0 https://github.com/SeldonIO/seldon-core/projects/43, so let me know if you want me to do anything.

@ukclivecox
Copy link
Contributor

If that fixes your issue then yes will close this. Keep us updated if there are any other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants