Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

istio-pilot deployment fails when there is no LoadBalancer in the cluster #287

Open
orfeas-k opened this issue Jun 7, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@orfeas-k
Copy link
Contributor

orfeas-k commented Jun 7, 2023

Context

Trying to run this test for kubeflow-tensorboards-operator (which is a simple "deploy and relate the minimum required charms" test), we noticed that istio-pilot charm would fail in different parts of its execution. The error always looked like the one below, but the hook failing could also be install, istio-pilot-relation-changed.

...
  File "./src/charm.py", line 827, in _get_address_from_loadbalancer
    if len(ingresses) != 1:
TypeError: object of type 'NoneType' has no len()
unit-istio-pilot-0: 15:19:55 ERROR juju.worker.uniter.operation hook "config-changed" (via hook dispatching script: dispatch) failed: exit status 1

Reproduce

sudo snap install microk8s --classic --channel=1.24/stable
sudo usermod -a -G microk8s $USER
sudo chown -f -R $USER ~/.kube
microk8s status --wait-ready
microk8s.enable hostpath-storage dns 
sudo snap install juju --classic
juju bootstrap microk8s micro
  • git clone [email protected]:canonical/kubeflow-tensorboards-operator.git and cd kubeflow-tensorboards-operator
  • Install test requirements python3 -m pip install -r requirements-integration.txt
  • Run test with tox -e integration

Cause

It turns out the Microk8s cluster didn't have a LoadBalancer and Istio relies heavily on the underlying LB to configure the gateway (ingressgateway). Otherwise, the gateway will be waiting for an external IP and istio-pilot will fail due to the following lines where it checks the ingresses object's length before verifying that it is not None. Probably though, this is a case that the istio-pilot charm should be able to handle.

P.S. The error was gone as soon as I added a LB with microk8s enable metallb:10.64.140.43-10.64.140.49

@ca-scribner
Copy link
Contributor

There are some tests that check against the different responses we might get for the LoadBalancer ip, but this case was missed. We can probably add another mock load balancer to the unit tests to prove this is happening, then add some logic to handle the issue.

@kimwnasptd kimwnasptd added the bug Something isn't working label Jun 13, 2023
@rgildein
Copy link
Contributor

rgildein commented Jun 7, 2024

I hit the same issue, and I'm using metallb configured like this:

IPADDR=$(ip -4 -j route get 2.2.2.2 | jq -r '.[] | .prefsrc') 
microk8s enable metallb:$IPADDR-$IPADDR

My environment:

$ microk8s version  
MicroK8s v1.28.10 revision 6829  
$ microk8s status                                                                                                                      
microk8s is running                                                                                                                          
high-availability: no                                                                                                                        
  datastore master nodes: 127.0.0.1:19001                                                                                                    
  datastore standby nodes: none                                                                                                              
addons:                                                                                                                                      
  enabled:                                                                                                                                   
    community            # (core) The community addons repository                                                                                                                                                                                                                         
    dashboard            # (core) The Kubernetes dashboard                                                                                   
    dns                  # (core) CoreDNS                                                                                                    
    ha-cluster           # (core) Configure high availability on the current node                                                            
    helm                 # (core) Helm - the package manager for Kubernetes                                                                  
    helm3                # (core) Helm 3 - the package manager for Kubernetes                                                                
    hostpath-storage     # (core) Storage class; allocates storage from host directory                                                       
    metallb              # (core) Loadbalancer for your Kubernetes cluster                                                                   
    metrics-server       # (core) K8s Metrics Server for API access to service metrics                                                       
    registry             # (core) Private image registry exposed on localhost:32000                                                          
    storage              # (core) Alias to hostpath-storage add-on, deprecated                                                               
  disabled:                                                                                                                                  
    argocd               # (community) Argo CD is a declarative continuous deployment for Kubernetes.                                        
    dashboard-ingress    # (community) Ingress definition for Kubernetes dashboard                                                           
    easyhaproxy          # (community) EasyHAProxy can configure HAProxy automatically based on ingress labels                               
    fluentd              # (community) Elasticsearch-Fluentd-Kibana logging and monitoring                                                   
    gopaddle             # (community) DevSecOps and Multi-Cloud Kubernetes Platform                                                         
    inaccel              # (community) Simplifying FPGA management in Kubernetes                                                             
    istio                # (community) Core Istio service mesh services                                                                                                                                                                                                                   
    jaeger               # (community) Kubernetes Jaeger operator with its simple config                                                                                                                                                                                                  
    keda                 # (community) Kubernetes-based Event Driven Autoscaling                                                                                                                                                                                                          
    knative              # (community) Knative Serverless and Event Driven Applications                                                                                                                                                                                                   
    linkerd              # (community) Linkerd is a service mesh for Kubernetes and other frameworks                                                                                                                                                                                      
    microcks             # (community) Open source Kubernetes Native tool for API Mocking and Testing                                                                                                                                                                                     
    openebs              # (community) OpenEBS is the open-source storage solution for Kubernetes                                                                                                                                                                                         
    openfaas             # (community) OpenFaaS serverless framework                                                                                                                                                                                                                      
    osm-edge             # (community) osm-edge is a lightweight SMI compatible service mesh for the edge-computing.                                                                                                                                                                      
    parking              # (community) Static webserver to park a domain. Works with EasyHAProxy.                                                                                                                                                                                         
    portainer            # (community) Portainer UI for your Kubernetes cluster                                                                                                                                                                                                           
    shifu                # (community) Kubernetes native IoT software development framework.                                                                                                                                                                                              
    sosivio              # (community) Kubernetes Predictive Troubleshooting, Observability, and Resource Optimization                                                                                                                                                                    
    traefik              # (community) traefik Ingress controller                                                                                                                                                                                                                         
    trivy                # (community) Kubernetes-native security scanner                                                                                                                                                                                                                 
    cert-manager         # (core) Cloud native certificate management                                                                                                                                                                                                                     
    cis-hardening        # (core) Apply CIS K8s hardening                                                                                                                                                                                                                                 
    host-access          # (core) Allow Pods connecting to Host services smoothly                                                                                                                                                                                                         
    ingress              # (core) Ingress controller for external access                                                                                                                                                                                                                  
    mayastor             # (core) OpenEBS MayaStor                                                                                                                                                                                                                                        
    minio                # (core) MinIO object storage                                                                                                                                                                                                                                    
    observability        # (core) A lightweight observability stack for logs, traces and metrics                                                                                                                                                                                          
    prometheus           # (core) Prometheus operator for monitoring and logging                                                                                                                                                                                                          
    rbac                 # (core) Role-Based Access Control for authorisation                                                                                                                                                                                                             
    rook-ceph            # (core) Distributed Ceph storage using Rook

Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5821.

This message was autogenerated

@rgildein
Copy link
Contributor

Using range instead of single IP works fine, and now into-pilot is running without issue.

@mvlassis
Copy link
Contributor

mvlassis commented Aug 6, 2024

This issue was also encountered by user during the deployment of CKF 1.9, as mentioned in this issue. Users should try the following:

  • Ensure a LoadBalancer is enabled and correctly configured
  • If not using a LoadBalancer, configure a different gateway_service_type in istio-ingressgateway

@orfeas-k
Copy link
Contributor Author

Hit this when testing in proxy and the workaround was to disable and re-enable metallb.

@paulomach
Copy link

paulomach commented Dec 12, 2024

This also happens when there's no more address available on the LB.
Would be helpful to have a more descriptive message and block the charm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants