-
Notifications
You must be signed in to change notification settings - Fork 402
EXTERNAL-IP <pending> forever #208
Comments
@frafra do you run into the same issue when downloading and installing helm onto the k3os host and invoking from there? |
@dweomer I actually get a different error:
|
@frafra did you make sure that k3s was fully up and running (it can take anywhere from 20 seconds to a few minutes) before attempting the helm invocation? Take a look at https://github.com/rancher/k3s/blob/master/scripts/sonobuoy for some inspiration on automating such. Specifically note that there are effectively three wait phases: kubeconfig, nodes, services and that you will wan to to include |
@dweomer thanks, but I waited > 20', and it still fails. Helm v2.14.3. |
@dweomer I'm seeing pretty much the same problem when I try and run this on my hetzner instance.
It's pending after like 17h but I also saw that the wordpress container crashed as it couldn't connect to the db. |
I have the same event happening. I cannot see what caused it. |
Scratch that, just tested installing wordpress via a helmchart manifest file and it's pending. |
Im seeing the same thing on a LoadBalancer I just deployed.... |
Seeing the same on a freshly provisioned K3s cluster; I installed Longhorn (hoping to evaluate it/use it for NFS-mounts or just regular distributed block storage), but I'm seeing that the longhorn-frontend service stays on
When I check the status of one of the pods, I see the following:
|
I also have this issue. The solution is as follows: Obviously replace the IP with the desired IP, and the SERVICENAME with the right name. |
I solved it by changing the service from type |
I've been working on a better solution, and was working on setting up Traefik. Could you give me some more information on how you set that up? EDIT: Currently got stable/traefik working by doing the patch command on it, and that seems to work fine, as far as I know. |
In case you're still interested (this is for Traefik 2.x):
Ofcourse you will also have to define |
How did you determine what IP address to give? I'm fairly certain that this error is because it is binding port 80 on the same IP address that traefik is already listening on. But I can't find out how k3s allocates EXTERNAL-IP's |
So the machine itself has an IP, that you probably already know. That's the one I'm talking about. Actually, my above solution is horrible wrong though, I have found. |
I believe this might have had something to do with that Hetzner didn't offer any load balancer as a service thing and k8s doesn't come with an implementation for LoadBalancer, it's something you need to provide using something like Metallb, etc. I do believe k3s comes with klipperlb that you can disable with --no-deploy servicelb. Hetzner cloud might need one to use something like metallb for it to being able to resolve the ip. Hetzner now seem to offer a load balancer as a service just like aws, gcloud, etc. do and that might be able to be used. I might be completely off here as I'm pretty new to the k8s world. |
I was confused by this as well coming into k3os with a bare metal test lab. It turns out this "external IP pending forever" is exactly what one would expect in a cluster with no LoadBalancer. This article describes the flow of a working setup with HTTP traffic coming in, going through MetalLB to Traefik, to the end service. In particular I found this insightful: "A Kubernetes LoadBalancer service is not a load balancer, it's a target for a load balancer, and typically this load balancer is external [to the cluster]." This article explores multiple ways of solving load balancing depending on your environment and goals. If I'm understanding correctly, in my case:
Looking for SSL? If you have a single instance of Traefik you can follow their docs. But K3os by default has Traefik configured for high availability (running on each node). As of 2.0 Traefik only supports HA/LetsEncrypt in their Enterprise version. So this will require additionally setting up a cert-manager service on your cluster, and configuring your Ingress rules to take advantage of them. You don't have to use Cloudflare DNS as the article suggests, but you may have to implement a webhook if your DNS provider is not supported. It's a lot of steps to go through, and considerably more effort than I was expecting to get traffic into the cluster. Seems like there should be a guide for this linked to from the main k3os readme. I can't imagine someone wanting to setup a cluster and not caring about ingress. I'll try it out and post back here if I get it working. |
Ok, I got this working and it wasn't so bad.
ssh rancher@labc1
sudo su
mount -o remount,rw /k3os/system
vim /k3os/system/config.yaml
reboot Modify config.yaml:
values.yaml:
kubectl get svc traefik -o yaml > traefik.yaml
# (modify the file as needed)
kubectl delete -f traefik.yaml
kubectl apply -f traefik.yaml
That's it! Appears to be working from my end now. SSL is next up. |
This doesn't work for me. Klipper doesn't leave even though I have my config correct and service restarted. Any ideas as to how to get rid of klipper completely? |
For me the issue was that when upgrading to a newer version of k3s, at some point apparently the naming scheme changed. Meaning I had a svclb-traefik (running rancher/klipper-lb:v0.2.0) and a svclb-traefik- (running rancher/klipper-lb:v0.4.4).
The latter stayed pending because obviously the old one (still functioning without issue) had the respective ports already taken on all nodes. I resolved this by disabling the old DaemonSet. I first patched it to a non-existent node selector. Then I confirmed that the new DaemonSet came available on all nodes (and that my public sites were actually available), and went on to delete the old DaemonSet manually. |
How to reproduce:
Result:
Additional details:
/home/rancher
instead of/opt
for persistence because of /opt mentioned by README.md but it does not exist #207kubectl get svc -n kube-system
shows that Traefik got the right external ipWarning FailedScheduling 12s (x5 over 21s) default-scheduler 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.
The text was updated successfully, but these errors were encountered: