diff --git a/content/en/docs/tasks/debug-application-cluster/debug-service.md b/content/en/docs/tasks/debug-application-cluster/debug-service.md index 872d956e4e5c6..f60d7b9ae9c25 100644 --- a/content/en/docs/tasks/debug-application-cluster/debug-service.md +++ b/content/en/docs/tasks/debug-application-cluster/debug-service.md @@ -7,8 +7,7 @@ title: Debug Services --- {{% capture overview %}} -An issue that comes up rather frequently for new installations of Kubernetes is -that a `Service` is not working properly. You've run your `Deployment` and +An issue that comes up rather frequently for new installations of Kubernetes is that a `Service` is not working properly. You've run your `Deployment` and created a `Service`, but you get no response when you try to access it. This document will hopefully help you to figure out what's going wrong. @@ -19,10 +18,8 @@ This document will hopefully help you to figure out what's going wrong. ## Conventions -Throughout this doc you will see various commands that you can run. Some -commands need to be run within a `Pod`, others on a Kubernetes `Node`, and others -can run anywhere you have `kubectl` and credentials for the cluster. To make it -clear what is expected, this document will use the following conventions. +Throughout this doc you will see various commands that you can run. Some +commands need to be run within a `Pod`, others on a Kubernetes `Node`, and others can run anywhere you have `kubectl` and credentials for the cluster. To make it clear what is expected, this document will use the following conventions. If the command "COMMAND" is expected to run in a `Pod` and produce "OUTPUT": @@ -48,29 +45,29 @@ OUTPUT ## Running commands in a Pod For many steps here you will want to see what a `Pod` running in the cluster -sees. The simplest way to do this is to run an interactive busybox `Pod`: +sees. The simplest way to do this is to run an interactive busybox `Pod`: ```none -$ kubectl run -it --rm --restart=Never busybox --image=busybox sh -If you don't see a command prompt, try pressing enter. +kubectl run -it --rm --restart=Never busybox --image=busybox sh / # ``` +{{< note >}} +If you don't see a command prompt, try pressing enter. +{{< /note >}} If you already have a running `Pod` that you prefer to use, you can run a command in it using: ```shell -$ kubectl exec -c -- +kubectl exec -c -- ``` ## Setup -For the purposes of this walk-through, let's run some `Pods`. Since you're -probably debugging your own `Service` you can substitute your own details, or you -can follow along and get a second data point. +For the purposes of this walk-through, let's run some `Pods`. Since you're probably debugging your own `Service` you can substitute your own details, or you can follow along and get a second data point. ```shell -$ kubectl run hostnames --image=k8s.gcr.io/serve_hostname \ +kubectl run hostnames --image=k8s.gcr.io/serve_hostname \ --labels=app=hostnames \ --port=9376 \ --replicas=3 @@ -108,7 +105,7 @@ spec: Confirm your `Pods` are running: ```shell -$ kubectl get pods -l app=hostnames +kubectl get pods -l app=hostnames NAME READY STATUS RESTARTS AGE hostnames-632524106-bbpiw 1/1 Running 0 2m hostnames-632524106-ly40y 1/1 Running 0 2m @@ -117,13 +114,9 @@ hostnames-632524106-tlaok 1/1 Running 0 2m ## Does the Service exist? -The astute reader will have noticed that we did not actually create a `Service` -yet - that is intentional. This is a step that sometimes gets forgotten, and -is the first thing to check. +The astute reader will have noticed that we did not actually create a `Service` yet - that is intentional. This is a step that sometimes gets forgotten, and is the first thing to check. -So what would happen if I tried to access a non-existent `Service`? Assuming you -have another `Pod` that consumes this `Service` by name you would get something -like: +So what would happen if I tried to access a non-existent `Service`? Assuming you have another `Pod` that consumes this `Service` by name you would get something like: ```shell u@pod$ wget -O- hostnames @@ -134,23 +127,23 @@ wget: unable to resolve host address 'hostnames' So the first thing to check is whether that `Service` actually exists: ```shell -$ kubectl get svc hostnames +kubectl get svc hostnames No resources found. Error from server (NotFound): services "hostnames" not found ``` -So we have a culprit, let's create the `Service`. As before, this is for the +So we have a culprit, let's create the `Service`. As before, this is for the walk-through - you can use your own `Service`'s details here. ```shell -$ kubectl expose deployment hostnames --port=80 --target-port=9376 +kubectl expose deployment hostnames --port=80 --target-port=9376 service/hostnames exposed ``` And read it back, just to be sure: ```shell -$ kubectl get svc hostnames +kubectl get svc hostnames NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hostnames ClusterIP 10.0.1.175 80/TCP 5s ``` @@ -186,8 +179,7 @@ Name: hostnames Address 1: 10.0.1.175 hostnames.default.svc.cluster.local ``` -If this fails, perhaps your `Pod` and `Service` are in different -`Namespaces`, try a namespace-qualified name: +If this fails, perhaps your `Pod` and `Service` are in different `Namespaces`, try a namespace-qualified name: ```shell u@pod$ nslookup hostnames.default @@ -197,9 +189,7 @@ Name: hostnames.default Address 1: 10.0.1.175 hostnames.default.svc.cluster.local ``` -If this works, you'll need to adjust your app to use a cross-namespace name, or -run your app and `Service` in the same `Namespace`. If this still fails, try a -fully-qualified name: +If this works, you'll need to adjust your app to use a cross-namespace name, or run your app and `Service` in the same `Namespace`. If this still fails, try a fully-qualified name: ```shell u@pod$ nslookup hostnames.default.svc.cluster.local @@ -210,7 +200,7 @@ Address 1: 10.0.1.175 hostnames.default.svc.cluster.local ``` Note the suffix here: "default.svc.cluster.local". The "default" is the -`Namespace` we're operating in. The "svc" denotes that this is a `Service`. +`Namespace` we're operating in. The "svc" denotes that this is a `Service`. The "cluster.local" is your cluster domain, which COULD be different in your own cluster. @@ -229,8 +219,7 @@ Name: hostnames.default.svc.cluster.local Address: 10.0.1.175 ``` -If you are able to do a fully-qualified name lookup but not a relative one, you -need to check that your `/etc/resolv.conf` file is correct. +If you are able to do a fully-qualified name lookup but not a relative one, you need to check that your `/etc/resolv.conf` file is correct. ```shell u@pod$ cat /etc/resolv.conf @@ -239,26 +228,25 @@ search default.svc.cluster.local svc.cluster.local cluster.local example.com options ndots:5 ``` -The `nameserver` line must indicate your cluster's DNS `Service`. This is +The `nameserver` line must indicate your cluster's DNS `Service`. This is passed into `kubelet` with the `--cluster-dns` flag. The `search` line must include an appropriate suffix for you to find the -`Service` name. In this case it is looking for `Services` in the local +`Service` name. In this case it is looking for `Services` in the local `Namespace` (`default.svc.cluster.local`), `Services` in all `Namespaces` -(`svc.cluster.local`), and the cluster (`cluster.local`). Depending on your own -install you might have additional records after that (up to 6 total). The -cluster suffix is passed into `kubelet` with the `--cluster-domain` flag. We +(`svc.cluster.local`), and the cluster (`cluster.local`). Depending on your own install you might have additional records after that (up to 6 total). The +cluster suffix is passed into `kubelet` with the `--cluster-domain` flag. We assume that is "cluster.local" in this document, but yours might be different, in which case you should change that in all of the commands above. The `options` line must set `ndots` high enough that your DNS client library -considers search paths at all. Kubernetes sets this to 5 by default, which is +considers search paths at all. Kubernetes sets this to 5 by default, which is high enough to cover all of the DNS names it generates. ### Does any Service exist in DNS? If the above still fails - DNS lookups are not working for your `Service` - we -can take a step back and see what else is not working. The Kubernetes master +can take a step back and see what else is not working. The Kubernetes master `Service` should always work: ```shell @@ -277,8 +265,7 @@ debugging your own `Service`, debug DNS. ## Does the Service work by IP? Assuming we can confirm that DNS works, the next thing to test is whether your -`Service` works at all. From a node in your cluster, access the `Service`'s -IP (from `kubectl get` above). +`Service` works at all. From a node in your cluster, access the `Service`'s IP (from `kubectl get` above). ```shell u@node$ curl 10.0.1.175:80 @@ -291,17 +278,17 @@ u@node$ curl 10.0.1.175:80 hostnames-bvc05 ``` -If your `Service` is working, you should get correct responses. If not, there +If your `Service` is working, you should get correct responses. If not, there are a number of things that could be going wrong. Read on. ## Is the Service correct? It might sound silly, but you should really double and triple check that your -`Service` is correct and matches your `Pod`'s port. Read back your `Service` +`Service` is correct and matches your `Pod`'s port. Read back your `Service` and verify it: ```shell -$ kubectl get service hostnames -o json +kubectl get service hostnames -o json ``` ```json { @@ -341,47 +328,45 @@ $ kubectl get service hostnames -o json } ``` -Is the port you are trying to access in `spec.ports[]`? Is the `targetPort` -correct for your `Pods` (many `Pods` choose to use a different port than the -`Service`)? If you meant it to be a numeric port, is it a number (9376) or a -string "9376"? If you meant it to be a named port, do your `Pods` expose a port -with the same name? Is the port's `protocol` the same as the `Pod`'s? +* Is the port you are trying to access in `spec.ports[]`? +* Is the `targetPort` correct for your `Pods` (many `Pods` choose to use a different port than the `Service`)? +* If you meant it to be a numeric port, is it a number (9376) or a string "9376"? +* If you meant it to be a named port, do your `Pods` expose a port +with the same name? +* Is the port's `protocol` the same as the `Pod`'s? ## Does the Service have any Endpoints? If you got this far, we assume that you have confirmed that your `Service` -exists and is resolved by DNS. Now let's check that the `Pods` you ran are +exists and is resolved by DNS. Now let's check that the `Pods` you ran are actually being selected by the `Service`. Earlier we saw that the `Pods` were running. We can re-check that: ```shell -$ kubectl get pods -l app=hostnames +kubectl get pods -l app=hostnames NAME READY STATUS RESTARTS AGE hostnames-0uton 1/1 Running 0 1h hostnames-bvc05 1/1 Running 0 1h hostnames-yp2kp 1/1 Running 0 1h ``` -The "AGE" column says that these `Pods` are about an hour old, which implies that -they are running fine and not crashing. +The "AGE" column says that these `Pods` are about an hour old, which implies that they are running fine and not crashing. The `-l app=hostnames` argument is a label selector - just like our `Service` -has. Inside the Kubernetes system is a control loop which evaluates the -selector of every `Service` and saves the results into an `Endpoints` object. +has. Inside the Kubernetes system is a control loop which evaluates the selector of every `Service` and saves the results into an `Endpoints` object. ```shell -$ kubectl get endpoints hostnames +kubectl get endpoints hostnames NAME ENDPOINTS hostnames 10.244.0.5:9376,10.244.0.6:9376,10.244.0.7:9376 ``` This confirms that the endpoints controller has found the correct `Pods` for -your `Service`. If the `hostnames` row is blank, you should check that the +your `Service`. If the `hostnames` row is blank, you should check that the `spec.selector` field of your `Service` actually selects for `metadata.labels` -values on your `Pods`. A common mistake is to have a typo or other error, such -as the `Service` selecting for `run=hostnames`, but the `Deployment` specifying -`app=hostnames`. +values on your `Pods`. A common mistake is to have a typo or other error, such +as the `Service` selecting for `run=hostnames`, but the `Deployment` specifying `app=hostnames`. ## Are the Pods working? @@ -404,17 +389,16 @@ u@pod$ wget -qO- 10.244.0.7:9376 hostnames-yp2kp ``` -We expect each `Pod` in the `Endpoints` list to return its own hostname. If +We expect each `Pod` in the `Endpoints` list to return its own hostname. If this is not what happens (or whatever the correct behavior is for your own -`Pods`), you should investigate what's happening there. You might find -`kubectl logs` to be useful or `kubectl exec` directly to your `Pods` and check -service from there. +`Pods`), you should investigate what's happening there. You might find +`kubectl logs` to be useful or `kubectl exec` directly to your `Pods` and check service from there. Another thing to check is that your `Pods` are not crashing or being restarted. Frequent restarts could lead to intermittent connectivity issues. ```shell -$ kubectl get pods -l app=hostnames +kubectl get pods -l app=hostnames NAME READY STATUS RESTARTS AGE hostnames-632524106-bbpiw 1/1 Running 0 2m hostnames-632524106-ly40y 1/1 Running 0 2m @@ -432,8 +416,7 @@ suspect. Let's confirm it, piece by piece. ### Is kube-proxy running? -Confirm that `kube-proxy` is running on your `Nodes`. You should get something -like the below: +Confirm that `kube-proxy` is running on your `Nodes`. You should get something like the below: ```shell u@node$ ps auxw | grep kube-proxy @@ -441,10 +424,9 @@ root 4194 0.4 0.1 101864 17696 ? Sl Jul04 25:43 /usr/local/bin/kube-proxy ``` Next, confirm that it is not failing something obvious, like contacting the -master. To do this, you'll have to look at the logs. Accessing the logs +master. To do this, you'll have to look at the logs. Accessing the logs depends on your `Node` OS. On some OSes it is a file, such as -/var/log/kube-proxy.log, while other OSes use `journalctl` to access logs. You -should see something like: +`/var/log/kube-proxy.log`, while other OSes use `journalctl` to access logs. You should see something like: ```none I1027 22:14:53.995134 5063 server.go:200] Running in resource-only container "/kube-proxy" @@ -472,12 +454,11 @@ and then retry. ### Is kube-proxy writing iptables rules? One of the main responsibilities of `kube-proxy` is to write the `iptables` -rules which implement `Services`. Let's check that those rules are getting +rules which implement `Services`. Let's check that those rules are getting written. The kube-proxy can run in "userspace" mode, "iptables" mode or "ipvs" mode. -Hopefully you are using the "iptables" mode or "ipvs" mode. You -should see one of the following cases. +Hopefully you are using the "iptables" mode or "ipvs" mode. You should see one of the following cases. #### Userspace @@ -488,7 +469,7 @@ u@node$ iptables-save | grep hostnames ``` There should be 2 rules for each port on your `Service` (just one in this -example) - a "KUBE-PORTALS-CONTAINER" and a "KUBE-PORTALS-HOST". If you do +example) - a "KUBE-PORTALS-CONTAINER" and a "KUBE-PORTALS-HOST". If you do not see these, try restarting `kube-proxy` with the `-v` flag set to 4, and then look at the logs again. @@ -512,9 +493,7 @@ u@node$ iptables-save | grep hostnames ``` There should be 1 rule in `KUBE-SERVICES`, 1 or 2 rules per endpoint in -`KUBE-SVC-(hash)` (depending on `SessionAffinity`), one `KUBE-SEP-(hash)` chain -per endpoint, and a few rules in each `KUBE-SEP-(hash)` chain. The exact rules -will vary based on your exact config (including node-ports and load-balancers). +`KUBE-SVC-(hash)` (depending on `SessionAffinity`), one `KUBE-SEP-(hash)` chain per endpoint, and a few rules in each `KUBE-SEP-(hash)` chain. The exact rules will vary based on your exact config (including node-ports and load-balancers). #### IPVS @@ -542,7 +521,7 @@ hostnames-0uton ``` If this fails and you are using the userspace proxy, you can try accessing the -proxy directly. If you are using the iptables proxy, skip this section. +proxy directly. If you are using the iptables proxy, skip this section. Look back at the `iptables-save` output above, and extract the port number that `kube-proxy` is using for your `Service`. In the above @@ -567,9 +546,7 @@ then look at the logs again. This can happen when the network is not properly configured for "hairpin" traffic, usually when `kube-proxy` is running in `iptables` mode and Pods are connected with bridge network. The `Kubelet` exposes a `hairpin-mode` -[flag](/docs/admin/kubelet/) that allows endpoints of a Service to loadbalance back to themselves -if they try to access their own Service VIP. The `hairpin-mode` flag must either be -set to `hairpin-veth` or `promiscuous-bridge`. +[flag](/docs/admin/kubelet/) that allows endpoints of a Service to loadbalance back to themselves if they try to access their own Service VIP. The `hairpin-mode` flag must either be set to `hairpin-veth` or `promiscuous-bridge`. The common steps to trouble shoot this are as follows: