Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

500 Internal Server Error / no hosts available #268

Closed
mlamothe opened this issue Jul 15, 2019 · 16 comments
Closed

500 Internal Server Error / no hosts available #268

mlamothe opened this issue Jul 15, 2019 · 16 comments
Labels

Comments

@mlamothe
Copy link

I'm setting up KrakenD for the first time. I'm trying to use SD DNS to work in tandem with Consul. If I specify a static IP everything works fine, but SD DNS isn't working.

For example, this simple curl example returns a 500:
>curl http://10.30.54.185:8000/consul --verbose

*   Trying 10.30.54.185...
* TCP_NODELAY set
* Connected to 10.30.54.185 (10.30.54.185) port 8000 (#0)
> GET /consul HTTP/1.1
> Host: 10.30.54.185:8000
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< X-Krakend: Version 0.9.0
< X-Krakend-Completed: false
< Date: Mon, 15 Jul 2019 19:42:36 GMT
< Content-Length: 0
<
* Connection #0 to host 10.30.54.185 left intact

The configuration file, cat /etc/krakend/krakend.json:

{
  "version": 2,
  "name": "CAMP API Gateway",
  "timeout": "3000ms",
  "cache_ttl": "300s",
  "port": 8000,
  "output_encoding": "json",
  "extra_config": {
    "github_com/devopsfaith/krakend-gologging": {
      "level": "DEBUG",
      "prefix": "[KRAKEND]",
      "syslog": true,
      "stdout": true,
      "format": "custom",
      "custom_format": "%{message}"
    },
    "github_com/devopsfaith/krakend-cors": {
      "allow_origins": [ "*" ],
      "allow_methods": [ "POST", "GET", "PUT", "DELETE" ],
      "allow_headers": [
        "Origin",
        "Authorization",
        "Content-Type",
        "Accept",
        "X-Auth-Token"
      ],
      "expose_headers": [ "Content-Length" ],
      "max_age": "12h"
    }
  },
  "endpoints": [
    {
      "endpoint": "/consul",
      "timeout": "15s",
      "method": "GET",
      "headers_to_pass": ["*", "Cookie"],
      "backend": [
        {
          "url_pattern": "/v1/status/leader",
          "sd": "dns",
          "host": [
            "http://consul.service.consul:8500"
          ],
          "disable_host_sanitize": true
        }
      ]
    }
  ]
}

Note: For 'host', I also tried "consul.service.consul:8500" but it made no difference.
Log file (notice logging is set to DEBUG level):

krakend.service - Krakend API Gateway
   Loaded: loaded (/lib/systemd/system/krakend.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-07-15 19:39:27 UTC; 6min ago
     Docs: http://krakend.io
 Main PID: 5661 (krakend)
    Tasks: 9 (limit: 4599)
   CGroup: /system.slice/krakend.service
           └─5661 /usr/bin/krakend run -c /etc/krakend/krakend.json

Jul 15 19:39:27 krakend1-vm [KRAKEND][5661]: AMQP: http://consul.service.consul:8500: no amqp consumer defined
Jul 15 19:39:27 krakend1-vm [KRAKEND][5661]: AMQP: http://consul.service.consul:8500: no amqp producer defined
Jul 15 19:39:27 krakend1-vm [KRAKEND][5661]: CEL: no extra config detected for backend /v1/status/leader
Jul 15 19:39:27 krakend1-vm [KRAKEND][5661]: CEL: no extra config detected for pipe /consul
Jul 15 19:39:27 krakend1-vm [KRAKEND][5661]: JOSE: singer disabled for the endpoint /consul
Jul 15 19:39:27 krakend1-vm [KRAKEND][5661]: JOSE: validator disabled for the endpoint /consul
Jul 15 19:40:04 krakend1-vm krakend[5661]: [GIN] 2019/07/15 - 19:40:04 | 500 |        67.9µs |    10.30.54.185 | GET      /consul
Jul 15 19:40:04 krakend1-vm krakend[5661]: Error #01: no hosts available
Jul 15 19:42:36 krakend1-vm krakend[5661]: [GIN] 2019/07/15 - 19:42:36 | 500 |        13.5µs |    10.30.54.185 | GET      /consul
Jul 15 19:42:36 krakend1-vm krakend[5661]: Error #01: no hosts available

Results of dig on consul.service.consul:

; <<>> DiG 9.11.3-1ubuntu1.8-Ubuntu <<>> consul.service.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37758
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;consul.service.consul.         IN      A

;; ANSWER SECTION:
consul.service.consul.  0       IN      A       10.30.54.161
consul.service.consul.  0       IN      A       10.30.54.171

;; Query time: 4 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Mon Jul 15 19:47:27 UTC 2019
;; MSG SIZE  rcvd: 82

Those IP addresses are correct - there are two consul servers.

Also: dig consul.service.consul SRV +short

1 1 8300 consul2-vm.node.stage-vm.consul.
1 1 8300 consul1-vm.node.stage-vm.consul.

I can ping those two addresses.
This is running on Ubuntu Server 18.04.2 LTS
I'm not sure what I'm missing. I've been banging away at this all day. Any help is much appreciated.

@kpacha
Copy link
Member

kpacha commented Jul 15, 2019

hi, @mlamothe and welcome to the KrakenD community!

Why are you setting the ports to the service names? as you can see, the SRV record already informs the system which port is each node using.

Also, the schema prefix (http://) should be removed.

So, for the service registered as consul, your host should be consul.service.consul and the KrakenD DNS SD module will resolve that name using the underlaying OS. In your scenario, the DNS module will inform the load balancer that there are 2 hosts available (each with the related port) and the LB will select which one should be used for every request.

BTW, you can customize your hostname lookup by replacing the actual value of dnssrv. DefaultLookup with your custom lookup implementation if using the OS is not what you want (ie: asking an external service for the name resolution).

cheers!

@mlamothe
Copy link
Author

mlamothe commented Jul 15, 2019

Thanks for your help - I changed the config file but the errors are exactly the same as before.
Snippet:

{
      "endpoint": "/consul",
      "output_encoding": "string",
      "timeout": "15s",
      "method": "GET",
      "headers_to_pass": ["*", "Cookie"],
      "backend": [
        {
          "encoding": "string",
          "url_pattern": "/v1/status/leader",
          "sd": "dns",
          "host": [
            "consul.service.consul"
          ],
          "disable_host_sanitize": true
        }
      ]
    }

@kpacha
Copy link
Member

kpacha commented Jul 15, 2019

Is the KrakenD service running on the same machine where you executed the dig command?

also, is the OS running the KrakenD instance able to resolve the names consul2-vm.node.stage-vm.consul and consul1-vm.node.stage-vm.consul?

@mlamothe
Copy link
Author

mlamothe commented Jul 15, 2019

Howdy,

Is the KrakenD service running on the same machine where you executed the dig command?

Yes, everything I wrote above was executed on the VM running KrakendD.

also, is the OS running the KrakenD instance able to resolve the names consul2-vm.node.stage-vm.consul and consul1-vm.node.stage-vm.consul?

Yes:
$host consul1-vm.node.stage-vm.consul
>consul1-vm.node.stage-vm.consul has address 10.30.54.161

and
$host consul2-vm.node.stage-vm.consul
>consul2-vm.node.stage-vm.consul has address 10.30.54.171

@kpacha
Copy link
Member

kpacha commented Jul 16, 2019

Can you provide us a way to reproduce your scenario? I've been playing with a Vagrantfile but it's taking me too much time to get the details right

@mlamothe
Copy link
Author

I've never used Vagrant... Is there any more troubleshooting I can do on my end? I really wish I could "see" why KrakenD is returning a 500. Is it crashing? It is getting a result back from Consul it doesn't like?

@mlamothe
Copy link
Author

As for KrakenD, this is a brand new VM. The following steps were performed:

  sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 5DE6FD698AD6FDD2
  echo "deb http://repo.krakend.io/apt stable main" | sudo tee /etc/apt/sources.list.d/krakend.list
  sudo apt-get update
  sudo apt-get install -y krakend

The configuration file shown above was created with chmod 644 applied.

The only other noteworthy part was configuring /etc/systemd/resolved.conf and iptables to point to the consul box.
/etc/systemd/resolved.conf

[Resolve]
DNS=10.30.54.161 10.30.54.171
#FallbackDNS=
Domains=~consul
#LLMNR=no
#MulticastDNS=no
#DNSSEC=no
#Cache=yes
#DNSStubListener=yes

And relevant iptables: $sudo iptables -t nat -L

Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
DNAT       udp  --  anywhere             consul1-vm.node.stage-vm.consul  udp dpt:domain to:10.30.54.161:8600
DNAT       tcp  --  anywhere             consul1-vm.node.stage-vm.consul  tcp dpt:domain to:10.30.54.161:8600
DNAT       udp  --  anywhere             consul2-vm.node.stage-vm.consul  udp dpt:domain to:10.30.54.171:8600
DNAT       tcp  --  anywhere             consul2-vm.node.stage-vm.consul  tcp dpt:domain to:10.30.54.171:8600

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination

Thanks again for all your help.

@mlamothe
Copy link
Author

Update:
I've been experimenting with the config file.
So, host -t SRV consul.service.consul returns:

consul.service.consul has SRV record 1 1 8300 consul2-vm.node.stage-vm.consul.
consul.service.consul has SRV record 1 1 8300 consul1-vm.node.stage-vm.consul.

If I take one of those, (say consul1-vm.node.stage-vm.consul.) and put it in KrakenD's config file, switching to static service discovery, then I get an error.

But if I use the IP address that resolves to (http://10.30.54.161), that DOES work.

One interesting thing is if I run host -t SRV consul1-vm.node.stage-vm.consul.
I get consul1-vm.node.stage-vm.consul has no SRV record

Is it possible that's the problem? If so, any idea what might fix this issue?

Thanks,
Marcel

@kpacha
Copy link
Member

kpacha commented Jul 17, 2019

hi, @mlamothe

the SRV records contain hostnames and ports for the registered service instances. those hostnames have no SRV record associated because they are related to machine names, not services. a SRV record can also return IP instead of hostnames, that's why the KrakenD does not manipulate the returned names. It just expects the underlaying OS to be able to resolve them if required.

I finally got an environment up and running that should look like yours.

config required for the DNS layer:

#/etc/dnsmasq.d/10-consul
server=/consul/127.0.0.1#8600
#/etc/dnsmasq.conf
port=53
domain-needed
bogus-priv
strict-order
listen-address=127.0.0.1
#/etc/resolv.conf
nameserver 127.0.0.1

Since the consul services are not exposing an API (the port 8300 is for the RPC inter-node communications), I've registered the krakend service with this definition

{"service": {"name": "krakend", "tags": ["gateway"], "port": 8000}}

and changed the configuration of the supu endpoint (https://github.com/devopsfaith/krakend-ce/blob/master/krakend.json#L31-L79) so the backend definition looks like this

                    "host": [
                        "krakend.service.consul"
                    ],
		    "sd":"dns",
		    "disable_host_sanitize": true,
                    "url_pattern": "/__debug/supu",

before testing the gateway integration, I've checked the dnsmasq setup

vagrant@krakend:~$ dig krakend.service.consul SRV +short
1 1 8000 krakend.node.dc1.consul.
vagrant@krakend:~$ dig krakend.node.dc1.consul. +short
172.20.20.12

finally:

vagrant@krakend:~$ curl -i http://localhost:8000/supu
HTTP/1.1 200 OK
Cache-Control: public, max-age=3600
Content-Type: application/json; charset=utf-8
X-Krakend: Version 0.9.0
X-Krakend-Completed: true
Date: Wed, 17 Jul 2019 14:01:35 GMT
Content-Length: 18

{"message":"pong"}

it's the same result you'll get using the default configuration (notice the /__debug endpoint is only available when the krakend is started with the -d flag)

@mlamothe
Copy link
Author

Thank you, @kpacha. It seems like the biggest difference between your setup and mine is you used dnsmasq whereas I used systemd. I will see if I can get it to work using your method and will update the ticket one way or another.

@kpacha
Copy link
Member

kpacha commented Jul 17, 2019

there is some documentation regarding how to configure the DNS integration for your consul service... and it covers both solutions plus several others. so I guess it should be 100% transparent. (https://learn.hashicorp.com/consul/security-networking/forwarding)

on the other hand, you can not consume RPC services with the default http client and you are mixing the RPC service (port 8300) with the HTTP API service (port 8500, exposed just to localhost)

Here, I tried to consume the same endpoint you have in your configuration against the second consul node

vagrant@n1:~$ curl -i http://172.20.20.11:8300/v1/status/leader
curl: (56) Recv failure: Connection reset by peer
2019/07/17 14:51:40 [ERR] consul.rpc: unrecognized RPC byte: 71 from=172.20.20.10:47010

Here, I fixed the port

vagrant@n1:~$ curl -i http://172.20.20.11:8500/v1/status/leader
curl: (7) Failed to connect to 172.20.20.11 port 8500: Connection refused

Here, against the right port at the local interface

vagrant@n1:~$ curl -i http://localhost:8500/v1/status/leader
HTTP/1.1 200 OK
Content-Type: application/json
Vary: Accept-Encoding
Date: Wed, 17 Jul 2019 14:54:57 GMT
Content-Length: 19

"172.20.20.10:8300"

that's why I registered the krakend as a service itself and replace http://127.0.0.1:8000 for krakend.service.consul

@mlamothe
Copy link
Author

mlamothe commented Jul 24, 2019

I know it's been a while, but I've been working on this problem this whole time. Even though I've never worked with Go, I decided to clone the repo, build it and step through it with a debugger.

Long story short, when the app attempts to resolve my hostname, it gets the error back "cannot unmarshal DNS message". See this github ticket for more details: golang/go#27546.

In my case, Consul returns "1 1 80 0a1e3730.addr.stage-vm.consul.". Those hexadecimal digits before the "addr" part represent the IP address of the service (10.30.55.48). My understanding is that the Go team made a deliberate change to reject these kinds of host names.

The actual cause is that Consul is returning compressed DNS records, and Go made a change that no longer allows these - even though they're perfectly valid. See: golang/go#24870

You can close this ticket if you like. I have my answer: It's Go, not KrakenD :)

@skjoher
Copy link

skjoher commented Aug 16, 2019

any update/fix !! on "Error #1: no hosts available"

@kpacha
Copy link
Member

kpacha commented Aug 26, 2019

@skjoher the issue here was with the way the services were registered at the consul layer. Is that your case?

@kpacha kpacha closed this as completed Jan 28, 2020
@parshantjindal8
Copy link

@kpacha I have the very same scenario as with @mlamothe but i am using dnsmasq. Can you help error I am getting is no hosts available.

@github-actions
Copy link

github-actions bot commented Apr 6, 2022

This issue was marked as resolved a long time ago and now has been automatically locked as there has not been any recent activity after it. You can still open a new issue and reference this link.

@github-actions github-actions bot added the locked label Apr 6, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants