internal DNS queries fail sometimes during build #2482

TomasTomecek · 2015-05-26T07:25:40Z

My git clone during build failed with:

fatal: Unable to look up our.internal.git.redhat.com (port 9418) (Name or service not known)

Unfortunately, this is not 100% reproducible. Only happens sometimes. When I compare docker inspect, /etc/hosts, /etc/resolv.conf of "wrong" and "good" build container, they match precisely.

Logs

container's `hosts`

172.17.0.7      openshift-base-20150526-081936
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

container's `resolv.conf`

nameserver 172.16.125.48
nameserver 172.16.125.39
search default.cluster.local default.svc.cluster.local svc.cluster.local cluster.local internal.domain.redhat.com

host's `resolv.conf`

# Generated by NetworkManager
domain internal.domain.redhat.com
search internal.domain.redhat.com
nameserver 172.16.125.39

`docker inspect $build_container`

[{
    "AppArmorProfile": "",
    "Args": [
        "--verbose",
        "inside-build",
        "--input",
        "osv3"
    ],
    "Config": {
        "AttachStderr": false,
        "AttachStdin": false,
        "AttachStdout": false,
        "Cmd": [
            "dock",
            "--verbose",
            "inside-build",
            "--input",
            "osv3"
        ],
        "CpuShares": 0,
        "Cpuset": "",
        "Domainname": "",
        "Entrypoint": null,
        "Env": [
            "BUILD=<too-long>",
            "SOURCE_REPOSITORY=git://our.internal.git.redhat.com/path",
            "DOCK_PLUGINS=<too-long>",
            "DOCKER_SOCKET=/var/run/docker.sock",
            "SOURCE_URI=git://our.internal.git.redhat.com/path",
            "SOURCE_REF=branch",
            "OUTPUT_REGISTRY=172.17.42.1:5000",
            "OUTPUT_IMAGE=image-name:tag",
            "KUBERNETES_PORT_443_TCP=tcp://172.30.0.2:443",
            "KUBERNETES_RO_SERVICE_PORT=80",
            "KUBERNETES_RO_PORT_80_TCP_ADDR=172.30.0.1",
            "KUBERNETES_SERVICE_HOST=172.30.0.2",
            "KUBERNETES_PORT_443_TCP_PROTO=tcp",
            "KUBERNETES_PORT_443_TCP_ADDR=172.30.0.2",
            "KUBERNETES_RO_SERVICE_HOST=172.30.0.1",
            "KUBERNETES_RO_PORT_80_TCP_PROTO=tcp",
            "KUBERNETES_SERVICE_PORT=443",
            "KUBERNETES_PORT_443_TCP_PORT=443",
            "KUBERNETES_PORT=tcp://172.30.0.2:443",
            "KUBERNETES_RO_PORT=tcp://172.30.0.1:80",
            "KUBERNETES_RO_PORT_80_TCP=tcp://172.30.0.1:80",
            "KUBERNETES_RO_PORT_80_TCP_PORT=80",
            "container=docker"
        ],
        "ExposedPorts": null,
        "Hostname": "openshift-base-20150526-081936",
        "Image": "buildroot",
        "Labels": {
            "Architecture": "x86_64",
            "Build_Host": "build.host.redhat.com",
            "Name": "rhel-server-docker",
            "Release": "4",
            "Vendor": "Red Hat, Inc.",
            "Version": "7.1",
            "io.kubernetes.pod.name": "default/openshift-base-20150526-081936"
        },
        "MacAddress": "",
        "Memory": 0,
        "MemorySwap": 0,
        "NetworkDisabled": false,
        "OnBuild": null,
        "OpenStdin": false,
        "PortSpecs": null,
        "StdinOnce": false,
        "Tty": false,
        "User": "",
        "Volumes": null,
        "WorkingDir": ""
    },
    "Created": "2015-05-26T06:22:58.532428212Z",
    "Driver": "devicemapper",
    "ExecDriver": "native-0.2",
    "ExecIDs": null,
    "HostConfig": {
        "Binds": [
            "/var/run/docker.sock:/var/run/docker.sock",
            "/var/lib/openshift/openshift.local.volumes/pods/2e4fe713-036f-11e5-a053-fa163ed7ae77/containers/custom-build/542e4dcf44a8178414394da6fa4e232d9b781adb8e95dbfc10fe22225e783847:/dev/termination-log"
        ],
        "CapAdd": null,
        "CapDrop": null,
        "CgroupParent": "",
        "ContainerIDFile": "",
        "CpuShares": 0,
        "CpusetCpus": "",
        "Devices": null,
        "Dns": [
            "172.16.125.48",
            "172.16.125.39"
        ],
        "DnsSearch": [
            "default.cluster.local",
            "default.svc.cluster.local",
            "svc.cluster.local",
            "cluster.local",
            "internal.domain.redhat.com"
        ],
        "ExtraHosts": null,
        "IpcMode": "container:91b09605c5bd6daac79d9bf0f56032063266536c7c9bd36bced4ee2e1270363d",
        "Links": null,
        "LogConfig": {
            "Config": null,
            "Type": "json-file"
        },
        "LxcConf": null,
        "Memory": 0,
        "MemorySwap": 0,
        "MountRun": false,
        "NetworkMode": "container:91b09605c5bd6daac79d9bf0f56032063266536c7c9bd36bced4ee2e1270363d",
        "PidMode": "",
        "PortBindings": null,
        "Privileged": true,
        "PublishAllPorts": false,
        "ReadonlyRootfs": false,
        "RestartPolicy": {
            "MaximumRetryCount": 0,
            "Name": ""
        },
        "SecurityOpt": null,
        "Ulimits": null,
        "VolumesFrom": null
    },
    "HostnamePath": "/var/lib/docker/containers/91b09605c5bd6daac79d9bf0f56032063266536c7c9bd36bced4ee2e1270363d/hostname",
    "HostsPath": "/var/lib/docker/containers/91b09605c5bd6daac79d9bf0f56032063266536c7c9bd36bced4ee2e1270363d/hosts",
    "Id": "542e4dcf44a8178414394da6fa4e232d9b781adb8e95dbfc10fe22225e783847",
    "Image": "bc256449b7ad6a5b4c6936e2f13e65733fb54f5ea1462eaba22741b6a180ad25",
    "LogPath": "/var/lib/docker/containers/542e4dcf44a8178414394da6fa4e232d9b781adb8e95dbfc10fe22225e783847/542e4dcf44a8178414394da6fa4e232d9b781adb8e95dbfc10fe22225e783847-json.log",
    "MountLabel": "system_u:object_r:svirt_sandbox_file_t:s0:c440,c491",
    "Name": "/k8s_custom-build.40afc205_openshift-base-20150526-081936_default_2e4fe713-036f-11e5-a053-fa163ed7ae77_8a7cf8eb",
    "NetworkSettings": {
        "Bridge": "",
        "Gateway": "",
        "GlobalIPv6Address": "",
        "GlobalIPv6PrefixLen": 0,
        "IPAddress": "",
        "IPPrefixLen": 0,
        "IPv6Gateway": "",
        "LinkLocalIPv6Address": "",
        "LinkLocalIPv6PrefixLen": 0,
        "MacAddress": "",
        "PortMapping": null,
        "Ports": null
    },
    "Path": "dock",
    "ProcessLabel": "system_u:system_r:svirt_lxc_net_t:s0:c440,c491",
    "ResolvConfPath": "/var/lib/docker/containers/91b09605c5bd6daac79d9bf0f56032063266536c7c9bd36bced4ee2e1270363d/resolv.conf",
    "RestartCount": 0,
    "State": {
        "Dead": false,
        "Error": "",
        "ExitCode": 1,
        "FinishedAt": "2015-05-26T06:23:55.03157924Z",
        "OOMKilled": false,
        "Paused": false,
        "Pid": 0,
        "Restarting": false,
        "Running": false,
        "StartedAt": "2015-05-26T06:23:19.996966349Z"
    },
    "Volumes": {
        "/dev/termination-log": "/var/lib/openshift/openshift.local.volumes/pods/2e4fe713-036f-11e5-a053-fa163ed7ae77/containers/custom-build/542e4dcf44a8178414394da6fa4e232d9b781adb8e95dbfc10fe22225e783847",
        "/var/run/docker.sock": "/run/docker.sock"
    },
    "VolumesRW": {
        "/dev/termination-log": true,
        "/var/run/docker.sock": true
    },
    "VolumesRelabel": {
        "/dev/termination-log": "",
        "/var/run/docker.sock": ""
    }
}
]

The text was updated successfully, but these errors were encountered:

csrwng · 2015-05-26T14:29:44Z

@smarterclayton - potentially the same issue as #2024

smarterclayton · 2015-05-26T16:26:09Z

My suspicion is that the upstream is either taking too long or not responding. We should be able to reproduce this against the internal name server and determine whether it's a timeout/cache problem.

sosiouxme · 2015-05-28T14:52:46Z

I suspect SkyDNS (as configured for OpenShift) is the problem here. It seems to be answering queries for things it shouldn't, acting as an open resolver which is already a problem, but beyond that it doesn't answer consistently if it's chained to more than one nameserver. So if you configure a node /etc/resolv.conf with two nameservers (one for internal components, one for "real" DNS), you'll get inconsistent results.

Here's a sequence of queries where I have just that setup. I use a side dnsmasq installation for resolving the actual OpenShift hosts, in addition to the regular DNS. 172.16.4.81 is the master where SkyDNS is running.

# dig @172.16.4.81 master.osv3.example.com

; <<>> DiG 9.9.4-RedHat-9.9.4-18.el7_1.1 <<>> @172.16.4.81 master.osv3.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35166
;; flags: qr aa rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;master.osv3.example.com.       IN      A

;; ANSWER SECTION:
master.osv3.example.com. 0      IN      A       172.16.4.81

;; Query time: 6 msec
;; SERVER: 172.16.4.81#53(172.16.4.81)
;; WHEN: Thu May 28 09:54:09 EDT 2015
;; MSG SIZE  rcvd: 57

# dig @172.16.4.81 master.osv3.example.com

[...]
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 50953
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

[...]

;; AUTHORITY SECTION:
example.com.            843     IN      SOA     sns.dns.icann.org. noc.dns.icann.org. 2015050902 7200 3600 1209600 3600

[note this gets cached for future queries, I don't see the authority section again]

;; Query time: 3 msec
;; SERVER: 172.16.4.81#53(172.16.4.81)
;; WHEN: Thu May 28 09:54:12 EDT 2015
;; MSG SIZE  rcvd: 109

# dig @172.16.4.81 master.osv3.example.com
[...]
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 46433
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
[...]
# dig @172.16.4.81 master.osv3.example.com
[...]
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15514
;; flags: qr aa rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
[...]

Thereafter it randomly answers with NOERROR or NXDOMAIN. Obviously this plays hob with builds and deploys contacting the master, and it would be the same problem if you're building or pulling from an internal repository. You'll only see the problem in the container where SkyDNS is inserted ahead of the other resolvers from the host, and only when you have multiple upstream nameservers that give different results.

IMNSHO SkyDNS should not be chaining requests to resolve any domains it doesn't own. That's behavior we must be able to disable to avoid deploying open resolvers. Even if it's configured to do that, it should consult the resolver chain in the correct order, not randomly select an upstream server.

What options do we have for configuring SkyDNS?

/CC @brenton @thoraxe @detiber @sdodson

ncdc · 2015-05-28T15:01:27Z

@sosiouxme it looks like we can configure SkyDNS not to forward

smarterclayton · 2015-05-28T15:02:24Z

On May 28, 2015, at 10:52 AM, Luke Meyer [email protected] wrote:

I suspect SkyDNS (as configured for OpenShift) is the problem here. It seems to be answering queries for things it shouldn't, acting as an open resolver which is already a problem, but beyond that it doesn't answer consistently if it's chained to more than one nameserver. So if you configure a node /etc/resolv.conf with two nameservers (one for internal components, one for "real" DNS), you'll get inconsistent results.

Here's a sequence of queries where I have just that setup. I use a side dnsmasq installation for resolving the actual OpenShift hosts, in addition to the regular DNS. 172.16.4.81 is the master where SkyDNS is running.

dig @172.16.4.81 master.osv3.example.com

; <<>> DiG 9.9.4-RedHat-9.9.4-18.el7_1.1 <<>> @172.16.4.81 master.osv3.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35166
;; flags: qr aa rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;master.osv3.example.com. IN A

;; ANSWER SECTION:
master.osv3.example.com. 0 IN A 172.16.4.81

;; Query time: 6 msec
;; SERVER: 172.16.4.81#53(172.16.4.81)
;; WHEN: Thu May 28 09:54:09 EDT 2015
;; MSG SIZE rcvd: 57

dig @172.16.4.81 master.osv3.example.com

[...]
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 50953
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

[...]

;; AUTHORITY SECTION:
example.com. 843 IN SOA sns.dns.icann.org. noc.dns.icann.org. 2015050902 7200 3600 1209600 3600

[note this gets cached for future queries, I don't see the authority section again]

;; Query time: 3 msec
;; SERVER: 172.16.4.81#53(172.16.4.81)
;; WHEN: Thu May 28 09:54:12 EDT 2015
;; MSG SIZE rcvd: 109

dig @172.16.4.81 master.osv3.example.com

[...]
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 46433
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
[...]

dig @172.16.4.81 master.osv3.example.com

[...]
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15514
;; flags: qr aa rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
[...]
Thereafter it randomly answers with NOERROR or NXDOMAIN. Obviously this plays hob with builds and deploys contacting the master, and it would be the same problem if you're building or pulling from an internal repository. You'll only see the problem in the container where SkyDNS is inserted ahead of the other resolvers from the host, and only when you have multiple upstream nameservers that give different results.

IMNSHO SkyDNS should not be chaining requests to resolve any domains it doesn't own. That's behavior we must be able to disable to avoid deploying open resolvers. Even if it's configured to do that, it should consult the resolver chain in the correct order, not randomly select an upstream server.

What options do we have for configuring SkyDNS?

See https://github.com/skynetservices/skydns/blob/master/server/config.go
/CC @brenton @thoraxe

—
Reply to this email directly or view it on GitHub.

sosiouxme · 2015-05-28T15:10:54Z

Looking for the config file options that feed into that config...

It should really be the default to not forward requests, BTW.

ncdc · 2015-05-28T15:12:27Z

@sosiouxme I've tested setting config.Nameservers = []string{} and that stops it from forwarding. It also results in printing out this any time a DNS lookup fails: skydns: can not forward, no name servers defined and this anytime you look up e.g. the docker-registry service: skydns: incomplete CNAME chain: no nameservers configured can not lookup name. DNS resolution does work and I'm able to do this: curl -v docker-registry.default.svc.cluster.local:5000/healthz.

sosiouxme · 2015-05-28T15:20:47Z

@ncdc I think for OpenShift we need some more options in https://github.com/openshift/origin/blob/master/pkg/cmd/server/origin/master.go#L773-L796 although if we just hardcode do-not-forward that would be fine (at least for now). It shouldn't spew warnings though, it's not noteworthy to respond NXDOMAIN and let the next resolver handle it...

sosiouxme · 2015-05-28T15:22:00Z

I think what we need is NoRec https://github.com/skynetservices/skydns/blob/master/server/config.go#L41

ncdc · 2015-05-28T15:24:52Z

@sosiouxme NoRec isn't in our vendored copy, but we can update.

ncdc · 2015-05-28T15:36:04Z

I've updated the vendored copy and enabled NoRec. It seems to be working - it responds with SERVFAIL. I do see this show up in the log every time I try to curl the docker-registry service via DNS:

skydns: incomplete CNAME chain: rcode is not equal to success

Not exactly sure what that means.

smarterclayton · 2015-05-28T15:54:10Z

If you update be sure to grab the upstream patch we have applied over it

On May 28, 2015, at 11:36 AM, Andy Goldstein [email protected] wrote:

I've updated the vendored copy and enabled NoRec. It seems to be working - it responds with SERVFAIL. I do see this show up in the log every time I try to curl the docker-registry service via DNS:

skydns: incomplete CNAME chain: rcode is not equal to success
Not exactly sure what that means.

—
Reply to this email directly or view it on GitHub.

sosiouxme · 2015-05-28T17:06:24Z

Well as long as the resolver moves on to the next I guess it's not important if the return code is a little funny and there is some extra spew in the log (we should probably fix the log spew though, this is not an error of any kind).

thoraxe · 2015-05-28T20:23:10Z

If SkyDNS doesn't forward requests, what happens when a container asks for something that isn't in SkyDNS? eg: google.com

sdodson · 2015-05-28T20:24:21Z

It should move on to the next nameserver, which will be the host's nameserver[s].

thoraxe · 2015-05-28T20:26:43Z

Should the hosts be able to use SkyDNS for resolution? This came up in a dev list thread regarding using SkyDNS for finding the registry at the host level.

If the hosts use SkyDNS for resolution, then the "next nameserver" would again be SkyDNS, unless we configure SkyDNS to forward...

ncdc · 2015-05-28T20:30:20Z

@thoraxe I've tested this on my host, with this for /etc/resolv.conf:

nameserver 127.0.0.1 (skydns)
nameserver x.x.x.x (my normal dns server)

It works fine for DNS resolution on the host. It will resolve cluster DNS entries. It will resolve non cluster DNS entries like google.com - skydns doesn't find google.com, so the host tries the next nameserver (my x.x.x.x entry). I can't say I've specifically looked at how resolution works in the containers with this change in place.

thoraxe · 2015-05-28T20:33:30Z

Yeah you would have to test from inside a container launched pointing at SkyDNS as its only resolver.

My assumption is that your host tries to resolve the entry, SkyDNS rejects, and then it moves to the next resolver. Who actually answered your query for google.com? The normal DNS server?

ncdc · 2015-05-28T20:36:34Z

My assumption is that your host tries to resolve the entry, SkyDNS rejects, and then it moves to the next resolver. Who actually answered your query for google.com? The normal DNS server?

Yes

sdodson · 2015-05-28T20:37:00Z

This is from within a pod, 192.168.122.90 is my master, 192.168.122.1 is libvirt's dnsmasq assigned via DHCP to the host where this pod is running.

-bash-4.3# cat /etc/resolv.conf 
nameserver 192.168.122.90
nameserver 192.168.122.1
search default.local local example.com
-bash-4.3# nslookup google.com
;; Got SERVFAIL reply from 192.168.122.90, trying next server
Server:         192.168.122.1
Address:        192.168.122.1#53

Non-authoritative answer:
Name:   google.com
Address: 216.58.217.142

thoraxe · 2015-05-28T20:42:32Z

I guess this is a positive sign?

sosiouxme · 2015-05-29T01:59:22Z

Containers don't get SkyDNS as their only resolver. Containers get the /etc/resolv.conf from the host plus SkyDNS at the top. Currently. Andy's proposing putting it at the front of the host /etc/resolv.conf too, in which case it wouldn't need inserting at all.

sosiouxme · 2015-06-01T12:41:59Z

#2569 prevents skydns from recursing requests it doesn't know about. So this should be working now; @TomasTomecek can you verify?

Shortly, we will switch from having kubernetes insert skydns as the first resolver, to the expectation that nodes will have it at the top of their /etc/resolv.conf (which docker already passes on to containers it deploys). This way, kubernetes pods, docker containers like builders spun off from a build (without help from kubernetes), and nodes will all have the same environment for DNS resolution (except the node also has /etc/hosts).

TomasTomecek · 2015-06-01T14:48:33Z

We disabled skydns by removing it from master config (as suggested by @csrwng). I guess that I can verify this (but since this wasn't happening always, it will be hard to do).

smarterclayton · 2015-06-04T19:26:41Z

Fixed by the changes to put master DNS entry in host /etc/resolv.conf and disabling the open forwarding on the master

----- Original Message -----

Closed #2482.

Reply to this email directly or view it on GitHub:
#2482 (comment)

knrc · 2015-06-08T23:57:19Z

FYI We hit this issue in the lab in Brno some time back but the issue wasn't the recursion, it was the fact that the resolv.conf being pushed from the DHCP server contained entries that were not reachable. Disabling PEERDNS on our minions and only using the SkyDNS resolver fixed our issues, until today that is when we updated to this version.

TomasTomecek · 2015-09-09T13:39:11Z

still hitting this with 1.0.5

smarterclayton · 2015-09-09T13:46:50Z

Can you reach out to @mfojtik and myself offline so we can debug this?

On Wed, Sep 9, 2015 at 9:39 AM, Tomas Tomecek [email protected]
wrote:

still hitting this with 1.0.5

—
Reply to this email directly or view it on GitHub
#2482 (comment).

Clayton Coleman | Lead Engineer, OpenShift

TomasTomecek · 2015-09-09T14:33:34Z

@smarterclayton already talking to @sosiouxme

EDIT: looks like it's related to this: https://docs.openshift.com/enterprise/3.0/admin_guide/iptables.html#restarting

metal3d · 2017-06-16T08:34:26Z

I've got that issue on origin 1.5.1 and sometimes application won't contact database via the service name.
Nslookup fails 1 time per 10 requests.
Container resolv.conf points on the node where it runs (eg 172.16.1.15) and dnsmasq forwards on 172.30.0.1 that is the kubernetes service.
Any help will be appreciated.

sdodson · 2017-06-16T12:18:49Z

@metal3d Two things I'd check, are the service endpoints for kubernetes service all reachable from the node where you saw the failures?

oc describe svc/kubernetes

Also, are there any errors logged by dnsmasq service?

metal3d · 2017-06-16T13:11:16Z

Hi @sdodson:

$ oc describe svc/kubernetes -n default
Name:			kubernetes
Namespace:		default
Labels:			component=apiserver
			provider=kubernetes
Selector:		<none>
Type:			ClusterIP
IP:			172.30.0.1
Port:			https	443/TCP
Endpoints:		172.16.135.11:8443,172.16.135.12:8443
Port:			dns	53/UDP
Endpoints:		172.16.135.11:8053,172.16.135.12:8053
Port:			dns-tcp	53/TCP
Endpoints:		172.16.135.11:8053,172.16.135.12:8053
Session Affinity:	ClientIP
No events.

172.16.135.11:8443 and 172.16.135.12:8443 are my masters (etcd is also installed on node1 to have 3 servers)

resolv.conf indicates (in containers) the node on which it runs (eg. 172.16.135.15).

On each node, I've got:

$ cat /etc/dnsmasq.d/origin-dns.conf 
no-resolv
domain-needed
server=/cluster.local/172.30.0.1

dnsmasq has no error in logs, just, sometimes, nslookup on a service name fails on containers... 5% of requests fails.

So, right now, I tried to change node configuration to set dnsIP on 172.30.0.1 and it seems to works.

One more thing, now I setted up node config to hit skyDNS, I see name resolution:

Every 2.0s: nslookup galera                                                                                                                                                                             Fri Jun 16 13:06:53 2017
                                                                                                                                                                                                                                
Server:         172.30.0.1                                                                                                                                                                                                      
Address:        172.30.0.1#53                                                                                                                                                                                                   
                                                                                                                                                                                                                                
Name:   galera.test.svc.cluster.local                                                                                                                                                                                           
Address: 10.130.0.86                                                                                                                                                                                                            
Name:   galera.test.svc.cluster.local                                                                                                                                                                                           
Address: 10.131.0.38                                                                                                                                                                                                            
Name:   galera.test.svc.cluster.local                                                                                                                                                                                           
Address: 10.131.0.39

That was not the case before while "no-resolv" remains in dnsmasq configuration I think.

Note that I've installed openshift-origin with openshift-ansible, on fresh Centos 7 installation (I just installed and enabled NetworkManager service)

We have the same issu on CentOS 7 installed on Scaleway.io, sometimes dnsmasq fails to resolv a service name, as on our 5 bare metals here.

metal3d · 2017-06-16T13:14:22Z

The problem is that now I use direct skydns, I've got no ip rotation while I resolv service name. That's a pitty to not profit of dnsmasq options, cache, and so on...

metal3d · 2017-06-16T15:48:04Z

forget what I said, I've got the same error without passing by dnsmasq, so the problem seems to be coming from skydns or something like this:

# in container:
$ cat /etc/resolv.conf                                                                                                                                                                                            
search myshop.svc.cluster.local svc.cluster.local cluster.local priv.paas.smile.fr                                                                                                                                              
nameserver 172.30.0.1                                                                                                                                                                                                           
nameserver 8.8.8.8                                                                                                                                                                                                              
options ndots:5  

$ while true; do nslookup galera; sleep 1; done                                                                                                                                                                   
nslookup: can't resolve '(null)': Name does not resolve                                                                                                                                                                         
                                                                                                                                                                                                                                
nslookup: can't resolve 'galera': Name does not resolve                                                                                                                                                                         
nslookup: can't resolve '(null)': Name does not resolve                                                                                                                                                                         
                                                                                                                                                                                                                                
Name:      galera                                                                                                                                                                                                               
Address 1: 10.130.0.88                                                                                                                                                                                                          
Address 2: 10.130.0.89                                                                                                                                                                                                          
Address 3: 10.131.0.41                                                                                                                                                                                                          
nslookup: can't resolve '(null)': Name does not resolve                                                                                                                                                                         
                                                                                                                                                                                                                                
nslookup: can't resolve 'galera': Name does not resolve                                                                                                                                                                         
nslookup: can't resolve '(null)': Name does not resolve                                                                                                                                                                         
                                                                                                                                                                                                                                
nslookup: can't resolve 'galera': Name does not resolve                                                                                                                                                                         
nslookup: can't resolve '(null)': Name does not resolve                                                                                                                                                                         
                                                                                                                                                                                                                                
Name:      galera                                                                                                                                                                                                               
Address 1: 10.130.0.88                                                                                                                                                                                                          
Address 2: 10.130.0.89                                                                                                                                                                                                          
Address 3: 10.131.0.41                                                                                                                                                                                                          
nslookup: can't resolve '(null)': Name does not resolve                                                                                                                                                                         
                                                                                                                                                                                                                                
Name:      galera                                                                                                                                                                                                               
Address 1: 10.130.0.88                                                                                                                                                                                                          
Address 2: 10.130.0.89                                                                                                                                                                                                          
Address 3: 10.131.0.41

smarterclayton added priority/P1 area/reliability labels May 27, 2015

sosiouxme mentioned this issue May 28, 2015

Update SkyDNS and disable recursion #2569

Merged

smarterclayton mentioned this issue May 28, 2015

Intermittent DNS errors: service IP returned as 127.0.53.53 #2024

Closed

danmcp assigned jwhonce May 29, 2015

danmcp closed this as completed Jun 4, 2015

jimmidyson mentioned this issue Oct 24, 2015

Repeated skydns log: incomplete CNAME chain: rcode is not equal to success #5154

Closed

soltysh mentioned this issue Jan 25, 2016

k8s rebase - 4a65fa1f35e98ae96785836d99bf4ec7712ab682 #6320

Merged

46 tasks

internal DNS queries fail sometimes during build #2482

internal DNS queries fail sometimes during build #2482

Comments

TomasTomecek commented May 26, 2015

Logs

container's hosts

container's resolv.conf

host's resolv.conf

docker inspect $build_container

csrwng commented May 26, 2015

smarterclayton commented May 26, 2015

sosiouxme commented May 28, 2015

ncdc commented May 28, 2015

smarterclayton commented May 28, 2015

dig @172.16.4.81 master.osv3.example.com

dig @172.16.4.81 master.osv3.example.com

dig @172.16.4.81 master.osv3.example.com

dig @172.16.4.81 master.osv3.example.com

sosiouxme commented May 28, 2015

ncdc commented May 28, 2015

sosiouxme commented May 28, 2015

sosiouxme commented May 28, 2015

ncdc commented May 28, 2015

ncdc commented May 28, 2015

smarterclayton commented May 28, 2015

sosiouxme commented May 28, 2015

thoraxe commented May 28, 2015

sdodson commented May 28, 2015

thoraxe commented May 28, 2015

ncdc commented May 28, 2015

thoraxe commented May 28, 2015

ncdc commented May 28, 2015

sdodson commented May 28, 2015

thoraxe commented May 28, 2015

sosiouxme commented May 29, 2015

sosiouxme commented Jun 1, 2015

TomasTomecek commented Jun 1, 2015

smarterclayton commented Jun 4, 2015

knrc commented Jun 8, 2015

TomasTomecek commented Sep 9, 2015

smarterclayton commented Sep 9, 2015

TomasTomecek commented Sep 9, 2015

metal3d commented Jun 16, 2017

sdodson commented Jun 16, 2017

metal3d commented Jun 16, 2017 • edited Loading

metal3d commented Jun 16, 2017

metal3d commented Jun 16, 2017

container's `hosts`

container's `resolv.conf`

host's `resolv.conf`

`docker inspect $build_container`

metal3d commented Jun 16, 2017 •

edited

Loading