Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal DNS queries fail sometimes during build #2482

Closed
TomasTomecek opened this issue May 26, 2015 · 34 comments
Closed

internal DNS queries fail sometimes during build #2482

TomasTomecek opened this issue May 26, 2015 · 34 comments

Comments

@TomasTomecek
Copy link
Contributor

My git clone during build failed with:

fatal: Unable to look up our.internal.git.redhat.com (port 9418) (Name or service not known)

Unfortunately, this is not 100% reproducible. Only happens sometimes. When I compare docker inspect, /etc/hosts, /etc/resolv.conf of "wrong" and "good" build container, they match precisely.

Logs

container's hosts

172.17.0.7      openshift-base-20150526-081936
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

container's resolv.conf

nameserver 172.16.125.48
nameserver 172.16.125.39
search default.cluster.local default.svc.cluster.local svc.cluster.local cluster.local internal.domain.redhat.com

host's resolv.conf

# Generated by NetworkManager
domain internal.domain.redhat.com
search internal.domain.redhat.com
nameserver 172.16.125.39

docker inspect $build_container

[{
    "AppArmorProfile": "",
    "Args": [
        "--verbose",
        "inside-build",
        "--input",
        "osv3"
    ],
    "Config": {
        "AttachStderr": false,
        "AttachStdin": false,
        "AttachStdout": false,
        "Cmd": [
            "dock",
            "--verbose",
            "inside-build",
            "--input",
            "osv3"
        ],
        "CpuShares": 0,
        "Cpuset": "",
        "Domainname": "",
        "Entrypoint": null,
        "Env": [
            "BUILD=<too-long>",
            "SOURCE_REPOSITORY=git://our.internal.git.redhat.com/path",
            "DOCK_PLUGINS=<too-long>",
            "DOCKER_SOCKET=/var/run/docker.sock",
            "SOURCE_URI=git://our.internal.git.redhat.com/path",
            "SOURCE_REF=branch",
            "OUTPUT_REGISTRY=172.17.42.1:5000",
            "OUTPUT_IMAGE=image-name:tag",
            "KUBERNETES_PORT_443_TCP=tcp://172.30.0.2:443",
            "KUBERNETES_RO_SERVICE_PORT=80",
            "KUBERNETES_RO_PORT_80_TCP_ADDR=172.30.0.1",
            "KUBERNETES_SERVICE_HOST=172.30.0.2",
            "KUBERNETES_PORT_443_TCP_PROTO=tcp",
            "KUBERNETES_PORT_443_TCP_ADDR=172.30.0.2",
            "KUBERNETES_RO_SERVICE_HOST=172.30.0.1",
            "KUBERNETES_RO_PORT_80_TCP_PROTO=tcp",
            "KUBERNETES_SERVICE_PORT=443",
            "KUBERNETES_PORT_443_TCP_PORT=443",
            "KUBERNETES_PORT=tcp://172.30.0.2:443",
            "KUBERNETES_RO_PORT=tcp://172.30.0.1:80",
            "KUBERNETES_RO_PORT_80_TCP=tcp://172.30.0.1:80",
            "KUBERNETES_RO_PORT_80_TCP_PORT=80",
            "container=docker"
        ],
        "ExposedPorts": null,
        "Hostname": "openshift-base-20150526-081936",
        "Image": "buildroot",
        "Labels": {
            "Architecture": "x86_64",
            "Build_Host": "build.host.redhat.com",
            "Name": "rhel-server-docker",
            "Release": "4",
            "Vendor": "Red Hat, Inc.",
            "Version": "7.1",
            "io.kubernetes.pod.name": "default/openshift-base-20150526-081936"
        },
        "MacAddress": "",
        "Memory": 0,
        "MemorySwap": 0,
        "NetworkDisabled": false,
        "OnBuild": null,
        "OpenStdin": false,
        "PortSpecs": null,
        "StdinOnce": false,
        "Tty": false,
        "User": "",
        "Volumes": null,
        "WorkingDir": ""
    },
    "Created": "2015-05-26T06:22:58.532428212Z",
    "Driver": "devicemapper",
    "ExecDriver": "native-0.2",
    "ExecIDs": null,
    "HostConfig": {
        "Binds": [
            "/var/run/docker.sock:/var/run/docker.sock",
            "/var/lib/openshift/openshift.local.volumes/pods/2e4fe713-036f-11e5-a053-fa163ed7ae77/containers/custom-build/542e4dcf44a8178414394da6fa4e232d9b781adb8e95dbfc10fe22225e783847:/dev/termination-log"
        ],
        "CapAdd": null,
        "CapDrop": null,
        "CgroupParent": "",
        "ContainerIDFile": "",
        "CpuShares": 0,
        "CpusetCpus": "",
        "Devices": null,
        "Dns": [
            "172.16.125.48",
            "172.16.125.39"
        ],
        "DnsSearch": [
            "default.cluster.local",
            "default.svc.cluster.local",
            "svc.cluster.local",
            "cluster.local",
            "internal.domain.redhat.com"
        ],
        "ExtraHosts": null,
        "IpcMode": "container:91b09605c5bd6daac79d9bf0f56032063266536c7c9bd36bced4ee2e1270363d",
        "Links": null,
        "LogConfig": {
            "Config": null,
            "Type": "json-file"
        },
        "LxcConf": null,
        "Memory": 0,
        "MemorySwap": 0,
        "MountRun": false,
        "NetworkMode": "container:91b09605c5bd6daac79d9bf0f56032063266536c7c9bd36bced4ee2e1270363d",
        "PidMode": "",
        "PortBindings": null,
        "Privileged": true,
        "PublishAllPorts": false,
        "ReadonlyRootfs": false,
        "RestartPolicy": {
            "MaximumRetryCount": 0,
            "Name": ""
        },
        "SecurityOpt": null,
        "Ulimits": null,
        "VolumesFrom": null
    },
    "HostnamePath": "/var/lib/docker/containers/91b09605c5bd6daac79d9bf0f56032063266536c7c9bd36bced4ee2e1270363d/hostname",
    "HostsPath": "/var/lib/docker/containers/91b09605c5bd6daac79d9bf0f56032063266536c7c9bd36bced4ee2e1270363d/hosts",
    "Id": "542e4dcf44a8178414394da6fa4e232d9b781adb8e95dbfc10fe22225e783847",
    "Image": "bc256449b7ad6a5b4c6936e2f13e65733fb54f5ea1462eaba22741b6a180ad25",
    "LogPath": "/var/lib/docker/containers/542e4dcf44a8178414394da6fa4e232d9b781adb8e95dbfc10fe22225e783847/542e4dcf44a8178414394da6fa4e232d9b781adb8e95dbfc10fe22225e783847-json.log",
    "MountLabel": "system_u:object_r:svirt_sandbox_file_t:s0:c440,c491",
    "Name": "/k8s_custom-build.40afc205_openshift-base-20150526-081936_default_2e4fe713-036f-11e5-a053-fa163ed7ae77_8a7cf8eb",
    "NetworkSettings": {
        "Bridge": "",
        "Gateway": "",
        "GlobalIPv6Address": "",
        "GlobalIPv6PrefixLen": 0,
        "IPAddress": "",
        "IPPrefixLen": 0,
        "IPv6Gateway": "",
        "LinkLocalIPv6Address": "",
        "LinkLocalIPv6PrefixLen": 0,
        "MacAddress": "",
        "PortMapping": null,
        "Ports": null
    },
    "Path": "dock",
    "ProcessLabel": "system_u:system_r:svirt_lxc_net_t:s0:c440,c491",
    "ResolvConfPath": "/var/lib/docker/containers/91b09605c5bd6daac79d9bf0f56032063266536c7c9bd36bced4ee2e1270363d/resolv.conf",
    "RestartCount": 0,
    "State": {
        "Dead": false,
        "Error": "",
        "ExitCode": 1,
        "FinishedAt": "2015-05-26T06:23:55.03157924Z",
        "OOMKilled": false,
        "Paused": false,
        "Pid": 0,
        "Restarting": false,
        "Running": false,
        "StartedAt": "2015-05-26T06:23:19.996966349Z"
    },
    "Volumes": {
        "/dev/termination-log": "/var/lib/openshift/openshift.local.volumes/pods/2e4fe713-036f-11e5-a053-fa163ed7ae77/containers/custom-build/542e4dcf44a8178414394da6fa4e232d9b781adb8e95dbfc10fe22225e783847",
        "/var/run/docker.sock": "/run/docker.sock"
    },
    "VolumesRW": {
        "/dev/termination-log": true,
        "/var/run/docker.sock": true
    },
    "VolumesRelabel": {
        "/dev/termination-log": "",
        "/var/run/docker.sock": ""
    }
}
]
@csrwng
Copy link
Contributor

csrwng commented May 26, 2015

@smarterclayton - potentially the same issue as #2024

@smarterclayton
Copy link
Contributor

My suspicion is that the upstream is either taking too long or not responding. We should be able to reproduce this against the internal name server and determine whether it's a timeout/cache problem.

@sosiouxme
Copy link
Member

I suspect SkyDNS (as configured for OpenShift) is the problem here. It seems to be answering queries for things it shouldn't, acting as an open resolver which is already a problem, but beyond that it doesn't answer consistently if it's chained to more than one nameserver. So if you configure a node /etc/resolv.conf with two nameservers (one for internal components, one for "real" DNS), you'll get inconsistent results.

Here's a sequence of queries where I have just that setup. I use a side dnsmasq installation for resolving the actual OpenShift hosts, in addition to the regular DNS. 172.16.4.81 is the master where SkyDNS is running.

# dig @172.16.4.81 master.osv3.example.com

; <<>> DiG 9.9.4-RedHat-9.9.4-18.el7_1.1 <<>> @172.16.4.81 master.osv3.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35166
;; flags: qr aa rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;master.osv3.example.com.       IN      A

;; ANSWER SECTION:
master.osv3.example.com. 0      IN      A       172.16.4.81

;; Query time: 6 msec
;; SERVER: 172.16.4.81#53(172.16.4.81)
;; WHEN: Thu May 28 09:54:09 EDT 2015
;; MSG SIZE  rcvd: 57

# dig @172.16.4.81 master.osv3.example.com

[...]
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 50953
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

[...]

;; AUTHORITY SECTION:
example.com.            843     IN      SOA     sns.dns.icann.org. noc.dns.icann.org. 2015050902 7200 3600 1209600 3600

[note this gets cached for future queries, I don't see the authority section again]

;; Query time: 3 msec
;; SERVER: 172.16.4.81#53(172.16.4.81)
;; WHEN: Thu May 28 09:54:12 EDT 2015
;; MSG SIZE  rcvd: 109

# dig @172.16.4.81 master.osv3.example.com
[...]
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 46433
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
[...]
# dig @172.16.4.81 master.osv3.example.com
[...]
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15514
;; flags: qr aa rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
[...]

Thereafter it randomly answers with NOERROR or NXDOMAIN. Obviously this plays hob with builds and deploys contacting the master, and it would be the same problem if you're building or pulling from an internal repository. You'll only see the problem in the container where SkyDNS is inserted ahead of the other resolvers from the host, and only when you have multiple upstream nameservers that give different results.

IMNSHO SkyDNS should not be chaining requests to resolve any domains it doesn't own. That's behavior we must be able to disable to avoid deploying open resolvers. Even if it's configured to do that, it should consult the resolver chain in the correct order, not randomly select an upstream server.

What options do we have for configuring SkyDNS?

/CC @brenton @thoraxe @detiber @sdodson

@ncdc
Copy link
Contributor

ncdc commented May 28, 2015

@sosiouxme it looks like we can configure SkyDNS not to forward

@smarterclayton
Copy link
Contributor

On May 28, 2015, at 10:52 AM, Luke Meyer [email protected] wrote:

I suspect SkyDNS (as configured for OpenShift) is the problem here. It seems to be answering queries for things it shouldn't, acting as an open resolver which is already a problem, but beyond that it doesn't answer consistently if it's chained to more than one nameserver. So if you configure a node /etc/resolv.conf with two nameservers (one for internal components, one for "real" DNS), you'll get inconsistent results.

Here's a sequence of queries where I have just that setup. I use a side dnsmasq installation for resolving the actual OpenShift hosts, in addition to the regular DNS. 172.16.4.81 is the master where SkyDNS is running.

dig @172.16.4.81 master.osv3.example.com

; <<>> DiG 9.9.4-RedHat-9.9.4-18.el7_1.1 <<>> @172.16.4.81 master.osv3.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35166
;; flags: qr aa rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;master.osv3.example.com. IN A

;; ANSWER SECTION:
master.osv3.example.com. 0 IN A 172.16.4.81

;; Query time: 6 msec
;; SERVER: 172.16.4.81#53(172.16.4.81)
;; WHEN: Thu May 28 09:54:09 EDT 2015
;; MSG SIZE rcvd: 57

dig @172.16.4.81 master.osv3.example.com

[...]
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 50953
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

[...]

;; AUTHORITY SECTION:
example.com. 843 IN SOA sns.dns.icann.org. noc.dns.icann.org. 2015050902 7200 3600 1209600 3600

[note this gets cached for future queries, I don't see the authority section again]

;; Query time: 3 msec
;; SERVER: 172.16.4.81#53(172.16.4.81)
;; WHEN: Thu May 28 09:54:12 EDT 2015
;; MSG SIZE rcvd: 109

dig @172.16.4.81 master.osv3.example.com

[...]
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 46433
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
[...]

dig @172.16.4.81 master.osv3.example.com

[...]
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15514
;; flags: qr aa rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
[...]
Thereafter it randomly answers with NOERROR or NXDOMAIN. Obviously this plays hob with builds and deploys contacting the master, and it would be the same problem if you're building or pulling from an internal repository. You'll only see the problem in the container where SkyDNS is inserted ahead of the other resolvers from the host, and only when you have multiple upstream nameservers that give different results.

IMNSHO SkyDNS should not be chaining requests to resolve any domains it doesn't own. That's behavior we must be able to disable to avoid deploying open resolvers. Even if it's configured to do that, it should consult the resolver chain in the correct order, not randomly select an upstream server.

What options do we have for configuring SkyDNS?

See https://github.com/skynetservices/skydns/blob/master/server/config.go
/CC @brenton @thoraxe


Reply to this email directly or view it on GitHub.

@sosiouxme
Copy link
Member

Looking for the config file options that feed into that config...

It should really be the default to not forward requests, BTW.

@ncdc
Copy link
Contributor

ncdc commented May 28, 2015

@sosiouxme I've tested setting config.Nameservers = []string{} and that stops it from forwarding. It also results in printing out this any time a DNS lookup fails: skydns: can not forward, no name servers defined and this anytime you look up e.g. the docker-registry service: skydns: incomplete CNAME chain: no nameservers configured can not lookup name. DNS resolution does work and I'm able to do this: curl -v docker-registry.default.svc.cluster.local:5000/healthz.

@sosiouxme
Copy link
Member

@ncdc I think for OpenShift we need some more options in https://github.com/openshift/origin/blob/master/pkg/cmd/server/origin/master.go#L773-L796 although if we just hardcode do-not-forward that would be fine (at least for now). It shouldn't spew warnings though, it's not noteworthy to respond NXDOMAIN and let the next resolver handle it...

@sosiouxme
Copy link
Member

@ncdc
Copy link
Contributor

ncdc commented May 28, 2015

@sosiouxme NoRec isn't in our vendored copy, but we can update.

@ncdc
Copy link
Contributor

ncdc commented May 28, 2015

I've updated the vendored copy and enabled NoRec. It seems to be working - it responds with SERVFAIL. I do see this show up in the log every time I try to curl the docker-registry service via DNS:

skydns: incomplete CNAME chain: rcode is not equal to success

Not exactly sure what that means.

@smarterclayton
Copy link
Contributor

If you update be sure to grab the upstream patch we have applied over it

On May 28, 2015, at 11:36 AM, Andy Goldstein [email protected] wrote:

I've updated the vendored copy and enabled NoRec. It seems to be working - it responds with SERVFAIL. I do see this show up in the log every time I try to curl the docker-registry service via DNS:

skydns: incomplete CNAME chain: rcode is not equal to success
Not exactly sure what that means.


Reply to this email directly or view it on GitHub.

@sosiouxme
Copy link
Member

Well as long as the resolver moves on to the next I guess it's not important if the return code is a little funny and there is some extra spew in the log (we should probably fix the log spew though, this is not an error of any kind).

@thoraxe
Copy link
Contributor

thoraxe commented May 28, 2015

If SkyDNS doesn't forward requests, what happens when a container asks for something that isn't in SkyDNS? eg: google.com

@sdodson
Copy link
Member

sdodson commented May 28, 2015

It should move on to the next nameserver, which will be the host's nameserver[s].

@thoraxe
Copy link
Contributor

thoraxe commented May 28, 2015

Should the hosts be able to use SkyDNS for resolution? This came up in a dev list thread regarding using SkyDNS for finding the registry at the host level.

If the hosts use SkyDNS for resolution, then the "next nameserver" would again be SkyDNS, unless we configure SkyDNS to forward...

@ncdc
Copy link
Contributor

ncdc commented May 28, 2015

@thoraxe I've tested this on my host, with this for /etc/resolv.conf:

nameserver 127.0.0.1 (skydns)
nameserver x.x.x.x (my normal dns server)

It works fine for DNS resolution on the host. It will resolve cluster DNS entries. It will resolve non cluster DNS entries like google.com - skydns doesn't find google.com, so the host tries the next nameserver (my x.x.x.x entry). I can't say I've specifically looked at how resolution works in the containers with this change in place.

@thoraxe
Copy link
Contributor

thoraxe commented May 28, 2015

Yeah you would have to test from inside a container launched pointing at SkyDNS as its only resolver.

My assumption is that your host tries to resolve the entry, SkyDNS rejects, and then it moves to the next resolver. Who actually answered your query for google.com? The normal DNS server?

@ncdc
Copy link
Contributor

ncdc commented May 28, 2015

My assumption is that your host tries to resolve the entry, SkyDNS rejects, and then it moves to the next resolver. Who actually answered your query for google.com? The normal DNS server?

Yes

@sdodson
Copy link
Member

sdodson commented May 28, 2015

This is from within a pod, 192.168.122.90 is my master, 192.168.122.1 is libvirt's dnsmasq assigned via DHCP to the host where this pod is running.

-bash-4.3# cat /etc/resolv.conf 
nameserver 192.168.122.90
nameserver 192.168.122.1
search default.local local example.com
-bash-4.3# nslookup google.com
;; Got SERVFAIL reply from 192.168.122.90, trying next server
Server:         192.168.122.1
Address:        192.168.122.1#53

Non-authoritative answer:
Name:   google.com
Address: 216.58.217.142

@thoraxe
Copy link
Contributor

thoraxe commented May 28, 2015

I guess this is a positive sign?

@sosiouxme
Copy link
Member

Containers don't get SkyDNS as their only resolver. Containers get the /etc/resolv.conf from the host plus SkyDNS at the top. Currently. Andy's proposing putting it at the front of the host /etc/resolv.conf too, in which case it wouldn't need inserting at all.

@sosiouxme
Copy link
Member

#2569 prevents skydns from recursing requests it doesn't know about. So this should be working now; @TomasTomecek can you verify?

Shortly, we will switch from having kubernetes insert skydns as the first resolver, to the expectation that nodes will have it at the top of their /etc/resolv.conf (which docker already passes on to containers it deploys). This way, kubernetes pods, docker containers like builders spun off from a build (without help from kubernetes), and nodes will all have the same environment for DNS resolution (except the node also has /etc/hosts).

@TomasTomecek
Copy link
Contributor Author

We disabled skydns by removing it from master config (as suggested by @csrwng). I guess that I can verify this (but since this wasn't happening always, it will be hard to do).

@danmcp danmcp closed this as completed Jun 4, 2015
@smarterclayton
Copy link
Contributor

Fixed by the changes to put master DNS entry in host /etc/resolv.conf and disabling the open forwarding on the master

----- Original Message -----

Closed #2482.


Reply to this email directly or view it on GitHub:
#2482 (comment)

@knrc
Copy link

knrc commented Jun 8, 2015

FYI We hit this issue in the lab in Brno some time back but the issue wasn't the recursion, it was the fact that the resolv.conf being pushed from the DHCP server contained entries that were not reachable. Disabling PEERDNS on our minions and only using the SkyDNS resolver fixed our issues, until today that is when we updated to this version.

@TomasTomecek
Copy link
Contributor Author

still hitting this with 1.0.5

@smarterclayton
Copy link
Contributor

Can you reach out to @mfojtik and myself offline so we can debug this?

On Wed, Sep 9, 2015 at 9:39 AM, Tomas Tomecek [email protected]
wrote:

still hitting this with 1.0.5


Reply to this email directly or view it on GitHub
#2482 (comment).

Clayton Coleman | Lead Engineer, OpenShift

@TomasTomecek
Copy link
Contributor Author

@smarterclayton already talking to @sosiouxme

EDIT: looks like it's related to this: https://docs.openshift.com/enterprise/3.0/admin_guide/iptables.html#restarting

@metal3d
Copy link

metal3d commented Jun 16, 2017

I've got that issue on origin 1.5.1 and sometimes application won't contact database via the service name.
Nslookup fails 1 time per 10 requests.
Container resolv.conf points on the node where it runs (eg 172.16.1.15) and dnsmasq forwards on 172.30.0.1 that is the kubernetes service.
Any help will be appreciated.

@sdodson
Copy link
Member

sdodson commented Jun 16, 2017

@metal3d Two things I'd check, are the service endpoints for kubernetes service all reachable from the node where you saw the failures?

oc describe svc/kubernetes

Also, are there any errors logged by dnsmasq service?

@metal3d
Copy link

metal3d commented Jun 16, 2017

Hi @sdodson:

$ oc describe svc/kubernetes -n default
Name:			kubernetes
Namespace:		default
Labels:			component=apiserver
			provider=kubernetes
Selector:		<none>
Type:			ClusterIP
IP:			172.30.0.1
Port:			https	443/TCP
Endpoints:		172.16.135.11:8443,172.16.135.12:8443
Port:			dns	53/UDP
Endpoints:		172.16.135.11:8053,172.16.135.12:8053
Port:			dns-tcp	53/TCP
Endpoints:		172.16.135.11:8053,172.16.135.12:8053
Session Affinity:	ClientIP
No events.

172.16.135.11:8443 and 172.16.135.12:8443 are my masters (etcd is also installed on node1 to have 3 servers)

resolv.conf indicates (in containers) the node on which it runs (eg. 172.16.135.15).

On each node, I've got:

$ cat /etc/dnsmasq.d/origin-dns.conf 
no-resolv
domain-needed
server=/cluster.local/172.30.0.1

dnsmasq has no error in logs, just, sometimes, nslookup on a service name fails on containers... 5% of requests fails.

So, right now, I tried to change node configuration to set dnsIP on 172.30.0.1 and it seems to works.

One more thing, now I setted up node config to hit skyDNS, I see name resolution:

Every 2.0s: nslookup galera                                                                                                                                                                             Fri Jun 16 13:06:53 2017
                                                                                                                                                                                                                                
Server:         172.30.0.1                                                                                                                                                                                                      
Address:        172.30.0.1#53                                                                                                                                                                                                   
                                                                                                                                                                                                                                
Name:   galera.test.svc.cluster.local                                                                                                                                                                                           
Address: 10.130.0.86                                                                                                                                                                                                            
Name:   galera.test.svc.cluster.local                                                                                                                                                                                           
Address: 10.131.0.38                                                                                                                                                                                                            
Name:   galera.test.svc.cluster.local                                                                                                                                                                                           
Address: 10.131.0.39

That was not the case before while "no-resolv" remains in dnsmasq configuration I think.

Note that I've installed openshift-origin with openshift-ansible, on fresh Centos 7 installation (I just installed and enabled NetworkManager service)

We have the same issu on CentOS 7 installed on Scaleway.io, sometimes dnsmasq fails to resolv a service name, as on our 5 bare metals here.

@metal3d
Copy link

metal3d commented Jun 16, 2017

The problem is that now I use direct skydns, I've got no ip rotation while I resolv service name. That's a pitty to not profit of dnsmasq options, cache, and so on...

@metal3d
Copy link

metal3d commented Jun 16, 2017

forget what I said, I've got the same error without passing by dnsmasq, so the problem seems to be coming from skydns or something like this:

# in container:
$ cat /etc/resolv.conf                                                                                                                                                                                            
search myshop.svc.cluster.local svc.cluster.local cluster.local priv.paas.smile.fr                                                                                                                                              
nameserver 172.30.0.1                                                                                                                                                                                                           
nameserver 8.8.8.8                                                                                                                                                                                                              
options ndots:5  

$ while true; do nslookup galera; sleep 1; done                                                                                                                                                                   
nslookup: can't resolve '(null)': Name does not resolve                                                                                                                                                                         
                                                                                                                                                                                                                                
nslookup: can't resolve 'galera': Name does not resolve                                                                                                                                                                         
nslookup: can't resolve '(null)': Name does not resolve                                                                                                                                                                         
                                                                                                                                                                                                                                
Name:      galera                                                                                                                                                                                                               
Address 1: 10.130.0.88                                                                                                                                                                                                          
Address 2: 10.130.0.89                                                                                                                                                                                                          
Address 3: 10.131.0.41                                                                                                                                                                                                          
nslookup: can't resolve '(null)': Name does not resolve                                                                                                                                                                         
                                                                                                                                                                                                                                
nslookup: can't resolve 'galera': Name does not resolve                                                                                                                                                                         
nslookup: can't resolve '(null)': Name does not resolve                                                                                                                                                                         
                                                                                                                                                                                                                                
nslookup: can't resolve 'galera': Name does not resolve                                                                                                                                                                         
nslookup: can't resolve '(null)': Name does not resolve                                                                                                                                                                         
                                                                                                                                                                                                                                
Name:      galera                                                                                                                                                                                                               
Address 1: 10.130.0.88                                                                                                                                                                                                          
Address 2: 10.130.0.89                                                                                                                                                                                                          
Address 3: 10.131.0.41                                                                                                                                                                                                          
nslookup: can't resolve '(null)': Name does not resolve                                                                                                                                                                         
                                                                                                                                                                                                                                
Name:      galera                                                                                                                                                                                                               
Address 1: 10.130.0.88                                                                                                                                                                                                          
Address 2: 10.130.0.89                                                                                                                                                                                                          
Address 3: 10.131.0.41    

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests