Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service in docker-compose resolved wrong ip ,resulting in a connection refused. #41766

Closed
zffocussss opened this issue Dec 9, 2020 · 18 comments

Comments

@zffocussss
Copy link

zffocussss commented Dec 9, 2020

Description

I am using docker-compose to manage my docker service.I have some containers which are running in a same docker-compose network.but it gave me surprise that when container A connect to container B by service name,it was refused as the ip of service was resolved to a wrong ip.

Steps to reproduce the issue:
it is my first time to see this strange behavior

I can not reproduce it

it works after I restarted the containert B

Describe the results you received:
another container in a same docker-compose network is refused when trying to connect it as it is resolved to a wrong ip

Describe the results you expected:
I hope the ip is resovled right

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        6247962
 Built:             Sun Feb 10 04:13:50 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:42:13 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Containers: 14
 Running: 14
 Paused: 0
 Stopped: 0
Images: 84
Server Version: 18.09.2
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-117-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.796GiB
Name: hk-gino-dev-03
ID: 3OS3:JEQT:O7VV:4ZPA:TL7E:IIFD:GDEQ:VYP3:4IX5:WABO:7C7X:K25G
Docker Root Dir: /data/docker/docker-data
Debug Mode (client): false
Debug Mode (server): false
Username: zffocus
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):
ubuntu 16

docker-compose info

docker-compose version 1.23.2, build 1110ad01
docker-py version: 3.6.0
CPython version: 3.6.7
OpenSSL version: OpenSSL 1.1.0f  25 May 2017
@thaJeztah
Copy link
Member

Docker 18.09 reached EOL last year. Are you still able to reproduce this on a current (19.03 or 20.10) version of docker?

We would also need to have a minimal reproducer (compose file using standard images and steps to reproduce), otherwise it will not be possible to investigate the issue.

@zffocussss
Copy link
Author

Docker 18.09 reached EOL last year. Are you still able to reproduce this on a current (19.03 or 20.10) version of docker?

We would also need to have a minimal reproducer (compose file using standard images and steps to reproduce), otherwise it will not be possible to investigate the issue.

I can provide the docker-compose.yml
service A

version: '3'
services:
  nginx:
    container_name: nginx
    image: nginx
    extra_hosts:
      - "DOCKER-BRIDGE:${DOCKER_BRIDGE_IP}"
    volumes:
      - "/data/nginx/conf/nginx.conf:/etc/nginx/nginx.conf"
      - "/data/nginx/conf/conf.d:/etc/nginx/conf.d"
      - "/data/nginx/file:/var/www/html"
      - "/data/nginx/log/drafter:/data/nginx/log/drafter"
    network_mode:
      compose_kong-net

networks:
  custom_network:
    external:
      name: compose_kong-net

the configuration serviceA proxy to serviceB

location ~* ^/doxturbo/ {

             rewrite ^/doxturbo/(.*)$ /$1 break;
             proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
             proxy_set_header Host $host;
             proxy_http_version 1.1;
             proxy_set_header Upgrade $http_upgrade;
             proxy_set_header Connection "upgrade";
             proxy_pass http://doxturbo:8088;

            }

service B

version: "3.6"

services:
  doxturbo:
    container_name: doxturbo
    build:
      context: ../backend
      dockerfile: ${PWD}/Dockerfile
      args:
        - DEVOPS_ACCOUNT=${DEVOPS_ACCOUNT}
        - DEVOPS_PWD=${DEVOPS_PWD}
    image: "doxturbo:local"
    environment:
      BRANCH: ${BRANCH}
      REDIS: redis://redis:6379
      ENV: ${ENV}
      VERSION: ${VERSION}
      VERSION_FULL: ${VERSION}
      CONSUL_ADDRESS: http://consul:8500
    ports:
      - "8088:8088"
    volumes:
      - "../backend:/doxturbo"
    working_dir:
      /doxturbo
    command:
      - /bin/bash
      - -c
      - |
        set -exu
        run my application
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8088/healthcheck/"]
      interval: 30s
      timeout: 7s
      retries: 3
    network_mode:
      compose_kong-net

networks:
  custom_network:
    external:
      name: compose_kong-net

@laxmanpradhan
Copy link

laxmanpradhan commented Dec 14, 2020

@thaJeztah
I think I am having the same issue. Essentially the docker swarm DNS server has the wrong IP address in it's A records. The IP addresses are all minus 1 from what they should be, ie 10.0.4.8 -> 10.0.4.7.

I deploy a stack to docker swarm, I use docker-compose to create three services and add them to the same overlay network. I should be able to ping one from the other using ping stack_service-name.stack_network-name, ie ping nextcloud-mariadb.infraTest_infraNet. The resulting output shows the IP address that it is resolving to is shifted by 1. Given that it is always shifted by 1, I think it is reasonable to assume there is a bug somewhere in the DNS records of the overlay network.

# ping nextcloud-mariadb.infraTest_infraNet
PING nextcloud-mariadb.infraTest_infraNet (10.0.4.7): 56 data bytes
^C
--- nextcloud-mariadb.infraTest_infraNet ping statistics ---
6 packets transmitted, 0 packets received, 100% packet loss

It is looking at IP 10.0.4.7, however if I inspect the network, I see the actual IP for that service is 10.0.4.8 and indeed, I can ping 10.0.4.8 and it works. This behaviour is the same for all the services that I deploy via a stack and is always the actual IP -1.

Notes:

  • Using adminer, I can connect to the nextcloud-mariadb database using either the full container ID infraTest_nextcloud-mariadb.1.icovmyqaweew7co5il5tef1kh or the correct IP address 10.0.4.8.
  • Using busybox and adminer, I can ping the above host name and IP address as well
  • Using adminer I cannot access the database using infraTest_nextcloud.infraTest_infraNet
  • Using adminer or busybox, I cannot ping infraTest_nextcloud.infraTest_infraNet as the IP address that it looks for is the correct IP -1
  • using busybox, I cannot nslookup infraTest_nextcloud.infraTest_infraNet as it doesn't not seem to find a DNS entry

Steps to reproduce:

  1. Deploy this docker-compose or any docker compose that has at least 2 services in a user defined network.
version: "3.8"

services:

  nextcloud-mariadb:
    image: mariadb
    volumes:
      - /zfs/nextcloud-mariadb:/var/lib/mysql
    environment:
      - MYSQL_ROOT_PASSWORD=xxx
      - MYSQL_PASSWORD=xxx
      - MYSQL_DATABASE=nextcloud
      - MYSQL_USER=nextcloud
    networks:
      - infraNet
    ports:
      - "3306:3306"

  adminer:
    image: adminer
    networks:
      - infraNet
    ports:
      - target: 8080
        published: 8081
        protocol: tcp
        mode: host

  busybox:
    image: busybox
    networks:
      - infraNet
    command: sleep 3000

networks:
  infraNet:
    external: false
  1. Using this command to deploy: docker stack deploy --compose-file docker-compose.yml infraTest

  2. Run docker network inspect infraNet

"1c1d429ff92f91b0784259ed220729fd854c7f8ed7f1ce63724b19b07d2f0ce2": {
    "Name": "infraTest_busybox.1.o7558rpwomdn2qq49zrp7gdtc",
    "EndpointID": "813071bee520338ba994610492fa1187da2af873f227d564b6eccd8e0adad885",
    "MacAddress": "02:42:0a:00:04:06",
    "IPv4Address": "10.0.4.6/24",
    "IPv6Address": ""
},
"5c5b95629752eefec659dc3a5ca68a3cccd0f5f4a0cb84884eaab2b27c861db1": {
    "Name": "infraTest_nextcloud-mariadb.1.icovmyqaweew7co5il5tef1kh",
    "EndpointID": "b3902f3fb0e47fe32b54ab21b3a8a972cf3b4dc983780e11b0666872a89cf6fd",
    "MacAddress": "02:42:0a:00:04:08",
    "IPv4Address": "10.0.4.8/24",
    "IPv6Address": ""
},
"690e876df510e2daa6705b613b404db689d93ddeb278a36066273e9e4ea94f09": {
    "Name": "infraTest_adminer.1.twjwt7bditsipwme1f3ksun20",
    "EndpointID": "5b6097bbad23b8ab5dfaa9dc2c16be9f6395ba8fbf31d489667b4880bd0e3c1a",
    "MacAddress": "02:42:0a:00:04:03",
    "IPv4Address": "10.0.4.3/24",
    "IPv6Address": ""
},
"lb-infraTest_infraNet": {
    "Name": "infraTest_infraNet-endpoint",
    "EndpointID": "74fbc070ae07a41ad242d8a8011edd6eaa7f90a37051786471259f9d131e9b54",
    "MacAddress": "02:42:0a:00:04:04",
    "IPv4Address": "10.0.4.4/24",
    "IPv6Address": ""
}
  1. Exec into any other container such as busybox docker exec -it infraTest_busybox.1.zesvs77dm5z91drms72mvu0zo /bin/sh

  2. Ping infraTest_nextcloud.infraTest_infraNet will show that it is looking for 10.0.4.7 instead of the correct 10.0.4.8

  3. This can be repeated with any service in the stack, the IPs that it looks for are always -1 the actual IP. It can also be tested using ping from any container (I have tested busybox and the adminer container that I was using)

  4. Networks can be seen with docker network ls

# docker network ls
NETWORK ID          NAME                 DRIVER              SCOPE
110358e5eaf6        bridge               bridge              local
05f25e23a5b0        docker_gwbridge      bridge              local
07fa0145d282        host                 host                local
ef1v7e8m0f8v        infraTest_infraNet   overlay             swarm
s4enetps5h3q        ingress              overlay             swarm
7585fb8b387e        none                 null                local
  1. See also from an ubuntu container I made for testing
# dig nextcloud-mariadb.infraTest_infraNet

; <<>> DiG 9.16.1-Ubuntu <<>> nextcloud-mariadb.infraTest_infraNet
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35788
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;nextcloud-mariadb.infraTest_infraNet. IN A

;; ANSWER SECTION:
nextcloud-mariadb.infraTest_infraNet. 600 IN A  10.0.4.7

;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Mon Dec 14 00:28:33 PST 2020
;; MSG SIZE  rcvd: 106

and

# nslookup nextcloud-mariadb.infraTest_infraNet
Server:         127.0.0.11
Address:        127.0.0.11#53

Non-authoritative answer:
Name:   nextcloud-mariadb.infraTest_infraNet
Address: 10.0.4.7

System Info:

# docker version
Client: Docker Engine - Community
 Version:           20.10.0
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        7287ab3
 Built:             Tue Dec  8 18:59:40 2020
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.0
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       eeddea2
  Built:            Tue Dec  8 18:57:45 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

and

# docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.4.2-docker)

Server:
 Containers: 8
  Running: 6
  Paused: 0
  Stopped: 2
 Images: 13
 Server Version: 20.10.0
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: q4sjkufcnz78gmr1hy4vto2hm
  Is Manager: true
  ClusterID: ki7cos41zqioz2hzgtl4lkgk9
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 192.168.1.26
  Manager Addresses:
   192.168.1.26:2377
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.73-1-pve
 Operating System: Ubuntu 20.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 16GiB
 Name: dockerHost
 ID: CIMP:V2AO:ZOX2:ZJEU:HH7K:FZEQ:ITJO:QTHP:NPSN:32D5:FCXY:Y2F4
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio weight support
WARNING: No blkio weight_device support

@laxmanpradhan
Copy link

laxmanpradhan commented Dec 14, 2020

I rolled back to the previous stable version: 19.03.14 and it has the same problem. The adminer service is on 10.0.2.8 but trying to ping the service from another container and it looks for 10.0.2.7.

# docker network inspect infraTest_infraNet 
[
    {
        "Name": "infraTest_infraNet",
        "Id": "88b5fxxwm1tyt4g1ff4j8uohc",
        "Created": "2020-12-14T19:39:00.448742183Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.2.0/24",
                    "Gateway": "10.0.2.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "1375951be63b533b68f124e47ed260562ff14ee093ad30e5f9c8a6f043a02521": {
                "Name": "infraTest_busybox.1.udfdx59gfian6vrsrgcg820y2",
                "EndpointID": "4140b6ef6f072df1784bdab87f71f3d26f2e42402bc4497bfd00f0d3e4d90bc1",
                "MacAddress": "02:42:0a:00:02:0b",
                "IPv4Address": "10.0.2.11/24",
                "IPv6Address": ""
            },
            "ef69da0c727ef51be1d32e0f0576df491c8d21e0501ae1674c7c8661b1836055": {
                "Name": "infraTest_adminer.1.mq1pnyuw0coy0ju5qn0i8a3b0",
                "EndpointID": "295a27a09df3b2716b096b648080feb00b03c790fabe7e59b9e7c9d0aeb4436b",
                "MacAddress": "02:42:0a:00:02:08",
                "IPv4Address": "10.0.2.8/24",
                "IPv6Address": ""
            },
            "lb-infraTest_infraNet": {
                "Name": "infraTest_infraNet-endpoint",
                "EndpointID": "46733680a666078200b5d2141a059c52bc7abdb9433e98963d13b42d2893beaa",
                "MacAddress": "02:42:0a:00:02:04",
                "IPv4Address": "10.0.2.4/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4098"
        },
        "Labels": {
            "com.docker.stack.namespace": "infraTest"
        },
        "Peers": [
            {
                "Name": "eeb9b467cc7f",
                "IP": "192.168.1.26"
            }
        ]
    }
]


# docker exec -u root -it infraTest_busybox.1.udfdx59gfian6vrsrgcg820y2 /bin/sh

/ # ping adminer.infraTest_infraNet
PING adminer.infraTest_infraNet (10.0.2.7): 56 data bytes
^C
--- adminer.infraTest_infraNet ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss

# docker version
Client: Docker Engine - Community
 Version:           19.03.14
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        5eb3275d40
 Built:             Tue Dec  1 19:20:26 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.14
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       5eb3275d40
  Built:            Tue Dec  1 19:18:53 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

@thaJeztah
Copy link
Member

@laxmanpradhan the IP address you're seeing is likely the VIP of the service itself; if you inspect the network with the -v / --verbose option, do you see a Services node in the JSON output?

@laxmanpradhan
Copy link

laxmanpradhan commented Dec 14, 2020

@thaJeztah, thanks for your reply. Yes, you are correct, using the verbose option I see that the IP I'm seeing is the VIP of the service (which is 1 less than the container IPs). So I guess that leads to the question of why is it not able to connect on that IP? Why can't I ping the service from another container on the same overlay network? Using the adminer example, I try to connect to the database and it gives an error of "Host is unreachable". Shouldn't any requests to the service VIP get re-directed to one of the container IPs in that service?

Note: I can connect using this InfraTest_nextcloud-mariadb.1.wztwijuhouvab5m1s4vt0g6xw and 10.0.4.3 but not InfraTest_nextcloud-mariadb which is what I want. Also 10.0.4.2 which is the service VIP also doesn't work.

# docker network inspect -v InfraTest_infraNet 
[
    {
        "Name": "InfraTest_infraNet",
        "Id": "a0p98pu4ixc18fv4awi315o9k",
        "Created": "2020-12-14T21:12:29.314191832Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.4.0/24",
                    "Gateway": "10.0.4.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "0f818cdd9a4106bf5078e595417fb31120898664a571e5b379df6d79add4db56": {
                "Name": "InfraTest_nextcloud-mariadb.1.wztwijuhouvab5m1s4vt0g6xw",
                "EndpointID": "a8c0a7fff99201e8bf60f9ac0a1c557ecc83442a515362917d53f5785f75de1a",
                "MacAddress": "02:42:0a:00:04:03",
                "IPv4Address": "10.0.4.3/24",
                "IPv6Address": ""
            },
            "64667a9e80de0cca7a8f31b4fbc2097306c96c986d93f20571a7d2829ce3d34b": {
                "Name": "InfraTest_busybox.1.3lrfk3j6owz9ahmz74oqhpmxt",
                "EndpointID": "916e5b1621cbd140c31128f4dc977894e05267a525fc03a79c97a4bd486f587f",
                "MacAddress": "02:42:0a:00:04:08",
                "IPv4Address": "10.0.4.8/24",
                "IPv6Address": ""
            },
            "6e84e46f454c58ab5c095188e9c7d6bc76ba0bf288084226dbaf83f7b8ead857": {
                "Name": "InfraTest_adminer.1.ri5e33zuh2dhy545woqbcxwbc",
                "EndpointID": "858bfb50ae800058e5b5ff6a69f688aaa60a8bf368c55ac82e1466e09d25dcce",
                "MacAddress": "02:42:0a:00:04:06",
                "IPv4Address": "10.0.4.6/24",
                "IPv6Address": ""
            },
            "lb-InfraTest_infraNet": {
                "Name": "InfraTest_infraNet-endpoint",
                "EndpointID": "53fdfefbd4d51110aec01e9edaf32e964121b6083b975d76e78f57f38812e19f",
                "MacAddress": "02:42:0a:00:04:04",
                "IPv4Address": "10.0.4.4/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4100"
        },
        "Labels": {
            "com.docker.stack.namespace": "InfraTest"
        },
        "Peers": [
            {
                "Name": "4dc98c7e5f08",
                "IP": "192.168.1.26"
            }
        ],
        "Services": {
            "InfraTest_adminer": {
                "VIP": "10.0.4.5",
                "Ports": [],
                "LocalLBIndex": 266,
                "Tasks": [
                    {
                        "Name": "InfraTest_adminer.1.ri5e33zuh2dhy545woqbcxwbc",
                        "EndpointID": "858bfb50ae800058e5b5ff6a69f688aaa60a8bf368c55ac82e1466e09d25dcce",
                        "EndpointIP": "10.0.4.6",
                        "Info": {
                            "Host IP": "192.168.1.26"
                        }
                    }
                ]
            },
            "InfraTest_busybox": {
                "VIP": "10.0.4.7",
                "Ports": [],
                "LocalLBIndex": 267,
                "Tasks": [
                    {
                        "Name": "InfraTest_busybox.1.3lrfk3j6owz9ahmz74oqhpmxt",
                        "EndpointID": "916e5b1621cbd140c31128f4dc977894e05267a525fc03a79c97a4bd486f587f",
                        "EndpointIP": "10.0.4.8",
                        "Info": {
                            "Host IP": "192.168.1.26"
                        }
                    }
                ]
            },
            "InfraTest_nextcloud-mariadb": {
                "VIP": "10.0.4.2",
                "Ports": [],
                "LocalLBIndex": 264,
                "Tasks": [
                    {
                        "Name": "InfraTest_nextcloud-mariadb.1.wztwijuhouvab5m1s4vt0g6xw",
                        "EndpointID": "a8c0a7fff99201e8bf60f9ac0a1c557ecc83442a515362917d53f5785f75de1a",
                        "EndpointIP": "10.0.4.3",
                        "Info": {
                            "Host IP": "192.168.1.26"
                        }
                    }
                ]
            }
        }
    }
]

@laxmanpradhan
Copy link

ok my post above can be ignored, I don't think it is related to docker. I was running docker in a LXC container on proxmox. I switched the a VM and the DNS service works as expected.

@debugtux
Copy link

debugtux commented Jul 21, 2021

EDIT: Please use the updated patch of the post below #41766 (comment)

I can confirm this behavior of the DNS system reporting the wrong container IPv4 address with one increment difference between the actual and reported IPv4 address when using docker in swarm-mode inside a LXC container (Arch linux image) created by LXD.

I do however found a small workaround fix by adding the "hostname: " entry to the docker-compose stack file. The hostname field is then made equal to the service name, resulting in the correct IPv4 address when querying by the service name.

What the fix looks like for the latest discussed situation:

version: "3.8"

services:
  nextcloud-mariadb:
    image: mariadb
    hostname: nextcloud-mariadb
    ...

What the fix looks like for the OP situation:

version: '3'

services:
  nginx:
    container_name: nginx
    hostname: nginx
    image: nginx
    ...
  doxturbo:
    container_name: doxturbo
    hostname: doxturbo
    image: "doxturbo:local"
    ...

Version info of the test system used:

LXD/LXC version: 4.13
Container kernel version: 5.12.1-arch1-1
Docker version: 20.10.7 (OS/Arch: linux/amd64)
Containerd version: v1.5.2
Stack yml compose version: 3.8

@jigneshkhatri
Copy link

@debugtux that hostname entry seems to be working, but do you have any documentation link for that? I cannot find its official document on how and what it is doing.

@debugtux
Copy link

debugtux commented Oct 3, 2022

[PATCH UPDATE]

I did some more debugging as I noticed a very rare instability on the containers (failed connections due to wrong ip addr). Turns out the patch creates a race-condition on the name to be resolved by giving the name both the correct and incorrect ip address at the same time. Therefore, every so often the wrong ip is provided by the resolver.

However, the patch still works and is stable with a different servicename to hostname. Only query by the hostname given and the correct ip address will be provided consistently; the servicename ip address will then again be off by one increment. My preferred way of implementing this is adding "service_" to the servicename and query the desired hostname as previously done.

An example of the updated patch with a nginx container to be resolved at 'nginx':

services:
  service_nginx:
    image: nginx
    hostname: nginx
    ...

Last thing to note is that I am unable to replicate this problematic ip assignment behavior on docker-compose. I only encounter this problematic behavior with docker swarm (docker stack deploy); as this issue is originally directed to docker-compose and I am not the OP.

@debugtux
Copy link

debugtux commented Oct 3, 2022

@jigneshkhatri

@debugtux that hostname entry seems to be working, but do you have any documentation link for that? I cannot find its official document on how and what it is doing.

Unfortunately I couldn't find any specific documentation on the network implications of these compose options other than the standard reference: https://docs.docker.com/compose/compose-file/#hostname

But it appears that docker uses a special kind of networking and resolving for its servicenames. The hostname field is probably some form of an overwrite to name the container the standard way. But this is just my speculation and train of thought that inspired my patch.

@fchapo
Copy link

fchapo commented Mar 15, 2023

I do have the same issue and the workaround with hostname works aswell. I tried via network aliases which is not working.

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.2
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.16.0
    Path:     /usr/libexec/docker/cli-plugins/docker-compose
  scan: Docker Scan (Docker Inc.)
    Version:  v0.23.0
    Path:     /usr/libexec/docker/cli-plugins/docker-scan

Server:
 Containers: 32
  Running: 16
  Paused: 0
  Stopped: 16
 Images: 22
 Server Version: 23.0.1
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: p05hpda90nxa0eqkoxp1wjcat
  Is Manager: true
  ClusterID: aosxlfeja2iclfy64afveaanw
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: -
  Manager Addresses:
   -:2377
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 2456e983eb9e37e47538f59ea18f2043c9a73640
 runc version: v1.1.4-0-g5fd4c4d
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.14.0-162.18.1.el9_1.x86_64
 Operating System: Red Hat Enterprise Linux 9.1 (Plow)
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 23.22GiB
 Name: -
 ID: 08744b23-8f46-41fd-85f6-199b8c4aa6ca
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

@fchapo
Copy link

fchapo commented Mar 15, 2023

@debugtux @jigneshkhatri

I found another workaround that might not have the race condition. It seems that is you use endpoint_mode dnsrr (https://docs.docker.com/network/overlay/#bypass-the-routing-mesh-for-a-swarm-service) it works as expected.

services:
  nginx:
    image: nginx
    deploy:
        endpoint_mode: dnsrr  

I also noted that this bug (so without hostname or endpoint_mode fix) makes replicas undiscoverable. If you have a service replicated 4 times and you try to list them with PHP as follows:
print_r(gethostbynamel('SERVICENAME'));
PHP Will return only one IP and the IP is going to be -1 to one of the real IP.

@bsousaa
Copy link

bsousaa commented Jun 1, 2023

There is not much we can do without a reproducer. I'm closing the issue - though feel free to open a new one (and link to this issue) if you have reproducible steps.

@bsousaa bsousaa closed this as not planned Won't fix, can't repro, duplicate, stale Jun 1, 2023
@Drallas
Copy link

Drallas commented Sep 20, 2023

@debugtux Thank for the fix, it also works for me.

@fchapo unfortunatly I get an Failure: services.service_linkace-dbhost.deploy Additional property endpoint_mode is not allowed when applying endpoint_mode: dnsrr to my compose file.

@fchapo
Copy link

fchapo commented Sep 21, 2023

@Drallas which version of Compose are you using? Make sure that you use version 3.8 or higher.

Example of my compose file which expicitly sets compose version:

version: '3.8'

services:
  proxy:
    image: somerepo/example
    ports:
      - target: 80
        published: 80
        mode: host
    deploy:
      mode: replicated
      replicas: 1
      endpoint_mode: dnsrr

@Drallas
Copy link

Drallas commented Sep 21, 2023

@fchapo This is my issue InvalidArgument desc = EndpointSpec: port published with ingress mode can't be used with dnsrr mode.

The solution is to force host mode and not using the routing mesh.

I guess I want Ingress Mode, in order to loadbalance my services.

But if needed, mode: host can be set explicitly for endpoint_mode: dnsrr to work.

deploy:
      replicas: 1
      endpoint_mode: dnsrr
    ports:
      - target: 8080
        published: 8764
        protocol: tcp
        mode: host
       #- "8764:8080"

@fchapo
Copy link

fchapo commented Sep 26, 2023

It depends how you loadbalance. I use a separate service for that so I don't want Ingress Mode. And depending what you're doing you maybe don't need tu publish any port, so endpoint_mode: dnsrr should then work again as a workaround to the issue of this ticket :)

version: '3.8'

services:
  proxy:
    image: somerepo/example
    deploy:
      mode: replicated
      replicas: 1
      endpoint_mode: dnsrr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants