Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker keeps saying that container is 'unhealty' #1582

Closed
stempst0r opened this issue Dec 2, 2020 · 12 comments
Closed

docker keeps saying that container is 'unhealty' #1582

stempst0r opened this issue Dec 2, 2020 · 12 comments

Comments

@stempst0r
Copy link

I'm running the container on a raspberry pi 2 with hyperiotos.
Everything is running fine I guess, but docker says the container is unhealty.

$ docker ps
CONTAINER ID        IMAGE                          COMMAND                  CREATED             STATUS                      PORTS                                                                                                  NAMES
409142cbefe3        haugene/transmission-openvpn   "dumb-init /etc/open…"   10 minutes ago      Up 10 minutes (unhealthy)   0.0.0.0:9091->9091/tcp                                                                                 transmission-openvpn

I also checked if the VPN is running. Indeed with curl ifconfig.io from inside the container I get another IP than from outside the container, so VPN is working.

docker-compose

$ cat docker-compose.yml
version: '3.3'
services:
    transmission-openvpn:
        cap_add:
            - NET_ADMIN
        volumes:
            - '/mnt/nas/torrent:/data'
        environment:
            - OPENVPN_PROVIDER=VPNUNLIMITED
            - OPENVPN_CONFIG=fr,lu,ro
            - OPENVPN_USERNAME=***
            - OPENVPN_PASSWORD=***
            - LOCAL_NETWORK=192.168.188.0/24
        dns:
            - 192.168.188.5
            - 1.1.1.1
        logging:
            driver: json-file
            options:
                max-size: 10m
        ports:
            - '9091:9091'
        image: haugene/transmission-openvpn
        restart: on-failure
        container_name: transmission-openvpn

Logs

$ docker logs 409142cbefe3
Starting container with revision: 42d00652323b7b7fbee1689a54219599f4795a04
Creating TUN device /dev/net/tun
Using OpenVPN provider: VPNUNLIMITED
3 servers found in OPENVPN_CONFIG, ro chosen randomly
Starting OpenVPN using config ro.ovpn
Setting OpenVPN credentials...
adding route to local network 192.168.188.0/24 via 172.22.0.1 dev eth0
Wed Dec  2 09:42:11 2020 OpenVPN 2.4.9 armv7-alpine-linux-musleabihf [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD] built on Apr 20 2020
Wed Dec  2 09:42:11 2020 library versions: OpenSSL 1.1.1g  21 Apr 2020, LZO 2.10
Wed Dec  2 09:42:11 2020 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Wed Dec  2 09:42:15 2020 TCP/UDP: Preserving recently used remote address: [AF_INET]185.144.83.13:1194
Wed Dec  2 09:42:15 2020 UDP link local: (not bound)
Wed Dec  2 09:42:15 2020 UDP link remote: [AF_INET]185.144.83.13:1194
Wed Dec  2 09:42:16 2020 WARNING: 'link-mtu' is used inconsistently, local='link-mtu 1542', remote='link-mtu 1602'
Wed Dec  2 09:42:16 2020 WARNING: 'cipher' is used inconsistently, local='cipher BF-CBC', remote='cipher AES-256-CBC'
Wed Dec  2 09:42:16 2020 WARNING: 'auth' is used inconsistently, local='auth SHA1', remote='auth SHA512'
Wed Dec  2 09:42:16 2020 WARNING: 'keysize' is used inconsistently, local='keysize 128', remote='keysize 256'
Wed Dec  2 09:42:16 2020 [openvpn2.vpnunlimitedapp.com] Peer Connection Initiated with [AF_INET]185.144.83.13:1194
Wed Dec  2 09:42:22 2020 TUN/TAP device tun0 opened
Wed Dec  2 09:42:22 2020 /sbin/ip link set dev tun0 up mtu 1500
Wed Dec  2 09:42:22 2020 /sbin/ip addr add dev tun0 local 10.200.0.74 peer 10.200.0.73
Wed Dec  2 09:42:22 2020 /etc/openvpn/tunnelUp.sh tun0 1500 1553 10.200.0.74 10.200.0.73 init
Up script executed with tun0 1500 1553 10.200.0.74 10.200.0.73 init
Updating TRANSMISSION_BIND_ADDRESS_IPV4 to the ip of tun0 : 10.200.0.74
Updating Transmission settings.json with values from env variables
Using existing settings.json for Transmission /data/transmission-home/settings.json
Overriding bind-address-ipv4 because TRANSMISSION_BIND_ADDRESS_IPV4 is set to 10.200.0.74
Overriding download-dir because TRANSMISSION_DOWNLOAD_DIR is set to /data/completed
Overriding incomplete-dir because TRANSMISSION_INCOMPLETE_DIR is set to /data/incomplete
Overriding rpc-port because TRANSMISSION_RPC_PORT is set to 9091
Overriding watch-dir because TRANSMISSION_WATCH_DIR is set to /data/watch
sed'ing True to true

-------------------------------------
Transmission will run as
-------------------------------------
User name:   root
User uid:    0
User gid:    0
-------------------------------------

STARTING TRANSMISSION
Transmission startup script complete.
Wed Dec  2 09:42:23 2020 Initialization Sequence Completed

Host system:
$ uname -a Linux black-pearl 5.4.72-v7+ #1356 SMP Thu Oct 22 13:56:54 BST 2020 armv7l GNU/Linux

$ docker -v
Docker version 19.03.14, build 5eb3275d40

Hope I provided everything needed, thanks for the awsome container!

@haugene
Copy link
Owner

haugene commented Dec 3, 2020

Hmm. It will say that the container is unhealthy as long as the health check script is failing. That doesn't necessarily mean that it doesn't work, but it would be interesting to see what fails in the script. Thank you for a very well specified issue, it's nice to come to one of these compared to "it doesn't work" as some issues basically put it 😆

If you do docker inspect transmission-openvpn you should find a section with "State" - > "Health". Can you post that?

@stempst0r
Copy link
Author

First thanks for your response! Didn't have access to the machine for the last days, because of that the delay. Here is the output of docker inspect transmission-openvpn

"Health": {
                "Status": "unhealthy",
                "FailingStreak": 2420,
                "Log": [
                    {
                        "Start": "2020-12-06T22:24:53.614170248Z",
                        "End": "2020-12-06T22:24:59.305009928Z",
                        "ExitCode": 1,
                        "Output": "PING google.com (216.58.214.142): 56 data bytes\n64 bytes from 216.58.214.142: seq=0 ttl=113 time=97.965 ms\n\n--- google.com ping statistics ---\n2 packets transmitted, 1 packets received, 50% packet loss\nround-trip min/avg/max = 97.965/97.965/97.965 ms\nNetwork is down\n"
                    },
                    {
                        "Start": "2020-12-06T22:25:59.348762669Z",
                        "End": "2020-12-06T22:26:05.00527991Z",
                        "ExitCode": 1,
                        "Output": "PING google.com (172.217.17.238): 56 data bytes\n64 bytes from 172.217.17.238: seq=0 ttl=113 time=89.376 ms\n\n--- google.com ping statistics ---\n2 packets transmitted, 1 packets received, 50% packet loss\nround-trip min/avg/max = 89.376/89.376/89.376 ms\nNetwork is down\n"
                    },
                    {
                        "Start": "2020-12-06T22:27:05.045131429Z",
                        "End": "2020-12-06T22:27:10.693205139Z",
                        "ExitCode": 1,
                        "Output": "PING google.com (172.217.17.238): 56 data bytes\n64 bytes from 172.217.17.238: seq=0 ttl=113 time=89.192 ms\n\n--- google.com ping statistics ---\n2 packets transmitted, 1 packets received, 50% packet loss\nround-trip min/avg/max = 89.192/89.192/89.192 ms\nNetwork is down\n"
                    },
                    {
                        "Start": "2020-12-06T22:28:10.729377572Z",
                        "End": "2020-12-06T22:28:16.389096899Z",
                        "ExitCode": 1,
                        "Output": "PING google.com (172.217.17.238): 56 data bytes\n64 bytes from 172.217.17.238: seq=0 ttl=113 time=89.678 ms\n\n--- google.com ping statistics ---\n2 packets transmitted, 1 packets received, 50% packet loss\nround-trip min/avg/max = 89.678/89.678/89.678 ms\nNetwork is down\n"
                    },
                    {
                        "Start": "2020-12-06T22:29:16.432009562Z",
                        "End": "2020-12-06T22:29:22.053042577Z",
                        "ExitCode": 1,
                        "Output": "PING google.com (172.217.17.238): 56 data bytes\n64 bytes from 172.217.17.238: seq=0 ttl=113 time=89.960 ms\n\n--- google.com ping statistics ---\n2 packets transmitted, 1 packets received, 50% packet loss\nround-trip min/avg/max = 89.960/89.960/89.960 ms\nNetwork is down\n"
                    }
                ]

For me, it sounds simply like a problem with an unstable connection because there is packetloss? The Pi is wired, so in my private network the connection should be fine. I have a Pi-Hole-instance as local DNS, maybe this blocks some addresses? I set the DNS of the docker-container to "1.1.1.1" and "1.0.0.1" (both cloudflare-DNS-server) and restarted the container, maybe the problem is solved with that?

@stempst0r
Copy link
Author

Actually changing the DNS doesn't solve the issue.

@clement-z
Copy link
Contributor

Could you try:

  • To manually ping google.com (without -c 2 as in the healthcheck script)
  • Same with other target hosts
  • Same from host OS

I'm interested to see if maybe there is a pattern for the ping fails (for example, maybe only the first is ever blocked ?)


Note: your public IP (of 5 days ago) seems to be appearing in the first log message. You might want to redact it.

@stempst0r
Copy link
Author

Again sorry for the delayed answer.
Before testing today I did "docker-compose pull" and "docker-compose restart" to be sure to have the latest version.

Ping from host

$ sudo ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=118 time=6.11 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=118 time=5.88 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=118 time=6.10 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=118 time=5.93 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=118 time=5.77 ms
64 bytes from 8.8.8.8: icmp_seq=6 ttl=118 time=5.63 ms
64 bytes from 8.8.8.8: icmp_seq=7 ttl=118 time=6.05 ms
64 bytes from 8.8.8.8: icmp_seq=8 ttl=118 time=5.88 ms
64 bytes from 8.8.8.8: icmp_seq=9 ttl=118 time=6.30 ms
64 bytes from 8.8.8.8: icmp_seq=10 ttl=118 time=6.45 ms
64 bytes from 8.8.8.8: icmp_seq=11 ttl=118 time=5.65 ms
64 bytes from 8.8.8.8: icmp_seq=12 ttl=118 time=6.27 ms
64 bytes from 8.8.8.8: icmp_seq=13 ttl=118 time=5.83 ms
^C
--- 8.8.8.8 ping statistics ---
13 packets transmitted, 13 received, 0% packet loss, time 30ms
rtt min/avg/max/mdev = 5.625/5.987/6.445/0.250 ms

Ping from inside container

$ docker exec -it transmission-openvpn ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=117 time=31.947 ms
64 bytes from 8.8.8.8: seq=1 ttl=117 time=31.625 ms
64 bytes from 8.8.8.8: seq=2 ttl=117 time=31.710 ms
64 bytes from 8.8.8.8: seq=3 ttl=117 time=33.671 ms
64 bytes from 8.8.8.8: seq=4 ttl=117 time=31.833 ms
64 bytes from 8.8.8.8: seq=5 ttl=117 time=32.920 ms
64 bytes from 8.8.8.8: seq=6 ttl=117 time=31.922 ms
64 bytes from 8.8.8.8: seq=7 ttl=117 time=35.927 ms
64 bytes from 8.8.8.8: seq=8 ttl=117 time=32.099 ms
64 bytes from 8.8.8.8: seq=9 ttl=117 time=31.884 ms
64 bytes from 8.8.8.8: seq=10 ttl=117 time=33.152 ms
64 bytes from 8.8.8.8: seq=11 ttl=117 time=31.798 ms
64 bytes from 8.8.8.8: seq=12 ttl=117 time=45.929 ms
64 bytes from 8.8.8.8: seq=13 ttl=117 time=32.481 ms
^C
--- 8.8.8.8 ping statistics ---
14 packets transmitted, 14 packets received, 0% packet loss
round-trip min/avg/max = 31.625/33.492/45.929 ms

I also pinged for longer. No package loss from

  • host (raspberry pi wired directly to the router)
  • from inside the container
  • from other machines inside the same network

Could the problem somewhere here? is there maybe one network to much? (Running Pihole and Transmission container on this host)

$ docker network ls
NETWORK ID          NAME                           DRIVER              SCOPE
10030c99fea8        bridge                         bridge              local
f596526e18dd        host                           host                local
f589a7b9031d        none                           null                local
1eac474e1897        openvpn-as_default             bridge              local
bffb0c723851        pihole_default                 bridge              local
53775edeee44        transmission-openvpn_default   bridge              local

@clement-z
Copy link
Contributor

Hi @stempst0r, I was able to reproduce the issue manually on my container. I'm pretty sure it is related to the -w switch not doing what we think it does (and what the doc says?). busybox provides the timeout which we can use to replace it if needed. I will look into it and open a PR.

@stempst0r
Copy link
Author

Hi @stempst0r, I was able to reproduce the issue manually on my container. I'm pretty sure it is related to the -w switch not doing what we think it does (and what the doc says?). busybox provides the timeout which we can use to replace it if needed. I will look into it and open a PR.

Thanks for your time!
Docker is quite new to me and it's a hard time understanding the virtual networking for me. If I can help with any further information of my error, let me know.

@clement-z
Copy link
Contributor

clement-z commented Dec 19, 2020

Hmm I cannot seem to reproduce it anymore. And the -w of ping does indeed seem to work as it should... I don't really know what changed on my side

@stempst0r is the issue still present for you?

@clement-z
Copy link
Contributor

And if it is, can you try running (in order until it works):

  • docker exec container_name ping -c 2 -w 5 google.com
  • docker exec container_name ping -c 2 -w 10 google.com
  • docker exec container_name timeout 5 ping -c 2 google.com

For me all of these work fine now, but the first one gave me the same 50% loss error last time I tried, even though 5 seconds hadn't elapsed... On my side nothing should have changed network wise.

@stempst0r
Copy link
Author

And if it is, can you try running (in order until it works):

* `docker exec container_name ping -c 2 -w 5 google.com`

* `docker exec container_name ping -c 2 -w 10 google.com`

* `docker exec container_name timeout 5 ping -c 2 google.com`

For me all of these work fine now, but the first one gave me the same 50% loss error last time I tried, even though 5 seconds hadn't elapsed... On my side nothing should have changed network wise.

Here are my results:

$ docker exec transmission-openvpn ping -c 2 -w 5 google.com
PING google.com (172.217.17.142): 56 data bytes
64 bytes from 172.217.17.142: seq=0 ttl=118 time=45.912 ms

--- google.com ping statistics ---
2 packets transmitted, 1 packets received, 50% packet loss
round-trip min/avg/max = 45.912/45.912/45.912 ms
$ docker exec transmission-openvpn ping -c 2 -w 10 google.com
PING google.com (216.58.214.14): 56 data bytes
64 bytes from 216.58.214.14: seq=0 ttl=118 time=56.604 ms
64 bytes from 216.58.214.14: seq=1 ttl=118 time=40.971 ms

--- google.com ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 40.971/48.787/56.604 ms
$ docker exec transmission-openvpn timeout 5 ping -c 2 google.com
PING google.com (216.58.214.14): 56 data bytes
64 bytes from 216.58.214.14: seq=0 ttl=118 time=105.033 ms

Hope this info can help fix the bug. Tbh idk in which file I should change w 5 to w 10 so I could fix it myself.
Thanks for your efforts!

@stempst0r
Copy link
Author

Ok, with your tips I was able to fix it for me. Did the following:

I edited the healthcheck.sh script inside the container /etc/scripts/healtcheck.sh. Changed line 14 from
ping -c 2 -w 5 $HOST # Get at least 2 responses and timeout after 5 seconds
to
ping -c 2 -w 10 $HOST # Get at least 2 responses and timeout after 5 seconds

Afterwards I restared the container via docker-compose restart and now the container is listed as healthy.
Basically made a small crashcourse how docker container works and how to get things edited via 'vi' 🤣

Don't know if this value should be changed globally? Seems like this issue only exists with VPNUnlimited as provider?

Anyway thanks for all the effort. I'll close the issue with this comment :)

@clement-z
Copy link
Contributor

Don't know if this value should be changed globally? Seems like this issue only exists with VPNUnlimited as provider?

I am using PIA, and I still had (at some point) the same error as you had. The timeout is just there so that we don't wait indefinitely for the ping. Bumping its duration should have no incidence on the functioning of the container and the ability to say that the connection has dropped.

Note that if you ever use a new container (after a new docker run) you will have to re-do this edit. I already had this patch ready in one of my branches, but since I couldn't reproduce the issue anymore, I was waiting for your reply (to see if going to -w 10 fixed your issue) before asking for merge here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants