Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin panics when running in multiple containers #3189

Closed
patrickmglynn opened this issue Jan 19, 2021 · 10 comments
Closed

Plugin panics when running in multiple containers #3189

patrickmglynn opened this issue Jan 19, 2021 · 10 comments

Comments

@patrickmglynn
Copy link

Describe the bug
After upgrading to the latest Docker version, I now receive an "Unable to connect to plugin" message from Loki in the dockerd logs. As a result logs no longer push from Swarm services into my Loki instance.

To Reproduce
Steps to reproduce the behavior:

  1. Confirm Docker version 20:
# docker --version
Docker version 20.10.2, build 2291f61
  1. Confirm latest Loki Docker plugin is in use:
# docker plugin disable loki --force
# docker plugin upgrade loki grafana/loki-docker-driver:latest --grant-all-permissions
# docker plugin enable loki
  1. Configure /etc/docker/daemon.json
{
    "debug" : true,
    "log-driver": "loki",
    "log-opts": {
        "loki-url": "http://<loki-endpoint>:3100/loki/api/v1/push",
        "loki-batch-size": "400",
        "no-file": "false",
        "keep-file": "true",
        "max-size": "5m",
        "max-file": "1"
    }
}
...
# systemctl restart docker
  1. Query Dockerd logs related to Loki
# journalctl -u docker.service | grep loki

Expected behavior
Logs pushing to Loki instance from services running in Swarm.

Environment:

  • Infrastructure: Docker Swarm running on a cluster of AWS EC2 instances
  • Deployment tool: Ansible
  • Loki version
loki, version 2.0.0 (branch: HEAD, revision: 6978ee5d)
  build date:       2020-10-26T15:54:56Z
  go version:       go1.14.2
  platform:         linux/amd64

Screenshots, Promtail config, or terminal output
The offending log lines output from running journalctl -xe are included below:

Jan 19 04:15:59 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:15:59Z" level=error msg="\t/src/loki/vendor/github.com/prometheus/client_golang/prometheus/registry.go:401 +0xad" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:15:59 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:15:59Z" level=error msg="github.com/grafana/loki/pkg/promtail/client.newMetrics(0x2abc7c0, 0xc0000c8870, 0xc000746d80)" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:15:59 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:15:59Z" level=error msg="\t/src/loki/pkg/promtail/client/client.go:114 +0x757" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:15:59 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:15:59Z" level=error msg="github.com/grafana/loki/pkg/promtail/client.New(0x2abc7c0, 0xc0000c8870, 0xc00039ab40, 0x3b9aca00, 0x190, 0x0, 0x0, 0x0, 0x0, 0x0, ...)" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:15:59 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:15:59Z" level=error msg="\t/src/loki/pkg/promtail/client/client.go:166 +0x1d0" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:15:59 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:15:59Z" level=error msg="\t/src/loki/cmd/docker-driver/loki.go:36 +0x1fc" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:15:59 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:15:59Z" level=error msg="\t/src/loki/cmd/docker-driver/driver.go:101 +0x774" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:15:59 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:15:59Z" level=error msg="\t/src/loki/cmd/docker-driver/http.go:45 +0x23b" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:16:03 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:16:03Z" level=info msg="level=info ts=2021-01-19T04:16:03.149234848Z caller=driver.go:77 msg=\"starting logging driver for container\" id=0ab7fb52027dfb0c2741d58bfc1cf44590aec2ebc0ab08acfa6d59aaed029105 config=\"map[keep-file:true loki-batch-size:400 loki-url:http://<loki-endpoint>:3100/loki/api/v1/push max-file:1 max-size:5m no-file:false]\" file=/run/docker/logging/23d89f02da00fd9541367014d3839ef4709be7761bb41dc79c56c2c0e2009c09 logpath=/var/log/docker/0ab7fb52027dfb0c2741d58bfc1cf44590aec2ebc0ab08acfa6d59aaed029105/json.log" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:16:03 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:16:03Z" level=error msg="\t/src/loki/vendor/github.com/prometheus/client_golang/prometheus/registry.go:401 +0xad" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:16:03 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:16:03Z" level=error msg="github.com/grafana/loki/pkg/promtail/client.newMetrics(0x2abc7c0, 0xc0000c8870, 0xc0007471a0)" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:16:03 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:16:03Z" level=error msg="\t/src/loki/pkg/promtail/client/client.go:114 +0x757" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:16:03 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:16:03Z" level=error msg="github.com/grafana/loki/pkg/promtail/client.New(0x2abc7c0, 0xc0000c8870, 0xc00039afc0, 0x3b9aca00, 0x190, 0x0, 0x0, 0x0, 0x0, 0x0, ...)" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:16:03 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:16:03Z" level=error msg="\t/src/loki/pkg/promtail/client/client.go:166 +0x1d0" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:16:03 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:16:03Z" level=error msg="\t/src/loki/cmd/docker-driver/loki.go:36 +0x1fc" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:16:03 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:16:03Z" level=error msg="\t/src/loki/cmd/docker-driver/driver.go:101 +0x774" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:16:03 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:16:03Z" level=error msg="\t/src/loki/cmd/docker-driver/http.go:45 +0x23b" plugin=cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467
Jan 19 04:16:03 <swarm.host.fqdn> dockerd[31013]: time="2021-01-19T04:16:03.149705058Z" level=warning msg="Unable to connect to plugin: /run/docker/plugins/cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467/loki.sock/LogDriver.StartLogging: Post http://%2Frun%2Fdocker%2Fplugins%2Fcf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467%2Floki.sock/LogDriver.StartLogging: EOF, retrying in 1s"

Further detail about the logging driver plugin:

[
    {
        "Config": {
            "Args": {
                "Description": "",
                "Name": "",
                "Settable": null,
                "Value": null
            },
            "Description": "Loki Logging Driver",
            "DockerVersion": "17.09.0-ce",
            "Documentation": "https://github.com/grafana/loki",
            "Entrypoint": [
                "/bin/docker-driver"
            ],
            "Env": [
                {
                    "Description": "Set log level to output for plugin logs",
                    "Name": "LOG_LEVEL",
                    "Settable": [
                        "value"
                    ],
                    "Value": "info"
                }
            ],
            "Interface": {
                "Socket": "loki.sock",
                "Types": [
                    "docker.logdriver/1.0"
                ]
            },
            "IpcHost": false,
            "Linux": {
                "AllowAllDevices": false,
                "Capabilities": null,
                "Devices": null
            },
            "Mounts": null,
            "Network": {
                "Type": "host"
            },
            "PidHost": false,
            "PropagatedMount": "",
            "User": {},
            "WorkDir": "",
            "rootfs": {
                "diff_ids": [
                    "sha256:d7a265a16adfd67580bb5abe0f1e8486824890c63913631e46b6f05886228dc0"
                ],
                "type": "layers"
            }
        },
        "Enabled": true,
        "Id": "cf308cb0b9640b3e11df151775cfae122e0d057a0ef1bef16805314a81852467",
        "Name": "loki:latest",
        "PluginReference": "docker.io/grafana/loki-docker-driver:latest",
        "Settings": {
            "Args": [],
            "Devices": [],
            "Env": [
                "LOG_LEVEL=info"
            ],
            "Mounts": []
        }
    }
]

Additional troubleshooting steps
I have cleaned up unused networks/images etc. with docker system prune -f and have uninstalled and re-installed the Loki plugin.

@rayvbr
Copy link

rayvbr commented Jan 19, 2021

I have the same problem, although not using Swarm. I can easily reproduce on a clean Docker install and starting multiple containers using Loki logging plugin driver. It seems to be caused by a dual registration inside the Prometheus client:

dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="2021/01/19 19:12:20 http: panic serving @: duplicate metrics collector registration attempted" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="goroutine 55 [running]:" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="net/http.(*conn).serve.func1(0xc000332000)" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="\t/usr/local/go/src/net/http/server.go:1801 +0x147" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="panic(0x2450320, 0xc0003a0460)" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="\t/usr/local/go/src/runtime/panic.go:975 +0x47a" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="github.com/prometheus/client_golang/prometheus.(*Registry).MustRegister(0xc00007e8c0, 0xc00012e700, 0x8, 0x8)" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="\t/src/loki/vendor/github.com/prometheus/client_golang/prometheus/registry.go:401 +0xad" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="github.com/grafana/loki/pkg/promtail/client.newMetrics(0x2abc7c0, 0xc00007e8c0, 0xc0002b1380)" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="\t/src/loki/pkg/promtail/client/client.go:114 +0x757" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="github.com/grafana/loki/pkg/promtail/client.New(0x2abc7c0, 0xc00007e8c0, 0xc0000ca990, 0x3b9aca00, 0x64, 0x0, 0x0, 0x0, 0x0, 0x0, ...)" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="\t/src/loki/pkg/promtail/client/client.go:166 +0x1d0" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20.686160914Z" level=warning msg="Unable to connect to plugin: /run/docker/plugins/5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d/loki.sock/LogDriver.StartLogging: Post http://%2Frun%2Fdocker%2Fplugins%2F5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d%2Floki.sock/LogDriver.StartLogging: EOF, retrying in 1s"
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="main.New(0xc00032ea50, 0xc0000bcd00, 0x40, 0xc000290440, 0xf, 0xc000290480, 0x9, 0xc0000b6880, 0x4, 0x4, ...)" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="\t/src/loki/cmd/docker-driver/loki.go:36 +0x1fc" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="main.(*driver).StartLogging(0xc0006942d0, 0xc0002b0de0, 0x54, 0xc00032ea50, 0xc0000bcd00, 0x40, 0xc000290440, 0xf, 0xc000290480, 0x9, ...)" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="\t/src/loki/cmd/docker-driver/driver.go:101 +0x774" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="main.handlers.func1(0x2ac9280, 0xc0004742a0, 0xc00045ea00)" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="\t/src/loki/cmd/docker-driver/http.go:45 +0x23b" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="net/http.HandlerFunc.ServeHTTP(0xc0004562f0, 0x2ac9280, 0xc0004742a0, 0xc00045ea00)" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="\t/usr/local/go/src/net/http/server.go:2042 +0x44" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="net/http.(*ServeMux).ServeHTTP(0xc000042840, 0x2ac9280, 0xc0004742a0, 0xc00045ea00)" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="\t/usr/local/go/src/net/http/server.go:2417 +0x1ad" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="net/http.serverHandler.ServeHTTP(0xc0002ee000, 0x2ac9280, 0xc0004742a0, 0xc00045ea00)" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="\t/usr/local/go/src/net/http/server.go:2843 +0xa3" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="net/http.(*conn).serve(0xc000332000, 0x2ad1080, 0xc00034c000)" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="\t/usr/local/go/src/net/http/server.go:1925 +0x8ad" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="created by net/http.(*Server).Serve" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d
dockerd[1957]: time="2021-01-19T19:12:20Z" level=error msg="\t/usr/local/go/src/net/http/server.go:2969 +0x36c" plugin=5289ecdf1e9fa5e2efa88ef5d4cad637cf81b525e9304083b3fd6d5d8df7384d

@rayvbr
Copy link

rayvbr commented Jan 19, 2021

Issue seems to be caused by 628c1fa

@patrickmglynn
Copy link
Author

Confirmed that rolling back the plugin to prior commit 54d1d3b resolved the issue for me,

@ruanbekker
Copy link

Thanks @patrickmglynn , ran into the same issue and as you mentioned the prior commit worked for me as well

@IgorOhrimenko
Copy link

Have some problem and need to check status loki.
The check consists in starting the container from ansible:

- name: Check whether alive loki
  shell: >
      docker run --log-driver=loki --log-opt loki-url="http://loki.lan/loki/api/v1/push" --rm -i alpine true
      && docker run --log-driver=loki --log-opt loki-url="http://loki.lan/loki/api/v1/push" --rm -i alpine true

Why twice? I do not know, but the first start after:
docker plugin disable --force loki:latest && docker plugin enable loki:latest
return 0, only second and other starts return 125.

@hervenicol
Copy link
Contributor

I think it's more related to the loki plugin version than Docker version, as I have the same behaviour with a fresh install of Docker 19.03.14.

Reverting to commit 54d1d3b fixed:

# docker plugin disable loki --force
# docker plugin upgrade loki grafana/loki-docker-driver:master-54d1d3b --grant-all-permissions --skip-remote-check
# docker plugin enable loki

@rayvbr
Copy link

rayvbr commented Feb 1, 2021

@patrickmglynn why did you close this issue? I believe the issue hasn’t been patched, or am I mistaken?

@patrickmglynn patrickmglynn reopened this Feb 1, 2021
@rayvbr
Copy link

rayvbr commented Feb 1, 2021

@patrickmglynn maybe you can change the issue title to better reflect the source of the problem. Something along the lines of": "Plugin panics when running in multiple containers"

@patrickmglynn patrickmglynn changed the title Unable to connect to logging plugin in Swarm Plugin panics when running in multiple containers Feb 1, 2021
@cyriltovena
Copy link
Contributor

cyriltovena commented Feb 2, 2021

This is fixed in master. Isn't it ? I've sent a PR to fix it recently

@ruanbekker
Copy link

@cyriltovena I tested master today and worked perfectly.

cyriltovena pushed a commit to cyriltovena/loki that referenced this issue Jun 11, 2021
* Turn memcached circuit-breaker on by default

Will trip after 10 failures within 10 seconds

Signed-off-by: Bryan Boreham <[email protected]>

* Change circuit-breaker log fields to avoid clash

The names 'from' and 'to' are used elsewhere as dates, so avoid re-using them here as strings

Signed-off-by: Bryan Boreham <[email protected]>

* Update CHANGELOG

Signed-off-by: Bryan Boreham <[email protected]>

* Updated doc

Signed-off-by: Marco Pracucci <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>
cyriltovena pushed a commit to cyriltovena/loki that referenced this issue Jun 11, 2021
…3190)

* Turn memcached circuit-breaker on by default

Will trip after 10 failures within 10 seconds

Signed-off-by: Bryan Boreham <[email protected]>

* Change circuit-breaker log fields to avoid clash

The names 'from' and 'to' are used elsewhere as dates, so avoid re-using them here as strings

Signed-off-by: Bryan Boreham <[email protected]>

* Update CHANGELOG

Signed-off-by: Bryan Boreham <[email protected]>

* Updated doc

Signed-off-by: Marco Pracucci <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>

Co-authored-by: Bryan Boreham <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants