Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker startup broken in master for aboot due to systemd upgrade #7372

Closed
bluecmd opened this issue Apr 19, 2021 · 8 comments · Fixed by #7376
Closed

Docker startup broken in master for aboot due to systemd upgrade #7372

bluecmd opened this issue Apr 19, 2021 · 8 comments · Fixed by #7376

Comments

@bluecmd
Copy link
Contributor

bluecmd commented Apr 19, 2021

Description

Using current image I am trying to boot it on my Arista 7050 QX32. Image calls itself image-master.641-782b02d9.

NOTE: This worked fine using the latest image from April 17 (image-master.640-f8d38aee).

I receive the following on boot: [FAILED] Failed to start Docker Application Container Engine.

Extra logs:

admin@sonic:~$ sudo journalctl -u docker.service | cat
-- Journal begins at Mon 2021-04-19 20:43:40 UTC, ends at Mon 2021-04-19 20:50:01 UTC. --
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.695767629Z" level=info msg="parsed scheme: \"unix\"" module=grpc
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.695886690Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.695967652Z" level=info msg="parsed scheme: \"unix\"" module=grpc
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.695986151Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.697591666Z" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0  <nil>}]" module=grpc
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.697684434Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.697814892Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4206204b0, CONNECTING" module=grpc
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.698511345Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4206204b0, READY" module=grpc
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.699131813Z" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0  <nil>}]" module=grpc
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.699186427Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.699330895Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4206207a0, CONNECTING" module=grpc
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.699599924Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4206207a0, READY" module=grpc
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.744818050Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.745163700Z" level=warning msg="Your kernel does not support cgroup memory limit"
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.745199154Z" level=warning msg="Unable to find cpu cgroup in mounts"
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.745229266Z" level=warning msg="Unable to find blkio cgroup in mounts"
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.745259210Z" level=warning msg="Unable to find cpuset cgroup in mounts"
Apr 19 20:43:47 sonic dockerd[547]: time="2021-04-19T20:43:46.745341484Z" level=warning msg="mountpoint for pids not found"
Apr 19 20:43:47 sonic dockerd[547]: Error starting daemon: Devices cgroup isn't mounted
Apr 19 20:43:44 sonic systemd[1]: Starting Docker Application Container Engine...

Steps to reproduce the issue:

  1. Download master
  2. Boot
  3. It fails, at least on Arista 7050 QX32

Describe the results you received:

Docker fails to boot.

Describe the results you expected:

Image booted with default configuration.

Output of show version:

N/A, show does not work without Docker.

Output of show techsupport:

N/A, show does not work without Docker.

Additional information you deem important (e.g. issue happens only occasionally):

admin@sonic:~$ sudo generate_dump
Command: show services timedout after 5 minutes.
Command: show reboot-cause timedout after 5 minutes.
Command: show interface counters timedout after 5 minutes.
Command: show queue counters timedout after 5 minutes.
/usr/local/bin/generate_dump: line 216: Traceback: unbound variable
@bluecmd
Copy link
Contributor Author

bluecmd commented Apr 19, 2021

For reference, here is the journalctl -u docker from a working 635 build:

admin@sonic:~$ sudo journalctl -u docker | cat
-- Logs begin at Mon 2021-04-19 20:56:04 UTC, end at Mon 2021-04-19 20:58:56 UTC. --
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.353804341Z" level=info msg="parsed scheme: \"unix\"" module=grpc
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.353918436Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.354022624Z" level=info msg="parsed scheme: \"unix\"" module=grpc
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.354042142Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.354092471Z" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0  <nil>}]" module=grpc
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.354177042Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.355418125Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420608770, CONNECTING" module=grpc
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.358662309Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420608770, READY" module=grpc
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.358920442Z" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0  <nil>}]" module=grpc
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.358968195Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.359039486Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420608a70, CONNECTING" module=grpc
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.359274582Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420608a70, READY" module=grpc
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.406995394Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.407482831Z" level=warning msg="Your kernel does not support swap memory limit"
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.407604237Z" level=warning msg="Your kernel does not support cgroup rt period"
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.407632046Z" level=warning msg="Your kernel does not support cgroup rt runtime"
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.408583592Z" level=info msg="Loading containers: start."
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.717391503Z" level=info msg="Loading containers: done."
Apr 19 20:56:09 sonic systemd[1]: Starting Docker Application Container Engine...
Apr 19 20:56:11 sonic dockerd[551]: time="2021-04-19T20:56:11.993035569Z" level=warning msg="failed to retrieve runc version: unknown output format: runc version 1.0.0-rc93\ncommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec\nspec: 1.0.2-dev\ngo: go1.13.15\nlibseccomp: 2.3.3\n"
Apr 19 20:56:12 sonic dockerd[551]: time="2021-04-19T20:56:12.036854551Z" level=info msg="Docker daemon" commit=0dd43dd graphdriver(s)=overlay2 version=18.09.8
Apr 19 20:56:12 sonic dockerd[551]: time="2021-04-19T20:56:12.037028607Z" level=info msg="Daemon has completed initialization"
Apr 19 20:56:12 sonic dockerd[551]: time="2021-04-19T20:56:12.052211952Z" level=info msg="API listen on /var/run/docker.sock"
Apr 19 20:56:12 sonic systemd[1]: Started Docker Application Container Engine.

@bluecmd bluecmd changed the title Docker startup broken on build #641 (working on #635) Docker startup broken on build #641 (working on #640) Apr 19, 2021
@bluecmd
Copy link
Contributor Author

bluecmd commented Apr 19, 2021

I bisected this to working with Build 640. Looking at the diff between the original 635 and 641 I suspected the systemd upgrade suggesting me to try the 640 image which seems to work.

Output of the 640 working build:

admin@sonic:~$ sudo journalctl -u docker.service | cat
-- Logs begin at Mon 2021-04-19 21:17:34 UTC, end at Mon 2021-04-19 21:19:06 UTC. --
Apr 19 21:17:38 sonic systemd[1]: Starting Docker Application Container Engine...
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.109965445Z" level=info msg="parsed scheme: \"unix\"" module=grpc
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.111309116Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.111584435Z" level=info msg="parsed scheme: \"unix\"" module=grpc
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.111609083Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.113104632Z" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0  <nil>}]" module=grpc
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.113474130Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.113770828Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42074fac0, CONNECTING" module=grpc
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.115966023Z" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0  <nil>}]" module=grpc
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.116283972Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.116529709Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4200291f0, CONNECTING" module=grpc
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.117603989Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4200291f0, READY" module=grpc
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.120021438Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42074fac0, READY" module=grpc
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.165512117Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.166144237Z" level=warning msg="Your kernel does not support swap memory limit"
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.166256756Z" level=warning msg="Your kernel does not support cgroup rt period"
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.166283320Z" level=warning msg="Your kernel does not support cgroup rt runtime"
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.167294639Z" level=info msg="Loading containers: start."
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.488288917Z" level=info msg="Loading containers: done."
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.762405763Z" level=warning msg="failed to retrieve runc version: unknown output format: runc version 1.0.0-rc93\ncommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec\nspec: 1.0.2-dev\ngo: go1.13.15\nlibseccomp: 2.3.3\n"
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.807777057Z" level=info msg="Docker daemon" commit=0dd43dd graphdriver(s)=overlay2 version=18.09.8
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.807951885Z" level=info msg="Daemon has completed initialization"
Apr 19 21:17:40 sonic dockerd[541]: time="2021-04-19T21:17:40.823728898Z" level=info msg="API listen on /var/run/docker.sock"
Apr 19 21:17:40 sonic systemd[1]: Started Docker Application Container Engine.
admin@sonic:~$ show version

SONiC Software Version: SONiC.master.640-f8d38aee
Distribution: Debian 10.9
Kernel: 4.19.0-12-2-amd64
Build commit: f8d38aee
Build date: Sat Apr 17 07:59:19 UTC 2021
Built by: johnar@jenkins-worker-2

Platform: x86_64-arista_7050_qx32
HwSKU: Arista-7050-QX32
ASIC: broadcom
ASIC Count: 1
Serial Number: JPE15291848
Uptime: 21:19:21 up 2 min,  1 user,  load average: 3.72, 2.05, 0.79

Docker images:
REPOSITORY                    TAG                   IMAGE ID            SIZE
docker-syncd-brcm             latest                846fc514b57f        691MB
docker-syncd-brcm             master.640-f8d38aee   846fc514b57f        691MB
docker-sflow                  latest                f6a2b1f6e712        411MB
docker-sflow                  master.640-f8d38aee   f6a2b1f6e712        411MB
docker-snmp                   latest                415d63dabc0a        440MB
docker-snmp                   master.640-f8d38aee   415d63dabc0a        440MB
docker-teamd                  latest                bb2e71b1963d        410MB
docker-teamd                  master.640-f8d38aee   bb2e71b1963d        410MB
docker-nat                    latest                69e5d0c12b75        413MB
docker-nat                    master.640-f8d38aee   69e5d0c12b75        413MB
docker-router-advertiser      latest                11fecc37f99e        399MB
docker-router-advertiser      master.640-f8d38aee   11fecc37f99e        399MB
docker-platform-monitor       latest                5c0ea3d8a534        608MB
docker-platform-monitor       master.640-f8d38aee   5c0ea3d8a534        608MB
docker-lldp                   latest                864b67cef4c9        439MB
docker-lldp                   master.640-f8d38aee   864b67cef4c9        439MB
docker-dhcp-relay             latest                55de9d43a2d5        406MB
docker-dhcp-relay             master.640-f8d38aee   55de9d43a2d5        406MB
docker-database               latest                a796f97a0a59        399MB
docker-database               master.640-f8d38aee   a796f97a0a59        399MB
docker-orchagent              latest                958df19865b3        429MB
docker-orchagent              master.640-f8d38aee   958df19865b3        429MB
docker-macsec                 latest                8db2e616a365        413MB
docker-macsec                 master.640-f8d38aee   8db2e616a365        413MB
docker-sonic-telemetry        latest                0314ebd2908d        489MB
docker-sonic-telemetry        master.640-f8d38aee   0314ebd2908d        489MB
docker-sonic-mgmt-framework   latest                69be32a7e761        619MB
docker-sonic-mgmt-framework   master.640-f8d38aee   69be32a7e761        619MB
docker-fpm-frr                latest                40f0faaac5e9        427MB
docker-fpm-frr                master.640-f8d38aee   40f0faaac5e9        427MB

@bluecmd
Copy link
Contributor Author

bluecmd commented Apr 19, 2021

I tested building my own master with a revert of 4369361 and that seems to work.

@bluecmd bluecmd changed the title Docker startup broken on build #641 (working on #640) Docker startup broken due to systemd upgrade Apr 19, 2021
@bluecmd bluecmd changed the title Docker startup broken due to systemd upgrade Docker startup broken in master due to systemd upgrade Apr 19, 2021
@lguohan
Copy link
Collaborator

lguohan commented Apr 19, 2021

do you have this one? systemd.unified_cgroup_hierarchy=0 in the /proc/cmdline after the system boots? this one is required.

@lguohan
Copy link
Collaborator

lguohan commented Apr 19, 2021

we missed aboot changes in this commit 63885ca

@bluecmd bluecmd changed the title Docker startup broken in master due to systemd upgrade Docker startup broken in master for aboot due to systemd upgrade Apr 19, 2021
@lguohan
Copy link
Collaborator

lguohan commented Apr 20, 2021

@Staphylo , we need this in the aboot.j2

@Staphylo
Copy link
Collaborator

@lguohan, I'll make a PR right away

Staphylo added a commit to Staphylo/sonic-buildimage that referenced this issue Apr 20, 2021
Recent systemd upgrade from sonic-net#7228 requires an extra cmdline parameter
for dockerd to start properly.
Updating boot0 was missed as part of the systemd upgrade change.
This change fixes sonic-net#7372

Signed-off-by: Samuel Angebault <[email protected]>
@Staphylo
Copy link
Collaborator

Staphylo commented Apr 20, 2021

@bluecmd, you can temporarily work around this issue by running the following command on your switch.
echo systemd.unified_cgroup_hierarchy=0 > /host/kernel-params

The content of the kernel-params file is used as part of the kernel cmdline generation at boot time on non-secureboot images.
Let me know if you run into any issue with this.

yxieca pushed a commit that referenced this issue Apr 20, 2021
Why I did it
Recent systemd upgrade from #7228 requires an extra cmdline parameter for dockerd to start properly.
Updating boot0 was missed as part of the systemd upgrade change.

How I did it
Just added the missing cmdline parameter in files/Aboot/boot0.j2
This change fixes #7372

How to verify it
Boot the image and dockerd should start normally.
lguohan pushed a commit that referenced this issue May 2, 2021
Why I did it
Recent systemd upgrade from #7228 requires an extra cmdline parameter for dockerd to start properly.
Updating boot0 was missed as part of the systemd upgrade change.

How I did it
Just added the missing cmdline parameter in files/Aboot/boot0.j2
This change fixes #7372

How to verify it
Boot the image and dockerd should start normally.
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-buildimage that referenced this issue May 23, 2021
Why I did it
Recent systemd upgrade from sonic-net#7228 requires an extra cmdline parameter for dockerd to start properly.
Updating boot0 was missed as part of the systemd upgrade change.

How I did it
Just added the missing cmdline parameter in files/Aboot/boot0.j2
This change fixes sonic-net#7372

How to verify it
Boot the image and dockerd should start normally.
carl-nokia pushed a commit to carl-nokia/sonic-buildimage that referenced this issue Aug 7, 2021
Why I did it
Recent systemd upgrade from sonic-net#7228 requires an extra cmdline parameter for dockerd to start properly.
Updating boot0 was missed as part of the systemd upgrade change.

How I did it
Just added the missing cmdline parameter in files/Aboot/boot0.j2
This change fixes sonic-net#7372

How to verify it
Boot the image and dockerd should start normally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants