You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Observed Issue:
In Multi-asic environment have seen it's possible (especially during bootup) is OA gets error out (fox example: no response from syncd in case of ASIC missing) swss service do not transition from wait to stop state even though docker itself has got exited. These cause swss service remain to be active and syncd and teamd docker to be running.
Expected behaviour:
Ideally docker-wait-any script running in wait() call of swss script should be able to detect docker going down and systemctl should move the service from wait phase to stop phase. Root cause looks like it's possible (timing issue) if docker gets exited too early then docker-wait-any can sleep continuously here https://github.com/Azure/sonic-buildimage/pull/5628/files#diff-1172d3e4fc4e138bee494d1a7ae1eb1cd432f335bd646cca2e70e878a19465a7R49 since docker is not in running state.
Checking Docker running state and adding sleep was done as part PR: #5628
Logs:
1. Swss5 docker is not running (syncd5 and teamd5 is running) but systemctl service is still up for swss@5:
abdosi@xxxx:~$ docker ps | grep swss
3ee9c4017ee0 docker-orchagent:latest "/usr/bin/docker-ini…" 36 hours ago Up 36 hours swss1
6493d6d7f5c8 docker-orchagent:latest "/usr/bin/docker-ini…" 36 hours ago Up 36 hours swss4
0ed7b7986c03 docker-orchagent:latest "/usr/bin/docker-ini…" 36 hours ago Up 36 hours swss0
fb2f95bde191 docker-orchagent:latest "/usr/bin/docker-ini…" 36 hours ago Up 36 hours swss2
1cfec5ffc96e docker-orchagent:latest "/usr/bin/docker-ini…" 36 hours ago Up 36 hours swss3
abdosi@xxxx:~$ sudo systemctl status swss@5
● [email protected] - switch state service
Loaded: loaded (/usr/lib/systemd/system/[email protected]; disabled; vendor preset: enabled)
Active: active (running) since Sat 2022-02-26 06:25:34 UTC; 1 day 9h ago
Process: 13836 ExecStop=/usr/local/bin/swss.sh stop 5 (code=exited, status=0/SUCCESS)
Process: 14482 ExecStartPre=/usr/local/bin/swss.sh start 5 (code=exited, status=0/SUCCESS)
Main PID: 14756 (swss.sh)
Tasks: 5 (limit: 4915)
CGroup: /system.slice/system-swss.slice/[email protected]
├─14756 /bin/bash /usr/local/bin/swss.sh wait 5
└─16457 python /usr/bin/docker-wait-any -s swss5 -d syncd5 teamd5
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable
3. Even though swss@5 service wait is running docker-wait-any it is not able to detect swss5 docker not running
abdosi@xxxx:~$ docker wait swss5
0
The text was updated successfully, but these errors were encountered:
Observed Issue:
In Multi-asic environment have seen it's possible (especially during bootup) is OA gets error out (fox example: no response from syncd in case of ASIC missing) swss service do not transition from wait to stop state even though docker itself has got exited. These cause swss service remain to be active and syncd and teamd docker to be running.
Expected behaviour:
Ideally
docker-wait-any
script running inwait()
call of swss script should be able to detect docker going down and systemctl should move the service fromwait
phase tostop
phase. Root cause looks like it's possible (timing issue) if docker gets exited too early thendocker-wait-any
can sleep continuously here https://github.com/Azure/sonic-buildimage/pull/5628/files#diff-1172d3e4fc4e138bee494d1a7ae1eb1cd432f335bd646cca2e70e878a19465a7R49 since docker is not in running state.Checking Docker running state and adding sleep was done as part PR: #5628
Logs:
The text was updated successfully, but these errors were encountered: