Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swss service in active state even though swss docker has exited #10134

Closed
abdosi opened this issue Mar 2, 2022 · 2 comments · Fixed by #11595
Closed

swss service in active state even though swss docker has exited #10134

abdosi opened this issue Mar 2, 2022 · 2 comments · Fixed by #11595
Assignees
Labels
Triaged this issue has been triaged

Comments

@abdosi
Copy link
Contributor

abdosi commented Mar 2, 2022

Observed Issue:
In Multi-asic environment have seen it's possible (especially during bootup) is OA gets error out (fox example: no response from syncd in case of ASIC missing) swss service do not transition from wait to stop state even though docker itself has got exited. These cause swss service remain to be active and syncd and teamd docker to be running.

Expected behaviour:
Ideally docker-wait-any script running in wait() call of swss script should be able to detect docker going down and systemctl should move the service from wait phase to stop phase. Root cause looks like it's possible (timing issue) if docker gets exited too early then docker-wait-any can sleep continuously here https://github.com/Azure/sonic-buildimage/pull/5628/files#diff-1172d3e4fc4e138bee494d1a7ae1eb1cd432f335bd646cca2e70e878a19465a7R49 since docker is not in running state.
Checking Docker running state and adding sleep was done as part PR: #5628

Logs:

1.	Swss5 docker is not running (syncd5 and teamd5 is running) but systemctl service is still up for swss@5:

abdosi@xxxx:~$ docker ps | grep swss
3ee9c4017ee0        docker-orchagent:latest           "/usr/bin/docker-ini…"   36 hours ago        Up 36 hours                             swss1
6493d6d7f5c8        docker-orchagent:latest           "/usr/bin/docker-ini…"   36 hours ago        Up 36 hours                             swss4
0ed7b7986c03        docker-orchagent:latest           "/usr/bin/docker-ini…"   36 hours ago        Up 36 hours                             swss0
fb2f95bde191        docker-orchagent:latest           "/usr/bin/docker-ini…"   36 hours ago        Up 36 hours                             swss2
1cfec5ffc96e        docker-orchagent:latest           "/usr/bin/docker-ini…"   36 hours ago        Up 36 hours                             swss3

abdosi@xxxx:~$ sudo systemctl status swss@5
● [email protected] - switch state service
   Loaded: loaded (/usr/lib/systemd/system/[email protected]; disabled; vendor preset: enabled)
   Active: active (running) since Sat 2022-02-26 06:25:34 UTC; 1 day 9h ago
  Process: 13836 ExecStop=/usr/local/bin/swss.sh stop 5 (code=exited, status=0/SUCCESS)
  Process: 14482 ExecStartPre=/usr/local/bin/swss.sh start 5 (code=exited, status=0/SUCCESS)
Main PID: 14756 (swss.sh)
    Tasks: 5 (limit: 4915)
   CGroup: /system.slice/system-swss.slice/[email protected]
           ├─14756 /bin/bash /usr/local/bin/swss.sh wait 5
           └─16457 python /usr/bin/docker-wait-any -s swss5 -d syncd5 teamd5

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable

3.	Even though swss@5 service wait is running docker-wait-any it is not able to detect swss5 docker not running

abdosi@xxxx:~$ docker wait swss5
0

@abdosi
Copy link
Contributor Author

abdosi commented Mar 2, 2022

@judyjoseph created the issue for tracking.

cc @rlhui

@abdosi
Copy link
Contributor Author

abdosi commented Mar 2, 2022

FYI: @anamehra

@prsunny prsunny added the Triaged this issue has been triaged label Apr 13, 2022
@abdosi abdosi linked a pull request Aug 3, 2022 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triaged this issue has been triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants