-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docker-wait-any] immediately start to wait #11595
[docker-wait-any] immediately start to wait #11595
Conversation
It could happen that a container has already crashed but docker-wait-any will wait forever till it starts. It should, however, immediately exit to make the serivce restart. Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
@stepanblyschak looks like this will fix: #10134 |
@stepanblyschak original change was done as part of this PR: #5628 can you make sure this change does not break above PR use-case ? |
@abdosi I checked all scenarios are working except multi-asic one |
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
/easycla |
It could happen that a container has already crashed but docker-wait-any will wait forever till it starts. It should, however, immediately exit to make the serivce restart. #### Why I did it It is observed in some circumstances that the auto-restart mechanism does not work. Specifically for ```swss.service```, ```orchagent``` had crashed before ```docker-wait-any``` started in ```swss.sh```. This led ```docker-wait-any``` wait forever for ```swss``` to be in ```"Running"``` state and it results in: ``` CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1abef1ecebff bcbca2b74df6 "/usr/local/bin/supe…" 22 hours ago Up 22 hours what-just-happened 3c924d405cd5 docker-lldp:latest "/usr/bin/docker-lld…" 22 hours ago Up 22 hours lldp eb2b12a98c13 docker-router-advertiser:latest "/usr/bin/docker-ini…" 22 hours ago Up 22 hours radv d6aac4a46974 docker-sonic-mgmt-framework:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours mgmt-framework d880fd07aab9 docker-platform-monitor:latest "/usr/bin/docker_ini…" 22 hours ago Up 22 hours pmon 75f9e22d4fdd docker-snmp:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours snmp 76d570a4bd1c docker-sonic-telemetry:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours telemetry ee49f50344b3 docker-syncd-mlnx:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours syncd 1f0b0bab3687 docker-teamd:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours teamd 917aeeaf9722 docker-orchagent:latest "/usr/bin/docker-ini…" 22 hours ago Exited (0) 22 hours ago swss 81a4d3e820e8 docker-fpm-frr:latest "/usr/bin/docker_ini…" 22 hours ago Up 22 hours bgp f6eee8be282c docker-database:latest "/usr/local/bin/dock…" 22 hours ago Up 22 hours database ``` The check for ```"Running"``` state is not needed because for cold boot case we do ```start_peer_and_dependent_services``` and for warm boot case the loop will retry to wait for container if this container is doing warm boot: https://github.com/stepanblyschak/sonic-buildimage/blob/d01a91a569c9d545b30e8f81994b02d0c2513971/files/image_config/misc/docker-wait-any#L56 #### How I did it Removed the check for ```"Running"```. #### How to verify it Kill swss before ```docker-wait-any``` is reached and verify auto restart will restart swss serivce.
This reverts commit 8431d3a.
This reverts commit a8b2a53.
It could happen that a container has already crashed but docker-wait-any
will wait forever till it starts. It should, however, immediately exit
to make the serivce restart.
Signed-off-by: Stepan Blyschak [email protected]
Why I did it
It is observed in some circumstances that the auto-restart mechanism does not work. Specifically for
swss.service
,orchagent
had crashed beforedocker-wait-any
started inswss.sh
. This leddocker-wait-any
wait forever forswss
to be in"Running"
state and it results in:The check for
"Running"
state is not needed because for cold boot case we dostart_peer_and_dependent_services
and for warm boot case the loop will retry to wait for container if this container is doing warm boot:https://github.com/stepanblyschak/sonic-buildimage/blob/d01a91a569c9d545b30e8f81994b02d0c2513971/files/image_config/misc/docker-wait-any#L56
How I did it
Removed the check for
"Running"
.How to verify it
Kill swss before
docker-wait-any
is reached and verify auto restart will restart swss serivce.Which release branch to backport (provide reason below if selected)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)