-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[scripts/fast-reboot] Shutdown remaining containers through systemd #2133
Conversation
The current implementation has two issues: 1. In case containers from "docker ps" output are ordered in a way that database is first in the list, the "systemctl stop database" followed by "docker kill database" will stop all other containers through systemd and ruin this optimization 2. After "docker kill database" there are lots of errors from daemons like hostcfgd, system-healthd, caclmgrd, etc. Also it causes those daemons to hang when received SIGTERM making a delay on following "systemctl stop database". In the new implementation, services are implicitelly stopped by systemd in the order that is correct. If a certain container needs an optimization that will kill the container instead of stopping it the container may implement this optimization in its /usr/local/bin/*.sh script. It is also more optimal since independent services might be stopped in parallel. NOTE: This fix is relevant for regular SONiC image and not for Kubernetes enabled SONiC image. Kubernetes integration in SONiC might have this issue still, which might be a design flawn. Signed-off-by: Stepan Blyschak <[email protected]>
scripts/fast-reboot
Outdated
if test -f /usr/local/bin/ctrmgr_tools.py | ||
then | ||
debug "Stopping all remaining containers ..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this debug flag misplaced, considering the kill/shutdown happens implicitly now through systemd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this branch is for kubernetes and can be removed. @renukamanavalan
Signed-off-by: Stepan Blyschak <[email protected]>
@renukamanavalan could you please refer to the question above? if this can be removed lets go a head |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current solution of stopping docker service would work.
…2133) The current implementation has two issues: 1. In case containers from "docker ps" output are ordered in a way that database is first in the list, the "systemctl stop database" followed by "docker kill database" will stop all other containers through systemd and ruin this optimization 2. After "docker kill database" there are lots of errors from daemons like hostcfgd, system-healthd, caclmgrd, etc. Also it causes those daemons to hang when received SIGTERM making a delay on following "systemctl stop database". In the new implementation, services are implicitly stopped by systemd in the order that is correct. If a certain container needs an optimization that will kill the container instead of stopping it the container may implement this optimization in its /usr/local/bin/*.sh script. It is also more optimal since independent services might be stopped in parallel. - What I did Stop services using systemd - How I did it Stop services using systemd - How to verify it Run warm-reboot. Signed-off-by: Stepan Blyschak <[email protected]>
…2133) The current implementation has two issues: 1. In case containers from "docker ps" output are ordered in a way that database is first in the list, the "systemctl stop database" followed by "docker kill database" will stop all other containers through systemd and ruin this optimization 2. After "docker kill database" there are lots of errors from daemons like hostcfgd, system-healthd, caclmgrd, etc. Also it causes those daemons to hang when received SIGTERM making a delay on following "systemctl stop database". In the new implementation, services are implicitly stopped by systemd in the order that is correct. If a certain container needs an optimization that will kill the container instead of stopping it the container may implement this optimization in its /usr/local/bin/*.sh script. It is also more optimal since independent services might be stopped in parallel. - What I did Stop services using systemd - How I did it Stop services using systemd - How to verify it Run warm-reboot. Signed-off-by: Stepan Blyschak <[email protected]>
How I did it Advance swss submodule head to include: c3fb52b 2022-02-04 | Fix for missing lossless PG profile on certain ports (sonic-swss-common update for Vnet tables sonic-net#2133) (HEAD -> 202012, github/202012) [Ying Xie] Signed-off-by: Ying Xie [email protected]
``` 08495be [scripts/fast-reboot] Shutdown remaining containers through systemd (sonic-net#2133) fa07373 [scripts/fast-reboot] stop timers in advance (sonic-net#2131) 00ef80e [show][muxcable] Decrease the timeout for show mux status/hwmode (sonic-net#2130) ``` Signed-off-by: Stepan Blyschak <[email protected]>
…ystemd (sonic-net#2133)" This reverts commit 23e9398.
…ystemd (#2133)" (#2161) This reverts commit 23e9398. - What I did Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)" This reverted PR is part of a story that refactors warm/fast shutdown sequence to gracefully stop services instead of killing them without any ordering and dependency requirements which creates several issues and is error prone for the future. This PR must come together with sonic-net/sonic-buildimage#10510. However, #10510 is blocked due to an issue in swss-common sonic-net/sonic-swss-common#603 And a fix by MSFT is in review sonic-net/sonic-swss-common#606 I am reverting it because its dependency is still blocked and we cannot update submodule pointer. Once the dependency of the reverted PR is resolved, it shall be re-committed.
…ystemd (sonic-net#2133)" This reverts commit a5f55aa.
…ystemd (#2133)" (#2166) - What I did This reverts commit a5f55aa. Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)" This reverted PR is part of a story that refactors warm/fast shutdown sequence to gracefully stop services instead of killing them without any ordering and dependency requirements which creates several issues and is error prone for the future. This PR must come together with sonic-net/sonic-buildimage#10510. However, #10510 is blocked due to an issue in swss-common sonic-net/sonic-swss-common#603 And a fix by MSFT is in review sonic-net/sonic-swss-common#606 I am reverting it because its dependency is still blocked and we cannot update submodule pointer. Once the dependency of the reverted PR is resolved, it shall be re-committed. - How I did it git revert a5f55aa - How to verify it Run tests
…ystemd (sonic-net#2133)" This reverts commit a5f55aa.
Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (sonic-net/sonic-utilities#2133)" (sonic-net/sonic-utilities#2166)
…hrough systemd (sonic-net#2133)" (sonic-net#2161)" This reverts commit 288c2d8.
…hrough systemd (#2133)" (#2161)" (#2184) Reverts #2161 Revert a revert. This must be merged together with sonic-net/sonic-buildimage#10510
…2133) The current implementation has two issues: 1. In case containers from "docker ps" output are ordered in a way that database is first in the list, the "systemctl stop database" followed by "docker kill database" will stop all other containers through systemd and ruin this optimization 2. After "docker kill database" there are lots of errors from daemons like hostcfgd, system-healthd, caclmgrd, etc. Also it causes those daemons to hang when received SIGTERM making a delay on following "systemctl stop database". In the new implementation, services are implicitly stopped by systemd in the order that is correct. If a certain container needs an optimization that will kill the container instead of stopping it the container may implement this optimization in its /usr/local/bin/*.sh script. It is also more optimal since independent services might be stopped in parallel. - What I did Stop services using systemd - How I did it Stop services using systemd - How to verify it Run warm-reboot. Signed-off-by: Stepan Blyschak <[email protected]>
The current implementation has two issues:
In the new implementation, services are implicitly stopped by systemd in the order that is correct. If a certain container needs an optimization that will kill the container instead of stopping it the container may implement this optimization in its /usr/local/bin/*.sh script.
It is also more optimal since independent services might be stopped in parallel.
Signed-off-by: Stepan Blyschak [email protected]
What I did
Stop services using systemd
How I did it
Stop services using systemd
How to verify it
Run warm-reboot.
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)
Required to chery-pick: 202012, 202111.
DEPENDS ON: sonic-net/sonic-buildimage#10510 sonic-net/sonic-buildimage#10511