You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The common warm-start states are initialized -> replayed -> reconciled
vxlanmgrd is one such process, but it continuously (for > 10mins) fails to reconcile Orchagent.
WARMBOOT_FINALIZER reports that orchagent is not reconciled, but goes ahead with Finalizing warmboot.
After some time, a new warm reboot is issued, but the RESTARTCHECK times-out for 5 (max-allowed) retries due to the fact that orchagent from last warmboot was not reconciled.
Steps to reproduce the issue:
Run test_cont_warm_reboot (the error was seen on KVM).
The issue is seen after 40 successful iterations.
Check syslog after the failure - warmboot failed due to OA RESTARTCHECK failed.
Feb 11 13:32:53.221127 vlab-01 NOTICE swss#orchagent: :- checkWarmStart: orchagent doing warm start, restore count 40
Feb 11 13:32:56.663604 vlab-01 INFO swss#supervisord 2021-02-11 13:32:56,647 INFO spawned: 'vxlanmgrd' with pid 147
Feb 11 13:32:56.786068 vlab-01 NOTICE swss#vxlanmgrd: :- main: --- Starting vxlanmgrd ---
Feb 11 13:32:56.786622 vlab-01 NOTICE swss#vxlanmgrd: :- checkWarmStart: vxlanmgrd doing warm start, restore count 40
Feb 11 13:32:56.793019 vlab-01 NOTICE swss#vxlanmgrd: :- setWarmStartState: vxlanmgrd warm start state changed to initialized
Feb 11 13:32:57.649618 vlab-01 INFO swss#supervisord 2021-02-11 13:32:57,648 INFO success: vxlanmgrd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Feb 11 13:33:02.033737 vlab-01 NOTICE swss#vxlanmgrd: :- setWarmStartState: vxlanmgrd warm start state changed to replayed
Feb 11 13:33:02.034131 vlab-01 NOTICE swss#vxlanmgrd: :- main: Waiting Until Orchagent is reconciled. Current 40. Waited 0 secs
Feb 11 13:33:03.037357 vlab-01 NOTICE swss#vxlanmgrd: :- main: Waiting Until Orchagent is reconciled. Current 40. Waited 1 secs
Feb 11 13:33:04.042161 vlab-01 NOTICE swss#vxlanmgrd: :- main: Waiting Until Orchagent is reconciled. Current 40. Waited 2 secs
..
Feb 11 13:38:06.515441 vlab-01 NOTICE root: WARMBOOT_FINALIZER : Some components didn't finish reconcile: orchagent ...
Feb 11 13:38:06.523556 vlab-01 NOTICE root: WARMBOOT_FINALIZER : Finalizing warmboot...
Feb 11 13:38:07.124812 vlab-01 INFO systemd[1]: warmboot-finalizer.service: Succeeded.
..
Feb 11 13:41:54.584331 vlab-01 NOTICE admin: Saving counters folder before warmboot...
Feb 11 13:41:58.865214 vlab-01 NOTICE swss#orchagent_restart_check: :- main: Wait time for response from orchagent set to 2000 milliseconds
Feb 11 13:41:58.865214 vlab-01 NOTICE swss#orchagent_restart_check: :- main: Number of retries for the request to orchagent is set to 5
Feb 11 13:41:58.868188 vlab-01 INFO swss#orchagent_restart_check: :- subscribe: subscribed to RESTARTCHECKREPLY
Feb 11 13:42:06.910388 vlab-01 NOTICE swss#orchagent_restart_check: :- main: RESTARTCHECK for timed out
Feb 11 13:42:06.919690 vlab-01 NOTICE swss#orchagent_restart_check: :- main: requested orchagent to do warm restart state check, retry count: 4
Feb 11 13:42:07.266429 vlab-01 NOTICE swss#vxlanmgrd: :- main: Waiting Until Orchagent is reconciled. Current 40. Waited 543 secs
Feb 11 13:42:08.267817 vlab-01 NOTICE swss#vxlanmgrd: :- main: Waiting Until Orchagent is reconciled. Current 40. Waited 544 secs
Feb 11 13:42:08.921712 vlab-01 NOTICE swss#orchagent_restart_check: :- main: RESTARTCHECK for timed out
Feb 11 13:42:08.925296 vlab-01 NOTICE swss#orchagent_restart_check: :- main: requested orchagent to do warm restart state check, retry count: 5
Feb 11 13:42:09.269244 vlab-01 NOTICE swss#vxlanmgrd: :- main: Waiting Until Orchagent is reconciled. Current 40. Waited 545 secs
Feb 11 13:42:10.270319 vlab-01 NOTICE swss#vxlanmgrd: :- main: Waiting Until Orchagent is reconciled. Current 40. Waited 546 secs
Feb 11 13:42:10.924137 vlab-01 NOTICE swss#orchagent_restart_check: :- main: RESTARTCHECK for timed out
Feb 11 13:42:11.137017 vlab-01 NOTICE admin: warm-reboot failure (0) cleanup ...
..
..
Feb 11 13:46:44.864286 vlab-01 NOTICE swss#vxlanmgrd: :- main: Waiting Until Orchagent is reconciled. Current 40. Waited 819 secs
Describe the results you expected:
Services should reconcile after warmstart. And, Orchagent RESTARTCHECK should not fail when warmboot is issued.
Output of show version:
SONiC-OS-HEAD.0-11937d37
Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered:
Thanks @prsunny . But I think 1647 will not solve warm reboot issue captured here. The change will let vxlanmgr not wait for orchagent reconciliation. But, if the Orchagent is not reconciled, will the RESTARTCHECK still not fail when the warm-reboot is issues? I think it will still fail with
Feb 11 13:42:08.925296 vlab-01 NOTICE swss#orchagent_restart_check: :- main: requested orchagent to do warm restart state check, retry count: 5
Feb 11 13:42:09.269244 vlab-01 NOTICE swss#vxlanmgrd: :- main: Waiting Until Orchagent is reconciled. Current 40. Waited 545 secs
Feb 11 13:42:10.270319 vlab-01 NOTICE swss#vxlanmgrd: :- main: Waiting Until Orchagent is reconciled. Current 40. Waited 546 secs
Feb 11 13:42:10.924137 vlab-01 NOTICE swss#orchagent_restart_check: :- main: RESTARTCHECK for timed out
Description
After warm-reboot:
initialized
->replayed
->reconciled
vxlanmgrd
is one such process, but it continuously (for > 10mins) fails to reconcile Orchagent.WARMBOOT_FINALIZER
reports that orchagent is not reconciled, but goes ahead withFinalizing warmboot
.RESTARTCHECK
times-out for 5 (max-allowed) retries due to the fact that orchagent from last warmboot was not reconciled.Steps to reproduce the issue:
test_cont_warm_reboot
(the error was seen on KVM).Describe the results you received:
The error was caught by test_cont_warm_reboot` on KVM test. Artifacts are here https://dev.azure.com/mssonic/build/_build/results?buildId=3698&view=artifacts&pathAsName=false&type=publishedArtifacts
The failure was 42nd iteration.
Describe the results you expected:
Services should reconcile after warmstart. And, Orchagent RESTARTCHECK should not fail when warmboot is issued.
Output of
show version
:Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: