You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue was found on multi-asic platform.
Teamd got started before swss restart which resulted in bad state since swss on its init
cleaned up the State DB populated by teamd and system went into bad state
and could not recover on it’s own
Root cause for above condition is Feature Table handling being done in hostcfgd.
Feature Table does not know about state of swss when starting teamd(/syncd)
and can start teamd (/syncd) while swss is about to get stopped (because of some crash/error handling) but not completely stopped and is active.
Issue Flow during init:
• Linux start swss/teamd/syncd/hostcfgd services
• Syncd crash
• Syncd/swss/teamd are getting stopped (service is still active)
• Hostcfgd processing Feature Table in parallel and start teamd which populate State DB
• Linux finally stop and restart swss and it cleans up State DB populated by above Step.
Swss2 gets notified of orchangent being killed
Nov 19 01:41:30.843894 STG02-0101-0102-01T1 INFO swss2#supervisor-proc-exit-listener: Process orchagent exited unxepectedly. Terminating supervisor...
Teamd2 getting started from hostcfgd (swss2 is still active )
Nov 19 01:41:51.649500 STG02-0101-0102-01T1 INFO systemd[1]: Reloading.
Nov 19 01:41:51.733279 STG02-0101-0102-01T1 INFO hostcfgd: Running cmd: 'sudo systemctl start [email protected]'
Nov 19 01:41:51.744685 STG02-0101-0102-01T1 INFO systemd[1]: Starting TEAMD container...
Swss2 service finally getting stop and restart
Nov 19 01:42:14.064349 STG02-0101-0102-01T1 INFO systemd[1]: [email protected]: Service hold-off time over, scheduling restart.
Nov 19 01:42:14.064899 STG02-0101-0102-01T1 INFO systemd[1]: Stopped switch state service.
Nov 19 01:42:14.065865 STG02-0101-0102-01T1 INFO systemd[1]: Starting switch state service...
Nov 19 01:42:14.071078 STG02-0101-0102-01T1 NOTICE root: Starting swss2 service...
Nov 19 01:42:14.075242 STG02-0101-0102-01T1 NOTICE root: Locking /tmp/swss-syncd-lock2 from swss2 service
Nov 19 01:42:14.080174 STG02-0101-0102-01T1 NOTICE root: Locked /tmp/swss-syncd-lock2 (10) from swss2 service
Nov 19 01:42:14.443526 STG02-0101-0102-01T1 NOTICE root: Warm boot flag: swss2 false.
Nov 19 01:42:14.447624 STG02-0101-0102-01T1 NOTICE root: Flushing APP, ASIC, COUNTER, CONFIG, and partial STATE databases ...
This issue is definitely more prominent in case of multi-asic as we have more services to take action on
but can come in single asic also.
The text was updated successfully, but these errors were encountered:
Issue was found on multi-asic platform.
Teamd got started before swss restart which resulted in bad state since swss on its init
cleaned up the State DB populated by teamd and system went into bad state
and could not recover on it’s own
Root cause for above condition is Feature Table handling being done in hostcfgd.
Feature Table does not know about state of swss when starting teamd(/syncd)
and can start teamd (/syncd) while swss is about to get stopped (because of some crash/error handling) but not completely stopped and is active.
Issue Flow during init:
• Linux start swss/teamd/syncd/hostcfgd services
• Syncd crash
• Syncd/swss/teamd are getting stopped (service is still active)
• Hostcfgd processing Feature Table in parallel and start teamd which populate State DB
• Linux finally stop and restart swss and it cleans up State DB populated by above Step.
Swss2 gets notified of orchangent being killed
Nov 19 01:41:30.843894 STG02-0101-0102-01T1 INFO swss2#supervisor-proc-exit-listener: Process orchagent exited unxepectedly. Terminating supervisor...
Teamd2 getting started from hostcfgd (swss2 is still active )
Nov 19 01:41:51.649500 STG02-0101-0102-01T1 INFO systemd[1]: Reloading.
Nov 19 01:41:51.733279 STG02-0101-0102-01T1 INFO hostcfgd: Running cmd: 'sudo systemctl start [email protected]'
Nov 19 01:41:51.744685 STG02-0101-0102-01T1 INFO systemd[1]: Starting TEAMD container...
Swss2 service finally getting stop and restart
Nov 19 01:42:14.064349 STG02-0101-0102-01T1 INFO systemd[1]: [email protected]: Service hold-off time over, scheduling restart.
Nov 19 01:42:14.064899 STG02-0101-0102-01T1 INFO systemd[1]: Stopped switch state service.
Nov 19 01:42:14.065865 STG02-0101-0102-01T1 INFO systemd[1]: Starting switch state service...
Nov 19 01:42:14.071078 STG02-0101-0102-01T1 NOTICE root: Starting swss2 service...
Nov 19 01:42:14.075242 STG02-0101-0102-01T1 NOTICE root: Locking /tmp/swss-syncd-lock2 from swss2 service
Nov 19 01:42:14.080174 STG02-0101-0102-01T1 NOTICE root: Locked /tmp/swss-syncd-lock2 (10) from swss2 service
Nov 19 01:42:14.443526 STG02-0101-0102-01T1 NOTICE root: Warm boot flag: swss2 false.
Nov 19 01:42:14.447624 STG02-0101-0102-01T1 NOTICE root: Flushing APP, ASIC, COUNTER, CONFIG, and partial STATE databases ...
This issue is definitely more prominent in case of multi-asic as we have more services to take action on
but can come in single asic also.
The text was updated successfully, but these errors were encountered: