You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dual ToR initialization requires muxcable to be active on one ToR and standby on peer ToR when the mux is healthy.
When orchagent goes bad as noticed in the lab dues to h/w vendor SAI bug, it fails to create/tear down tunnels and put overall mux state as unknown. Linkmgrd during initialization, reads the overall mux state and when it finds unknown (no tunnel created,) it probes xcvrd for the current mux state. Xcvrd reports either active/standby and so linkmgrd switches the mux state to match xcvrd. However due to H/W SAI bug, orchagent fails to switch the mux and the loop continues.
Aside problem was also noticed. When BRCM SAI is fixed and orchagent succeeds in switching the mux state, it noticed the communication continues one between xcvrd and orchagent for ~30 min. It is not clear if backlogged requests were being serviced or what caused such communication to take place.
Other Observations:
Restart of swss service did not recover the issue since pmon was not restarted
The backlogged requests should not happen as linkmgrd will not start new sequence before the current one completes.
Steps to reproduce the issue:
Load Gemini image after 1/26 on ToRs with complete set of muxcables
Reboot the ToR or config reload
Notice initialization loop in the syslog and in swss.rec
Fix swss ipinip issue, and notice communication between xcvrd and orachagent
Describe the results you received:
Init loop
Describe the results you expected:
There should be init loop
Additional information you deem important (e.g. issue happens only occasionally):
Sample logs:
/var/log/syslog.54.gz:Feb 5 23:50:13.770607 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:66 setMuxState: Ethernet52: setting mux to standby
/var/log/syslog.54.gz:Feb 5 23:50:13.770681 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:167 handleSetMuxState: Ethernet52: setting mux state to standby
/var/log/syslog.54.gz:Feb 5 23:50:13.776129 BN9-0101-0301-01LT0 INFO linkmgrd: MuxManager.cpp:134 addOrUpdateMuxPortMuxState: Ethernet76: state db mux state: unknown
/var/log/syslog.54.gz:Feb 5 23:50:13.776210 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:84 probeMuxState: Ethernet76
/var/log/syslog.54.gz:Feb 5 23:50:13.780099 BN9-0101-0301-01LT0 INFO linkmgrd: MuxManager.cpp:180 processProbeMuxState: Ethernet60: app db mux state: standby
/var/log/syslog.54.gz:Feb 5 23:50:13.780193 BN9-0101-0301-01LT0 INFO linkmgrd: link_manager/LinkManagerStateMachine.cpp:408 handleProbeMuxStateNotification: Ethernet60: Initializing MUX state 'Standby' to match xcvrd state
/var/log/syslog.54.gz:Feb 5 23:50:13.780254 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:66 setMuxState: Ethernet60: setting mux to standby
/var/log/syslog.54.gz:Feb 5 23:50:13.780335 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:167 handleSetMuxState: Ethernet60: setting mux state to standby
/var/log/syslog.54.gz:Feb 5 23:50:13.785580 BN9-0101-0301-01LT0 INFO linkmgrd: MuxManager.cpp:134 addOrUpdateMuxPortMuxState: Ethernet84: state db mux state: unknown
/var/log/syslog.54.gz:Feb 5 23:50:13.785639 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:84 probeMuxState: Ethernet84
/var/log/syslog.54.gz:Feb 5 23:50:13.789881 BN9-0101-0301-01LT0 INFO linkmgrd: MuxManager.cpp:180 processProbeMuxState: Ethernet68: app db mux state: standby
/var/log/syslog.54.gz:Feb 5 23:50:13.789966 BN9-0101-0301-01LT0 INFO linkmgrd: link_manager/LinkManagerStateMachine.cpp:408 handleProbeMuxStateNotification: Ethernet68: Initializing MUX state 'Standby' to match xcvrd state
/var/log/syslog.54.gz:Feb 5 23:50:13.790022 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:66 setMuxState: Ethernet68: setting mux to standby
/var/log/syslog.54.gz:Feb 5 23:50:13.790093 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:167 handleSetMuxState: Ethernet68: setting mux state to standby
/var/log/syslog.54.gz:Feb 5 23:50:13.796877 BN9-0101-0301-01LT0 INFO linkmgrd: MuxManager.cpp:134 addOrUpdateMuxPortMuxState: Ethernet0: state db mux state: unknown
/var/log/syslog.54.gz:Feb 5 23:50:13.797048 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:84 probeMuxState: Ethernet0
/var/log/syslog.54.gz:Feb 5 23:50:13.800972 BN9-0101-0301-01LT0 INFO linkmgrd: MuxManager.cpp:180 processProbeMuxState: Ethernet76: app db mux state: standby
/var/log/syslog.54.gz:Feb 5 23:50:13.801041 BN9-0101-0301-01LT0 INFO linkmgrd: link_manager/LinkManagerStateMachine.cpp:408 handleProbeMuxStateNotification: Ethernet76: Initializing MUX state 'Standby' to match xcvrd state
/var/log/syslog.54.gz:Feb 5 23:50:13.801089 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:66 setMuxState: Ethernet76: setting mux to standby
/var/log/syslog.54.gz:Feb 5 23:50:13.801163 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:167 handleSetMuxState: Ethernet76: setting mux state to standby
/var/log/syslog.54.gz:Feb 5 23:50:13.806510 BN9-0101-0301-01LT0 INFO linkmgrd: MuxManager.cpp:134 addOrUpdateMuxPortMuxState: Ethernet104: state db mux state: unknown
/var/log/syslog.54.gz:Feb 5 23:50:13.806594 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:84 probeMuxState: Ethernet104
/var/log/syslog.54.gz:Feb 5 23:50:13.810597 BN9-0101-0301-01LT0 INFO linkmgrd: MuxManager.cpp:180 processProbeMuxState: Ethernet84: app db mux state: standby
/var/log/syslog.54.gz:Feb 5 23:50:13.810685 BN9-0101-0301-01LT0 INFO linkmgrd: link_manager/LinkManagerStateMachine.cpp:408 handleProbeMuxStateNotification: Ethernet84: Initializing MUX state 'Standby' to match xcvrd state
/var/log/syslog.54.gz:Feb 5 23:50:13.810741 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:66 setMuxState: Ethernet84: setting mux to standby
/var/log/syslog.54.gz:Feb 5 23:50:13.810808 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:167 handleSetMuxState: Ethernet84: setting mux state to standby
/var/log/syslog.54.gz:Feb 5 23:50:13.816795 BN9-0101-0301-01LT0 INFO linkmgrd: MuxManager.cpp:134 addOrUpdateMuxPortMuxState: Ethernet4: state db mux state: unknown
/var/log/syslog.54.gz:Feb 5 23:50:13.816901 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:84 probeMuxState: Ethernet4
/var/log/syslog.54.gz:Feb 5 23:50:13.820122 BN9-0101-0301-01LT0 INFO linkmgrd: MuxManager.cpp:180 processProbeMuxState: Ethernet0: app db mux state: standby
/var/log/syslog.54.gz:Feb 5 23:50:13.820210 BN9-0101-0301-01LT0 INFO linkmgrd: link_manager/LinkManagerStateMachine.cpp:408 handleProbeMuxStateNotification: Ethernet0: Initializing MUX state 'Standby' to match xcvrd state
/var/log/syslog.54.gz:Feb 5 23:50:13.820266 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:66 setMuxState: Ethernet0: setting mux to standby
/var/log/syslog.54.gz:Feb 5 23:50:13.820331 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:167 handleSetMuxState: Ethernet0: setting mux state to standby
/var/log/syslog.54.gz:Feb 5 23:50:13.825988 BN9-0101-0301-01LT0 INFO linkmgrd: MuxManager.cpp:134 addOrUpdateMuxPortMuxState: Ethernet52: state db mux state: unknown
/var/log/syslog.54.gz:Feb 5 23:50:13.826083 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:84 probeMuxState: Ethernet52
/var/log/syslog.54.gz:Feb 5 23:50:13.830941 BN9-0101-0301-01LT0 INFO linkmgrd: MuxManager.cpp:180 processProbeMuxState: Ethernet104: app db mux state: standby
/var/log/syslog.54.gz:Feb 5 23:50:13.831026 BN9-0101-0301-01LT0 INFO linkmgrd: link_manager/LinkManagerStateMachine.cpp:408 handleProbeMuxStateNotification: Ethernet104: Initializing MUX state 'Standby' to match xcvrd state
/var/log/syslog.54.gz:Feb 5 23:50:13.831082 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:66 setMuxState: Ethernet104: setting mux to standby
/var/log/syslog.54.gz:Feb 5 23:50:13.831156 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:167 handleSetMuxState: Ethernet104: setting mux state to standby
/var/log/syslog.54.gz:Feb 5 23:50:13.836662 BN9-0101-0301-01LT0 INFO linkmgrd: MuxManager.cpp:134 addOrUpdateMuxPortMuxState: Ethernet60: state db mux state: unknown
/var/log/syslog.54.gz:Feb 5 23:50:13.836743 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:84 probeMuxState: Ethernet60
/var/log/syslog.54.gz:Feb 5 23:50:13.840634 BN9-0101-0301-01LT0 INFO linkmgrd: MuxManager.cpp:180 processProbeMuxState: Ethernet4: app db mux state: standby
/var/log/syslog.54.gz:Feb 5 23:50:13.840719 BN9-0101-0301-01LT0 INFO linkmgrd: link_manager/LinkManagerStateMachine.cpp:408 handleProbeMuxStateNotification: Ethernet4: Initializing MUX state 'Standby' to match xcvrd state
/var/log/syslog.54.gz:Feb 5 23:50:13.840775 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:66 setMuxState: Ethernet4: setting mux to standby
/var/log/syslog.54.gz:Feb 5 23:50:13.840840 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:167 handleSetMuxState: Ethernet4: setting mux state to standby
/var/log/syslog.54.gz:Feb 5 23:50:13.846341 BN9-0101-0301-01LT0 INFO linkmgrd: MuxManager.cpp:134 addOrUpdateMuxPortMuxState: Ethernet68: state db mux state: unknown
/var/log/syslog.54.gz:Feb 5 23:50:13.846532 BN9-0101-0301-01LT0 INFO linkmgrd: DbInterface.cpp:84 probeMuxState: Ethernet68
/var/log/syslog.54.gz:Feb 5 23:50:13.850446 BN9-0101-0301-01LT0 INFO linkmgrd: MuxManager.cpp:180 processProbeMuxState: Ethernet52: app db mux state: standby
/var/log/syslog.54.gz:Feb 5 23:50:13.850525 BN9-0101-0301-01LT0 INFO linkmgrd:
**Output of `show version`:**
```
(paste your output here)
```
**Attach debug file `sudo generate_dump`:**
```
(paste your output here)
```
The text was updated successfully, but these errors were encountered:
Description
Dual ToR initialization requires muxcable to be active on one ToR and standby on peer ToR when the mux is healthy.
When orchagent goes bad as noticed in the lab dues to h/w vendor SAI bug, it fails to create/tear down tunnels and put overall mux state as unknown. Linkmgrd during initialization, reads the overall mux state and when it finds unknown (no tunnel created,) it probes xcvrd for the current mux state. Xcvrd reports either active/standby and so linkmgrd switches the mux state to match xcvrd. However due to H/W SAI bug, orchagent fails to switch the mux and the loop continues.
Aside problem was also noticed. When BRCM SAI is fixed and orchagent succeeds in switching the mux state, it noticed the communication continues one between xcvrd and orchagent for ~30 min. It is not clear if backlogged requests were being serviced or what caused such communication to take place.
Other Observations:
Steps to reproduce the issue:
Describe the results you received:
Init loop
Describe the results you expected:
There should be init loop
Additional information you deem important (e.g. issue happens only occasionally):
Sample logs:
The text was updated successfully, but these errors were encountered: