-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libteam][warm-reboot] fix issue in teamd warm-reboot that teamd starts #8227
[libteam][warm-reboot] fix issue in teamd warm-reboot that teamd starts #8227
Conversation
with state of tdport from previous warm-reboot. In lacp_event_watch_port_flush_data we incremented nr_of_tdports and add tdport to lacp->wr.state. In case lacp->wr.state already had this tdport we do not set new state for tdport but appened a new item in lacp->wr.state. In case we preformed warm-reboot and PortChannel member was down, after reboot PortChannel member became up next warm-reboot will initialize teamd with PortChannel member in down state. Example of PortChannel0002 dump with single member Ethernet24 file when this issue is reproduced: ``` admin@sonic:~$ sudo cat /host/warmboot/teamd/PortChannel0002 0 4 Ethernet24 0 Ethernet24 1 Ethernet24 1 Ethernet24 1 ``` Fix this issue by searching for existing tdport in lacp->wr.state and set enabled flag in tdport or append in case tdport is not found. Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
…eamd-warm-reboot-bug
/azpw run |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
is this upstreamed? |
@judyjoseph to check |
@lguohan warm-reboot feature for teamd is not upstreamed |
@lguohan can we proceed with the approval flow? |
Thanks for the fix, yes looks like it was missed out earlier - based on the comments Pavel had put in code.. We can take it. But in my tests I could not find a functional impact though I find the file eg: /host/warmboot/teamd/PortChannel0002 is having wrong data ( member interface getting appended on multiple warm-reboots ). The LAG interface state comes up correctly once the member interface comes back up. Can you share the exact test sequence where you see the issue. |
You can also run warm-reboot sad tests cases in the following order - first lag member sad test, then vlan member sad test. Vlan member sad test will fail with downtime of few sec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ts (sonic-net#8227) with state of tdport from previous warm-reboot. In case LAG was down before reboot, lacp->wr is not cleared. In lacp_event_watch_port_flush_data we incremented nr_of_tdports and add tdport to lacp->wr.state. In case lacp->wr.state already had this tdport we do not set new state for tdport but appened a new item in lacp->wr.state. In case we preformed warm-reboot and PortChannel member was down, after reboot PortChannel member became up next warm-reboot will initialize teamd with PortChannel member in down state. Fix this issue by calling stop_wr_mode() when LAG was down. This was probably intended but missed. #### Why I did it To fix an issue seen in warm-reboot-sad test cases. #### How I did it I fixed it in SONiC libteam patch that adds warm-reboot support. Details in commit description. #### How to verify it Run warm-reboot-sad test on t0-56 topology.
…ts (#8227) with state of tdport from previous warm-reboot. In case LAG was down before reboot, lacp->wr is not cleared. In lacp_event_watch_port_flush_data we incremented nr_of_tdports and add tdport to lacp->wr.state. In case lacp->wr.state already had this tdport we do not set new state for tdport but appened a new item in lacp->wr.state. In case we preformed warm-reboot and PortChannel member was down, after reboot PortChannel member became up next warm-reboot will initialize teamd with PortChannel member in down state. Fix this issue by calling stop_wr_mode() when LAG was down. This was probably intended but missed. #### Why I did it To fix an issue seen in warm-reboot-sad test cases. #### How I did it I fixed it in SONiC libteam patch that adds warm-reboot support. Details in commit description. #### How to verify it Run warm-reboot-sad test on t0-56 topology.
…ts (#8227) with state of tdport from previous warm-reboot. In case LAG was down before reboot, lacp->wr is not cleared. In lacp_event_watch_port_flush_data we incremented nr_of_tdports and add tdport to lacp->wr.state. In case lacp->wr.state already had this tdport we do not set new state for tdport but appened a new item in lacp->wr.state. In case we preformed warm-reboot and PortChannel member was down, after reboot PortChannel member became up next warm-reboot will initialize teamd with PortChannel member in down state. Fix this issue by calling stop_wr_mode() when LAG was down. This was probably intended but missed. #### Why I did it To fix an issue seen in warm-reboot-sad test cases. #### How I did it I fixed it in SONiC libteam patch that adds warm-reboot support. Details in commit description. #### How to verify it Run warm-reboot-sad test on t0-56 topology.
with state of tdport from previous warm-reboot.
In case LAG was down before reboot, lacp->wr is not cleared.
In lacp_event_watch_port_flush_data we incremented nr_of_tdports and add
tdport to lacp->wr.state. In case lacp->wr.state already had this tdport
we do not set new state for tdport but appened a new item in
lacp->wr.state. In case we preformed warm-reboot and PortChannel member
was down, after reboot PortChannel member became up next warm-reboot
will initialize teamd with PortChannel member in down state.
Example of PortChannel0002 dump with single member Ethernet24 file when
this issue is reproduced:
Fix this issue by calling stop_wr_mode() when LAG was down. This was probably intended but missed.
Signed-off-by: Stepan Blyschak [email protected]
Why I did it
To fix an issue seen in warm-reboot-sad test cases.
How I did it
I fixed it in SONiC libteam patch that adds warm-reboot support. Details in commit description.
How to verify it
Run warm-reboot-sad test on t0-56 topology.
Which release branch to backport (provide reason below if selected)
Description for the changelog
A picture of a cute animal (not mandatory but encouraged)