-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't put down LAG interface when it starts in WR mode #2257
Conversation
|
||
- /* initialize carrier control */ | ||
- err = team_carrier_set(ctx->th, false); | ||
+ err = team_carrier_set(ctx->th, ctx->warm_start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of warm start, should we keep the carrier state as is? I,E, don't set the carrier to kernel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we start teamd after system restart, the carrier is down. We need it to be up (if the carrier was up before reboot).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking more about teamd warm-restart. If the carrier was down before warm-restart of teamd docker, should we unconditionally set carrier to true after warm restart?
} | ||
- | ||
- lacp->carrier_up = false; | ||
+ lacp->carrier_up = ctx->warm_start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably could read the carrier state back and save it to "lacp->carrier_up"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we don't do system reboot your approach will work. When we do the system reboot the carrier state would be 'down' on reading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
If we want to support docker level warm restart, sounds like we may need save the carrier state before warm restart or system warm reboot, and restore them afterwards?
return lacp_set_carrier(lacp, true); | ||
+ } | ||
} | ||
+ if (lacp->ctx->warm_start && !lacp->ctx->warm_start_started) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we need this two lines here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to put the LAG interface up only once, when teamd started. After teamd calculate the interface is allowed to be up in legit way, we disable our shortcut here. So the shortcut mode should work only once, on the start of teamd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your explanation. I probably missed something here, Isn't "lacp_carrier_init" to set the carrier state after teamd started and this function (lacp_update_carrier) is only hit if we do port update etc later? "lacp_carrier_init" should handle the different reboot modes, but I am not sure if we need do something special in this function for warm reboot/restart case.
Address PR comments |
return lacp_set_carrier(lacp, true); | ||
+ } | ||
} | ||
+ if (lacp->ctx->warm_start && !lacp->ctx->warm_start_started) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change, even if teamd was started with warmboot mode, we should still set the carrier according to the port enabled state (e,g, number of enabled port less than the min_ports , we should put it down. etc.) during the run time to keep the state right? Do you think lacp_carrier_init should be able to restore to whatever state before warmboot and we don't need do anything special here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we start teamd in WR mode we don't have any information about current desired state of the LAG interface.
If the box wasn't restarted, we could read the current state of teamd, and consider that no interfaces wasn't put down during transient period. Then teamd will read saved LACP PDUs, and restore previous state of itself. If some state changed since stop of previous instance of teamd, the new instance of teamd will catch it up eventually (up to 90 seconds in the worst case)
If the box was restarted, we must create LAG interface and change it state to the saved state ASAP. This patch doesn't have this behavior. Current patch doesn't change state the LAG interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because, if the previous state was down, we don't put the LAG interface up, only because we're in WR mode.
But if the previous state was up, we put the interface up as soon as we get into lacp_update_carrier function, which is fast. And after the interface is up after we have enough links in the group we put teamd behavior to the normal mode.
What is bad in my patch. If the previous state of LAG was down, and after start we have LAG in down state, it'd be up as soon as we add one interface to a group, so no min_links check. I'm thinking currently how I can fix that.
Introduced the WR carrier logic. Updated PR overview |
Includes below commits ``` 0d5e68f5a [GCU] Ignore bgpraw table in GCU operation (#2628) 22757b1f3 Add interface link-training command into the CLI doc (#2257) f4f857e10 [GCU] Ignore bgpraw in GCU applier (#2623) b5ac60036 [muxcable][config] Add support to enable/disable ceasing to be an advertisement interface when `radv` service is stopped (#2622) 981f9531e [chassis][voq] Add "show fabric reachability" command. (#2528) fba87f43f Revert (#2599) d6d7ab37f [warm-reboot] Use kexec_file_load instead of kexec_load when available (#2608) db4683d40 fix show techsupport error (#2597) 3d8e9c62d [GCU] Prohibit removal of PFC_WD POLL_INTERVAL field (#2545) 163e766cc [techsupport] include APPL_STATE_DB dump (#2607) 8703773eb YANG Validation for ConfigDB Updates: RADIUS_SERVER (#2604) c2d746d4f Remove TODO comment which is no longer relevant (#2600) f09da9983 [show] Add bgpraw to show run all (#2537) 39ac5641b Extend fast-reboot STATE_DB entry timer (#2577) ```
- What I did
I changed teamd logic carrier logic for LAG interface in WR mode:
- How I did it
When teamd starts it reads a carrier state for LAG interface. If the state is down, we don't need any special logic here, so teamd will work with the carrier in normal mode. if the state is up, we enable wr_carrier mode. In this mode we will prevent the LAG interface going down. We exits from this mode when either teamd logic decides it's time to raise the interface up, or we have a timer expired.
Current timer value is 3 second. We introduce the timer to be sure we will exit from the mode eventually.
- How to verify it
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)