From db146ca0fe93d4e38483b068facea055bd22638c Mon Sep 17 00:00:00 2001 From: Saikrishna Arcot Date: Mon, 1 Aug 2022 12:13:24 -0700 Subject: [PATCH 1/5] Add initial HLD for LACP PDU timeout change Signed-off-by: Saikrishna Arcot --- ...ing LACP PDU timeout during warm-reboot.md | 96 +++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 doc/lag/Increasing LACP PDU timeout during warm-reboot.md diff --git a/doc/lag/Increasing LACP PDU timeout during warm-reboot.md b/doc/lag/Increasing LACP PDU timeout during warm-reboot.md new file mode 100644 index 0000000000..cd49f1bcf3 --- /dev/null +++ b/doc/lag/Increasing LACP PDU timeout during warm-reboot.md @@ -0,0 +1,96 @@ +# Introduction +## Overview + +During warm-reboot, the control plane can be down for a maximum of 90 seconds. +This is beacuse LACP PDUs are sent every 30 seconds, and the protocol allows for +up to 3 LACP PDUs to be missed before the LAG is considered down and data +traffic is disrupted. + +It would be beneficial if it's possible to temporarily increase the timeout for +LACP PDUs on a LAG on both sides. Specifically, prior to starting warm-reboot, +the timeout could be increased by some amount (beyond the limits of the +protocol), and after warm-reboot, the timeout would be restored to the normal +value. + +## Requirements + +- Switch running a supported SONiC on both sides of the LAG + +## Assumptions + +TODO + +## Limitations + + +Such a change is going against the LACP protocol, and as such, can only be +supported if both sides of the LAG are running SONiC, and they are running a +version of SONiC that understands this. If the peer side is not running a +supported version of SONiC, or it is not running SONiC, then the current +behavior is preseved. + +# Background + +LACP supports two rates for sending PDUs. There is a short rate, where a PDU is +sent every 1 second, and a long rate, where a PDU is sent every 30 seconds. Both +sides know what rate to expect from the other side. If 3 LACP PDUs are missed, +then the LAG is considered to be down, and data traffic is stopped. This results +in an effective timeout of 3 seconds when using the short rate and 90 seconds +when using the long rate. + +# Changing Max Retries for Warmboot + +As part of a SONiC device starting the warmboot process, LACP PDUs are sent to +all of the peers, to refresh the timers on the peers. This allows the warmboot +process the full 90 seconds for control plane to come back up and for PDUs to be +sent again after warmboot. However, if the peer device is notified that this +device is going through warmboot, then the number of retries can be increased, +and the timeout can be raised. + +# Protocol + +When warmboot is starting, along with refreshing the LACP PDUs, an additional +Ethernet packet will be sent to the peer specifying the number of retries to +perform. This Ethernet packet will have an ethertype of 0x6300, and will not +have an IPv4 or IPv6 layer on top of it. Instead, there will instead be multiple +TLV fields, similar to LACP. + +The TLV types will be defined as follows: + +| Value | Description | +|-------|---------------------| +| 0x01 | Actor Information | +| 0x02 | Partner Information | +| 0x03 | Retry Count | + +Both Actor Information and Partner Information have the following content: + +| Starting byte | Length | Description | +|---------------|--------|-----------------| +| 0 | 2 | System Priority | +| 2 | 6 | System ID | +| 8 | 2 | Key | +| 10 | 2 | Port Priority | +| 12 | 2 | Port | + +Retry count have the following content: + +| Starting byte | Length | Description | +|---------------|--------|-----------------| +| 0 | 2 | New retry count | + +When the retry count needs to be changed, the sending device must send a packet +with ethertype 0x6300, and the data will contain the Actor Information, Partner +Information, and Retry Count TLVs. The receiving device must validate the actor +and partner information, and then update the retry count as specified. No +acknowledgment packet is sent back. + +# CLI + +No new CLI options or config options will be added, as this is not meant to be +configurable. + +# References + +- [libteam](https://github.com/jpirko/libteam) +- [IEEE 802.3ad Standard for LACP](http://www.ieee802.org/3/ad/public/mar99/seaman_1_0399.pdf) From 06d371e94753ae48d810a41ba408e69094ba7d4e Mon Sep 17 00:00:00 2001 From: Saikrishna Arcot Date: Fri, 26 Aug 2022 14:20:01 -0700 Subject: [PATCH 2/5] Update HLD --- ...ing LACP PDU timeout during warm-reboot.md | 48 +++++++++++++------ 1 file changed, 33 insertions(+), 15 deletions(-) diff --git a/doc/lag/Increasing LACP PDU timeout during warm-reboot.md b/doc/lag/Increasing LACP PDU timeout during warm-reboot.md index cd49f1bcf3..a01fea4078 100644 --- a/doc/lag/Increasing LACP PDU timeout during warm-reboot.md +++ b/doc/lag/Increasing LACP PDU timeout during warm-reboot.md @@ -38,22 +38,12 @@ then the LAG is considered to be down, and data traffic is stopped. This results in an effective timeout of 3 seconds when using the short rate and 90 seconds when using the long rate. -# Changing Max Retries for Warmboot - -As part of a SONiC device starting the warmboot process, LACP PDUs are sent to -all of the peers, to refresh the timers on the peers. This allows the warmboot -process the full 90 seconds for control plane to come back up and for PDUs to be -sent again after warmboot. However, if the peer device is notified that this -device is going through warmboot, then the number of retries can be increased, -and the timeout can be raised. - # Protocol -When warmboot is starting, along with refreshing the LACP PDUs, an additional -Ethernet packet will be sent to the peer specifying the number of retries to -perform. This Ethernet packet will have an ethertype of 0x6300, and will not -have an IPv4 or IPv6 layer on top of it. Instead, there will instead be multiple -TLV fields, similar to LACP. +To change the number of retries, an Ethernet packet of the fillowing structure +will be sent. This Ethernet packet will have an ethertype of 0x6300, and will +not have an IPv4 or IPv6 layer on top of it. Instead, there will instead be +multiple TLV fields, similar to LACP. The TLV types will be defined as follows: @@ -85,10 +75,38 @@ Information, and Retry Count TLVs. The receiving device must validate the actor and partner information, and then update the retry count as specified. No acknowledgment packet is sent back. +The custom ethertype is so that if the packet is sent to non-SONiC devices +or SONiC devices running an older version, then it can be silently discarded, +without it getting incorrectly handled by the peer device and without it getting +forwarded to a different device. + +# Changing Max Retries for Warmboot + +As part of a SONiC device starting the warmboot process, LACP PDUs are sent to +all of the peers, to refresh the timers on the peers. This allows the warmboot +process the full 90 seconds for control plane to come back up and for PDUs to be +sent again after warmboot. + +Now, in addition to refreshing the PDUs timer, the above-specified Ethernet +packet (with ethertype 0x6300) will be sent to the peer devices, with the new +retry count set to 5. This notifies the peer device that for this device that +for this LAG, it should allow up to 5 retries before bringing down this LAG. If +the peer device is not running SONiC or is not running a version of SONiC, then +the expectation is that the packet will just be dropped at the peer device with +no handling. In either case, teamd will not wait for any acknowledgment packet. + +After warmboot is done, and teamd has started up after warmboot, it will send +the above-specified Ethernet packet again, but this time, the new retry count is +set to 3, thus restoring it to the standard value. + # CLI No new CLI options or config options will be added, as this is not meant to be -configurable. +user-configurable. + +# Pull requests + +None as of right now. # References From 3db118541214e52b48699dd2d5742438712496cb Mon Sep 17 00:00:00 2001 From: Saikrishna Arcot Date: Mon, 29 Aug 2022 23:41:28 -0700 Subject: [PATCH 3/5] Minor changes to HLD Signed-off-by: Saikrishna Arcot --- ...asing LACP PDU timeout during warm-reboot.md | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/doc/lag/Increasing LACP PDU timeout during warm-reboot.md b/doc/lag/Increasing LACP PDU timeout during warm-reboot.md index a01fea4078..c30e30b1a6 100644 --- a/doc/lag/Increasing LACP PDU timeout during warm-reboot.md +++ b/doc/lag/Increasing LACP PDU timeout during warm-reboot.md @@ -14,21 +14,24 @@ value. ## Requirements -- Switch running a supported SONiC on both sides of the LAG +- Switch running a supported SONiC with patches in libteam for this feature on + both sides of the LAG ## Assumptions -TODO - -## Limitations - - Such a change is going against the LACP protocol, and as such, can only be supported if both sides of the LAG are running SONiC, and they are running a version of SONiC that understands this. If the peer side is not running a supported version of SONiC, or it is not running SONiC, then the current behavior is preseved. +## Limitations + +There's a chance that the packet to change the retry count doesn't get to the +peer device (due to congestion, for example). Because the protocol as defined +below doesn't wait for an acknowledgment packet, there's no reliable way of +knowing whether it was handled by the peer device or not. + # Background LACP supports two rates for sending PDUs. There is a short rate, where a PDU is @@ -62,12 +65,14 @@ Both Actor Information and Partner Information have the following content: | 8 | 2 | Key | | 10 | 2 | Port Priority | | 12 | 2 | Port | +| 14 | 2 | Padding | Retry count have the following content: | Starting byte | Length | Description | |---------------|--------|-----------------| | 0 | 2 | New retry count | +| 2 | 2 | Padding | When the retry count needs to be changed, the sending device must send a packet with ethertype 0x6300, and the data will contain the Actor Information, Partner From e150543a0ab200dd98c97796a63ccd139f878765 Mon Sep 17 00:00:00 2001 From: Saikrishna Arcot Date: Thu, 23 Feb 2023 15:36:20 -0800 Subject: [PATCH 4/5] Update HLD with the new design Signed-off-by: Saikrishna Arcot --- ...ing LACP PDU timeout during warm-reboot.md | 198 ++++++++++++------ 1 file changed, 129 insertions(+), 69 deletions(-) diff --git a/doc/lag/Increasing LACP PDU timeout during warm-reboot.md b/doc/lag/Increasing LACP PDU timeout during warm-reboot.md index c30e30b1a6..33fd5a5942 100644 --- a/doc/lag/Increasing LACP PDU timeout during warm-reboot.md +++ b/doc/lag/Increasing LACP PDU timeout during warm-reboot.md @@ -1,5 +1,22 @@ -# Introduction -## Overview +# Increasing LACP PDU timeout during warm-reboot # + +## Table of Contents + +### Revision + +### Scope + +This high-level design document is to add a feature to teamd and define a +custom LACP PDU packet to allow changing the number of maximum retries done +before the LAG session is torn down. + +### Definitions + +* LACP: Link Aggregation Control Protocol +* PDU: Protocol Data Unit +* LAG: Link Aggregation Group + +### Overview During warm-reboot, the control plane can be down for a maximum of 90 seconds. This is beacuse LACP PDUs are sent every 30 seconds, and the protocol allows for @@ -12,27 +29,19 @@ the timeout could be increased by some amount (beyond the limits of the protocol), and after warm-reboot, the timeout would be restored to the normal value. -## Requirements +### Requirements - Switch running a supported SONiC with patches in libteam for this feature on both sides of the LAG -## Assumptions - -Such a change is going against the LACP protocol, and as such, can only be -supported if both sides of the LAG are running SONiC, and they are running a -version of SONiC that understands this. If the peer side is not running a -supported version of SONiC, or it is not running SONiC, then the current -behavior is preseved. +### Architecture Design -## Limitations +There's no change to the overall SONiC architecture. There are no new processes +or containers added or removed with this change. -There's a chance that the packet to change the retry count doesn't get to the -peer device (due to congestion, for example). Because the protocol as defined -below doesn't wait for an acknowledgment packet, there's no reliable way of -knowing whether it was handled by the peer device or not. +### High-Level Design -# Background +#### Background LACP supports two rates for sending PDUs. There is a short rate, where a PDU is sent every 1 second, and a long rate, where a PDU is sent every 30 seconds. Both @@ -41,77 +50,128 @@ then the LAG is considered to be down, and data traffic is stopped. This results in an effective timeout of 3 seconds when using the short rate and 90 seconds when using the long rate. -# Protocol +#### Protocol + +To change the number of retries, a new LACP version 0xf1 will be defined. This +version will indicate that there will be two new TLV types named Actor Retry +Count (0x80) and Partner Retry Count (0x81) will be defined. + +The packet structure for LACP version 0xf1 will look as follows: + +| Starting byte | Length | Description | Value | +|---------------|--------|----------------------------------|-------| +| 0 | 1 | LACP Version | 0xf1 | +| 1 | 1 | Actor Info TLV Type | 0x01 | +| 2 | 1 | Actor Info TLV Length | 20 | +| 3 | 18 | Actor Info TLV Data | | +| 21 | 1 | Partner Info TLV Type | 0x02 | +| 22 | 1 | Partner Info TLV Length | 20 | +| 23 | 18 | Partner Info TLV Data | | +| 41 | 1 | Collector Info TLV Type | 0x03 | +| 42 | 1 | Collector Info TLV Length | 16 | +| 43 | 14 | Collector Info TLV Data | | +| 57 | 1 | Actor Retry Count TLV Type | 0x80 | +| 58 | 1 | Actor Retry Count TLV Length | 4 | +| 59 | 2 | Actor Retry Count TLV Data | | +| 61 | 1 | Partner Retry Count TLV Type | 0x81 | +| 62 | 1 | Partner Retry Count TLV Length | 4 | +| 63 | 2 | Partner Retry Count TLV Data | | +| 65 | 1 | Terminator TLV Type | 0x00 | +| 66 | 1 | Terminator TLV Length | 0 | +| 67 | 42 | Padding | | + +Compared to the regular LACP PDU packet, the changes are as follows: +* The LACP Version field has been changed from 0x01 to 0xf1. +* Two TLVs (Actor Retry Count, and Partner Retry Count) have been added after + the Collector Info TLV. +* The padding has been reduced from 50 bytes to 42 bytes. + +The Actor Retry Count and Partner Retry Count TLVs have the following content: -To change the number of retries, an Ethernet packet of the fillowing structure -will be sent. This Ethernet packet will have an ethertype of 0x6300, and will -not have an IPv4 or IPv6 layer on top of it. Instead, there will instead be -multiple TLV fields, similar to LACP. +| Starting byte | Length | Description | +|---------------|--------|-----------------| +| 0 | 1 | Retry count | +| 1 | 1 | Padding | -The TLV types will be defined as follows: +If either side wants to use a non-standard retry count (i.e. retry count set to +something besides 3), then they must send a LACP version 0xf1 packet. This +packet will include the retry count of both peers. The receiving device must +validate the peer's information and then update the retry count that the peer +wants to use. -| Value | Description | -|-------|---------------------| -| 0x01 | Actor Information | -| 0x02 | Partner Information | -| 0x03 | Retry Count | +This retry count is valid until any of the following occurs: -Both Actor Information and Partner Information have the following content: +* A new retry count is sent +* A duration of 3 minutes times the retry count passes +* The LACP session goes down for whatever reason (because the new retry count + expires, because the link goes down, etc.) +* The peer device sends a version 0x01 LACP PDU (without the retry count TLVs) -| Starting byte | Length | Description | -|---------------|--------|-----------------| -| 0 | 2 | System Priority | -| 2 | 6 | System ID | -| 8 | 2 | Key | -| 10 | 2 | Port Priority | -| 12 | 2 | Port | -| 14 | 2 | Padding | +Except for the first event, after any of these happen, the standard retry count +of 3 applies. -Retry count have the following content: +If both sides want to use the standard retry count of 3 instead, they are +recommended to send a regular LACP version 0x01 packet, so that the current +standard is being followed. -| Starting byte | Length | Description | -|---------------|--------|-----------------| -| 0 | 2 | New retry count | -| 2 | 2 | Padding | +#### Changing Max Retries for Warmboot + +As part of a SONiC device starting the warmboot process, currently, LACP PDUs +are sent to all of the peers, to refresh the timers on the peers. This allows +the warmboot process the full 90 seconds for control plane to come back up and +for PDUs to be sent again after warmboot. + +Now, the retry count on the local device will be changed to 5 retries (instead +of the standard 3 retries). This will cause teamd to send out LACP PDUs with +the above-defined version 0xf1 of the protocol, including the new retry count. +This should be done only after verifying through some method that the peer side +understands this feature. Teamd will not wait for an acknowledgment packet. + +After warmboot is done, and teamd has started up after warmboot, teamd will now +be using the default standard retry count of 3. Because of this, it will send a +standard LACP PDU packet (with version 0x01). When the peer teamd client +receives this packet, it will know that this side's retry count should be +changed back to 3. + +### SAI API + +There are no changes needed in the SAI API or by vendors. + +### Configuration and management + +#### CLI -When the retry count needs to be changed, the sending device must send a packet -with ethertype 0x6300, and the data will contain the Actor Information, Partner -Information, and Retry Count TLVs. The receiving device must validate the actor -and partner information, and then update the retry count as specified. No -acknowledgment packet is sent back. +There will be two CLIs added to get and set the retry count. These are: -The custom ethertype is so that if the packet is sent to non-SONiC devices -or SONiC devices running an older version, then it can be silently discarded, -without it getting incorrectly handled by the peer device and without it getting -forwarded to a different device. +* `config portchannel retry-count get ` +* `config portchannel retry-count set ` -# Changing Max Retries for Warmboot +`` must refer to a valid, existing portchannel name. +`` must refer to a retry count between 3 and 10. -As part of a SONiC device starting the warmboot process, LACP PDUs are sent to -all of the peers, to refresh the timers on the peers. This allows the warmboot -process the full 90 seconds for control plane to come back up and for PDUs to be -sent again after warmboot. +Changes done with this CLI is NOT preserved across reboots, and not saved in +any DB. -Now, in addition to refreshing the PDUs timer, the above-specified Ethernet -packet (with ethertype 0x6300) will be sent to the peer devices, with the new -retry count set to 5. This notifies the peer device that for this device that -for this LAG, it should allow up to 5 retries before bringing down this LAG. If -the peer device is not running SONiC or is not running a version of SONiC, then -the expectation is that the packet will just be dropped at the peer device with -no handling. In either case, teamd will not wait for any acknowledgment packet. +### Restrictions/Limitations -After warmboot is done, and teamd has started up after warmboot, it will send -the above-specified Ethernet packet again, but this time, the new retry count is -set to 3, thus restoring it to the standard value. +Such a change as described in this HLD is going against the LACP protocol, and +as such, can only be supported if both sides of the LAG are running SONiC, and +they are running a version of SONiC that understands this. If the peer side is +not running a supported version of SONiC, or it is not running SONiC, then +setting a custom retry count may cause the LAG to go down. -# CLI +### Testing Requirements/Design -No new CLI options or config options will be added, as this is not meant to be -user-configurable. +To test this feature, a T0 topology with SONiC neighbors will be used. Test +cases will be added to get and set the retry count via CLI. In addition, a test +case will be added to increase the retry count and do a warm-reboot, and verify +that after warm-reboot, the SONiC neighbors did not bring down the LAG, and +that after the T0 comes up, the retry count has been set to 3. # Pull requests -None as of right now. +* [sonic-net/sonic-utilities: Add CLI configuration options for teamd retry count feature](https://github.com/sonic-net/sonic-utilities/pull/2642) +* [sonic-net/sonic-buildimage: teamd: Add support for custom retry counts for LACP sessions](https://github.com/sonic-net/sonic-buildimage/pull/13453) # References From 97958ad1b410cdb533633acf8dcf890fc0d7516d Mon Sep 17 00:00:00 2001 From: Saikrishna Arcot Date: Fri, 21 Jul 2023 12:12:32 -0700 Subject: [PATCH 5/5] Update teamd retry count HLD with additional details and a PR Signed-off-by: Saikrishna Arcot --- ...ing LACP PDU timeout during warm-reboot.md | 46 +++++++++++++++---- 1 file changed, 37 insertions(+), 9 deletions(-) diff --git a/doc/lag/Increasing LACP PDU timeout during warm-reboot.md b/doc/lag/Increasing LACP PDU timeout during warm-reboot.md index 33fd5a5942..b53d079aa5 100644 --- a/doc/lag/Increasing LACP PDU timeout during warm-reboot.md +++ b/doc/lag/Increasing LACP PDU timeout during warm-reboot.md @@ -93,11 +93,13 @@ The Actor Retry Count and Partner Retry Count TLVs have the following content: | 0 | 1 | Retry count | | 1 | 1 | Padding | -If either side wants to use a non-standard retry count (i.e. retry count set to -something besides 3), then they must send a LACP version 0xf1 packet. This -packet will include the retry count of both peers. The receiving device must -validate the peer's information and then update the retry count that the peer -wants to use. +If either side wants to use a non-standard retry count for a member port (i.e. +retry count set to something besides 3), then they must send a LACP version +0xf1 packet. This packet will include the retry count of both peers for that +member port. The receiving device must validate the peer's information and then +update the retry count that the peer wants to use. This retry count will apply +only to that member port, and a separate packet will need to be sent for each +member port. This retry count is valid until any of the following occurs: @@ -105,14 +107,24 @@ This retry count is valid until any of the following occurs: * A duration of 3 minutes times the retry count passes * The LACP session goes down for whatever reason (because the new retry count expires, because the link goes down, etc.) -* The peer device sends a version 0x01 LACP PDU (without the retry count TLVs) +* The peer device sends a version 0x01 LACP PDU (only after 60 seconds) Except for the first event, after any of these happen, the standard retry count of 3 applies. +In the case of the last event, where a 0x01 LACP PDU is received, the retry +count will get reset to 3 only after 60 seconds after the last 0xf1 LACP PDU +with non-standard retry count. In other words, when a 0xf1 LACP PDU is received +with a non-standard retry count, if a 0x01 LACP PDU is received within 60 +seconds of that, then the retry count will not get reset to 3. This is meant to +act as a transition mechanism during image upgrades. + If both sides want to use the standard retry count of 3 instead, they are -recommended to send a regular LACP version 0x01 packet, so that the current -standard is being followed. +recommended (but not required) to send a regular LACP version 0x01 packet, so +that the current standard is being followed. For SONiC's purposes, if a 0xf1 +LACP PDU is received by a device, then it will also respond with a 0xf1 LACP +PDU. This will act as part of a feature presence test, to determine if the peer +device supports this feature. #### Changing Max Retries for Warmboot @@ -133,9 +145,24 @@ standard LACP PDU packet (with version 0x01). When the peer teamd client receives this packet, it will know that this side's retry count should be changed back to 3. +### Feature Test + +To test if a neighbor device has this feature, the following checks will be +done: + +* Based on the LLDP neighbor table, check to see if the remote device claims to + be a SONiC device. Specifically, check to see if the system description + contains SONiC. If desired, a version check could be made here as well. If + there is no LLDP data, or the remote device is not a SONiC device, then + assume that this feature is not support, and stop here. +* From a Python script, send a version 0xf1 LACP PDU packet, with the retry + count for both sides set to 3. If the neighbor device responds with a valid + 0xf1 LACP PDU packet, then this indicates that the feature is supported. If + not, then this feature is likely not supported. + ### SAI API -There are no changes needed in the SAI API or by vendors. +There are no changes needed in the SAI API or in the implementation by vendors. ### Configuration and management @@ -172,6 +199,7 @@ that after the T0 comes up, the retry count has been set to 3. * [sonic-net/sonic-utilities: Add CLI configuration options for teamd retry count feature](https://github.com/sonic-net/sonic-utilities/pull/2642) * [sonic-net/sonic-buildimage: teamd: Add support for custom retry counts for LACP sessions](https://github.com/sonic-net/sonic-buildimage/pull/13453) +* [sonic-net/sonic-mgmt: Add test cases for teamd retry count feature](https://github.com/sonic-net/sonic-mgmt/pull/8152) # References