Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SWSS] Static LAG Support #3090

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

kannansel
Copy link

What I did
Static lag support changes in SWSS module
sonic-net/SONiC#1039

Why I did it
PXE application doesnt support LAG during boot. and also some legacy device doesnt support lacp.
To achieve more bandwidth with those legacy devices static lag will be useful

How I verified it

UT:-
        Test cases
1       Create static port channel with static flag     pass    pass
2       verify static has option flag true or false     pass    pass
3       Add static member see the portchannel is up     pass    pass
4       verify teamd is created with round-robin option by default
pass    pass
5       Remove last portchannel member check port channel down  pass
pass
6       Remove portchannel member check port channel still up   pass
pass
7       verify teamdctl config dump     pass    pass
8       verify teamdctl state dump      pass    pass
9       shutdown the portchannel check the kernel state pass    pass
10      no shutdown the portchannel check the kernel state      pass
pass
11      "Check the show output matches the review comment
root@sonic:~# show inter port
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not
available,
       S - selected, D - deselected, * - not synced
  No.  Team Dev       Protocol     Ports
-----  -------------  -----------
-----------------------------------------
   01  PortChannel01  LACP(A)(Up)  Ethernet16(S)
   02  PortChannel02  NONE(A)(Up)  Ethernet48(S) Ethernet64(S)
Ethernet32(S)
root@sonic:~#
12      teamnl is set to roundrobin     pass    pass
13      save and reload and verify portchannel is up    pass    pass
14      "docker restart teamd
teamd stopped
swss stopped
syncd stopped

swss started
syncd started
teamd started"  pass    pass
15      warm-reboot fails even without any port channel config  fail
16      verify teamd settles doesnt hog cpu with 100% cpu usage pass
17      "trying with static port channel config on non supported
branches
port channel will be configured as LACP."               pass

Not Supported Options
1. Min links and
2. fall back are not supported

Details if related

Patch explanation
1. static lag supported with option roundrobin.
2. tlm_teamd -> teamdctl changes to handle json dump for static lag.
3. test cases -> updated

    Why ?
    Static lag support changes in SWSS module
    sonic-net/SONiC#1039

    Patch explanation
    1. static lag supported with option roundrobin.
    2. tlm_teamd -> teamdctl changes to handle json dump for static lag.
    3. test cases -> updated

    UT:-
            Test cases
    1       Create static port channel with static flag     pass    pass
    2       verify static has option flag true or false     pass    pass
    3       Add static member see the portchannel is up     pass    pass
    4       verify teamd is created with round-robin option by default
    pass    pass
    5       Remove last portchannel member check port channel down  pass
    pass
    6       Remove portchannel member check port channel still up   pass
    pass
    7       verify teamdctl config dump     pass    pass
    8       verify teamdctl state dump      pass    pass
    9       shutdown the portchannel check the kernel state pass    pass
    10      no shutdown the portchannel check the kernel state      pass
    pass
    11      "Check the show output matches the review comment
    root@sonic:~# show inter port
    Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not
    available,
           S - selected, D - deselected, * - not synced
      No.  Team Dev       Protocol     Ports
    -----  -------------  -----------
    -----------------------------------------
       01  PortChannel01  LACP(A)(Up)  Ethernet16(S)
       02  PortChannel02  NONE(A)(Up)  Ethernet48(S) Ethernet64(S)
    Ethernet32(S)
    root@sonic:~#
    12      teamnl is set to roundrobin     pass    pass
    13      save and reload and verify portchannel is up    pass    pass
    14      "docker restart teamd
    teamd stopped
    swss stopped
    syncd stopped

    swss started
    syncd started
    teamd started"  pass    pass
    15      warm-reboot fails even without any port channel config  fail
    16      verify teamd settles doesnt hog cpu with 100% cpu usage pass
    17      "trying with static port channel config on non supported
    branches
    port channel will be configured as LACP."               pass

    Not Supported Options
    1. Min links and
    2. fall back are not supported
@saiarcot895
Copy link
Contributor

I think the fallback LAG feature described here does the same thing basically, except that it allows LACP to be established once the peer device is in a state to start sending LACP packets?

There is one bug related to the LAG containing multiple ports that's fixed in sonic-net/sonic-buildimage#10823.

@kannansel
Copy link
Author

I think the fallback LAG feature described here does the same thing basically, except that it allows LACP to be established once the peer device is in a state to start sending LACP packets?

There is one bug related to the LAG containing multiple ports that's fixed in sonic-net/sonic-buildimage#10823.

Hi Sai,

  1. Static LAG doesnt involve any protocol or timer (will be more suitable where we will not rely on any lacp control packets).
  2. Will be useful in customer scenarios (as stop gap solution along with PxE) if we face any customer related issues with LACP control packet receive and forward handling.
  3. LACP fallback feature does involve lacp protocol timer (and also can oscillate between up/down if there is any LACP tx/rx issue in the network).
  4. In master quickly tried only single port comes up right currently... above change is not yet pushed.

Static LAG will be useful and will be immune to any LACP protocol operation (except Link up/down events).

Thanks,
Kannan.S

conf << "'{\"device\":\"" << alias << "\","
<< "\"hwaddr\":\"" << mac_boot.to_string() << "\","
<< "\"runner\":{"
<< "\"name\":\"roundrobin\"";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe based on HLD discussion the default mode should be loadbalance and not round robin. Can you please clarify why do we have roundrobin defined as default?
sonic-net/SONiC#1039 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition if we have roundrobin how is it achieved in the hardware. If I understand roundrobin is to send one packet per link in roundrobin fashion https://man.archlinux.org/man/teamd.conf.5.en . However we don't have such hash algorithm in SAI.
If SAI is going with loadbalance we should have the same defined in teamd as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dgsudharsan ,

Apologies for the delay... unwell last week (this week too ... )

No Round Robin is only for Linux traffic (CPU traffic), Right now it is not mapped to saiars.h (which support packet spraying in hardware for ECMP and LAG)
Regarding spec will modify to round robin..

With loadbalance facing issue of 100% teamd always thats the reason for the change to roundrobin.

Thanks,
Kannan.S

/// @param storage a reference to the temporary storage
///

void ValuesStore::extract_values_static(const std::string & lag_name, json_t * root, HashOfRecords & storage)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to have a separate function to extract values for static? This appears to be duplicate of the existing API. Can you clarify the difference?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teamdctl state information for Static lag and Dynamic LACP are little different.

Dynamic LACP has the following fields along with setup
{
"ports": {
"Ethernet48": {
"ifinfo": {
"dev_addr": "00:e0:ec:cc:a8:de",
"dev_addr_len": 6,
"ifindex": 40,
"ifname": "Ethernet48"
},
"link": {
"duplex": "half",
"speed": 0,
"up": false
},
"link_watches": {
"list": {
"link_watch_0": {
"delay_down": 0,
"delay_up": 0,
"down_count": 5,
"name": "ethtool",
"up": false
}
},
"up": false
},
"runner": {
"actor_lacpdu_info": {
"key": 103,
"port": 49,
"port_priority": 255,
"state": 5,
"system": "00:e0:ec:cc:a8:de",
"system_priority": 65535
},
"aggregator": {
"id": 0,
"selected": false
},
"key": 103,
"partner_lacpdu_info": {
"key": 0,
"port": 0,
"port_priority": 0,
"state": 0,
"system": "00:00:00:00:00:00",
"system_priority": 0
},
"partner_retry_count": 3,
"prio": 255,
"selected": false,
"state": "disabled"
}
}
},
"runner": {
"active": true,
"enable_retry_count_feature": true,
"fallback": true,
"fast_rate": false,
"retry_count": 3,
"select_policy": "lacp_prio",
"sys_prio": 65535
},
"setup": {
"daemonized": true,
"dbus_enabled": false,
"debug_level": 1,
"kernel_team_mode_name": "loadbalance",
"pid": 297,
"pid_file": "/var/run/teamd/PortChannel03.pid",
"runner_name": "lacp",
"zmq_enabled": false
},

"team_device": {
"ifinfo": {
"dev_addr": "00:e0:ec:cc:a8:de",
"dev_addr_len": 6,
"ifindex": 50,
"ifname": "PortChannel03"
}
}
}

Static Lag will not contain the runner option (Dynamic LACP needs runner option to determine peer lacp link protocol information)

{
"ports": {
"Ethernet32": {
"ifinfo": {
"dev_addr": "00:e0:ec:cc:a8:de",
"dev_addr_len": 6,
"ifindex": 39,
"ifname": "Ethernet32"
},
"link": {
"duplex": "half",
"speed": 0,
"up": true
},
"link_watches": {
"list": {
"link_watch_0": {
"delay_down": 0,
"delay_up": 0,
"down_count": 3,
"name": "ethtool",
"up": true
}
},
"up": true
}
},
"Ethernet64": {
"ifinfo": {
"dev_addr": "00:e0:ec:cc:a8:de",
"dev_addr_len": 6,
"ifindex": 37,
"ifname": "Ethernet64"
},
"link": {
"duplex": "half",
"speed": 0,
"up": false
},
"link_watches": {
"list": {
"link_watch_0": {
"delay_down": 0,
"delay_up": 0,
"down_count": 3,
"name": "ethtool",
"up": false
}
},
"up": false
}
}
},
"setup": {
"daemonized": true,
"dbus_enabled": false,
"debug_level": 1,
"kernel_team_mode_name": "roundrobin",
"pid": 49,
"pid_file": "/var/run/teamd/PortChannel02.pid",
"runner_name": "roundrobin",
"zmq_enabled": false
},
"team_device": {
"ifinfo": {
"dev_addr": "00:e0:ec:cc:a8:de",
"dev_addr_len": 6,
"ifindex": 9,
"ifname": "PortChannel02"
}
}
}

Since Static doesnt have those runner information modifying existing LAG Member table done for LACP can create more collaterals, thats the reason for separating it out between Static and LACP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants