Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vlanmgr] Disable arp_evict_nocarrier for vlan host intf #2469

Merged
merged 2 commits into from
Oct 6, 2022

Conversation

lolyu
Copy link
Contributor

@lolyu lolyu commented Sep 27, 2022

What I did
To fix: sonic-net/sonic-buildimage#11924

This PR depends on: sonic-net/sonic-linux-kernel#293

Signed-off-by: Longxiang Lyu [email protected]

Why I did it
Disable the sysctl parameter arp_evict_nocarrier for the host vlan interfaces.
So if its member ports are all down and host vlan interfaces become NO-CARRIER, the neighbor learned will not be flushed.

How I verified it
Build image together with the kernel change:

  1. verify the sysctl param arp_evict_no_carrier for Vlan1000
# cat /proc/sys/net/ipv4/conf/Vlan1000/arp_evict_nocarrier
0
  1. check the neighbor learned from Vlan1000
# ip neighbor | grep Vlan1000
192.168.0.13 dev Vlan1000 lladdr 82:7b:a1:ac:3d:0c STALE
192.168.0.15 dev Vlan1000 lladdr ea:5e:4b:f2:84:0e STALE
192.168.0.9 dev Vlan1000 lladdr 72:ce:37:c0:39:08 STALE
192.168.0.24 dev Vlan1000 lladdr ea:47:b9:01:4c:17 STALE
192.168.0.4 dev Vlan1000 lladdr 02:e1:1a:90:eb:03 STALE
192.168.0.11 dev Vlan1000 lladdr 56:a0:28:b5:b7:0a STALE
192.168.0.6 dev Vlan1000 lladdr 52:52:cb:52:7b:05 STALE
192.168.0.21 dev Vlan1000 lladdr 8e:a7:11:a3:1c:14 STALE
192.168.0.23 dev Vlan1000 lladdr 8e:83:cc:23:3b:16 STALE
192.168.0.2 dev Vlan1000 lladdr e2:48:09:46:76:01 STALE
192.168.0.17 dev Vlan1000 lladdr 16:0a:b6:0e:4d:10 STALE
192.168.0.12 dev Vlan1000 lladdr be:26:ff:0f:7e:0b STALE
192.168.0.19 dev Vlan1000 lladdr da:1b:c4:ca:10:12 STALE
192.168.0.14 dev Vlan1000 lladdr 66:e0:b9:f1:09:0d STALE
192.168.0.8 dev Vlan1000 lladdr c2:4d:53:03:bc:07 STALE
192.168.0.10 dev Vlan1000 lladdr 42:55:42:18:c7:09 STALE
192.168.0.25 dev Vlan1000 lladdr e6:75:31:cc:2e:18 STALE
192.168.0.5 dev Vlan1000 lladdr be:b2:73:06:4d:04 STALE
192.168.0.20 dev Vlan1000 lladdr 2e:d9:c9:5f:00:13 STALE
192.168.0.7 dev Vlan1000 lladdr ce:87:30:f2:d7:06 STALE
192.168.0.22 dev Vlan1000 lladdr ce:9e:72:38:21:15 STALE
192.168.0.16 dev Vlan1000 lladdr 76:0c:c4:6e:09:0f STALE
192.168.0.3 dev Vlan1000 lladdr 42:97:97:7d:c9:02 STALE
192.168.0.18 dev Vlan1000 lladdr 62:7d:fb:3e:87:11 STALE
  1. shutdown all the member ports of Vlan1000
  2. verify Vlan1000 device becomes NO-CARRIER and neighbors are kept
# ip -j -p link show Vlan1000
[ {
        "ifindex": 59,
        "link": "Bridge",
        "ifname": "Vlan1000",
        "flags": [ "NO-CARRIER","BROADCAST","MULTICAST","ALLMULTI","UP" ],
        "mtu": 9100,
        "qdisc": "noqueue",
        "operstate": "LOWERLAYERDOWN",
        "linkmode": "DEFAULT",
        "group": "default",
        "txqlen": 1000,
        "link_type": "ether",
        "address": "00:aa:bb:cc:dd:ee",
        "broadcast": "ff:ff:ff:ff:ff:ff"
    } ]
# ip neighbor | grep Vlan1000
192.168.0.13 dev Vlan1000  FAILED
192.168.0.15 dev Vlan1000  FAILED
192.168.0.9 dev Vlan1000  FAILED
192.168.0.24 dev Vlan1000  FAILED
192.168.0.4 dev Vlan1000  FAILED
192.168.0.11 dev Vlan1000  FAILED
192.168.0.6 dev Vlan1000  FAILED
192.168.0.21 dev Vlan1000  FAILED
192.168.0.23 dev Vlan1000  FAILED
192.168.0.2 dev Vlan1000  FAILED
192.168.0.17 dev Vlan1000  FAILED
192.168.0.12 dev Vlan1000  FAILED
192.168.0.19 dev Vlan1000  FAILED
192.168.0.14 dev Vlan1000  FAILED
192.168.0.8 dev Vlan1000  FAILED
192.168.0.10 dev Vlan1000  FAILED
192.168.0.25 dev Vlan1000  FAILED
192.168.0.5 dev Vlan1000  FAILED
192.168.0.20 dev Vlan1000  FAILED
192.168.0.7 dev Vlan1000  FAILED
192.168.0.22 dev Vlan1000  FAILED
192.168.0.16 dev Vlan1000  FAILED
192.168.0.3 dev Vlan1000  FAILED
192.168.0.18 dev Vlan1000  FAILED

Details if related

@lolyu lolyu marked this pull request as ready for review September 29, 2022 02:48
@lolyu lolyu requested a review from prsunny as a code owner September 29, 2022 02:48
zjswhhh
zjswhhh previously approved these changes Sep 29, 2022
prsunny
prsunny previously approved these changes Sep 30, 2022
@lolyu lolyu dismissed stale reviews from prsunny and zjswhhh via 4720301 October 6, 2022 03:52
@lolyu
Copy link
Contributor Author

lolyu commented Oct 6, 2022

hi @prsunny @zjswhhh, resolved a conflict, could you please review again?

@lolyu lolyu requested review from prsunny and zjswhhh October 6, 2022 03:53
@prsunny prsunny merged commit c8d4905 into sonic-net:master Oct 6, 2022
yxieca pushed a commit that referenced this pull request Oct 25, 2022
* [vlanmgr] Disable `arp_evict_nocarrier` for vlan host intf
liat-grozovik pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Nov 10, 2022
Advance sonic-swss submodule to pick up new commits:

dbdf31c [counters] Improve performance by polling only configured ports buffer queue/pg counters sonic-net/sonic-swss#2473
ab4f804 [portsorch] remove port OID from saiOidToAlias map on port deletion sonic-net/sonic-swss#2483
ab29920 [QoS] Support dynamic headroom calculation for Barefoot platforms sonic-net/sonic-swss#2412
15beee4 Add support for voq counters in portsorch. sonic-net/sonic-swss#2467
c8d4905 [vlanmgr] Disable arp_evict_nocarrier for vlan host intf sonic-net/sonic-swss#2469
31c9321 [chassis][voq]Collect counters for fabric links sonic-net/sonic-swss#1944

Signed-off-by: Kebo Liu <[email protected]>
@liuh-80
Copy link
Contributor

liuh-80 commented Nov 11, 2022

@lolyu , currently swss-common repo validation failed on this UT:

2022-11-11T06:20:03.0329331Z test_vlan.py::TestVlan::test_VlanMemberLinkDown FAILED [ 91%]

Here is a test PR which change nothing:
sonic-net/sonic-swss-common#707

https://dev.azure.com/mssonic/build/_build/results?buildId=172876&view=logs&j=3f6395b2-1619-5ebe-f305-2aedcf353cb5&t=f9db2383-9349-50ba-61c3-4bde15cec59e

2022-11-11T07:13:34.8909924Z _______________________ TestVlan.test_VlanMemberLinkDown _______________________
2022-11-11T07:13:34.8910205Z
2022-11-11T07:13:34.8910532Z self = <test_vlan.TestVlan object at 0x7f136768c190>
2022-11-11T07:13:34.8910999Z dvs = <conftest.DockerVirtualSwitch object at 0x7f1367347d60>
2022-11-11T07:13:34.8911234Z
2022-11-11T07:13:34.8911563Z def test_VlanMemberLinkDown(self, dvs):
2022-11-11T07:13:34.8911870Z
2022-11-11T07:13:34.8912219Z # TODO: add_ip_address has a dependency on cdb within dvs,
2022-11-11T07:13:34.8912688Z # so we still need to setup the db. This should be refactored.
2022-11-11T07:13:34.8913099Z dvs.setup_db()
2022-11-11T07:13:34.8913366Z
2022-11-11T07:13:34.8913629Z vlan = "1000"
2022-11-11T07:13:34.8913941Z vlan_ip = "192.168.0.1/21"
2022-11-11T07:13:34.8914289Z interface = "Ethernet0"
2022-11-11T07:13:34.8914640Z vlan_interface = "Vlan%s" % vlan
2022-11-11T07:13:34.8914998Z server_ip = "192.168.0.100"
2022-11-11T07:13:34.8915488Z vlan_intf_sysctl_param_path = "/proc/sys/net/ipv4/conf/%s/arp_evict_nocarrier" % vlan_interface
2022-11-11T07:13:34.8915923Z
2022-11-11T07:13:34.8916237Z self.dvs_vlan.create_vlan(vlan)
2022-11-11T07:13:34.8916652Z vlan_oid = self.dvs_vlan.get_and_verify_vlan_ids(1)[0]
2022-11-11T07:13:34.8917089Z self.dvs_vlan.verify_vlan(vlan_oid, vlan)
2022-11-11T07:13:34.8917549Z self.dvs_vlan.create_vlan_member(vlan, interface)
2022-11-11T07:13:34.8918005Z self.dvs_vlan.verify_vlan_member(vlan_oid, interface)
2022-11-11T07:13:34.8918628Z dvs.set_interface_status(interface, "up")
2022-11-11T07:13:34.8919046Z dvs.add_ip_address(vlan_interface, vlan_ip)
2022-11-11T07:13:34.8919643Z dvs.runcmd("ip neigh replace %s lladdr 11:22:33:44:55:66 dev %s nud stale" % (server_ip, vlan_interface))
2022-11-11T07:13:34.8920219Z
2022-11-11T07:13:34.8920610Z > neigh_oid = self.dvs_vlan.app_db.wait_for_n_keys("NEIGH_TABLE", 1)[0]
2022-11-11T07:13:34.8920874Z
2022-11-11T07:13:34.8921139Z test_vlan.py:513:
2022-11-11T07:13:34.8921539Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2022-11-11T07:13:34.8921709Z
2022-11-11T07:13:34.8922003Z self = <dvslib.dvs_database.DVSDatabase object at 0x7f1367262a00>
2022-11-11T07:13:34.8922749Z table_name = 'NEIGH_TABLE', num_keys = 1, wait_at_least_n_keys = False
2022-11-11T07:13:34.8923227Z polling_config = PollingConfig(polling_interval=0.01, timeout=20.0, strict=True)
2022-11-11T07:13:34.8923630Z failure_message = None
2022-11-11T07:13:34.8923783Z
2022-11-11T07:13:34.8924052Z def wait_for_n_keys(
2022-11-11T07:13:34.8924333Z self,
2022-11-11T07:13:34.8924609Z table_name: str,
2022-11-11T07:13:34.8924900Z num_keys: int,
2022-11-11T07:13:34.8925229Z wait_at_least_n_keys: bool = False,
2022-11-11T07:13:34.8925627Z polling_config: PollingConfig = PollingConfig(),
2022-11-11T07:13:34.8926012Z failure_message: str = None,
2022-11-11T07:13:34.8926509Z ) -> List[str]:
2022-11-11T07:13:34.8926888Z """Wait for the specified number of keys to exist in the table.
2022-11-11T07:13:34.8927222Z
2022-11-11T07:13:34.8927474Z Args:
2022-11-11T07:13:34.8927819Z table_name: The name of the table from which to fetch the keys.
2022-11-11T07:13:34.8928260Z num_keys: The expected number of keys to retrieve from the table.
2022-11-11T07:13:34.8928718Z polling_config: The parameters to use to poll the db.
2022-11-11T07:13:34.8929138Z failure_message: The message to print if the call times out. This will only take effect
2022-11-11T07:13:34.8929569Z if the PollingConfig is set to strict.
2022-11-11T07:13:34.8929841Z
2022-11-11T07:13:34.8930069Z Returns:
2022-11-11T07:13:34.8930435Z The keys stored in the table. If no keys are found, then an empty List is returned.
2022-11-11T07:13:34.8930783Z """
2022-11-11T07:13:34.8930980Z
2022-11-11T07:13:34.8931224Z def access_function():
2022-11-11T07:13:34.8931542Z keys = self.get_keys(table_name)
2022-11-11T07:13:34.8931850Z if wait_at_least_n_keys:
2022-11-11T07:13:34.8932177Z return (len(keys) >= num_keys, keys)
2022-11-11T07:13:34.8932476Z else:
2022-11-11T07:13:34.8932778Z return (len(keys) == num_keys, keys)
2022-11-11T07:13:34.8933067Z
2022-11-11T07:13:34.8933357Z status, result = wait_for_result(
2022-11-11T07:13:34.8933758Z access_function, self._disable_strict_polling(polling_config)
2022-11-11T07:13:34.8934110Z )
2022-11-11T07:13:34.8934344Z
2022-11-11T07:13:34.8934612Z if not status:
2022-11-11T07:13:34.8934891Z message = failure_message or (
2022-11-11T07:13:34.8935281Z f"Unexpected number of keys: expected={num_keys}, received={len(result)} "
2022-11-11T07:13:34.8935956Z f'({result}), table="{table_name}"'
2022-11-11T07:13:34.8936238Z )
2022-11-11T07:13:34.8936521Z > assert not polling_config.strict, message
2022-11-11T07:13:34.8937726Z E AssertionError: Unexpected number of keys: expected=1, received=2 (('Vlan1000:192.168.0.100', 'eth0:172.17.0.1')), table="NEIGH_TABLE"
2022-11-11T07:13:34.8938078Z
2022-11-11T07:13:34.8938378Z dvslib/dvs_database.py:395: AssertionError
2022-11-11T07:13:34.8938794Z =============================== warnings summary ===============================

@liuh-80
Copy link
Contributor

liuh-80 commented Nov 11, 2022

As my understand, some PR need cherry-pick to swss-common 202205 branch to fix the UT failed issue.

@liuh-80
Copy link
Contributor

liuh-80 commented Nov 11, 2022

@lolyu , I think this UT can't run on OS version <=20025 because this kernel patch not apply to them:
#2469

This UT need update to ignore older OS version.

liuh-80 added a commit that referenced this pull request Nov 14, 2022
liuh-80 added a commit to liuh-80/sonic-swss that referenced this pull request Nov 14, 2022
liuh-80 added a commit to liuh-80/sonic-swss that referenced this pull request Nov 14, 2022
lolyu added a commit to lolyu/sonic-swss that referenced this pull request Nov 14, 2022
liuh-80 pushed a commit that referenced this pull request Nov 14, 2022
lolyu added a commit that referenced this pull request Jan 4, 2023
What I did
Cherry-pick back #2469 into 202205.

Why I did it
It is reverted in 202205: #2518 because of the flakiness of test_vlan.py

How I verified it
This depends on the following PRs:
#2504
#2541
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[202205][dualtor][active-standby] The routes to tun0 are removed after shutdown all ports in a vlan
5 participants