-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dual-ToR] Tunnel route creation/removal causes packet duplication during interfaces recovering from DOWN to UP #16161
Comments
@ayurkiv-nvda can you confirm that the reported 'duplicate' was from vlan flooding? If this is the case, in real life, these IO would be flooded to different servers and being dropped by all but the actual receiver. @prsunny I think in dualtor case, the vlan flooding should be disabled? |
Hi @ayurkiv-nvda, could you please share the logs?
|
Yes, this is a simultaneous flood to all ports in VLAN that are currently in UP state (no FDB but neighbor exists) |
Yes, we have proxy arp enabled on our Dual-ToR setup
But as far as I know, proxy arp packet flooding is related to broadcast flooding |
I could not reproduce this with public 202211 image: https://sonic-build.azurewebsites.net/ui/sonic/pipelines/1/builds/342905/artifacts?branchName=202211
@ayurkiv-nvda, could you please share the link to the 202211 image you are using? |
@lolyu I used 202211.66-1e1974709_Internal I think it is here: I will check SONiC.202211.342905-adb43ff1f also, maybe it was fixed in earliest build |
I checked it on SONiC.202211.342905-adb43ff1f , 202211.362946-9ffa92cc6 and 202211.70-e7ce179b7_Internal |
Managed to reproduce it on Do we have some progress with this bug? |
@ayurkiv-nvda, the issue is that, after the port admin up, when the neighbor is added and tunnel route is removed, the fdb is not there, right?
The tunnel route is removed at |
Hello @lolyu |
Hi @ayurkiv-nvda, thanks for the clarification. The current dualtor-io test infrastructure cannot tell which port the packet is received, and the testcase will complain with error if this unknown unicast flood happens. This should be a test infrastructure gap. So let's mark this as a to-do item. |
Hi @ayurkiv-nvda, could you please have this fixed as it is specific to mlnx platform. |
Description
Problem is related to #16085
Found during running community test dualtor_io/test_link_failure.py::test_active_link_admin_down_config_reload_link_up_downstream_standby[active-active]
Steps to reproduce the issue:
config save -y
config reload -y
Describe the results you received:
After step#4
config reload -y
according to #16085, routes to tunnel are not created ( traffic generated on step#5 will not reach the destination). There are no routes for mux IPs (192.168.0.2, 192.168.0.4, 192.168.0.6, ... 192.168.0.48):Interface startup (
config interface startup Ethernet4
, etc) will trigger a tunnel route creation event for all mux interfaces.Shortly after this, when port is UP the tunnel route will be removed.
DUT also will start sending ARP requests. Or it may receive an ARP reply during GARP notification (once in 10s)
There is a time gap between removing tunnel route and new FDB entry (~150ms)
If, during this period neighbor is known, there is no FDB in the table, and packets are sent - these packets will be flooded on all interfaces currently UP on VLAN.
In our case, traffic is flooded on a few interfaces (192.168.0.6, 192.168.0.18, 192.168.0.36)
Flow:
NOTE:
Dehavior described for 192.168.0.18.
It is identical to 192.168.0.36
BUT there are no create_route/remove_route logs for 192.168.0.6 even though the traffic behavior for 192.168.0.6 is identical to 192.168.0.18 and 192.168.0.36. It is currently not clear to me what is the reason of the missing logs. I think I will do few more tests and will update the ticket.
Describe the results you expected:
No duplication.
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: