Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dual-ToR] The tunnel route of the standby ToR cannot be restored after config reload when the mux ports are admin DOWN #16085

Closed
ayurkiv-nvda opened this issue Aug 9, 2023 · 6 comments · Fixed by #17784
Assignees
Labels
Dual ToR Platform ♊ Issues found on dual ToR platforms Issue for 202205 Issue for 202211 Triaged this issue has been triaged

Comments

@ayurkiv-nvda
Copy link
Contributor

ayurkiv-nvda commented Aug 9, 2023

Description

Found during running community test dualtor_io/test_link_failure.py::test_active_link_admin_down_config_reload_downstream[active-active]

Steps to reproduce the issue:

  1. Config Active-Active Dual-ToR setup, make sure that both switches have all mux ports active.
  2. Shutdown ALL mux interface:
config interface shutdown Ethernet4
config interface shutdown Ethernet8
config interface shutdown Ethernet12
config interface shutdown Ethernet16
config interface shutdown Ethernet20
config interface shutdown Ethernet24
config interface shutdown Ethernet28
config interface shutdown Ethernet32
config interface shutdown Ethernet36
config interface shutdown Ethernet40
config interface shutdown Ethernet44
config interface shutdown Ethernet48
config interface shutdown Ethernet52
config interface shutdown Ethernet56
config interface shutdown Ethernet60
config interface shutdown Ethernet64
config interface shutdown Ethernet68
config interface shutdown Ethernet72
config interface shutdown Ethernet76
config interface shutdown Ethernet80
config interface shutdown Ethernet84
config interface shutdown Ethernet88
config interface shutdown Ethernet92
config interface shutdown Ethernet96
  1. run traffic from T1 to the server via standby tor - traffic will go via tunnel
  2. config save -y
  3. config reload -y
  4. run traffic from T1 to the server via standby tor ----> routes are not created, traffic will be dropped

Describe the results you received:

Routes are not created, traffic will be dropped.

after step#2(shutwown all mux interface) we can see that tunnel route successfully created for both mux server IP and soc IP:

root@r-tigon-20:/home/admin# cat /var/log/syslog | grep tunnel | grep 192.168
Aug  9 11:23:54.458646 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.2/32
Aug  9 11:23:54.462784 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.3/32
Aug  9 11:23:54.914870 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.4/32
Aug  9 11:23:54.918718 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.5/32
Aug  9 11:23:55.347541 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.6/32
Aug  9 11:23:55.351310 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.7/32
Aug  9 11:23:55.814059 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.8/32
Aug  9 11:23:55.818357 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.9/32
Aug  9 11:23:56.269375 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.10/32
Aug  9 11:23:56.273647 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.11/32
Aug  9 11:23:56.720473 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.12/32
Aug  9 11:23:56.724503 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.13/32
Aug  9 11:23:57.125093 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.14/32
Aug  9 11:23:57.129180 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.15/32
Aug  9 11:23:57.568422 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.16/32
Aug  9 11:23:57.572207 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.17/32
Aug  9 11:23:58.004881 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.18/32
Aug  9 11:23:58.008899 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.19/32
Aug  9 11:23:58.443860 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.20/32
Aug  9 11:23:58.448513 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.21/32
Aug  9 11:23:58.886701 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.22/32
Aug  9 11:23:58.890894 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.23/32
Aug  9 11:23:59.335582 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.24/32
Aug  9 11:23:59.339779 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.25/32
Aug  9 11:23:59.810644 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.26/32
Aug  9 11:23:59.814903 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.27/32
Aug  9 11:24:00.247835 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.28/32
Aug  9 11:24:00.251766 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.29/32
Aug  9 11:24:00.676497 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.30/32
Aug  9 11:24:00.680971 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.31/32
Aug  9 11:24:01.126619 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.32/32
Aug  9 11:24:01.130838 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.33/32
Aug  9 11:24:01.550122 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.34/32
Aug  9 11:24:01.554325 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.35/32
Aug  9 11:24:02.017823 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.36/32
Aug  9 11:24:02.022573 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.37/32
Aug  9 11:24:02.484290 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.38/32
Aug  9 11:24:02.489093 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.39/32
Aug  9 11:24:02.970491 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.40/32
Aug  9 11:24:02.974817 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.41/32
Aug  9 11:24:03.415059 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.42/32
Aug  9 11:24:03.418942 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.43/32
Aug  9 11:24:03.870395 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.44/32
Aug  9 11:24:03.874342 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.45/32
Aug  9 11:24:04.346708 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.46/32
Aug  9 11:24:04.350529 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.47/32
Aug  9 11:24:04.815721 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.48/32
Aug  9 11:24:04.820715 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.49/32

But after step#5 config reload, there are no routes for mux IPs (192.168.0.2, 192.168.0.4, 192.168.0.6, ... 192.168.0.48):

Aug  9 11:28:17.015511 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.11/32
Aug  9 11:28:17.019899 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.13/32
Aug  9 11:28:17.021329 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.15/32
Aug  9 11:28:17.022447 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.17/32
Aug  9 11:28:17.023535 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.19/32
Aug  9 11:28:17.024576 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.21/32
Aug  9 11:28:17.025628 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.23/32
Aug  9 11:28:17.026665 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.25/32
Aug  9 11:28:17.027745 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.27/32
Aug  9 11:28:17.028906 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.3/32
Aug  9 11:28:17.029990 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.31/32
Aug  9 11:28:17.031294 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.33/32
Aug  9 11:28:17.032432 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.39/32
Aug  9 11:28:17.033535 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.49/32
Aug  9 11:28:17.034632 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.5/32
Aug  9 11:28:17.035707 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.7/32
Aug  9 11:28:17.423890 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.43/32
Aug  9 11:28:18.016580 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.37/32
Aug  9 11:28:20.702798 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.29/32
Aug  9 11:28:20.748575 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.35/32
Aug  9 11:28:20.798468 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.41/32
Aug  9 11:28:20.847867 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.47/32
Aug  9 11:28:27.676586 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.45/32
Aug  9 11:28:27.717420 r-tigon-20 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.9/32

Describe the results you expected:

All needed routes (192.168.0.2, 192.168.0.4, 192.168.0.6, ... 192.168.0.48) are expected to be created after config reload even if all ports were in admin down state.

Output of show version:

SONiC Software Version: SONiC.202211.66-1e1974709_Internal

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@ayurkiv-nvda ayurkiv-nvda changed the title [Dual-ToR] The tunnel route of the standby tor cannot be restored after config reload when the mux ports are admin down [Dual-ToR] The tunnel route of the standby ToR cannot be restored after config reload when the mux ports are admin DOWN Aug 9, 2023
@yxieca
Copy link
Contributor

yxieca commented Aug 14, 2023

@lolyu can you help assess if this is an issue you addressed lately?

@lolyu
Copy link
Contributor

lolyu commented Aug 22, 2023

Hi @ayurkiv-nvda, this is the same as #11924, the issue is due to the fixes are not included in 202211 branch.

@ayurkiv-nvda
Copy link
Contributor Author

ayurkiv-nvda commented Sep 18, 2023

I can confirm that we had a similar issue for Active-Standby on 202205, and it is fixed now.
But for Active-Active, it still reproduces on 202205 (which is really interesting)

@lolyu
Copy link
Contributor

lolyu commented Sep 28, 2023

Thanks @ayurkiv-nvda, I could reproduce this on 202205 branch, both on active-active and active-standby.

If all ports under a Vlan is admin down, the vlan device is not in running state, so the route to the vlan subnet is flushed in ASIC. Thus any downstream traffic could not be trapped to the kernel, and as a result, no more tunnel route will be created.

@ayurkiv-nvda
Copy link
Contributor Author

Hello @lolyu
Do we have some ETA for when this bug will be fixed?

@lolyu
Copy link
Contributor

lolyu commented Dec 14, 2023

When all the member port is down, the vlan route is removed from the ASIC, the downstream traffic could not be trapped to the kernel. So the kernel will never try to learn the neighbor -> no FAILED neighbor -> no tunnel route. Will discuss further with the team if this could be further improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dual ToR Platform ♊ Issues found on dual ToR platforms Issue for 202205 Issue for 202211 Triaged this issue has been triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants