Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Functional] [dhcp6relay] Relay Reply was sent on a different Vlan from where it was initated #13077

Open
vivekrnv opened this issue Dec 15, 2022 · 6 comments
Labels
Help Wanted 🆘 Triaged this issue has been triaged

Comments

@vivekrnv
Copy link
Contributor

vivekrnv commented Dec 15, 2022

Description

Happens when the client is a member of multiple Vlans i.e. a trunk VLAN

Steps to reproduce the issue:

topology : enp4s0f0 (host) <-> Ethernet0 (DUT)

  1. Setup multiple subinterfaces
root@host:~# sudo ip link add link enp4s0f0 name enp4s0f0.690 type vlan id 690
root@host:~# sudo ip link set enp4s0f0.690 up
root@host:~# sudo ip link add link enp4s0f0 name enp4s0f0.691 type vlan id 691
root@host:~# sudo ip link set enp4s0f0.691 up
 
  1. Setup DUT
root@dut:/home/admin# config vlan add 690
root@dut:/home/admin# config vlan add 691
root@dut:/home/admin# config vlan member add 690 Ethernet0
root@dut:/home/admin# config vlan member add 691 Ethernet0
root@dut:/home/admin# sudo config interface ip add Vlan690 6900:1::1/64
root@dut:/home/admin# sudo config interface ip add Vlan691 6910::1/64
  1. Setup dhcp server, add dhcp_relay config and restart dhcp_relay service

  2. dhclient -6 enp66s0f1.690 -v

Describe the results you received:

Reply was received on Vlan691

root@host:~# tcpdump -i enp4s0f0 -e -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp66s0f1, link-type EN10MB (Ethernet), capture size 262144 bytes

12:45:36.548662 0c:42:a1:5a:60:01 > 33:33:00:01:00:02, ethertype 802.1Q (0x8100), length 150: vlan 690, p 0, ethertype IPv6, fe80::e42:a1ff:fe5a:6001.546 > ff02::1:2.547: dhcp6 solicit
12:45:36.550990 1c:34:da:a1:96:80 > 0c:42:a1:5a:60:01, ethertype 802.1Q (0x8100), length 150: vlan 691, p 0, ethertype IPv6, fe80::1e34:daff:fea1:9680.547 > fe80::e42:a1ff:fe5a:6001.546: dhcp6 advertise
12:45:36.551048 0c:42:a1:5a:60:01 > 1c:34:da:a1:96:80, ethertype 802.1Q (0x8100), length 198: vlan 691, p 0, ethertype IPv6, fe80::e42:a1ff:fe5a:6001 > fe80::1e34:daff:fea1:9680: ICMP6, destination unreachable, unreachable port, fe80::e42:a1ff:fe5a:6001 udp port 546, length 140

Describe the results you expected:

Reply should be received on the same vlan

Triage:

Can be traced back to this sonic-net/sonic-dhcp-relay#27 . This change assumes a 1<->1 mapping b/w member interfaes and vlan and there-in lies the problem

Potential Fix:

As described in the PR, the dhcp6relay client socket receives 3 copies of the same packet for every dhcpv6 related packet sent by the client (One for Member iface. one for bridge and one for Vlan). And the code only operates on the one recieved by Member Iface (Understandable since this is required to check if the iface is the standy leg of DualTor) and doesn't care about the other two

The agent maintains a 1<->1 mappig b/w ifaces and Vlan to infer the Vlan which the packet was received on and it causes the problem.

  1. Have to check if it's possible to infer the Van tag from the packet data. (I don't think it's possible since afaik vlan tag get's stripped)
  2. This could be a possible solution:
    Drop the copy on Bridge
    Hash the packet contents when the if_index is either Vlan or member iface. Save the hash <-> (vlan, member) mapping in a buffer and use this info to infer the vlan <-> iface mapping (Only do this for Dual Tor, in normal cases we don't need the Vlan <-> iface mapping, so proceed with relaying the client msg after seeing it on Vlan netdev)
    Instead of Hashing the entire packet, the possibility of using xid can be checked.
   root@host:~# dhclient -6 enp4s0f0.691 -v
      
  root@dut: sudo tcpdump -n -i Ethernet0 'udp dst port 547' -v
  23:49:30.968753 IP6 (flowlabel 0x348c2, hlim 1, next-header UDP (17) payload length: 64) fe80::7efe:90ff:fe12:22ec.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=32b4db (client-ID hwaddr/time type 1 time 724423475 7cfe901222ed) (option-request DNS-server DNS-search-list Client-FQDN SNTP-servers) (elapsed-time 0) (IA_NA IAID:2417107692 T1:3600 T2:5400))
  
  root@dut: sudo tcpdump -n -i Vlan691 'udp dst port 547' -v
  23:49:30.968753 IP6 (flowlabel 0x348c2, hlim 1, next-header UDP (17) payload length: 64) fe80::7efe:90ff:fe12:22ec.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=32b4db (client-ID hwaddr/time type 1 time 724423475 7cfe901222ed) (option-request DNS-server DNS-search-list Client-FQDN SNTP-servers) (elapsed-time 0) (IA_NA IAID:2417107692 T1:3600 T2:5400))
@vivekrnv
Copy link
Contributor Author

@kellyyeh, @jcaiMR, @yxieca PFA

@jcaiMR
Copy link
Contributor

jcaiMR commented Dec 16, 2022

When a single interface add to multiple VLAN groups, in the first VLAN group it will be untagged interface, in the second VLAN and third VLAN group it will be marked as tagged interface. So solution may like:

  1. still use current filter socket logic, only care about Ethernet interface (ignore bridge and Vlan)
  2. interface to vlan mapping table only care about untagged interface to vlan map
  3. Tagged interface we get the vlan infor from vlan header
  4. if packets don't have vlan header and can't find VLAN from interface to vlan mapping table, we drop the packet

@vivekrnv
Copy link
Contributor Author

I don't understand,

When a single interface add to multiple VLAN groups, in the first VLAN group it will be untagged interface, in the second VLAN and third VLAN group it will be marked as tagged interface. So solution may like:

Can't the iface be tagged in all VLAN Groups.

  1. still use current filter socket logic, only care about Ethernet interface (ignore bridge and Vlan)
  2. interface to vlan mapping table only care about untagged interface to vlan map
  3. Tagged interface we get the vlan info from vlan header

Afaik, the kernel strips off the Vlan TAG. if we can get this info from the packet, then this should solve the problem.

  1. if packets don't have vlan header and can't find VLAN from interface to vlan mapping table, we drop the packet

@jcaiMR
Copy link
Contributor

jcaiMR commented Dec 16, 2022

Yes, it can. What I mentioned here is mainly about mixed case, the point is interface vlan map only for untagged interface.

For kernel strips off vlan, when the packet reaches vlan member interface raw socket, I think the vlan header is not removed yet.
But need some testing to confirm that.

@gechiang
Copy link
Collaborator

gechiang commented Jan 4, 2023

originator agreed on the explanation and can close it now...

@gechiang gechiang closed this as completed Jan 4, 2023
@vivekrnv vivekrnv reopened this Feb 2, 2023
@vivekrnv
Copy link
Contributor Author

originator agreed on the explanation and can close it now...

Still an issue, just not a usecase that is used currently. I think it's better to keep this open for documentation.

@prgeor prgeor added Triaged this issue has been triaged Help Wanted 🆘 labels Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Help Wanted 🆘 Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

4 participants