Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPv6 link-local address enhancements #625

Merged
merged 4 commits into from
Jul 30, 2021

Conversation

kirankella
Copy link
Contributor

This document covers the design details for the following enhancements:

  1. Enable learning ipv6 link-local neighbors.
  2. Allow adding IPv6 routes with link-local nexthops.
  3. BGP unnumbered neighbors (using IPv6 LL nexthop) will work with the support for IPv6 LL neighbors.
  4. Config knob to enable/disable IPv6 link-local address in the absence of IPv6 global address.

7. Support to trap control packets destined to IPv6 link-local address to CPU.
8. Support filtering of packets with IPv6 link-local source or destination addresses. These packets must not be routed to other interfaces. This implies utilities like trace route are not applicable for IPv6 link-local addresses. Also, ping to link-local address is only applicable for directly connected networks.
9. Support BGP peering using unnumbered interface configuration. In this configuration, the IPv6 link-local address of the interface is used and the remote peer IPv6 link-local address is dynamically discovered to establish adjacency.
10. IPv6 mode is disabled by default on an interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, this can be changed to enabled as default and use some profile to disable


`sysctl -w net.ipv6.conf.default.disable_ipv6=1`

Since the Linux kernel auto-generates the IPv6 link-local address per interface, netlink events for IPv6 address addition and deletion are handled by the IntfMgr. All netlink messages other than RTM_NEWADDR, RTM_DELADDR are ignored. It also ignores all addresses other than IPv6 link-local addresses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently intfmgror other manager does not listen on events from kernel but handled by corresponding *syncd. Originally there was an intfsyncd but was later removed. We may want to revisit this section on how to handle netlink events for link-local address.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We initially considered using intfsyncd. But that can result in multiple producers for the INTF:PREFIX table (intfmgrd for configured addresses and intfsyncd for learned addresses).
To avoid any race conditions and inconsistent behavior (like for ex., if VRF interface unbind when INTF and INTF:PREFIX tables are deleted by intfmgrd and if there are IPv6 LL address updates from intfsyncd at the same time), we enhanced the intfmgrd to push the netlink address updates into the APP_DB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should re-introduce intfsyncd as before. However this approach of intfmgrd listening on netlink doesn't lgtm. Do you have any other proposal?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prsunny As long as we have an interface table e.g INTF_TABLE: in App-DB, creating a router interface in intfsorch for that interface in HW should be fine..right? why do we need to worry about handling multiple IPv6 LLAs with reference count.


```

# 4 Flow Diagrams
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this section with the flows, especially on LL IP2ME and /10 route installation and deletion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


"INTERFACE": {
"Ethernet24": {
"ipv6_use_link_local_only": "disable"
Copy link
Collaborator

@venkatmahalingam venkatmahalingam Jun 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we throw an error if user tries to configure global Ipv6 address when ipv6_use_link_local_only = "enable" for L3 interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ipv6_use_link_local_only config mode is in effect only in the absence of configured IPv6 global addresses on the interface. In the presence of global addresses, the link-local is enabled implicitly anyways.

Copy link
Collaborator

@venkatmahalingam venkatmahalingam Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we introduce 'ipv6_enable' field, it is implicitly enabled upon first global IPv6 configuration. it is aligned with what we exactly do to kernel in the cfgmgr.

By default Ipv6 is disabled on the interface, after few Ipv6 address configuration, if we need to go to the default state i.e ipv6 disable on interface, do we have any field? ipv6_use_link_local_only - disable should not disable the Ipv6 functionality on the interface or do we need to assume in the cfgmgr that when no IPv6 address present on the interface, disable the IPv6 functionality? IMO, we should have the field name based on exactly what we do in the backend to avoid any confusion.

F - PBR, f - OpenFabric,
> - selected route, * - FIB route # - not installed in hardware

B 192.168.0.0/24 [20/0] via fe80::5054:ff:fe03:6175, Ethernet0, 00:00:06
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since FRR/zebra is already pushing 169.254.0.1 IPv4 link local NH for IPv4 route to kernel, let's use the same for HW programming flow (fpmsyncd) as well, we're already listening neighbor events from kernel, the existing code should work for this case as well, please share the issues you have encountered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unnumbered routes needs to be sent to fpm with special attribute (e.g. onlink) to restrict next-hop resolution to be confined to the link on which route is getting programmed. Zebra doesn't pass any such attribute, and route orch doesn't have such ability to handle that today. Also, same neighbor entry 169.254.0.1 will appear on multiple interfaces, and this is also something not expected and requires special handling in sonic. Note that not supporting rfc 5549 next-hops natively is a limitation in Linux kernel and what zebra does is a hack.

Keeping all of the above into consideration, zebra is patched to send ipv6 next-hop along with the route. This approach is much cleaner and avoids any special neigh/next-hop handling in neigh/route orch.

Copy link
Collaborator

@venkatmahalingam venkatmahalingam Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hasan - Agree that this solution will work, trying to understand the advantage of deviating the current kernel & FPM flow behaviors, I'm not aware of any FRR user other than SONiC already deviating like this by introducing a change in the FRR module, I believe the change in FRR will stay as PATCH and we wont be able to merge with FRR opensource code i.e somone in SONiC has to maintain the SONiC specific FRR changes (Any discussion with FRR community already happened on this change?)

The unnumbered routes needs to be sent to fpm with special attribute (e.g. onlink) to restrict next-hop resolution to be confined to the link on which route is getting programmed.

How's this handled in kernel, I believe we dont expect the NH resolution in kernel either, if FRR restricts it by adding the static Ipv4 link-local neighbor(169.254.0.1), cant we follow the same in FPM flow as well.

Also, same neighbor entry 169.254.0.1 will appear on multiple interfaces, and this is also something not expected and requires special handling in sonic.

This is not different from any IPv6 LLA neighbor i.e same LLA can be present on multiple interfaces, if we use "SAI_NEIGHBOR_ENTRY_ATTR_NO_HOST_ROUTE" already for IPv6 LLA nbr, why cant we use the same for IPv4 LLA neighbor?

Copy link
Contributor

@prsunny prsunny Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this details to the HLD section ? @hasan-brcm, can we address this concern from Venkat as part of this feature?

@SuvenduMozumdar
Copy link

SuvenduMozumdar commented Jun 11, 2020

Hi, I am from Keysight. I was going through the test case coverage section of the design document,
I see that for BGP unnumbered interface, only one test case is there that checks BGP session establishment over auto-configured link-local ipv6 address. But looking at the deployment scenario, in my opinion, it should also check for whether BGP routers are able to advertise IPv4 routes with link-local ipv6 address as the next hop. So I propose following tests to cover the deployment scenario:

  1. Check a BGP router running over auto-configured link-local address can advertise IPv4 routes with IPv6 link local addresses as next-hop.
  2. Check a BGP router running over auto-configured link-local address can receive and install IPv4 routes with IPv6 link local addresses as next-hop.
  3. Check if Link Local NH (Next Hop) is added along with global NH when BGP+ peers share same subnet (RFC2545).

Let me know whats your opinion about including these functional tests in the design document.

@kirankella
Copy link
Contributor Author

Hi, I am from Keysight. I was going through the test case coverage section of the design document,
I see that for BGP unnumbered interface, only one test case is there that checks BGP session establishment over auto-configured link-local ipv6 address. But looking at the deployment scenario, in my opinion, it should also check for whether BGP routers are able to advertise IPv4 routes with link-local ipv6 address as the next hop. So I propose following tests to cover the deployment scenario:

  1. Check a BGP router running over auto-configured link-local address can advertise IPv4 routes with IPv6 link local addresses as next-hop.
  2. Check a BGP router running over auto-configured link-local address can receive and install IPv4 routes with IPv6 link local addresses as next-hop.
  3. Check if Link Local NH (Next Hop) is added along with global NH when BGP+ peers share same subnet (RFC2545).

Let me know whats your opinion about including these functional tests in the design document.

Updated the test cases section with 1 and 2. 3 is more specific to BGP functionality.

@prsunny
Copy link
Contributor

prsunny commented Nov 17, 2020

WG discussion points:

  1. Prefer the default to be IPv6 enabled as it is in the current behavior. Propose to have a device metadata field to specify global IPv6 mode to enable/disable from kernel perspective. BCM to check
  2. RIF creation is based on configuration (Global V6 addr or "use_link_local_only" config)
  3. IP2ME /128 link-local stays as it is currently within the RouteOrch and VrfOrch init.
  4. No processing of netlink events in intfmgrd.

@AkhileshSamineni
Copy link
Contributor

WG discussion points:

  1. Prefer the default to be IPv6 enabled as it is in the current behavior. Propose to have a device metadata field to specify global IPv6 mode to enable/disable from kernel perspective. BCM to check
  2. RIF creation is based on configuration (Global V6 addr or "use_link_local_only" config)
  3. IP2ME /128 link-local stays as it is currently within the RouteOrch and VrfOrch init.
  4. No processing of netlink events in intfmgrd.

@prsunny Updated the HLD as per the new design.

rlhui pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Jul 20, 2021
As per HLD - sonic-net/SONiC#625

FRR Patches:

0009-Link-local-scope-was-not-set-while-binding-socket-for-bgp-ipv6-link-local-neighbors.patch
Files modified : bgpd_network.c and bgpd/bgp_zebra.c
Fix for : Link local scope was not set while binding socket with local address causing socket errors for bgp ipv6 link local neighbors.

0010-VRF-interface-lookup-was-still-done-in-the-default-vrf.patch
Files modified : staticd/static_zebra.c
Fix for : VRF interface lookup was still done in the default-vrf which was causing the interface lookup to fail. Due to this static-route pointing to link-local was not getting installed.

0011-Changes-to-send-ipv6-link-local-address-as-nexthop-to-fpmsyncd.patch
Files modified : zebra/zebra_fpm_netlink.c
Fix for : Made changes to send ipv6 address as nexthop to fpmsyncd.

Depends on:
sonic-net/sonic-utilities#1159
sonic-net/sonic-swss#1463

Signed-off-by: Akhilesh Samineni [email protected]
@prsunny prsunny merged commit c83677c into sonic-net:master Jul 30, 2021
judyjoseph pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Aug 7, 2021
As per HLD - sonic-net/SONiC#625

FRR Patches:

0009-Link-local-scope-was-not-set-while-binding-socket-for-bgp-ipv6-link-local-neighbors.patch
Files modified : bgpd_network.c and bgpd/bgp_zebra.c
Fix for : Link local scope was not set while binding socket with local address causing socket errors for bgp ipv6 link local neighbors.

0010-VRF-interface-lookup-was-still-done-in-the-default-vrf.patch
Files modified : staticd/static_zebra.c
Fix for : VRF interface lookup was still done in the default-vrf which was causing the interface lookup to fail. Due to this static-route pointing to link-local was not getting installed.

0011-Changes-to-send-ipv6-link-local-address-as-nexthop-to-fpmsyncd.patch
Files modified : zebra/zebra_fpm_netlink.c
Fix for : Made changes to send ipv6 address as nexthop to fpmsyncd.

Depends on:
sonic-net/sonic-utilities#1159
sonic-net/sonic-swss#1463

Signed-off-by: Akhilesh Samineni [email protected]
carl-nokia pushed a commit to carl-nokia/sonic-buildimage that referenced this pull request Aug 7, 2021
As per HLD - sonic-net/SONiC#625

FRR Patches:

0009-Link-local-scope-was-not-set-while-binding-socket-for-bgp-ipv6-link-local-neighbors.patch
Files modified : bgpd_network.c and bgpd/bgp_zebra.c
Fix for : Link local scope was not set while binding socket with local address causing socket errors for bgp ipv6 link local neighbors.

0010-VRF-interface-lookup-was-still-done-in-the-default-vrf.patch
Files modified : staticd/static_zebra.c
Fix for : VRF interface lookup was still done in the default-vrf which was causing the interface lookup to fail. Due to this static-route pointing to link-local was not getting installed.

0011-Changes-to-send-ipv6-link-local-address-as-nexthop-to-fpmsyncd.patch
Files modified : zebra/zebra_fpm_netlink.c
Fix for : Made changes to send ipv6 address as nexthop to fpmsyncd.

Depends on:
sonic-net/sonic-utilities#1159
sonic-net/sonic-swss#1463

Signed-off-by: Akhilesh Samineni [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants