Skip to content

Latest commit

 

History

History
689 lines (499 loc) · 32.4 KB

hld_fpmsyncd-NTT.md

File metadata and controls

689 lines (499 loc) · 32.4 KB

fpmsyncd NextHop Group Enhancement High Level Design Document

Table of Content

Revision

Rev Date Author Change Description
0.1 Jul 14, 2023 Kanji Nakano, Kentaro Ebisawa, Hitoshi Irino (NTT) Initial version
0.2 Jul 30, 2023 Kentaro Ebisawa (NTT) Remove description about VRF which is not nessesary for NHG. Add High Level Architecture diagram. Add note related to libnl, Routing WG. Fix typo and improve explanations.
0.3 Sep 18, 2023 Kentaro Ebisawa (NTT) Update based on discussion at Routing WG on Sep 14th (Scope, Warmboot/Fastboot, CONFIG_DB)
0.4 Sep 24, 2023 Kentaro Ebisawa (NTT) Add feature enable/disable design and CLI. Update test plan.
0.5 Nov 10, 2023 Kanji Nakano (NTT) Update feature enable/disable design and CLI. Update test plan.

Scope

This document details the design and implementation of the "fpmsyncd extension" related to NextHop Group behavior in SONiC. The goal of this "fpmsyncd extension" is to integrate NextHop Group (NHG) functionality into SONiC by writing NextHop Group entry from fpmsyncd to APPL_DB for NextHop Group operation in SONiC.

  • Scope of this change is to extend fpmsyncd to handle RTM_NEWNEXTHOP and RTM_DELNEXTHOP messages from FPM.
  • There will be no change to SWSS/Orchagent.
  • This change is backward compatible. Upgrade from a SONiC version that does not support this feature does not change the user's expected behavior as this feature is disabled by default.

Overview

SONIC system has support for programming routes using the NextHop Group feature through the NextHop Group table in APPL_DB database. The idea is to have a more efficient system that would involve managing the NextHop Group in use by the route table separately, and simply have the route table specify a reference to which NextHop Group to use. Since at scale many routes will use the same NextHop Groups, this requires much smaller occupancy per route, and so more efficient building, transmission and parsing of per-route information.

The current version of fpmsyncd has no support to handle the NextHop Group netlink messages sent by zebra process via FPM using the dplane_fpm_nl module. This implementation modifies the fpmsyncd code to handle RTM_NEWNEXTHOP and RTM_DELNEXTHOP events and write it to the database. Also, the fpmsyncd was modified to use the NextHop Group ID (nexthop_group) when programming the route to the ROUTE_TABLE if RTA_NH_ID was included in the RTM_NEWROUTE message from zebra via FPM.

NHG ID and members are managed by FRR. fpmsyncd will use NHG ID provided in FPM message from FRR(zebra). Thus, logic of either if updating NHG members or create NHG with new ID during topology change is managed by FRR.

Use case example of this feature would be BGP PIC, and recursive routes handling. BGP PIC has started in design discussion in the SONiC Routing WG. Recursive routes support would be discussed after. See 09072023 Routing WG Meeting minutes for further information about BGP PIC discussion.

Requirements

Fpmsyncd extension requires:

  • fpmsyncd to handle RTM_NEWNEXTHOP and RTM_DELNEXTHOP events from zebra via dplane_fpm_nl
  • fpmsyncd to SET/DEL routes to APPL_DB: ROUTE_TABLE using nexthop_group
  • fpmsyncd to SET/DEL NextHop Group entry to APPL_DB: NEXTHOP_GROUP_TABLE

This feature must be disabled by default.

  • When this feature is disabled, behavior will be the same as before introducing this feature.
    • i.e. NEXTHOP_GROUP_TABLE entry will not be created and nexthop_group will not be used in ROUTE_TABLE entry in APPL_DB.
  • See section Configuration and management for details on how this feature is disabled/enabled.

Architecture Design

This design modifies fpmsyncd to use the new APPL_DB tables.

The current fpmsyncd handle just the RTM_NEWROUTE and RTM_DELROUTE writing all route information for each route prefix to ROUTE_TABLE on Redis DB (redis-server). When zebra process is initialized using the old fpm module, the RTM_NEWROUTE is sent with at least destination address, gateway, and interface id attributes. For multipath route, the RTM_NEWROUTE is sent with a list of gateway and interface id.

This Fpmsyncd extension will modify fpmsyncd to handle RTM_NEWNEXTHOP and RTM_DELNEXTHOP as below.

Figure: Fpmsyncd NHG High Level Architecture

fig: fpmsyncd nhg architecture

  • FRR configuration
    • (1) config zebra to use dplane_fpm_nl instead of fpm module (this is default since 202305 release)
    • (2) set fpm use-nexthop-groups option (this is disabled by default and enabled via CONFIG_DB)
  • fpmsyncd enhancement
    • (3) Handle RTM_NEWNEXTHOP fpm message from zebra
    • (4) and create NEXTHOP_GROUP_TABLE entry

High-Level Design

Current fpmsyncd processing flow (for reference)

For example, if one configure following routes:

B>*10.1.1.4/32 [20/0] via 10.0.0.1, Ethernet0, 00:00:08
  *                   via 10.0.0.3, Ethernet4, 00:00:08

it will generate the following APPL_DB entries:

admin@sonic:~$ sonic-db-cli APPL_DB hgetall "ROUTE_TABLE:10.1.1.4/32"
{'nexthop': '10.0.0.1,10.0.0.3', 'ifname': 'Ethernet0,Ethernet4', 'weight': '1,1'}

The flow below shows how zebra, fpmsyncd and redis-server interacts when using fpm plugin without NextHop Group:

Figure: Flow diagram without NextHop Group

fig1

Proposed fpmsyncd processing flow using NextHop Group

To support the nexthop group, fpmsyncd was modified to handle the new events RTM_NEWNEXTHOP and RTM_DELNEXTHOP. fpmsyncd now has a new logic to associate routes to NextHop Groups.

The flow for the new NextHop Group feature is shown below:

Figure: Flow diagram new nexthop group feature

fig2

Value SET/DEL to APPL_DB

After enabling use-next-hop-groups in dplane_fpm_nl plugin, zebra will send RTM_NEWNEXTHOP to fpmsyncd when a new route is added.

RTM_NEWNEXTHOP is sent with 2 different attribute groups as shown in the table below:

EventAttributesDescription
RTM_NEWNEXTHOPNHA_IDNextHop Group ID
NHA_GATEWAYgateway address
NHA_OIFThe interface ID
RTM_NEWNEXTHOPNHA_IDNextHop Group ID
NHA_GROUPA list of nexthop groups IDs with its respective weights.

After sending the RTM_NEWNEXTHOP events, zebra sends the RTM_NEWROUTE to fpmsyncd with NextHop Group ID as shown in the table below:

EventAttributesDescription
RTM_NEWROUTERTA_DSTroute prefix address
RTA_NH_IDNextHop Group ID

Example of entries in APPL_DB

For example. following route configuration will generate events show in the table below:

admin@sonic:~$ show ip route
B>*10.1.1.4/32 [20/0] via 10.0.0.1, Ethernet0, 00:00:08
  *                   via 10.0.0.3, Ethernet4, 00:00:08
admin@sonic:~$ sonic-db-cli APPL_DB keys \* | grep NEXT
NEXTHOP_GROUP_TABLE:ID127

admin@sonic:~$ sonic-db-cli APPL_DB HGETALL NEXTHOP_GROUP_TABLE:ID127
{'nexthop': '10.0.0.1,10.0.0.3', 'ifname': 'Ethernet0,Ethernet4', 'weight': '1,1'}

admin@sonic:~$ sonic-db-cli APPL_DB keys \* | grep ROUTE
ROUTE_TABLE:10.1.1.4

admin@sonic:~$ sonic-db-cli APPL_DB HGETALL ROUTE_TABLE:10.1.1.4
{'nexthop_group': 'ID127', 'protocol': 'bgp'}
SeqEventAttributesValue
1RTM_NEWNEXTHOPNHA_IDID125
NHA_GATEWAY10.0.0.1
NHA_OIF22
2RTM_NEWNEXTHOPNHA_IDID126
NHA_GATEWAY10.0.0.3
NHA_OIF23
3RTM_NEWNEXTHOPNHA_IDID127
NHA_GROUP[{125,1},{126,1}]
4RTM_NEWROUTERTA_DST10.1.1.4
RTA_NH_IDID127

A short description of fpmsyncd logic flow:

  • When receiving RTM_NEWNEXTHOP events on sequence 1, 2 and 3, fpmsyncd will save the information in an internal list to be used when necessary.
  • When fpmsyncd receive the RTM_NEWROUTE on sequence 4, the process will write the NextHop Group with ID 118 to the NEXTHOP_GROUP_TABLE using the information of gateway and interface from the NextHop Group events with IDs 116 and 117.
  • Then fpmsyncd will create a new route entry to ROUTE_TABLE with a nexthop_group field with value ID118.
  • When fpmsyncd receives the last RTM_NEWROUTE on sequence 5, the process will create a new route entry (but no NextHop Group entry) in ROUTE_TABLE with nexthop_group field with value ID118. (Note: This NextHop Group entry was created when the fpmsyncd received the event sequence 4.)

Example of entries in ASIC_DB

The ASIC_DB entry is not changed by this enhancement. Therefore, even after this enhancement, table entries will be created for ROUTE_ENTRY, NEXT_HOP_GROUP, NEXT_HOP_GROUP_MEMBER, and NEXT_HOP respectively, as shown in the example below

Figure: Example of ASIC_DB entry

fig3

SAI API

No changes are being made in SAI. The end result of what gets programmed via SAI will be the same as current implementation when manually adding NEXTHOP_GROUP_TABLE entries to APPL_DB.

Configuration and management

This NextHop Group feature is enabled/disabled by config option of zebra (BGP container): [no] fpm use-next-hop-groups

  • To disable this feature (default): configure no fpm use-next-hop-groups
  • To enable this feature: configure fpm use-next-hop-groups

On FRR, one can configure this zebra option via vtysh (zebra CLI) or zebra.conf (zebra startup config).

In SONiC, we will use CONFIG_DB data to enable/disable this option to be consistent with other SONiC features. We will also use config_db.json to preserve config among system reboot.

Users (SONiC admin) are expected to use only SONiC CLI or edit config_db.json file to enable/disable this feature, and should not edit zebra.conf directly.

This configuration is backward compatible. Upgrade from a SONiC version that does not support this feature does not change the user's expected behavior as this flag is set to be disabled by default. (i.e. It's disabled if FEATURE|nexthop_group entry does not exist in CONFIG_DB)

This setting can NOT be enabled or disabled at runtime. System reboot is required after enabling/disabling this feature to make sure route entry using and not using this NHG feature would not co-exisit in the APPL_DB.

Configuration data flow

Diagram shows how zebra.conf is genereated from CONFIG_DB data.

Figure: Configuration data flow

fig4

  • CONFIG_DB entry is created via CLI or data stored in config_db.json file
  • sonic-cfggen will generate zebra.conf based on template file named zebra.conf.j2
  • FRR will use zebra.conf during startup to apply config stored in the file

This flow is existing framework and not specific to this feature.

Modification made for this feature is in zebra.conf.j2 to generate config with [no] fpm use-next-hop-groups based on DEVICE_METADATA|localhost entry in CONFIG_DB.

As shown in below diff code, the template will generate config following below logic.

  • If DEVICE_METADATA|localhost is present in CONFIG_DB but there is no "nexthop_group" attribute => disabled
  • If DEVICE_METADATA|localhost is present in CONFIG_DB and "nexthop_group" attribute is "enabled" => enabled
> zebra.conf.j2

 {% endblock banner %}
 !
 {% block fpm %}
+{% if ( ('localhost' in DEVICE_METADATA) and ('nexthop_group' in  DEVICE_METADATA['localhost']) and
+        (DEVICE_METADATA['localhost']['nexthop_group'] == 'enabled') ) %}
+fpm use-next-hop-groups
+{% else %}
 ! Uses the old known FPM behavior of including next hop information in the route (e.g. RTM_NEWROUTE) messages
 no fpm use-next-hop-groups
+{% endif %}
 !
 fpm address 127.0.0.1
 {% endblock fpm %}

CLI/YANG model Enhancements

The output of 'show ip route' and 'show ipv6 route' will remain unchanged - the CLI code will resolve the NextHop Group ID referenced in the ROUTE_TABLE to display the next hops for the routes.

To enable/disable this feature, two new CLI (Klish) would be introduced.

  • Enable: feature next-hop-group enable
  • Disable: no feature next-hop-group

CONFIG_DB entry will be created (enable) or removed (disable) by entering above CLI command.

This setting is read at boot time during FRR startup so it requires a reboot once it’s changed and saved to startup configuration. So after config is changed by CLI (KLISH via RESTCONF), user must run sudo config save -y in order for the configuration to be saved in config_db.json and take effect after system restart.

Below is example when using this CLI command to enable/disable the feature.

Enable

admin@sonic:~$ redis-cli -n 4 hget "DEVICE_METADATA|localhost" nexthop_group
(nil)

admin@sonic:~$ sonic-cli

sonic# configure terminal

sonic(config)#
  end        Exit to EXEC mode
  exit       Exit from current mode
  feature    Configure additional feature
  interface  Select an interface
  ip         Global IP configuration subcommands
  mclag      domain
  no         To delete / disable commands in config mode

sonic(config)# feature
  next-hop-group  Next-hop Groups feature

sonic(config)# feature next-hop-group
  enable  Enable Next-hop Groups feature

sonic(config)# feature next-hop-group enable

admin@sonic:~$ redis-cli -n 4 hget "DEVICE_METADATA|localhost" nexthop_group
"enabled"

Disable

sonic(config)# no
  feature  Disable additional feature
  ip       Global IP configuration subcommands
  mclag    domain

sonic(config)# no feature
  next-hop-group  Disable Next-hop Groups feature

sonic(config)# no feature next-hop-group

admin@sonic:~$ redis-cli -n 4 hget "DEVICE_METADATA|localhost" nexthop_group
(nil)

Implementation:

  • New CLI actioner sonic-cli-feature.py will be added for this CLI command.
  • The CLI command will be defined in a new cli-xml file: /CLI/clitree/cli-xml/sonic-feature.xml

When actioner sonic-cli-feature.py is called from the Klish framework, it will call RESTCONF to create / remove the CONFIG_DB entry.

  • enable: $SONIC_CLI_ROOT/sonic-cli-feature.py configure_sonic_nexthop_groups 1
  • disable: $SONIC_CLI_ROOT/sonic-cli-feature.py configure_sonic_nexthop_groups 0
  • RESTCONF URI called from sonic-cli-feature.py: /restconf/data/sonic-feature:sonic-feature

The model is not newly introduced but using pre-existing sonic-device_metadata.yang model present in the source code at https://github.com/sonic-net/sonicbuildimage/blob/master/src/sonic-yang-models/yang-models/sonic-device_metadata.yang

module: sonic-device_metadata
  +--rw sonic-device_metadata
     +--rw DEVICE_METADATA
        +--rw localhost
           +--rw hwsku?                           stypes:hwsku
           +--rw default_bgp_status?              enumeration
           +--rw docker_routing_config_mode?      string
           +--rw hostname?                        stypes:hostname
           +--rw platform?                        string
           +--rw mac?                             yang:mac-address
           +--rw default_pfcwd_status?            enumeration
           +--rw bgp_asn?                         inet:as-number
           +--rw deployment_id?                   uint32
           +--rw type?                            string
           +--rw buffer_model?                    string
           +--rw frr_mgmt_framework_config?       boolean
           +--rw synchronous_mode?                enumeration
           +--rw yang_config_validation?          stypes:mode-status
           +--rw cloudtype?                       string
           +--rw region?                          string
           +--rw sub_role?                        string
           +--rw downstream_subrole?              string
           +--rw resource_type?                   string
           +--rw cluster?                         string
           +--rw subtype?                         string
           +--rw peer_switch?                     stypes:hostname
           +--rw storage_device?                  boolean
           +--rw asic_name?                       string
           +--rw switch_id?                       uint16
           +--rw switch_type?                     string
           +--rw max_cores?                       uint8
           +--rw dhcp_server?                     stypes:admin_mode
           +--rw bgp_adv_lo_prefix_as_128?        boolean
           +--rw suppress-fib-pending?            enumeration
           +--rw rack_mgmt_map?                   string
           +--rw timezone?                        stypes:timezone-name-type
           +--rw create_only_config_db_buffers?   boolean
           +--rw nexthop_group?                   enumeration

Config DB Enhancements

This feature should be disabled/enabled using the existing CONFIG_DB DEVICE_METADATA Table. The key name will be DEVICE_METADATA|localhost with nexthop_group attribute.

Configuration schema in ABNF format:

; DEVICE_METADATA table
key = DEVICE_METADATA|localhost`        ; DEVICE_METADATA configuration table
nexthop_group = "enabled" or "disabled" ; Globally enable/disable next-hop group feature,
                                        ; by default this flag is disabled

Sample of CONFIG DB snippet given below:

    "DEVICE_METADATA": {
        "localhost": {
            "bgp_asn": "65100",
            "buffer_model": "traditional",
            "default_bgp_status": "up",
            "default_pfcwd_status": "disable",
            "hostname": "sonic",
            "hwsku": "Force10-S6000",
            "mac": "50:00:00:0f:00:00",
            "nexthop_group": "enabled",
            "platform": "x86_64-kvm_x86_64-r0",
            "timezone": "UTC",
            "type": "LeafRouter"
        }
    },

Warmboot and Fastboot Design Impact

  • When the feature is disabled, there should be no impact to Warmboot and Fastboot.
  • When the feature is enabled, there will be no warmboot nor fastboot support.

When the feature is enabled, NHG ID will be managed by FRR which will change after FRR related process or BGP container restart. We need a way to either let FRR preserve the ID or a way to correlate the NHGs, IDs and it's members before and after the restart.

We will continue discussion on how we could support Warmboot/Fastboot for future enhancements.

Testing Requirements/Design

One can use redis-cli command to check entries in CONFIG_DB.

admin@sonic:/etc/sonic$ redis-cli -n 4 hget "DEVICE_METADATA|localhost" nexthop_group
    "enabled"

Unit Test cases

Config test cases (feature enable/disable)

Confirm the feature is disabled by default.

  1. Boot SONiC with default config (clean install)
  2. Check there is no DEVICE_METADATA|localhost entry in CONFIG_DB
  3. Log into BGP container. Check /etc/sonic/frr/zebra.conf has config no fpm use-next-hop-groups

CONFIG_DB entry add/del via Klish CLI

  1. From CLI, enter feature next-hop-group enable
  2. Confirm DEVICE_METADATA|localhost entry with attr nexthop_group=enabled is created CONFIG_DB
  3. From CLI, enter no feature next-hop-group
  4. Confirm DEVICE_METADATA|localhost entry does not exist in CONFIG_DB

zebra.conf option based on CONFIG_DB entry (disable)

  1. Confirm DEVICE_METADATA|localhost entry does not exist in CONFIG_DB
  2. Reboot system
  3. Confirm /etc/sonic/frr/zebra.conf has config no fpm use-next-hop-groups

zebra.conf option based on CONFIG_DB entry (enable)

  1. Confirm DEVICE_METADATA|localhost entry with attr nexthop_group=enabled exist in CONFIG_DB
  2. Reboot system
  3. Confirm /etc/sonic/frr/zebra.conf has config fpm use-next-hop-groups

System Test cases

Multiple NextHops

In case of multiple nexthops, Nexthop group will create it. For multiple nexthops, ensure next hop group is created.

Add route

  1. Create static route or bgp with 2 or more ECMP routes (which cause zebra to send RTM_NEWNEXTHOP)
  2. Confirm APPL_DB entries are created as expected

Sample of APPL_DB output result when Add route.

admin@sonic:~$ show ip route
B>*10.1.1.2/32 [20/0] via 10.0.0.1, Ethernet0, 00:00:14
  *                   via 10.0.0.3, Ethernet4, 00:00:14

admin@sonic:~$ sonic-db-cli APPL_DB keys \* | grep NEXT
NEXTHOP_GROUP_TABLE:ID94

admin@sonic:~$ sonic-db-cli APPL_DB HGETALL NEXTHOP_GROUP_TABLE:ID94
{'nexthop': '10.0.0.1,10.0.0.3', 'ifname': 'Ethernet0,Ethernet4', 'weight': '1,1'}

admin@sonic:~$ sonic-db-cli APPL_DB keys \* | grep ROUTE
ROUTE_TABLE:10.1.1.2

admin@sonic:~$ sonic-db-cli APPL_DB HGETALL ROUTE_TABLE:10.1.1.2
{'nexthop_group': 'ID94', 'protocol': 'bgp'}

Del route

  1. Delete nexthop(s) except one nexthop.
  2. Confirm APPL_DB entries are deleted as expected

Sample of APPL_DB output result when Del route.

admin@sonic:~$ show ip route
B>*10.1.1.2/32 [20/0] via 10.0.0.3, Ethernet4, 00:00:14

admin@sonic:~$ sonic-db-cli APPL_DB keys \* | grep NEXT

admin@sonic:~$ sonic-db-cli APPL_DB keys \* | grep ROUTE
ROUTE_TABLE:10.1.1.2

admin@sonic:~$ sonic-db-cli APPL_DB HGETALL ROUTE_TABLE:10.1.1.2
{'nexthop': '10.0.0.3', 'ifname': 'Ethernet4', 'protocol': 'bgp'}
Single NextHops

For Single NextHop, ensure next hop group is not created.

Add route

  1. Create static route or bgp with 1 routes
  2. Confirm APPL_DB entries are created as expected

Sample of APPL_DB output result when Add route.

admin@sonic:~$ show ip route
B>*10.1.1.3/32 [20/0] via 10.0.0.1, Ethernet0, 00:00:34
B>*10.1.1.4/32 [20/0] via 10.0.0.3, Ethernet4, 00:00:34

admin@sonic:~$ sonic-db-cli APPL_DB keys \* | grep NEXT

admin@sonic:~$ sonic-db-cli APPL_DB keys \* | grep ROUTE
ROUTE_TABLE:10.1.1.3
ROUTE_TABLE:10.1.1.4

admin@sonic:~$ sonic-db-cli APPL_DB HGETALL ROUTE_TABLE:10.1.1.3
{'nexthop': '10.0.0.1', 'ifname': 'Ethernet0', 'protocol': 'bgp'}

admin@sonic:~$ sonic-db-cli APPL_DB HGETALL ROUTE_TABLE:10.1.1.4
{'nexthop': '10.0.0.3', 'ifname': 'Ethernet4', 'protocol': 'bgp'}

Del route

  1. Delete static or bgp route created in previous test
  2. Confirm APPL_DB entries are deleted as expected

Sample of APPL_DB output result when Del route.

admin@sonic:~$ show ip route
B>*10.1.1.4/32 [20/0] via 10.0.0.3, Ethernet4, 00:09:30

admin@sonic:~$ sonic-db-cli APPL_DB keys \* | grep NEXT

admin@sonic:~$ sonic-db-cli APPL_DB keys \* | grep ROUTE
ROUTE_TABLE:10.1.1.4

admin@sonic:~$ sonic-db-cli APPL_DB HGETALL ROUTE_TABLE:10.1.1.4
{'nexthop': '10.0.0.3', 'ifname': 'Ethernet4', 'protocol': 'bgp'}

Open/Action items - if any

libnl compatibility with upstream

To add this feature, we have extended libnl to support NextHop Group. (i.e. nh_id, RTM_NEWNEXTHOP etc.)

However, there is a proposal libnl: PR#332 to support NextHop Group in upstream libnl. We should review this PR (and any other related patches if found) so difference from the upstream code would be minimal.

Further performance improvements

Extention to fpmsyncd described in this HLD will only change how fpmsyncd will handle RTM_NEWNEXTHOP and RTM_DELNEXTHOP.

Further study is required for more fundamental improvements, e.g. how zebra handles NextHop Groups in scale, communication channel between zebra and fpmsyncd, improvements in FRR like BGP PIC support etc.

Refer to the meeting minutes SONiC Routing Working Group for discussions related to future improvements. For the discussion specific to this HLD, check 07132023 Meeting Minutes

Backward compatibility with current NHG creation logic (Fine-grain NHG, Ordered NHG/ECMP)

This feature is disabled by default and thus backward compatible that it would not impact the current NHG creation logic in SWSS/Orchagent.

When enabled, NHG ID and member management will be handled by FRR, and the current NHG creation logic in SWSS/Orchagent will not be used. i.e. behavior will be same as the current behavior of manually adding entry to APPL_DB: NEXTHOP_GROUP_TABLE.

nexthop_compat_mode Kernel option

In regards to NextHop Group, Linux Kernel runs in compatibility mode which sends netlink message using both old route format without RTA_NH_ID and new format using RTA_NH_ID.

There is a sysctl option net.ipv4.nexthop_compat_mode nexthop_compat_mode which is on by default but provides the ability to turn off compatibility mode allowing systems to only send route update with the new format which could potentially improve performance.

This option is not changed as part this HLD to avoid unexpected impact to the existing behavior.

One should carefully study the impact of this change before chainging this option.

Warmboot/Fastboot support

Currently this feature does not work with Warmboot/Fastboot. We will continue discussion on how we could support Warmboot/Fastboot for future enhancements.

No support for setting config enable/disable on runtime

This feature can NOT be enabled or disabled at runtime. Reboot is required after enabling/disabling this feature to make sure route entry using and not using this NHG feature would not co-exisit in the APPL_DB.

Source of APPL_DB entry related to NHG

Expectation today is there is only one source, FRR or some other routing container, to modify NHG related entries in APPL_DB.

If there is any use case to use more than one source, then design of APPL_DB schema and related logic need to be studied. For example, we might need additional attr/entity to distinguish the source of the NHG/NH entry.

Not that this not specific to NHG feature but typical limitation when more than one entities are modifying same APPL_DB entry.