Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fpmsyncd Next Hop Table Enhancement HLD #1425

Merged
merged 8 commits into from
Nov 16, 2023

Conversation

ntt-omw
Copy link

@ntt-omw ntt-omw commented Jul 13, 2023

Fpmsyncd Next Hop Table Enhancement HLD


repo PR title status
sonic-buildimage sonic-net/sonic-buildimage#16762 PR_Merged
sonic-swss sonic-net/sonic-swss#2919 PR_Open
sonic-mgmt-common sonic-net/sonic-mgmt-common#107 PR_Open
sonic-mgmt-framework sonic-net/sonic-mgmt-framework#122 PR_Open

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jul 13, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

-->

There should be no impact to Warmboot and Fastboot Design Impact.
The change is how fpmsyncd handle message from zebra.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the nexthop group id stable? this could have impact to warm reboot, can you confirm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comment. I will check and update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion during the last Routing WG meeting, I have added description that Warmboot and Fastboot is NOT supported when this feature is enabled. (It works same as today if disabled which is default config)

We can disable `net.ipv4.nexthop_compat_mode` (set to 0) for performance optimization.

For backward comptibility, we can keep `fpm use-next-hop-groups` disabled by default in FRR configuration.
This way, zebra would not send `RTM_NEWNEXTHOP` and `RTM_DELNEXTHOP` to fpmsyncd and entry created in `APPL_DB` should be identical to when this NHG feature did not exist.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need sonic config db to enable and disable this feature, and this new feature should be disable by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. Yes, we intend to make this disabled by default.
We are also studying how to make this configurable using config db.
I will update the HLD and make it clear in the doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion during the last Routing WG meeting, I have added requirement to disable/enable this via CONFIG_DB in the Configuration and management section.
Will update more details like data models after further investigating how this should be implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated Configuration and management section with description of how this feature will be disabled(default) and enabled using CONFIG_DB and config_db.json. One can directly edit config_db.json file or use Klish CLI to enable/disable this feature. Also, corresponding test case are added.

-->

This design modifies `fpmsyncd` to use the new `APPL_DB` tables.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed during the review, it is more about move of logic of NHG creation from SWSS (Route/NHG OA) to Zebra. What is the main motivation for this move ? How such features as Dynamically Ordered ECMP (used by MSFT) with per NHGm's Sequence number work with this move ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the main motivation for this move ?

Motivation is to reduce NHG/ROUTE translation in zebra.
Also, supporting NHG in fpmsyncd as a first step to improve update performance (reducing DB change) by zebra to update a NHG shared among routes instead of updating all ECMP routes in ROUTE entry. (e.g., when NHG member was reduced from n to n-1)

How such features as Dynamically Ordered ECMP (used by MSFT) with per NHGm's Sequence number work with this move ?

My understanding is this would not impact people using SWSS to create NHG if the feature is disabled (by default) since fpm will not send RTM_NEWNEXTHOP and thus fpmsyncd will set entry in CONFIG_DB using ROUTE entry as today.
Do you expect anyone to use Dynamically Ordered ECMP / per NHGm's Sequence number while enabling NHG in fpmsyncd in near future? (While I agree that it would be nice to have both available in long term but could be a different PR in future)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case would be BGP PIC, and recursive routes handling. BGP PIC has started in design discussion in routing WG. Recursive routes support would be discussed after.

https://lists.sonicfoundation.dev/g/sonic-wg-routing/wiki/34786

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated "Scope" for clarification and added PIC example in "Overview" section.

- When `fpmsyncd` receive the `RTM_NEWROUTE` on sequence 4, the process will write the NextHop Group with ID 118 to the `NEXTHOP_GROUP_TABLE` using the information of gateway and interface from the NextHop Group events with IDs 116 and 117.
- Then `fpmsyncd` will create a new route entry to `ROUTE_TABLE` with a `nexthop_group` field with value `ID118`.
- When `fpmsyncd` receives the last `RTM_NEWROUTE` on sequence 5, the process will create a new route entry (but no NextHop Group entry) in `ROUTE_TABLE` with `nexthop_group` field with value `ID118`. (Note: This NextHop Group entry was created when the `fpmsyncd` received the event sequence 4.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Zebra/FRR re-create the NHG (delete old one + create new one) and update all routes with new ID on change of NHG (e.g. due to failed link to NHGm) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if there are one or more route referring to the NHG with members before the change.
(In this case we will have an old NHG with n members referenced from some routes and a newly created NHG n-x members referenced from other routes)

I think it's No if there are no routes referencing to the NHG with members before the change.
But I'm not 100% confident so let me check again and update the result here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would depend on NHG type (BGP vs IGP), the type of triggered events and how the trigger events get handled in zebra.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added explanation that NHG ID and members are managed by FRR in "Overview" section for clarification.


- FRR configuration
- (1) config zebra to use `dplane_fpm_nl` instead of `fpm` module (this is default since 202305 release)
- (2) set `fpm use-nexthop-groups` option (this is disabled by default and enabled via `CONFIG_DB`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it assume the use of Unified Management Framework ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this SONiC_Design_Doc_Unified_FRR_Mgmt_Interface.md the Unified Management Framework you have asked?
If yes, we are using sonic-cfggen and zebra.conf.j2 template to generate zebra config file based on CONFIG_DB entry.

@nakano-omw
Copy link
Contributor

I have PR the source code


### Configuration and management

This NextHop Group feature is enabled/disabled by config option of zebra (BGP container): `[no] fpm use-next-hop-groups`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. @nakano-omw is working on moving enabled/disabled config under DEVICE_METADATA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated both HLD and Source Code to config use DEVICE_METADATA.

@eddieruan-alibaba
Copy link
Contributor

@ebiken-ntt do you plan to add NHG update handling when local link is down? i.e. take some benefits from using NHG. What's your plan on that?

@ebiken-ntt
Copy link
Contributor

@ebiken-ntt do you plan to add NHG update handling when local link is down? i.e. take some benefits from using NHG. What's your plan on that?

No, at least not for the 202311. We expect FRR to notify any NHG update via dplane_fpm_nl including ones due to local link down. But we can discuss for future release if there are any idea.

@nakano-omw
Copy link
Contributor

Hi @zhangyanzhao @eddieruan-alibaba
All the comments of this PR has been addressed. It has been reviewed in Routing WG. Could we merge the PR ?

@eddieruan-alibaba
Copy link
Contributor

eddieruan-alibaba commented Nov 16, 2023

I have approved this HLD. You could merge it. We will continue this path further to PIC and recursive routes convergence support.

@nakano-omw
Copy link
Contributor

@eddieruan-alibaba I can't merge because I don't have enough permissions. Would you be able to grant me the permission?

@nakano-omw
Copy link
Contributor

@eddieruan-alibaba can you run merge ?

@eddieruan-alibaba eddieruan-alibaba merged commit 1d57663 into sonic-net:master Nov 16, 2023
@dgsudharsan
Copy link
Contributor

dgsudharsan commented May 21, 2024

@eddieruan-alibaba @lguohan @StormLiangMS @ntt-omw Due to recent performance issues found in fib suppression feature we plan to disable the feature in 202405. The dplane_fpm_nl is tied to fib suppression feature and would not be available if fib suppression feature is removed.
As this feature is dependent on dplane_fpm_nl module, unless we have fib suppression in 202405 this feature will not work.

@nakano-omw
Copy link
Contributor

@dgsudharsan Could you please elaborate on the performance issue? Is it a FRR issue?

@dgsudharsan
Copy link
Contributor

@dgsudharsan Could you please elaborate on the performance issue? Is it a FRR issue?

Yes. Its an FRR issue. If there are large number of routes and if link flap happens resulting in multiple withdraw and advertisement of the routes, zebra's processing queue will be built up resulting in zebra consuming memory in GBs. This memory will not get freed to the OS (due to low level libc implementation) although its freed by zebra at the end. So systems with lower memory might get impacted and would eventually run out of memory.

@eddieruan-alibaba
Copy link
Contributor

@dgsudharsan do we have an issue opened for this issue? We could discuss this issue in routing WG. Alibaba saw this issue as well. I could ask someone from my team to discuss our workaround in FRR.

@eddieruan-alibaba
Copy link
Contributor

@nakano-omw where are you on the work which @hasan-brcm and NTT worked on to further enhancing NHG handling?

@eddieruan-alibaba
Copy link
Contributor

@dgsudharsan But why fib suppression feature is removed. This issue is independent to this feature, right?

@dgsudharsan
Copy link
Contributor

@eddieruan-alibaba We removed it in 202305 and 202311 sonic-net/sonic-buildimage#17660 . With this feature since there is a delay zebra queue builds up.
However for 202405 I am not still sure if we do full revert or just disable fib suppression and keep dplane_fpm_nl. @stepanblyschak can you please clarify?

@nakano-omw
Copy link
Contributor

nakano-omw commented May 21, 2024

@nakano-omw where are you on the work which @hasan-brcm and NTT worked on to further enhancing NHG handling?

The NHG support we are developing will not work if dplane_fpm_nl is disabled because it is required.

@stepanblyschak
Copy link
Contributor

stepanblyschak commented May 21, 2024

@nakano-omw @dgsudharsan I plan to keep dplane_fpm_nl for 202405

@zice312963205
Copy link

@dgsudharsan Could you please elaborate on the performance issue? Is it a FRR issue?

Yes. Its an FRR issue. If there are large number of routes and if link flap happens resulting in multiple withdraw and advertisement of the routes, zebra's processing queue will be built up resulting in zebra consuming memory in GBs. This memory will not get freed to the OS (due to low level libc implementation) although its freed by zebra at the end. So systems with lower memory might get impacted and would eventually run out of memory.

I have submitted a description of the issue to the FRR community: FRRouting/frr#15016
the system runs out of memory due to excessive temporary memory consumption by the dplane. And I have attempted to fix this issue in the PR below: FRRouting/frr#16067

@zhangyanzhao
Copy link
Collaborator

review is on-going, keep it in 202405 release for now.

@dgsudharsan
Copy link
Contributor

@nakano-omw @lguohan @eddieruan-alibaba Stepan did some experiments and confirmed that there is no memory issue with dplane_fpm_nl without fib suppression and the reason for memory increase is due to fib suppression feature itself. We will not revert dplane_fpm_nl but disable fib suppression feature alone.

@zhangyanzhao
Copy link
Collaborator

code PRs are not approved yet, move to backlog

@zhangyanzhao
Copy link
Collaborator

code prs are not merged yet, move to backlog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: MovedToBacklog
Status: MovedToBacklog
Development

Successfully merging this pull request may close these issues.