-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[chassis-packet]: internal bfd sessions bringup delays during config reload/reboot #17180
Comments
Hi @rlhui , @abdosi , please assign this to me for now. May we plan to discuss this in the upcoming chassis meeting? Thanks @vperumal , @rajendrat , for your viz. |
why we need bfd session to be the fastest to come up before bgp? We may not want that right, as bfd is for monitoring/resiliency, but not necessarily needed in normal cases, unlike BGP which is critical. |
what functional issue are we seeing because of this delay? |
priority queue present in swss already. |
dual tor has the mechanism to delay bgp session bring up using a FRR configuration. |
Thanks @arlakshm , I will check on that. |
HI @rlhui , as such no functionality impact observed but the overall bringup of all bgp paths gets delayed. |
Hi @arlakshm , I tried following config but it did not help much. |
@arlakshm please include this in sonic-common-infra subgroup as one high priority problem to solve, thanks. |
I created a PR to fix #19569 The issue in orchagent is: massive low priority event may block high priority event. mor detail can find in issue #19569 |
Thanks @liuh-80 , I am validating this fix. |
This change LGTM. I see lot of improvement with port an bfd notification handling. |
@anamehra is this issue gone or still there? |
fixed by sonic-net/sonic-swss#3328 |
Description
On packet chassis, LC to LC connectivity via fabric uses iBGP sessions. Internal BFD over fabric interfaces is used to do fault detection for these iBGP sessions. On a scale setup with 3 or more fabric cards, it takes more time for orchagent to process the bfd session-up notifications from SAI during config reload or reboot. The reason for this delay is due to same notification queue being used for bfd notifications and bgp route learning notifications. During bgp/swss docker start, bfd and bgp configuration is applied together. As soon as a few bfd sessions come up, iBGP sessions start establishing. This also starts a flood of route-learning notifications for Orchagent. During this time when new bfd session-up notifications are sent by SAI, the processing for these new messages gets delayed.
On a scale setup with 5 FCs we observe that it may take up to 12 mins for orchagent to process all bfd session up messages since the start of docker.
If bgp sessions are kept in a down state during first ~3 mins of docker bring up, bfd session up messages are handled on time. After that, if bgp is started, the session bring up and route learning happens properly.
This GitHub issue is to find and implement a better way of handling the bfd and bgp session on chassis-packet.
Steps to reproduce the issue:
Describe the results you received:
Describe the results you expected:
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: