Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BGP] starting BGP service after swss #12381

Merged
merged 1 commit into from
Oct 13, 2022
Merged

Conversation

yxieca
Copy link
Contributor

@yxieca yxieca commented Oct 12, 2022

Why I did it

BGP service has always been starting after interface-config. However, recently we discovered an issue where some BGP sessions are unable to establish due to BGP daemon not able to read the interface IP.

This issue was clearly observed after upgrading to FRR 8.2.2. See more details in #12380.

How I did it

Delaying starting BGP seems to be a workaround for this issue.

However, caution is that this delay might impact warm reboot timing and other timing sequences.

This workaround is reducing the probability of hitting the issue by close to 100X. However, this workaround is not bulletproof as test shows. It is still preferrable to have a proper FRR fix and revert this change in the future.

How to verify it

Continuously issuing config reload and check BGP session status afterwards.

Signed-off-by: Ying Xie [email protected]

MSFT ADO: 24163872

BGP service has always been starting after interface-config.
However, recently we discovered an issue where some BGP sessions
are unable to establish due to BGP daemon not able to read the
interface IP.

This issue was only obsered after upgrading to FRR 8.2.2.

Delaying starting BGP seems to be a workaround for this issue.

However, caution is that this delay might impact warm reboot timing
and other timing sequences.

Signed-off-by: Ying Xie <[email protected]>
@lguohan lguohan merged commit bc684fe into sonic-net:master Oct 13, 2022
@yxieca yxieca deleted the bgp_service branch October 13, 2022 16:33
yxieca added a commit that referenced this pull request Oct 13, 2022
Why I did it
BGP service has always been starting after interface-config. However, recently we discovered an issue where some BGP sessions are unable to establish due to BGP daemon not able to read the interface IP.

This issue was clearly observed after upgrading to FRR 8.2.2. See more details in #12380.

How I did it
Delaying starting BGP seems to be a workaround for this issue.

However, caution is that this delay might impact warm reboot timing and other timing sequences.

This workaround is reducing the probability of hitting the issue by close to 100X. However, this workaround is not bulletproof as test shows. It is still preferrable to have a proper FRR fix and revert this change in the future.

How to verify it
Continuously issuing config reload and check BGP session status afterwards.

Signed-off-by: Ying Xie <[email protected]>
@mssonicbld
Copy link
Collaborator

@yxieca PR conflicts with 202012 branch

@yxieca
Copy link
Contributor Author

yxieca commented Jun 5, 2023

Cherry-pick was done by PR #15312

yxieca pushed a commit that referenced this pull request Jun 8, 2023
Cherrypick #12381 into 202012

Reverts #15312

Work item tracking
Microsoft ADO (number only): 24163872
StormLiangMS pushed a commit that referenced this pull request Aug 15, 2024
Why I did it
With the following PR, make bgp start after swss.
#12381

bgp started after the swss but still ahead of the interface init.

Jun 12 04:53:59.768546 bjw-can-7050qx-1 NOTICE root: Starting swss service...
...
Jun 12 04:54:12.725418 bjw-can-7050qx-1 NOTICE admin: Starting bgp service...
...
Jun 12 04:54:43.036682 bjw-can-7050qx-1 NOTICE swss#orchagent: :- updatePortOperStatus: Port Ethernet0 oper state set from down to up
Jun 12 04:54:43.191143 bjw-can-7050qx-1 NOTICE swss#orchagent: :- updatePortOperStatus: Port Ethernet4 oper state set from down to up
Jun 12 04:54:43.207343 bjw-can-7050qx-1 NOTICE swss#orchagent: :- updatePortOperStatus: Port Ethernet12 oper state set from down to up

Work item tracking
Microsoft ADO (number only):
26557087
How I did it
Check the interface status before start bgp.
waiting timeout is about 60s, will output a warning message if interface still down.

How to verify it
build debug image, boot the image, check the syslog. and bgp process.

syslog:1098:Jun 3 03:10:30.338071 str-a7060cx-acs-10 INFO bgp#root: [bgpd] It took 0.498398 seconds for interface to become ready
matiAlfaro pushed a commit to Marvell-switching/sonic-buildimage that referenced this pull request Aug 21, 2024
Why I did it
With the following PR, make bgp start after swss.
sonic-net#12381

bgp started after the swss but still ahead of the interface init.

Jun 12 04:53:59.768546 bjw-can-7050qx-1 NOTICE root: Starting swss service...
...
Jun 12 04:54:12.725418 bjw-can-7050qx-1 NOTICE admin: Starting bgp service...
...
Jun 12 04:54:43.036682 bjw-can-7050qx-1 NOTICE swss#orchagent: :- updatePortOperStatus: Port Ethernet0 oper state set from down to up
Jun 12 04:54:43.191143 bjw-can-7050qx-1 NOTICE swss#orchagent: :- updatePortOperStatus: Port Ethernet4 oper state set from down to up
Jun 12 04:54:43.207343 bjw-can-7050qx-1 NOTICE swss#orchagent: :- updatePortOperStatus: Port Ethernet12 oper state set from down to up

Work item tracking
Microsoft ADO (number only):
26557087
How I did it
Check the interface status before start bgp.
waiting timeout is about 60s, will output a warning message if interface still down.

How to verify it
build debug image, boot the image, check the syslog. and bgp process.

syslog:1098:Jun 3 03:10:30.338071 str-a7060cx-acs-10 INFO bgp#root: [bgpd] It took 0.498398 seconds for interface to become ready
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants