Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evpn: fix evpn losing type-2 routes #2804

Closed
wants to merge 1 commit into from

Conversation

Tuetuopay
Copy link
Contributor

When fixing the EVPN MAC mobility complexity, the way destinations are indexed in the routing table changed from RD+ETAG+MAC+IP to only RD+MAC. This is incorrect per the BGP EVPN RFC. It works in most cases, as when an IP is present, virtually all EVPN implementations will announce two paths: with and without the IP. This way routes announces are balanced and pose no issues.

Issues arise when GoBGP is connected to multiple peers announcing the same things (read: route reflectors), at a high rate, with lots of routes (hundreds of thousands), and if multiple paths exist for the same mac (e.g. with and without an overlay IP address). The issue does not appear time if any of the four above conditions is false.

There, processing ends up racy and over time, some routes end up missing due to the concurrent updates. Such missing routes have been observed with a production setup with:

  • hundreds of thousands of routes
  • tens of updates per second
  • four route reflectors

With this setup, we ended up with a handful of routes missing (usually 10 to 20) after a few days of runtime.

This commit reverts back the custom tableKey implementation done previously, to use the plain String view of the prefix. It is to be noted this is suboptimal performance wise, but is correct.

Fixes: c393f43 ("evpn: fix quadratic evpn mac-mobility handling")

Sorry for introducing this bug in the first place.

When fixing the EVPN MAC mobility complexity, the way destinations are
indexed in the routing table changed from RD+ETAG+MAC+IP to only RD+MAC.
This is incorrect per the BGP EVPN RFC. It works in most cases, as when
an IP is present, virtually all EVPN implementations will announce two
paths: with and without the IP. This way routes announces are balanced
and pose no issues.

Issues arise when GoBGP is connected to multiple peers announcing the
same things (read: route reflectors), at a high rate, with lots of
routes (hundreds of thousands), and if multiple paths exist for the same
mac (e.g. with and without an overlay IP address). The issue does not
appear time if any of the four above conditions is false.

There, processing ends up racy and over time, some routes end up missing
due to the concurrent updates. Such missing routes have been observed
with a production setup with:

- hundreds of thousands of routes
- tens of updates per second
- four route reflectors

With this setup, we ended up with a handful of routes missing (usually
10 to 20) after a few days of runtime.

This commit reverts back the custom `tableKey` implementation done
previously, to use the plain `String` view of the prefix. It is to be
noted this is suboptimal performance wise, but is correct.

Fixes: c393f43 ("evpn: fix quadratic evpn mac-mobility handling")
@fujita
Copy link
Member

fujita commented May 21, 2024

pushed, thanks.

@fujita fujita closed this May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants