Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bgpcfgd]: Dynamic BBR support #5626

Merged
merged 4 commits into from
Oct 22, 2020

Conversation

pavel-shirshov
Copy link
Contributor

@pavel-shirshov pavel-shirshov commented Oct 14, 2020

- Why I did it
To introduce dynamic support of BBR functionality into bgpcfgd.
BBR is adding `neighbor PEER_GROUP allowas-in 1' for all BGP peer-groups which points to T0
Now we can add and remove this configuration based on CONFIG_DB entry

- How I did it
I introduced a new CONFIG_DB entry:

  • table name: "BGP_BBR"
  • key value: "all". Currently only "all" is supported, which means that all peer-groups which points to T0s will be updated
  • data value: a dictionary: {"status": "status_value"}, where status_value could be either "enabled" or "disabled"

Initially, when bgpcfgd starts, it reads initial BBR status values from the constants.yml. Then you can control BBR status by changing "BGP_BBR" table in the CONFIG_DB (see examples below).

bgpcfgd knows what peer-groups to change fron constants.yml. The dictionary contains peer-group names as keys, and a list of address-families as values. So when bgpcfgd got a request to change the BBR state, it changes the state only for peer-groups listed in the constants.yml dictionary (and only for address families from the peer-group value).

- How to verify it
Initially, when we start SONiC FRR has BBR enabled for PEER_V4 and PEER_V6:

admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
  neighbor PEER_V4 allowas-in 1
  neighbor PEER_V6 allowas-in 1

Then we apply following configuration to the db:

admin@str-s6100-acs-1:~$ cat disable.json                
{
        "BGP_BBR": {
            "all": {
                "status": "disabled"
            }
        }
}


admin@str-s6100-acs-1:~$ sonic-cfggen -j disable.json -w 

The log output are:

Oct 14 18:40:22.450322 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'disabled'),))'
Oct 14 18:40:22.450620 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpmWTiuq']'.
Oct 14 18:40:22.681084 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'.
Oct 14 18:40:22.904626 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'.

Check FRR configuraiton and see that no allowas parameters are there:

admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas' 
admin@str-s6100-acs-1:~$

Then we apply enabling configuration back:

admin@str-s6100-acs-1:~$ cat enable.json 
{
        "BGP_BBR": {
            "all": {
                "status": "enabled"
            }
        }
}

admin@str-s6100-acs-1:~$ sonic-cfggen -j enable.json -w 

The log output:

Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'enabled'),))'
Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpDD6SKv']'.
Oct 14 18:40:41.587257 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'.
Oct 14 18:40:42.042967 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'.

Check FRR configuraiton and see that the BBR configuration is back:

admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
  neighbor PEER_V4 allowas-in 1
  neighbor PEER_V6 allowas-in 1

*** The test coverage ***
Below is the test coverage

---------- coverage: platform linux2, python 2.7.12-final-0 ----------
Name                             Stmts   Miss  Cover
----------------------------------------------------
bgpcfgd/__init__.py                  0      0   100%
bgpcfgd/__main__.py                  3      3     0%
bgpcfgd/config.py                   78     41    47%
bgpcfgd/directory.py                63     34    46%
bgpcfgd/log.py                      15      3    80%
bgpcfgd/main.py                     51     51     0%
bgpcfgd/manager.py                  41     23    44%
bgpcfgd/managers_allow_list.py     385     21    95%
bgpcfgd/managers_bbr.py             76      0   100%
bgpcfgd/managers_bgp.py            193    193     0%
bgpcfgd/managers_db.py               9      9     0%
bgpcfgd/managers_intf.py            33     33     0%
bgpcfgd/managers_setsrc.py          45     45     0%
bgpcfgd/runner.py                   39     39     0%
bgpcfgd/template.py                 64     11    83%
bgpcfgd/utils.py                    32     24    25%
bgpcfgd/vars.py                      1      0   100%
----------------------------------------------------
TOTAL                             1128    530    53%

- Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@pavel-shirshov
Copy link
Contributor Author

I see the test issue. I'll fix it ASAP.

@lguohan
Copy link
Collaborator

lguohan commented Oct 14, 2020

can you fix the test?

@lguohan
Copy link
Collaborator

lguohan commented Oct 14, 2020

have you check the bgp update timer? are those prefixes withdrawn immediately?

@lguohan
Copy link
Collaborator

lguohan commented Oct 20, 2020

does enable/disable bbr have traffic impact?

rv = self.cfg_mgr.push_list(cmds)
if not rv:
log_crit("BBRMgr::can't apply configuration")
return True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always return True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. False means save it in the queue, and give it back to me next time,
True means - remove it completely.
In this case I don't see any reason the change could be applied without any issue in the future


def __restart_peers(self):
""" Restart peer-groups which support BBR """
for peer_group in sorted(self.bbr_enabled_pgs.keys()):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why sort here? any reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise my test doesn't know what to expect (or I can use conversion to sets) and compare sets. I'm going to remove it after the batching change

m.enabled = is_enabled
prepare_config_return_value = [
["vtysh", "-c", "clear bgp peer-group PEER_V4 soft in"],
["vtysh", "-c", "clear bgp peer-group PEER_V6 soft in"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the description, when you enable/disable bbr, you only modify the peer group configuration for the peer group for T0. However, here you seems restart PEER_V4/V6 which is the peer group for T0 as well. Do we need to restart all peer groups, or only T0 peer groups?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the public we have PEER_V4 and PEER_V6, They both work for T2, T1, and T0 peers. So here I'll restart them as T0. There is no way to distinct T1 from T0 (no neighbor meta in the public enabled by requests from community). But internal configs has PEERs which a designed to be used only for T0. In our internal case we restart only required peer groups.

if 'bbr' in self.constants['bgp'] and \
'enabled' in self.constants['bgp']['bbr'] and \
self.constants['bgp']['bbr']['enabled']:
self.bbr_enabled_pgs = self.__read_pgs()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"BBR is adding `neighbor PEER_GROUP allowas-in 1' for all BGP peer-groups which points to T0"

can you point me code where you identify if a peer group points to T0 or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I use file constants

' address-family ipv4',
' no neighbor PEER_V4 allowas-in 1',
' address-family ipv6',
' no neighbor PEER_V4 allowas-in 1',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little bit confused here, why do we have PEER_V4 in address-family ipv6?

Copy link
Contributor Author

@pavel-shirshov pavel-shirshov Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for misleading here, I added this just for a test:
As you can see from the source data, PEER_V4 works for both ipv4 and ipv6

                "PEER_V4": ["ipv4", "ipv6"],
                "PEER_V6": ["ipv6"],

PEER_V4:
- ipv4
PEER_V6:
- ipv6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use device metadata role to identify if a peer is a tor or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I don't have neighbor metadata in public. That was disabled by multiple requests from the community. Without this change I can implement the change in internal, but not in the public. But we want to test the feature on public too.
  2. I need a way to define, what address families are used by each peer-group for BBR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this adds additional maintaining cost. it is better to check if neighbor metadata is available or not, if it is available, then use it. otherwise, look for bbr definition. do you agree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree here, sorry

  1. We need to define mapping between address families and peer-groups. We can't extract this information from the minigraph, it is available only in templates.
  2. Currently bgpcfgd doesn't know anything about the device neighbor metadata, also as T0, T1, T2 and so on. That was requested explicitly by community. So I add an option to depend on the neighbor metadata presence, but I enable that option only for internal image. All other operations with T0, T1 and so on is done inside of the internal templates.
  3. We need to test this functionality using the public image, so we need a way to define this without neighbor meta
  4. Current approach allows us to have T0 peer-groups, which doesn't support BBR.
  5. My goal is to make templates downloadable from external source, like we download acl rules. I think of having a tar archive with templates + metadata file in the archive describing the metadata of the templates.

@pavel-shirshov
Copy link
Contributor Author

@lguohan No traffic impact, I use 'soft' reset to update RIB

@pavel-shirshov pavel-shirshov merged commit c94f93f into sonic-net:master Oct 22, 2020
abdosi pushed a commit that referenced this pull request Oct 30, 2020
**- Why I did it**
To introduce dynamic support of BBR functionality into bgpcfgd.
BBR is adding  `neighbor PEER_GROUP allowas-in 1' for all BGP peer-groups which points to T0
Now we can add and remove this configuration based on CONFIG_DB entry 

**- How I did it**
I introduced a new CONFIG_DB entry:
 - table name: "BGP_BBR"
 - key value: "all". Currently only "all" is supported, which means that all peer-groups which points to T0s will be updated
 - data value: a dictionary: {"status": "status_value"}, where status_value could be either "enabled" or "disabled"

Initially, when bgpcfgd starts, it reads initial BBR status values from the [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR34). Then you can control BBR status by changing "BGP_BBR" table in the CONFIG_DB (see examples below).

bgpcfgd knows what peer-groups to change fron [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR39). The dictionary contains peer-group names as keys, and a list of address-families as values. So when bgpcfgd got a request to change the BBR state, it changes the state only for peer-groups listed in the constants.yml dictionary (and only for address families from the peer-group value).

**- How to verify it**
Initially, when we start SONiC FRR has BBR enabled for PEER_V4 and PEER_V6:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
  neighbor PEER_V4 allowas-in 1
  neighbor PEER_V6 allowas-in 1
```

Then we apply following configuration to the db:
```
admin@str-s6100-acs-1:~$ cat disable.json                
{
        "BGP_BBR": {
            "all": {
                "status": "disabled"
            }
        }
}


admin@str-s6100-acs-1:~$ sonic-cfggen -j disable.json -w 
```
The log output are:
```
Oct 14 18:40:22.450322 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'disabled'),))'
Oct 14 18:40:22.450620 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpmWTiuq']'.
Oct 14 18:40:22.681084 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'.
Oct 14 18:40:22.904626 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'.
```

Check FRR configuraiton and see that no allowas parameters are there:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas' 
admin@str-s6100-acs-1:~$
```

Then we apply enabling configuration back:
```
admin@str-s6100-acs-1:~$ cat enable.json 
{
        "BGP_BBR": {
            "all": {
                "status": "enabled"
            }
        }
}

admin@str-s6100-acs-1:~$ sonic-cfggen -j enable.json -w 
```
The log output:
```
Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'enabled'),))'
Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpDD6SKv']'.
Oct 14 18:40:41.587257 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'.
Oct 14 18:40:42.042967 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'.
```


Check FRR configuraiton and see that the BBR configuration is back:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
  neighbor PEER_V4 allowas-in 1
  neighbor PEER_V6 allowas-in 1
```

*** The test coverage ***
Below is the test coverage
```
---------- coverage: platform linux2, python 2.7.12-final-0 ----------
Name                             Stmts   Miss  Cover
----------------------------------------------------
bgpcfgd/__init__.py                  0      0   100%
bgpcfgd/__main__.py                  3      3     0%
bgpcfgd/config.py                   78     41    47%
bgpcfgd/directory.py                63     34    46%
bgpcfgd/log.py                      15      3    80%
bgpcfgd/main.py                     51     51     0%
bgpcfgd/manager.py                  41     23    44%
bgpcfgd/managers_allow_list.py     385     21    95%
bgpcfgd/managers_bbr.py             76      0   100%
bgpcfgd/managers_bgp.py            193    193     0%
bgpcfgd/managers_db.py               9      9     0%
bgpcfgd/managers_intf.py            33     33     0%
bgpcfgd/managers_setsrc.py          45     45     0%
bgpcfgd/runner.py                   39     39     0%
bgpcfgd/template.py                 64     11    83%
bgpcfgd/utils.py                    32     24    25%
bgpcfgd/vars.py                      1      0   100%
----------------------------------------------------
TOTAL                             1128    530    53%
```

**- Which release branch to backport (provide reason below if selected)**

- [ ] 201811
- [x] 201911
- [x] 202006
wangxin added a commit to sonic-net/sonic-mgmt that referenced this pull request Nov 25, 2020
What is the motivation for this PR?
The BGP Bounce Back Routing (BBR) feature has been introduced into SONiC.
This change is to implement a test script for testing this BGP BBR feature.

How did you do it?
This test requires T1 type topology. Added 3 test cases:

Test Case1
* Enable BBR, announce a route to the first T0 VM. The route has DUT asn in aspath.
* Verify that the route is accepted by DUT and propagated to other VMs.

Test Case2
* Enable BBR, announce a route to the first T0 VM. The route has two DUT asn in aspath.
* Verify that the route is not accepted by DUT and not propagated to other VMs.

Test Case3
* Disable BBR, announce a route to the first T0 VM. The route has DUT asn in aspath.
* Verify that the route is not accepted by DUT and not propagated to other VMs.
* Enable BBR again after test.

How did you verify/test it?
Test run the script on vs setup running t1-lag topology and 201911 image.

Any platform specific information?
No

Supported testbed topology if it's a new test case?
t1 type topology

Documentation
sonic-net/sonic-buildimage#5626

Signed-off-by: Xin Wang <[email protected]>
santhosh-kt pushed a commit to santhosh-kt/sonic-buildimage that referenced this pull request Feb 25, 2021
**- Why I did it**
To introduce dynamic support of BBR functionality into bgpcfgd.
BBR is adding  `neighbor PEER_GROUP allowas-in 1' for all BGP peer-groups which points to T0
Now we can add and remove this configuration based on CONFIG_DB entry 

**- How I did it**
I introduced a new CONFIG_DB entry:
 - table name: "BGP_BBR"
 - key value: "all". Currently only "all" is supported, which means that all peer-groups which points to T0s will be updated
 - data value: a dictionary: {"status": "status_value"}, where status_value could be either "enabled" or "disabled"

Initially, when bgpcfgd starts, it reads initial BBR status values from the [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR34). Then you can control BBR status by changing "BGP_BBR" table in the CONFIG_DB (see examples below).

bgpcfgd knows what peer-groups to change fron [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR39). The dictionary contains peer-group names as keys, and a list of address-families as values. So when bgpcfgd got a request to change the BBR state, it changes the state only for peer-groups listed in the constants.yml dictionary (and only for address families from the peer-group value).

**- How to verify it**
Initially, when we start SONiC FRR has BBR enabled for PEER_V4 and PEER_V6:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
  neighbor PEER_V4 allowas-in 1
  neighbor PEER_V6 allowas-in 1
```

Then we apply following configuration to the db:
```
admin@str-s6100-acs-1:~$ cat disable.json                
{
        "BGP_BBR": {
            "all": {
                "status": "disabled"
            }
        }
}


admin@str-s6100-acs-1:~$ sonic-cfggen -j disable.json -w 
```
The log output are:
```
Oct 14 18:40:22.450322 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'disabled'),))'
Oct 14 18:40:22.450620 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpmWTiuq']'.
Oct 14 18:40:22.681084 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'.
Oct 14 18:40:22.904626 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'.
```

Check FRR configuraiton and see that no allowas parameters are there:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas' 
admin@str-s6100-acs-1:~$
```

Then we apply enabling configuration back:
```
admin@str-s6100-acs-1:~$ cat enable.json 
{
        "BGP_BBR": {
            "all": {
                "status": "enabled"
            }
        }
}

admin@str-s6100-acs-1:~$ sonic-cfggen -j enable.json -w 
```
The log output:
```
Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'enabled'),))'
Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpDD6SKv']'.
Oct 14 18:40:41.587257 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'.
Oct 14 18:40:42.042967 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'.
```


Check FRR configuraiton and see that the BBR configuration is back:
```
admin@str-s6100-acs-1:~$ vtysh -c 'show run' | egrep 'PEER_V.? allowas'
  neighbor PEER_V4 allowas-in 1
  neighbor PEER_V6 allowas-in 1
```

*** The test coverage ***
Below is the test coverage
```
---------- coverage: platform linux2, python 2.7.12-final-0 ----------
Name                             Stmts   Miss  Cover
----------------------------------------------------
bgpcfgd/__init__.py                  0      0   100%
bgpcfgd/__main__.py                  3      3     0%
bgpcfgd/config.py                   78     41    47%
bgpcfgd/directory.py                63     34    46%
bgpcfgd/log.py                      15      3    80%
bgpcfgd/main.py                     51     51     0%
bgpcfgd/manager.py                  41     23    44%
bgpcfgd/managers_allow_list.py     385     21    95%
bgpcfgd/managers_bbr.py             76      0   100%
bgpcfgd/managers_bgp.py            193    193     0%
bgpcfgd/managers_db.py               9      9     0%
bgpcfgd/managers_intf.py            33     33     0%
bgpcfgd/managers_setsrc.py          45     45     0%
bgpcfgd/runner.py                   39     39     0%
bgpcfgd/template.py                 64     11    83%
bgpcfgd/utils.py                    32     24    25%
bgpcfgd/vars.py                      1      0   100%
----------------------------------------------------
TOTAL                             1128    530    53%
```

**- Which release branch to backport (provide reason below if selected)**

- [ ] 201811
- [x] 201911
- [x] 202006
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants