Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a check for ensuring mirror session ACLs are programmed to ASIC #3333

Merged
merged 5 commits into from
Jun 3, 2024

Conversation

ryanzhu706
Copy link
Contributor

@ryanzhu706 ryanzhu706 commented May 21, 2024

Description
Add a check for ensuring mirror session ACLs are programmed to ASIC

What is the issue?
This fix is to address an issue where an ACL is added to CONFIG_DB, but before it could be programmed to ASIC, Orchagent is paused.
This leads to APPLY_VIEW failure when base image OA could not process this ACL entry and target image's OA still creates it.
The issue has an image fix available at sonic-net/sonic-sairedis#1240
This issue is very rare, and has been caught by upgrade path tests only once in thousands of iterations.

What is this fix?
A new logic is added to check if mirror session ACLs for arp and nd are added to ASIC..
ACLs are looked into ASIC_DB and matched using SAI_ACL_ENTRY_ATTR_PRIORITY attribute.
SAI_ACL_ENTRY_ATTR_PRIORITY for arp ACL is 8888 and for nd is 8887
If one of the ACLs is found missing then warmboot is aborted.

Tested:

### arp/test_wr_arp.py result:
=========================================================================== warnings summary ===========================================================================
../../../../usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236
  /usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated
    "class": algorithms.Blowfish,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------------------------------------------- generated xml file: /var/src/sonic-mgmt-int/tests/logs/arp/test_wr_arp.xml ----------------------------------------------
------------------------------------------------------------------------ live log sessionfinish ------------------------------------------------------------------------
19:08:59 __init__.pytest_terminal_summary         L0067 INFO   | Can not get Allure report URL. Please check logs
=============================================================== 2 passed, 1 warning in 858.02s (0:14:18) ===============================================================
INFO:root:Can not get Allure report URL. Please check logs

### manually warm-reboot result:
Fri May 31 08:38:28 PM UTC 2024 Starting warm-reboot
Fri May 31 08:38:31 PM UTC 2024 Saving counters folder before warmboot...
Fri May 31 08:38:42 PM UTC 2024 Loading kernel without secure boot
Fri May 31 08:38:44 PM UTC 2024 Setting up control plane assistant: 10.1.2.3 ...
Fri May 31 08:38:48 PM UTC 2024 Checking if mirror session ACLs (arp, nd) programmed to ASIC successfully
Fri May 31 08:38:48 PM UTC 2024 Mirror session ACLs (arp, nd) programmed to ASIC successfully

ERROR: There are port channels/peer devices that failed the probe: ['PortChannel102', 'PortChannel101', 'PortChannel103', 'PortChannel104']
Fri May 31 08:38:53 PM UTC 2024 Warning: Retry count feature support unknown for one or more neighbor devices; assuming that it's not available
Fri May 31 08:38:53 PM UTC 2024 Pausing orchagent ...

Copy link

linux-foundation-easycla bot commented May 21, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

scripts/fast-reboot Show resolved Hide resolved
scripts/fast-reboot Outdated Show resolved Hide resolved
scripts/fast-reboot Outdated Show resolved Hide resolved
scripts/.fast-reboot.swp Outdated Show resolved Hide resolved
scripts/fast-reboot Outdated Show resolved Hide resolved
scripts/fast-reboot Outdated Show resolved Hide resolved
@vaibhavhd vaibhavhd marked this pull request as ready for review May 30, 2024 00:00
@ryanzhu706 ryanzhu706 requested review from saiarcot895 and yxieca May 30, 2024 18:06
vaibhavhd
vaibhavhd previously approved these changes May 31, 2024
Copy link
Contributor

@vaibhavhd vaibhavhd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test the change and update the description before PR merge

@saiarcot895
Copy link
Contributor

You'll also need to fix up the email addresses in the commits (or squash the commits) for EasyCLA to pass.

@vaibhavhd vaibhavhd merged commit 676ebe4 into sonic-net:master Jun 3, 2024
7 checks passed
arfeigin pushed a commit to arfeigin/sonic-utilities that referenced this pull request Jun 16, 2024
…onic-net#3333)

Description
Add a check for ensuring mirror session ACLs are programmed to ASIC

What is the issue?
This fix is to address an issue where an ACL is added to CONFIG_DB, but before it could be programmed to ASIC, Orchagent is paused.
This leads to APPLY_VIEW failure when base image OA could not process this ACL entry and target image's OA still creates it.
The issue has an image fix available at sonic-net/sonic-sairedis#1240
This issue is very rare, and has been caught by upgrade path tests only once in thousands of iterations.

What is this fix?
A new logic is added to check if mirror session ACLs for arp and nd are added to ASIC..
ACLs are looked into ASIC_DB and matched using SAI_ACL_ENTRY_ATTR_PRIORITY attribute.
SAI_ACL_ENTRY_ATTR_PRIORITY for arp ACL is 8888 and for nd is 8887
If one of the ACLs is found missing then warmboot is aborted.

Tested on physical testbed running 202311 and master
@ayurkiv-nvda
Copy link
Contributor

ayurkiv-nvda commented Sep 17, 2024

Hello @vaibhavhd
We found that in while loop we will fail after 1 iteration if ACL entries were not found in redis
I suggest continuing retry cycle.
Please take a look at suggested solution:

#3548

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants