Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[master] swss encounters error after clear_stats fail with SAI_STATUS_NOT_SUPPORTED #9261

Closed
vaibhavhd opened this issue Nov 15, 2021 · 6 comments

Comments

@vaibhavhd
Copy link
Contributor

Description

SWSS remains in a bad state after cold reboot (or config reload).
From the logs: clear_stats calls fail with SAI_STATUS_NOT_SUPPORTED : Unknown or unsupported stat type 20.

Ultimately, swss remains in bad state with TIMINEDOUT responses:
Encountered failure in get operation, SAI API: SAI_API_QUEUE, status: SAI_STATUS_FAILURE

This issue is hit in multiple different platforms. dx010, 7260 are the ones where I checked.

Steps to reproduce the issue:

  1. Install latest master image. Cold reboot to the new image.
  2. Check logs for errors.

Describe the results you received:

clear_stats failure: syslog and sairedis combined:

2021-11-15.11:49:53.803944|q|clear_stats|SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x180000000005d7|SAI_BUFFER_POOL_STAT_WATERMARK_BYTES=|SAI_BUFFER_POOL_STAT_XOFF_ROOM_WATERMARK_BYTES=
Nov 15 11:49:53.935212 str2-dx010-acs-7 ERR syncd#syncd: [none] SAI_API_BUFFER:brcm_sai_clear_buffer_pool_stats:1923 Unknown or unsupported stat type 20.
2021-11-15.11:49:53.938617|Q|clear_stats|SAI_STATUS_NOT_SUPPORTED
2021-11-15.11:49:53.938767|q|clear_stats|SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x180000000005d8|SAI_BUFFER_POOL_STAT_WATERMARK_BYTES=|SAI_BUFFER_POOL_STAT_XOFF_ROOM_WATERMARK_BYTES=
Nov 15 11:49:53.938928 str2-dx010-acs-7 NOTICE swss#orchagent: :- generateBufferPoolWatermarkCounterIdList: Clear watermark failed on egress_lossless_pool, rv: SAI_STATUS_NOT_SUPPORTED
2021-11-15.11:51:02.532152|Q|clear_stats|SAI_STATUS_FAILURE
...
...
...
Nov 15 11:50:24.383980 str2-dx010-acs-7 ERR syncd#syncd: :- threadFunction: time span WD exceeded 30443 ms for clear_stats:SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x180000000005d8
Nov 15 11:50:24.384313 str2-dx010-acs-7 ERR syncd#syncd: :- logEventData: op: clear_stats, key: SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x180000000005d8
Nov 15 11:50:24.384580 str2-dx010-acs-7 ERR syncd#syncd: :- logEventData: fv: SAI_BUFFER_POOL_STAT_WATERMARK_BYTES: 
Nov 15 11:50:24.384832 str2-dx010-acs-7 ERR syncd#syncd: :- logEventData: fv: SAI_BUFFER_POOL_STAT_XOFF_ROOM_WATERMARK_BYTES: 
..
...
Nov 15 11:53:02.930165 str2-dx010-acs-7 ERR swss#orchagent: :- wait: SELECT operation result: TIMEOUT on getresponse
Nov 15 11:53:02.930684 str2-dx010-acs-7 ERR swss#orchagent: :- wait: failed to get response for getresponse
Nov 15 11:53:02.931254 str2-dx010-acs-7 ERR swss#orchagent: :- getQueueTypeAndIndex: Failed to get queue type and index for queue 5910974510924590 rv:-1
Nov 15 11:53:02.931813 str2-dx010-acs-7 ERR swss#orchagent: :- handleSaiGetStatus: Encountered failure in get operation, SAI API: SAI_API_QUEUE, status: SAI_STATUS_FAILURE

Describe the results you expected:

Output of show version:

OS version:

SONiC Software Version: SONiC.master.51044-dirty-20211114.174651
Distribution: Debian 11.1
Kernel: 5.10.0-8-2-amd64
Build commit: df12ac5ab
Build date: Sun Nov 14 18:07:24 UTC 2021
Built by: AzDevOps@sonic-build-workers-000W6O

SAI version:

# docker exec -it syncd dpkg -s libsaibcm | head
Package: libsaibcm
Status: install ok installed
Priority: extra
Section: libs
Installed-Size: 264272
Maintainer: Guohan Lu <[email protected]>
Architecture: amd64
Source: saibcm
Version: 6.0.0.10-1
Provides: libsai

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@vaibhavhd
Copy link
Contributor Author

This is a Broadcom only issue. CSP pending.

@gechiang gechiang self-assigned this Nov 20, 2021
@gechiang
Copy link
Collaborator

@vaibhavhd Other than DX010 and 7260, is there other platforms that you have noticed having this same issue as well?
This info will help narrow down the issue with BRCM support.

@gechiang
Copy link
Collaborator

BRCM case CS00012219613 [6.0] threadFunction: time span WD exceeded 30335 ms for clear_stats:SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x180000000005d8
Filed...

@bingwang-ms
Copy link
Contributor

This issue also caused the main loop in orchdaemon hangs for a long time, which result in entries in CONFIG_DB failed to be consumed.
So, almost all config command have no effect.

@gechiang
Copy link
Collaborator

@bingwang-ms and @vaibhavhd Please wait for the next master image that contains BRCM SAI 6.0.0.13 to re-evaluate this issue.
This new SAI contains the necessary fix for this issue. Without it, none of the BRCM based platforms can run with master image which is based with SAI 6.0. That PR just got merged few hours ago.
If you still see issues using a master image that is based on SAI 6.0.0.13, please alert me to take another look.
Thanks!

@vaibhavhd
Copy link
Contributor Author

This issue is not seen on the latest master images:

2022-01-31.17:48:22.014202|q|clear_stats|SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x180000000014d5|SAI_BUFFER_POOL_STAT_WATERMARK_BYTES=|SAI_BUFFER_POOL_STAT_XOFF_ROOM_WATERMARK_BYTES=
2022-01-31.17:48:22.291775|Q|clear_stats|SAI_STATUS_NOT_SUPPORTED
2022-01-31.17:48:22.291891|q|clear_stats|SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x180000000014d6|SAI_BUFFER_POOL_STAT_WATERMARK_BYTES=|SAI_BUFFER_POOL_STAT_XOFF_ROOM_WATERMARK_BYTES=
2022-01-31.17:48:22.293145|Q|clear_stats|SAI_STATUS_NOT_SUPPORTED
2022-01-31.17:48:22.293245|q|clear_stats|SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x180000000014d7|SAI_BUFFER_POOL_STAT_WATERMARK_BYTES=|SAI_BUFFER_POOL_STAT_XOFF_ROOM_WATERMARK_BYTES=
2022-01-31.17:48:22.297398|Q|clear_stats|SAI_STATUS_SUCCESS
2022-01-31.18:32:13.382904|q|clear_stats|SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x180000000014d5|SAI_BUFFER_POOL_STAT_WATERMARK_BYTES=|SAI_BUFFER_POOL_STAT_XOFF_ROOM_WATERMARK_BYTES=
2022-01-31.18:32:13.384068|Q|clear_stats|SAI_STATUS_NOT_SUPPORTED
2022-01-31.18:32:13.384124|q|clear_stats|SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x180000000014d6|SAI_BUFFER_POOL_STAT_WATERMARK_BYTES=|SAI_BUFFER_POOL_STAT_XOFF_ROOM_WATERMARK_BYTES=
2022-01-31.18:32:13.384714|Q|clear_stats|SAI_STATUS_NOT_SUPPORTED
2022-01-31.18:32:13.384757|q|clear_stats|SAI_OBJECT_TYPE_BUFFER_POOL:oid:0x180000000014d7|SAI_BUFFER_POOL_STAT_WATERMARK_BYTES=|SAI_BUFFER_POOL_STAT_XOFF_ROOM_WATERMARK_BYTES=
2022-01-31.18:32:13.385567|Q|clear_stats|SAI_STATUS_SUCCESS
# docker exec -it syncd dpkg -s libsaibcm | head
Package: libsaibcm
Status: install ok installed
Priority: extra
Section: libs
Installed-Size: 309859
Maintainer: Guohan Lu <[email protected]>
Architecture: amd64
Source: saibcm
Version: 6.0.0.13
Provides: libsai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants