Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sflow] system crashed once sflow is enabled and switch has 200G+ interfaces #6793

Open
Hedgehog-Guru opened this issue Feb 16, 2021 · 8 comments
Labels
Triaged this issue has been triaged

Comments

@Hedgehog-Guru
Copy link

Description

If switch has 200G and above interfaces system crash occur after sflow was enabled

Steps to reproduce the issue:

  1. Enable sflow feature
config feature state sflow enabled
  1. Make sure at least one interface is oper-up and has 200G or above speed
config interface speed Ethernet24 200000
show interfaces status Ethernet24
  Interface        Lanes    Speed    MTU    FEC    Alias    Vlan    Oper    Admin             Type    Asym PFC
-----------  -----------  -------  -----  -----  -------  ------  ------  -------  ---------------  ----------
 Ethernet24  24,25,26,27     200G   9100    N/A     etp7  routed    down       up  QSFP28 or later         N/A
  1. Enable sflow
config sflow enable 
  1. Check system health for example by "pgrep orchagent"

Describe the results you received:

System crashed

Describe the results you expected:

Stable run

Output of show version:

SONiC Software Version: SONiC.SONIC.202012.10-d26a4af_Internal
Distribution: Debian 10.7
Kernel: 4.19.0-9-2-amd64
Build commit: d26a4aff
Build date: Thu Feb  4 15:28:36 UTC 2021
Built by: sw-r2d2-bot@r-build-sonic-ci02

Platform: x86_64-mlnx_msn3700-r0
HwSKU: ACS-MSN3700
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1852X03965
Uptime: 17:46:35 up 12 min,  1 user,  load average: 0.07, 0.84, 0.72
[sonic_dump_qa-anconda-test10_20210216_173758.tar.gz](https://github.com/Azure/sonic-buildimage/files/5989890/sonic_dump_qa-anconda-test10_20210216_173758.tar.gz)

Additional information you deem important (e.g. issue happens only occasionally):

sonic_dump_qa-anconda-test10_20210216_173758.tar.gz

@prsunny
Copy link
Contributor

prsunny commented Feb 16, 2021

@padmanarayana , @dgsudharsan , could you please take a look and suggest next steps?

@anshuv-mfst
Copy link

Issue Triage 2/17: Dell team to provide input on the issue, thanks!

@anshuv-mfst anshuv-mfst added the Triaged this issue has been triaged label Feb 17, 2021
@liat-grozovik
Copy link
Collaborator

@padmanarayana kindly reminder

@padmanarayana
Copy link
Contributor

@liat-grozovik : the dump is from an Internal build. Nevertheless, it is very likely that the 200G is failing because there is no entry in either https://github.com/Azure/sonic-swss/blob/288fb40d8ff4ec825645c2fbab1e79f50881a9f2/cfgmgr/sflowmgr.cpp#L13 or https://github.com/Azure/sonic-swss/blob/288fb40d8ff4ec825645c2fbab1e79f50881a9f2/cfgmgr/sflowmgr.h#L14. We'll check and get back.

@GarrickHe
Copy link
Contributor

@Hedgehog-Guru - We don't have a 200G interface. Can we provide a patch and you build and re-test on your end?

Thanks,
Garrick

@liat-grozovik
Copy link
Collaborator

liat-grozovik commented Mar 8, 2021 via email

@vadymhlushko-mlnx
Copy link
Contributor

@GarrickHe kind reminder, is there are any updates?

liat-grozovik pushed a commit to sonic-net/sonic-swss that referenced this issue Mar 29, 2021
- What I did
Added 200G entry into the speed-rate map. Also handled the case which programs empty string into APP-DB and thus leading to a failure of Orchagent.
Default Sampling rate for 200G is set to 20000

- Why I did it
Fix for Issue: sonic-net/sonic-buildimage#6793

- How I verified it
run sflow community test under sonic-mgmt

Co-authored-by: Vivek Reddy Karri <[email protected]>
daall pushed a commit to sonic-net/sonic-swss that referenced this issue Apr 1, 2021
- What I did
Added 200G entry into the speed-rate map. Also handled the case which programs empty string into APP-DB and thus leading to a failure of Orchagent.
Default Sampling rate for 200G is set to 20000

- Why I did it
Fix for Issue: sonic-net/sonic-buildimage#6793

- How I verified it
run sflow community test under sonic-mgmt

Co-authored-by: Vivek Reddy Karri <[email protected]>
@vivekrnv
Copy link
Contributor

This issue can be closed.

raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-swss that referenced this issue Oct 5, 2021
- What I did
Added 200G entry into the speed-rate map. Also handled the case which programs empty string into APP-DB and thus leading to a failure of Orchagent.
Default Sampling rate for 200G is set to 20000

- Why I did it
Fix for Issue: sonic-net/sonic-buildimage#6793

- How I verified it
run sflow community test under sonic-mgmt

Co-authored-by: Vivek Reddy Karri <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

8 participants