Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect rate values in counters_db rates table populated through rates lua script #8392

Closed
dgsudharsan opened this issue Aug 9, 2021 · 1 comment

Comments

@dgsudharsan
Copy link
Collaborator

dgsudharsan commented Aug 9, 2021

Description

RIF rates and port rates calculation has couple of issues.

  1. The counterpoll command gives polling interval in milliseconds. Lua script expects the delta (time difference) arguments to be in seconds. However, syncd multiplies it by 1000 which I think is wrong.(Rates would be order of 10^6 less).
    https://github.com/Azure/sonic-sairedis/blob/master/syncd/FlexCounter.cpp#L1600

  2. Based on the HLD, the rates are calculated using difference between old counter and new counter values. However according to lua script, the _last counter values become stagnant and never gets updated after the initialization time which again I feel is wrong. Ideally to calculate the rate, we should have counters before the delta and after the delta divided by delta.
    https://github.com/Azure/sonic-swss/blob/4f1d726d4cbf8a283b22cd5f612cf03ca21a27b3/orchagent/rif_rates.lua

Steps to reproduce the issue:

  1. Enable RIF counters through counterpoll
  2. Check counters DB for RIF rates in RATES table

Describe the results you received:

I tried to compare CLI vs lua script (CLI with period calculates accurately) with 5000 pps flood ping from a server.

root@r-bulldog-03:~# show interfaces  counters rif -p 5

The rates are calculated within 5 seconds period

     IFACE    RX_OK       RX_BPS     RX_PPS    RX_ERR    TX_OK    TX_BPS    TX_PPS    TX_ERR

----------  -------  -----------  ---------  --------  -------  --------  --------  --------

Ethernet0        0     0.00 B/s     0.00/s         0        0  0.00 B/s    0.00/s         0

Ethernet28   29,894  501.05 KB/s  5964.85/s         1        0  0.00 B/s    0.00/s         0

LUA rates

127.0.0.1:6379[2]> hgetall "RATES:oid:0x6000000000517"

1) "SAI_ROUTER_INTERFACE_STAT_IN_OCTETS_last"

2) "17856552"

3) "SAI_ROUTER_INTERFACE_STAT_IN_PACKETS_last"

4) "212578"

5) "SAI_ROUTER_INTERFACE_STAT_OUT_OCTETS_last"

6) "0"

7) "SAI_ROUTER_INTERFACE_STAT_OUT_PACKETS_last"

8) "0"

9) "RX_BPS"

10) "67.774654124880783"

11) "RX_PPS"

12) "0.80684145386762807"

13) "TX_BPS"

14) "0"

15) "TX_PPS"

16) "0"

Describe the results you expected:

Rates should be comparable to the rate of packets sent.

Output of show version:

SONiC Software Version: SONiC.master.168-8a48be9b7_Internal
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-amd64
Build commit: 8a48be9b7
Build date: Mon Jul 26 20:54:20 UTC 2021
Built by: sw-r2d2-bot@r-build-sonic-ci02-242

Platform: x86_64-mlnx_msn3420-r0
HwSKU: ACS-MSN3420
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2019X13878
Model Number: MSN3420-CB2FO
Hardware Revision: A1
Uptime: 17:55:59 up  3:09,  1 user,  load average: 0.23, 0.23, 0.35

Docker images:
REPOSITORY                                 TAG                             IMAGE ID            SIZE
harbor.mellanox.com/sonic-p4/p4-sampling   0.2.0                           d00b8650205c        745MB
docker-dhcp-relay                          latest                          eca8bac3e2d7        420MB
docker-syncd-mlnx                          latest                          dc8034f91dd8        963MB
docker-syncd-mlnx                          master.168-8a48be9b7_Internal   dc8034f91dd8        963MB
docker-sflow                               latest                          d8dce2f2a223        425MB
docker-sflow                               master.168-8a48be9b7_Internal   d8dce2f2a223        425MB
docker-snmp                                latest                          e1a20eb3e413        454MB
docker-snmp                                master.168-8a48be9b7_Internal   e1a20eb3e413        454MB
docker-fpm-frr                             latest                          cb4a6788026c        443MB
docker-fpm-frr                             master.168-8a48be9b7_Internal   cb4a6788026c        443MB
docker-teamd                               latest                          12aab6e73048        425MB
docker-teamd                               master.168-8a48be9b7_Internal   12aab6e73048        425MB
docker-platform-monitor                    latest                          45a45d29a112        740MB
docker-platform-monitor                    master.168-8a48be9b7_Internal   45a45d29a112        740MB
docker-router-advertiser                   latest                          0cc63e524e68        413MB
docker-router-advertiser                   master.168-8a48be9b7_Internal   0cc63e524e68        413MB
docker-lldp                                latest                          a9b9f625a902        453MB
docker-lldp                                master.168-8a48be9b7_Internal   a9b9f625a902        453MB
docker-database                            latest                          725e585d4761        413MB
docker-database                            master.168-8a48be9b7_Internal   725e585d4761        413MB
docker-orchagent                           latest                          e8b718a59001        443MB
docker-orchagent                           master.168-8a48be9b7_Internal   e8b718a59001        443MB
docker-nat                                 latest                          d38dfb77ae3c        427MB
docker-nat                                 master.168-8a48be9b7_Internal   d38dfb77ae3c        427MB
docker-macsec                              latest                          f57528b827f1        428MB
docker-macsec                              master.168-8a48be9b7_Internal   f57528b827f1        428MB
docker-sonic-telemetry                     latest                          f2dfbb5cf85e        501MB
docker-sonic-telemetry                     master.168-8a48be9b7_Internal   f2dfbb5cf85e        501MB
docker-sonic-mgmt-framework                latest                          49d507e169b0        569MB
docker-sonic-mgmt-framework                master.168-8a48be9b7_Internal   49d507e169b0        569MB
docker-sonic-restapi                       latest                          981f766bb707        357MB
docker-sonic-restapi                       master.168-8a48be9b7_Internal   981f766bb707        357MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@dgsudharsan
Copy link
Collaborator Author

@abdosi FYI.
@noaOrMlnx Please tag this issue to the fix.

qiluo-msft pushed a commit to sonic-net/sonic-sairedis that referenced this issue Sep 2, 2021
#878)

According to https://github.com/Azure/SONiC/blob/ec6d35dd2c28491bfade19cfee990fe612f1e5e9/doc/rates-and-utilization/Rates_and_utilization_HLD.md, counterpoll command gives polling interval in milliseconds. 
So when converting them to seconds to be supplied to lua script this should be divide by 1000. However, syncd multiplies it by 1000.

Changed the multiplication to none, and did the converting in lua script - sonic-net/sonic-swss#1855
Fixed issue - sonic-net/sonic-buildimage#8392
qiluo-msft pushed a commit to sonic-net/sonic-sairedis that referenced this issue Sep 2, 2021
#878)

According to https://github.com/Azure/SONiC/blob/ec6d35dd2c28491bfade19cfee990fe612f1e5e9/doc/rates-and-utilization/Rates_and_utilization_HLD.md, counterpoll command gives polling interval in milliseconds. 
So when converting them to seconds to be supplied to lua script this should be divide by 1000. However, syncd multiplies it by 1000.

Changed the multiplication to none, and did the converting in lua script - sonic-net/sonic-swss#1855
Fixed issue - sonic-net/sonic-buildimage#8392
qiluo-msft pushed a commit to sonic-net/sonic-swss that referenced this issue Sep 3, 2021
… MS (#1855)

**What I did**
Update the rif_rates/lua script to multiply by 1000 instead of FlexCounter class.
related also to this PR - sonic-net/sonic-sairedis#878

Fix issue - sonic-net/sonic-buildimage#8392
**Why I did it**
times were not calculated properly.
**How I verified it**
check that the output of cli and redis is close enough:
qiluo-msft pushed a commit to sonic-net/sonic-swss that referenced this issue Sep 3, 2021
… MS (#1855)

**What I did**
Update the rif_rates/lua script to multiply by 1000 instead of FlexCounter class.
related also to this PR - sonic-net/sonic-sairedis#878

Fix issue - sonic-net/sonic-buildimage#8392
**Why I did it**
times were not calculated properly.
**How I verified it**
check that the output of cli and redis is close enough:
judyjoseph pushed a commit to sonic-net/sonic-swss that referenced this issue Sep 14, 2021
… MS (#1855)

**What I did**
Update the rif_rates/lua script to multiply by 1000 instead of FlexCounter class.
related also to this PR - sonic-net/sonic-sairedis#878

Fix issue - sonic-net/sonic-buildimage#8392
**Why I did it**
times were not calculated properly.
**How I verified it**
check that the output of cli and redis is close enough:
judyjoseph pushed a commit to sonic-net/sonic-sairedis that referenced this issue Oct 5, 2021
#878)

According to https://github.com/Azure/SONiC/blob/ec6d35dd2c28491bfade19cfee990fe612f1e5e9/doc/rates-and-utilization/Rates_and_utilization_HLD.md, counterpoll command gives polling interval in milliseconds. 
So when converting them to seconds to be supplied to lua script this should be divide by 1000. However, syncd multiplies it by 1000.

Changed the multiplication to none, and did the converting in lua script - sonic-net/sonic-swss#1855
Fixed issue - sonic-net/sonic-buildimage#8392
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-swss that referenced this issue Oct 5, 2021
… MS (sonic-net#1855)

**What I did**
Update the rif_rates/lua script to multiply by 1000 instead of FlexCounter class.
related also to this PR - sonic-net/sonic-sairedis#878

Fix issue - sonic-net/sonic-buildimage#8392
**Why I did it**
times were not calculated properly.
**How I verified it**
check that the output of cli and redis is close enough:
pettershao-ragilenetworks pushed a commit to pettershao-ragilenetworks/sonic-sairedis that referenced this issue Nov 18, 2022
sonic-net#878)

According to https://github.com/Azure/SONiC/blob/ec6d35dd2c28491bfade19cfee990fe612f1e5e9/doc/rates-and-utilization/Rates_and_utilization_HLD.md, counterpoll command gives polling interval in milliseconds. 
So when converting them to seconds to be supplied to lua script this should be divide by 1000. However, syncd multiplies it by 1000.

Changed the multiplication to none, and did the converting in lua script - sonic-net/sonic-swss#1855
Fixed issue - sonic-net/sonic-buildimage#8392
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant