Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sonic-CLI 'show priority-group drop counters' crashes with key-error #19779

Open
amitpawar12 opened this issue Aug 2, 2024 · 14 comments
Open
Assignees
Labels
NOKIA Triaged this issue has been triaged

Comments

@amitpawar12
Copy link

Description

sonic-CLI - show priority-group drop counters - crashes with key-error especially, after clearing the CLI counter.

@saksarav-nokia

Steps to reproduce the issue:

  1. Clear the priority group counter
admin@ixre-egl-board73:/tmp/cache/pg-drop/0$ sudo ip netns exec asic0 sonic-clear priority-group drop counters 
Cleared PG drop counter

  1. Issue the CLI to query the priority-group counter.
admin@ixre-egl-board73:/tmp/cache/pg-drop/0$ sudo ip netns exec asic0 show priority-group drop counters 
Traceback (most recent call last):
  File "/usr/local/bin/pg-drop", line 272, in <module>
    main()
  File "/usr/local/bin/pg-drop", line 265, in main
    pgdropstat.print_all_stat(COUNTER_TABLE_PREFIX, "pg_drop" )
  File "/usr/local/bin/pg-drop", line 174, in print_all_stat
    data = self.get_counters(table_prefix, type["obj_map"][port], type["idx_func"], type["counter_name"])
  File "/usr/local/bin/pg-drop", line 154, in get_counters
    old_collected_data = port_drop_ckpt.get(name,{})[full_table_id] if len(port_drop_ckpt) > 0 else 0
KeyError: 'COUNTERS:oid:0x1a0000000001ac'

  1. Tried the CLI on different line-cards and it works fine there on both the ASICs. Tried clearing and checking, but CLI works fine on the line-card. But then tried this and was seeing the issue:
admin@ixre-egl-board74:~$ sudo sonic-clear priority-group drop counters
COUNTERS_PORT_NAME_MAP is empty!

admin@ixre-egl-board74:~$ sudo ip netns exec asic0 show priority-group drop counters
Ingress PG dropped packets:
         Port    PG0    PG1    PG2    PG3    PG4    PG5    PG6    PG7
-------------  -----  -----  -----  -----  -----  -----  -----  -----
    Ethernet0      0      0      0      0      0      0      0      0
    Ethernet8      0      0      0      0      0      0      0      0
   Ethernet16      0      0      0      0      0      0      0      0
   Ethernet24      0      0      0      0      0      0      0      0
   Ethernet32      0      0      0      0      0      0      0      0
   Ethernet40      0      0      0      0      0      0      0      0
   Ethernet48      0      0      0      0      0      0      0      0
   Ethernet56      0      0      0      0      0      0      0      0
   Ethernet64      0      0      0      0      0      0      0      0
   Ethernet72      0      0      0      0      0      0      0      0
   Ethernet80      0      0      0      0      0      0      0      0
   Ethernet88      0      0      0      0      0      0      0      0
   Ethernet96      0      0      0      0      0      0      0      0
  Ethernet104      0      0      0      0      0      0      0      0
  Ethernet112      0      0      0      0      0      0      0      0
  Ethernet120      0      0      0      0      0      0      0      0
  Ethernet128      0      0      0      0      0      0      0      0
  Ethernet136      0      0      0      0      0      0      0      0
 Ethernet-IB0      0      0      0      0      0      0      0      0
Ethernet-Rec0      0      0      0      0      0      0      0      0
admin@ixre-egl-board74:~$ sudo ip netns exec asic1 show priority-group drop counters
Traceback (most recent call last):
  File "/usr/local/bin/pg-drop", line 272, in <module>
    main()
  File "/usr/local/bin/pg-drop", line 265, in main
    pgdropstat.print_all_stat(COUNTER_TABLE_PREFIX, "pg_drop" )
  File "/usr/local/bin/pg-drop", line 174, in print_all_stat
    data = self.get_counters(table_prefix, type["obj_map"][port], type["idx_func"], type["counter_name"])
  File "/usr/local/bin/pg-drop", line 154, in get_counters
    old_collected_data = port_drop_ckpt.get(name,{})[full_table_id] if len(port_drop_ckpt) > 0 else 0
KeyError: 'COUNTERS:oid:0x11a0000000001ac'

Describe the results you received:

  1. If the CLI is on asic-level, then support to clear the counter without ASIC should not be supported. Seems like clearing the counters without any ASIC could be causing this issue.
  2. The CLI crashes on trying to query with ASIC value.
  3. Workaround is to delete the counter-file in /tmp/cache/pg-drop/0 and CLI will work but will not return the latest values.

Describe the results you expected:

  1. Return the correct drop counters everytime.

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@saksarav-nokia
Copy link
Contributor

i will debug

@zjswhhh
Copy link
Contributor

zjswhhh commented Aug 14, 2024

@saksarav-nokia - please provide as an updated. thanks!

@zjswhhh zjswhhh added Triaged this issue has been triaged NOKIA labels Aug 14, 2024
@saksarav-nokia
Copy link
Contributor

This seems to be very fundamental issue when the user issues sonic-clear and then show for the commands in multi-asic switch. If you issue sonic-clear for asic0, then asic1 and then do show for asic0 OR son-clear for asic1, sonic-clear for asic0 and then do show for asic1, you will see this issue. The reason is that the soni-clear command creates the files in /tmp/cache/pg-drop each time you run clear command and overwrites the asic 0's, when you clear for asic 0 first and then asic 0 (or asic1 file depending on the order you issue sonic-clear). Then when you do show for asic0, it reads this file created for asic 1 and it does look up for asic 0 pg oid which does n't exist in the file and script prints the error.
Since this is very basic issue and applies for all the clear commands in multi-asic , need to discuss with sonic community for the fix.

@saksarav-nokia
Copy link
Contributor

@arlakshm @judyjoseph @vmittal-msft for viz

@saksarav-nokia
Copy link
Contributor

If we add the multi-asic support to sonic-clear command for pg drop counter, then the cache file can be created with prefixing ns and show command can read the corresponding history from cache.

@vmittal-msft
Copy link
Contributor

@kenneth-arista can you/team please check this ? This is related to multi asic support for Qos commands.

@kenneth-arista
Copy link
Contributor

We are looking. Will update soon.

@kenneth-arista
Copy link
Contributor

The solution is to not use ip netns exec before running CLI command related to "priority-group drop counters" because native multi-ASIC support has been added recently to these family of commands. Instead use the built-in -n argument. See sonic-net/sonic-utilities#3058 for further details.

The reason is that ip netns exec ... limits the Linux network namespace, which conflicts with the default use of the multi_asic decorator for adding multi-asic support to existing commands. I believe historically folks have been using ip netns exec as a hack to get around old commands that haven't been taught about multi-asic. But we are putting effort into enhancing all Qos commands to natively support multi-asic. Tracking issue: #15148.

@rlhui rlhui assigned kenneth-arista and unassigned vmittal-msft Aug 21, 2024
@amitpawar12
Copy link
Author

@kenneth-arista

Tested with 2205 build and I saw that there is no support for namespace in -show priority-group - CLI. Please let me know if I am missing something here.

Logs:

===========================================
-- No -n option with priority-group CLI
===========================================
admin@ixre-egl-board71:~$ sudo show priority-group -n asic0 drop counters
Usage: show priority-group [OPTIONS] COMMAND [ARGS]...
Try "show priority-group -h" for help.

Error: no such option: -n

admin@ixre-egl-board71:~$ sudo show priority-group -n asic0 drop -h
Usage: show priority-group [OPTIONS] COMMAND [ARGS]...
Try "show priority-group -h" for help.

Error: no such option: -n

===========================================
-- Help has NO info about the namespace option.
===========================================

admin@ixre-egl-board71:~$ sudo show priority-group -h
Usage: show priority-group [OPTIONS] COMMAND [ARGS]...

  Show details of the PGs

Options:
  -?, -h, --help  Show this message and exit.

Commands:
  drop                  Show priority-group
  persistent-watermark  Show priority-group persistent WM
  watermark             Show priority-group user WM

===========================================
-- Help after the 'drop' keyword also does not have any info about namespace.
===========================================
admin@ixre-egl-board71:~$ sudo show priority-group drop -h
Usage: show priority-group drop [OPTIONS] COMMAND [ARGS]...

  Show priority-group

Options:
  -?, -h, --help  Show this message and exit.

Commands:
  counters  Show dropped packets for priority-group

-- Trying option of namespace after the end of the CLI

admin@ixre-egl-board71:~$ sudo show priority-group drop counters --namespace asic0
Usage: show priority-group drop counters [OPTIONS]
Try "show priority-group drop counters -h" for help.

Error: no such option: --namespace

admin@ixre-egl-board71:~$ sudo show priority-group drop counters -n asic0
Usage: show priority-group drop counters [OPTIONS]
Try "show priority-group drop counters -h" for help.

Error: no such option: -n

===========================================
-- Trying CLI without SUDO option
===========================================

admin@ixre-egl-board71:~$ show priority-group drop counters -n asic0
Usage: show priority-group drop counters [OPTIONS]
Try "show priority-group drop counters -h" for help.

Error: no such option: -n
admin@ixre-egl-board71:~$ show priority-group drop counters --namespace asic0
Usage: show priority-group drop counters [OPTIONS]
Try "show priority-group drop counters -h" for help.

Error: no such option: --namespace


==========================================
-- Clearing the counters
==========================================
admin@ixre-egl-board71:~$ sonic-clear priority-group drop counters -h
Usage: sonic-clear priority-group drop counters [OPTIONS]

  Clear priority-group dropped packets counter

Options:
  -h, -?, --help  Show this message and exit.
admin@ixre-egl-board71:~$ sudo sonic-clear priority-group drop counters -h
Usage: sonic-clear priority-group drop counters [OPTIONS]

  Clear priority-group dropped packets counter

Options:
  -?, -h, --help  Show this message and exit.

@kenneth-arista
Copy link
Contributor

kenneth-arista commented Sep 4, 2024

202205 does not support the mult-asic version of this command. sonic-net/sonic-utilities#3058 merged to master. Please test there.

@kenneth-arista kenneth-arista moved this to Done in SONiC Chassis Sep 4, 2024
@kenneth-arista
Copy link
Contributor

Please close this issue as I don't have permission to do so.

@kenneth-arista kenneth-arista removed their assignment Sep 18, 2024
@amitpawar12
Copy link
Author

@vmittal-msft , @kenneth-arista - I see that following pull-request is tagged for 202405 but seems not merged yet.
sonic-net/sonic-utilities#3058

I tried this on 202405 and still seeing the issue.

@vmittal-msft vmittal-msft self-assigned this Dec 5, 2024
@vmittal-msft
Copy link
Contributor

@amitpawar12 @saksarav-nokia please retry as fix is part of 202405 now. Please check w/o using any "show ip netns" prefix and see if standard sonic commands are able to clear as well show.

@kenneth-arista
Copy link
Contributor

@vmittal-msft although the change merged to sonic-utilities in its 202405 branch, sonic-buildimage needs a submodule update to pick up the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NOKIA Triaged this issue has been triaged
Projects
Archived in project
Development

No branches or pull requests

5 participants