-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding global-timeout, individual command timeout, log files collection #1249
Conversation
retest default please |
retest this please |
1 similar comment
retest this please |
Thanks for enhancing the tech support collection. One suggestion I have is next time PRs like this can be split into multiple PRs with each PR having a specific fix or enhancement. This makes it easier to revert a PR when there's an issue, also the PR looks cleaner and easier on the reviewer. |
Have you raised or going to raise a PR for tech support unit tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to break this into multiple PRs, up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- As most of the review comments are regarding the multi_asic, I would like to keep all this in single PR.
- I have done manual testing
- I am adding some tests in sonic-mgmt for the changes
Thanks for making the changes. Looks good. Since you added new features, in case you are still working on writing unit tests for the new features, would like to see the output. For example.. the output of "global timeout" case and the "command timeout" case. I'm sure you've tested it, just 'copy/paste' the output. Regarding "I am adding some tests in sonic-mgmt for the changes" -- have you raised a PR or are working on it. Adding @qiluo-msft for another pair of eyes. Rest looks good to me. |
retest this please |
1 similar comment
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Please wait for one more approval.
@lguohan - Can this one merge now pls? |
local filepath="${LOGDIR}/$filename" | ||
local do_gzip=${3:-false} | ||
local tarpath="${BASE}/dump/$filename" | ||
local timeout_cmd="timeout --foreground ${TIMEOUT_MIN}m" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout [](start = 23, length = 7)
bcmcmd
support timeout -t
. Could you use that instead of Could you use that and maybe together with timeout
command?timeout
command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qiluo-msft
We can't predict the behaviour of 'bcmcmd' in case syncd is crashed/stuck/in bad shape. The change you are suggesting might require very extensive testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'timeout' supersedes the '-t' option of 'bcmcmd'. Is the concern regarding 'bcmcmd' may end up in a bad state like leaving the socket open?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bcmcmd
is not well tested with signal termination. I am not sure.
In reply to: 550679331 [](ancestors = 550679331,546269345)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qiluo-msft
"Could you use that and maybe together with timeout command" --- The bcmcmd has a default timeout of 30 sec. If no argument is given using -t option, '-t 30' is automatically taken. So the outside 'timeout' command is being used together with bcmcmd timeout option.
root@sonic:/home/admin# bcmcmd
USAGE: bcmcmd [-f <sun_path>] -v <cmd>
-v verbose mode
-f domain socket filename, default /var/run/sswsyncd/sswsyncd.socket
-t timeout in seconds, default 30
RETURN VALUE:
0 success
5 socket io error
22 invalid argument
62 timeout
root@sonic:/home/admin#
scripts/generate_dump
Outdated
|
||
handle_signal() | ||
{ | ||
echo "Generate Dump received interrupt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echo [](start = 4, length = 4)
echo to stderr? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
scripts/generate_dump
Outdated
handle_signal() | ||
{ | ||
echo "Generate Dump received interrupt" | ||
$RM $V -rf $TARDIR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$RM $V [](start = 4, length = 6)
These variables should be defined before this function. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if [[ ( "$NUM_ASICS" > 1 ) ]]; then | ||
for (( i=0; i<$NUM_ASICS; i++ )) | ||
do | ||
local cmd="bcmcmd -n $i $1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bcmcmd -n [](start = 23, length = 9)
bcmcmd
does not have -n
option. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change was requested by @smaheshm.
bcmcmd has been modified to include the ASIC number. sonic-net/sonic-buildimage#4926
scripts/generate_dump
Outdated
@@ -135,7 +233,7 @@ get_vtysh_namespace() { | |||
else | |||
ns=" -n ${asic_id}" | |||
fi | |||
echo "$ns" | |||
#echo "$ns" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#echo "$ns" [](start = 4, length = 11)
The function stdout is changed. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the comment
save_vtysh "show bfd peers json" "frr.bfd.peers.json" | ||
save_vtysh "show bfd peers counters json" "frr.bfd.peers.counters.json" | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra blank line #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
scripts/generate_dump
Outdated
@@ -335,13 +512,19 @@ save_proc() { | |||
############################################################################### | |||
# Dumps all fields and values from given Redis DB. | |||
# Arguments: | |||
# DB id: id of DB for sonic-db-cli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DB id [](start = 3, length = 5)
The option has nothing to do with DB id. Why add? Could you remove? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the comments appropriately
retest this please |
2 similar comments
retest this please |
retest this please |
@qiluo-msft - can this proceed to approve/merge now? All checks are passing and I believe review comments have been addressed. |
Pls merge |
…on (sonic-net#1249) and some other enhancements to techsupport **- What I did** Following is the brief description of the changes, - Adding a ‘--silent’ option to ‘show techsupport’ command. Various tar/untar, addition and removal logs appear on the console by default. This option would disable above logs. - Adding global and per-command timeouts. This would provide more user control on ‘show techsupport’ CLI. - Adding time profiling information for the commands in techsupport. Time profiling information would be part of the tarball and helps to analyse the time consumption per command. - Sometimes ‘syncd’ docker is down and bcmshell is unavailable. In such cases all the bcmcmd commands would timeout and result in tremendous increase in the total techsupport collection time. We provided an option to skip rest of the bcmcmd commands once one command times out. - Added ‘show services’, ‘show reboot-cause’ and various BGP, BFD, bcm shell and other commands - Optimised the /var/log files collection. If the number of files are large in /var/log folder, it takes a long time to add each individually to the tarball. If the folder is tar'ed at once, the time taken reduces significantly. - Following error was observed while tar'ing softlinks inside .etc folder. ** Tar append operation failed. Aborting for safety. ** This issue was due to softlinks present at /etc folder where the destination file is absent. Fixed this issue by deleting such softlinks before adding them to the tarball. **- How I did it** - Added new options to the CLICK command 'show techsupport' - Modified the 'generate_dump' script to accomodate other changes **- How to verify it** Here are some outputs, root@sonic:/home/admin# show techsupport --silent Techsupport is running with silent option. This command might take a long time. HW Mgmt dump script /usr/bin/hw-management-generate-dump.sh does not exist /var/dump/sonic_dump_sonic_20201117_161246.tar.gz root@sonic:/home/admin# root@sonic:~# show techsupport -h Usage: show techsupport [OPTIONS] Gather information for troubleshooting Options: --since TEXT Collect logs and core files since given date -g, --global-timeout INTEGER Global timeout for techsupport in minutes. Default 30 mins -c, --cmd-timeout INTEGER Command timeout for techsupport in minutes. Default 5 mins --verbose Enable verbose output --silent Run techsupport in silent mode -?, -h, --help Show this message and exit. root@sonic:~# **- Previous command output (if the output of a command-line utility has changed)** - Previous command "show techsupport" works as is **- New command output (if the output of a command-line utility has changed)**
and some other enhancements to techsupport
- What I did
Following is the brief description of the changes,
** Tar append operation failed. Aborting for safety. **
This issue was due to softlinks present at /etc folder where the destination file is absent. Fixed this issue by deleting such softlinks before adding them to the tarball.
- How I did it
- How to verify it
Here are some outputs,
root@sonic:/home/admin# show techsupport --silent
Techsupport is running with silent option. This command might take a long time.
HW Mgmt dump script /usr/bin/hw-management-generate-dump.sh does not exist
/var/dump/sonic_dump_sonic_20201117_161246.tar.gz
root@sonic:/home/admin#
root@sonic:~# show techsupport -h
Usage: show techsupport [OPTIONS]
Gather information for troubleshooting
Options:
--since TEXT Collect logs and core files since given date
-g, --global-timeout INTEGER Global timeout for techsupport in minutes.
Default 30 mins
-c, --cmd-timeout INTEGER Command timeout for techsupport in minutes.
Default 5 mins
--verbose Enable verbose output
--silent Run techsupport in silent mode
-?, -h, --help Show this message and exit.
root@sonic:~#
- Previous command output (if the output of a command-line utility has changed)
- New command output (if the output of a command-line utility has changed)