-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[techsupport] Removed interactive option for docker commands and Improved Error Reporting #1723
Conversation
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
…o ts_error_handle
scripts/generate_dump
Outdated
local start_t=$(date +%s%3N) | ||
local end_t=0 | ||
local cmd="$1" | ||
local filename=$2 | ||
local filepath="${LOGDIR}/$filename" | ||
local do_gzip=${3:-false} | ||
local save_stderr=${4:-true} | ||
local save_stderr=${4:-$SAVE_STDERR} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usecase of stderr=False as i see is only for the sonic-mgmt auto-techsupport test.
If this is set to false, any intermediate stderr will be redirected to the stderr of the techsupport and the test can be enhanced to capture the stderr. Thereby we can have a single location to view the errors reported by any of the intermediate steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usecase of stderr=False as i see is only for the sonic-mgmt auto-techsupport test.
Could you give a link of the usage in sonic-mgmt repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not there yet. But we are planning to update it once this gets merged.
local start_t=$(date +%s%3N) | ||
local end_t=0 | ||
local docker=$1 | ||
local filename=$2 | ||
local dstpath=$3 | ||
local timeout_cmd="timeout --foreground ${TIMEOUT_MIN}m" | ||
|
||
local touch_cmd="sudo docker exec -i ${docker} touch ${filename}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why do we require -i for a touch command. Infact i've looked at other docker exec commands used in this script. All of them just write to stdout, so i don't see why -i has to be retained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know what do you think about this
Signed-off-by: Vivek Reddy Karri <[email protected]>
8f2cc58
Signed-off-by: Vivek Reddy Karri <[email protected]>
scripts/generate_dump
Outdated
local start_t=$(date +%s%3N) | ||
local end_t=0 | ||
local cmd="$1" | ||
local filename=$2 | ||
local filepath="${LOGDIR}/$filename" | ||
local do_gzip=${3:-false} | ||
local save_stderr=${4:-true} | ||
local save_stderr=${4:-$SAVE_STDERR} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what i understand, wouldn't that require adding an extra argument to all the save_cmd invocations across the script? That would be fine if this argument is something like filename
which might change for different invocations.
But since the scope of this flag is global, i think it makes sense to have it inside the parameter value. Let me know if you think otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could understand your point. Does it make more sense to only use global variable and remove $4
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stderr arg was first introduced here #1335.
And for some reason, this was set to false in the normal execution itself (probably because he only wanted to collect stdout). I did not want to disturb it and thus retained the argument, but if not provided read from the global variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you discuss with @stepanblyschak in the same company to understand the original intention? To me, use a global variable as default value is weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stepanblyschak, confirmed that this local variable $4 can be removed. I'll remove it and will only have the global variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE
@qiluo-msft Can you please review? |
Signed-off-by: Vivek Reddy Karri <[email protected]>
This PR could not be cleanly cherry-picked to 202012. Please submit another PR. |
Raised a separate PR: #1833 |
This PR could not be cleanly cherry-picked to 202106. Please submit another PR. |
|
What I did Fix: sonic-net/sonic-buildimage#8850 Issue was introduced by #1723, #1833, and #1843 (pending merge) The error_handler is a great idea but the bash script needs to be error free first. How I did it Fix bash script errors. How to verify it run show techsupport test.. Signed-off-by: Ying Xie <[email protected]>
What I did Fix: sonic-net/sonic-buildimage#8850 Issue was introduced by #1723, #1833, and #1843 (pending merge) The error_handler is a great idea but the bash script needs to be error free first. How I did it Fix bash script errors. How to verify it run show techsupport test.. Signed-off-by: Ying Xie <[email protected]>
@vivekreddynv @dgsudharsan The new behavior is that the script will fail immediately if any command return error. However in reality, this is not convenient. Let's assume there is an image bug, some command fail or this script itself has bug, we still need to collect as much as possible data. Could you improve and let the following commands continue running? |
Hi @qiluo-msft, The script will run in it's entirety (will create the archive with all the dumps) even though if any of the intermediate steps are failed. And in the end, it will exit with an rc=1 . But i understand the issue seen here #1844. When a timeout happens, the script returns with a non-zero exit code and bypasses the "Command timedout error log". i believe that is fixed in the PR. Is exiting with non-zero code not okay in the command timedout case? |
@yxieca to check |
…#4409) What is the motivation for this PR? show tech support command test case is failing. The change was from: sonic-net/sonic-utilities#1723 How did you do it? Update the command format How did you verify/test it? Run show_techsupport test. Signed-off-by: Ying Xie [email protected]
…#1844) What I did Fix: sonic-net/sonic-buildimage#8850 Issue was introduced by sonic-net#1723, sonic-net#1833, and sonic-net#1843 (pending merge) The error_handler is a great idea but the bash script needs to be error free first. How I did it Fix bash script errors. How to verify it run show techsupport test.. Signed-off-by: Ying Xie <[email protected]>
…oved Error Reporting (#1843) What I did Fix: sonic-net/sonic-buildimage#8850 Issue was introduced by #1723, #1833, and #1843 (pending merge) The error_handler is a great idea but the bash script needs to be error free first. How I did it Fix bash script errors. How to verify it run show techsupport test.. Signed-off-by: Ying Xie <[email protected]>
#### What I did This PR include some fixes which were missed while manually porting the error reporting PR onto 202012 #1833. i.e. removing -it option from the docker exec commands. to understand why the -it option was removed, refer #1723 This also include another fix which removes -d from the show ip interface command, which fails otherwise. **Note:** -d option for "show ip interface" is working on master and 202106. and not for 202012. So, this change is particular to 202012. Master: ``` admin@sonic-master-imge:~$ show ip interfaces -d all Interface Master IPv4 address/mask Admin/Oper BGP Neighbor Neighbor IP --------------- -------- ------------------- ------------ -------------- ------------- Loopback0 10.1.0.32/32 up/up N/A N/A PortChannel0001 10.0.0.56/31 up/up ARISTA01T1 10.0.0.57 PortChannel0002 10.0.0.58/31 up/up ARISTA02T1 10.0.0.59 PortChannel0003 10.0.0.60/31 up/up ARISTA03T1 10.0.0.61 PortChannel0004 10.0.0.62/31 up/up ARISTA04T1 10.0.0.63 Vlan1000 192.168.0.1/21 up/up N/A N/A docker0 240.127.1.1/24 up/down N/A N/A eth0 10.75.206.180/24 up/up N/A N/A lo 127.0.0.1/16 up/up N/A N/A ``` 202012: ``` admin@sonic-202012-image:~$ show ip interfaces -d all Usage: show ip interfaces [OPTIONS] Try "show ip interfaces -h" for help. Error: no such option: -d ``` #### How I did it #### How to verify it - Run the show tech-support and check the return status. It should be zero. (Atleast, it was on mellanox platform. I couldn't check the functions which were platform specific) - Run the "show techsupport" test.
- What I did This PR include some fixes which were missed for #1723 i.e. removing -t option from the docker exec commands. to understand why the -it option was removed, refer #1723. Also, the show techsupport exits with $RETURN_CODE only when --redirect-stderr option is used. Signed-off-by: Vivek Reddy Karri <[email protected]>
This commit could not be cleanly cherry-picked to 202012. Please submit another PR. |
…oved Error Reporting (sonic-net#1833) PR sonic-net#1723 cannot be cherry-picked directly to 202012. Thus raised a separate PR
…#1844) What I did Fix: sonic-net/sonic-buildimage#8850 Issue was introduced by sonic-net#1723, sonic-net#1833, and sonic-net#1843 (pending merge) The error_handler is a great idea but the bash script needs to be error free first. How I did it Fix bash script errors. How to verify it run show techsupport test.. Signed-off-by: Ying Xie <[email protected]>
Why I did
Recently, a bug was seen which was related to saisdkdump and particularly showed up when
show techsupport
was invoked. Although, it was fixed, the sonic-mgmt test failed to capture it beforehand.This highlighted a few shortcomings of the
generate_dump
script and this PR addresses those and also a few additional issues seenThis PR fixes a few things, I'll explain each of them in the next section.
What I did
1) Remove the "Interactive option (-i) for the docker invocation commands"
This was the reason why the bug which was was not captured previously. When the techsupport was invoked remotely (Eg: using sshpass), the
docker exec -it <docker> <cmd>
command would fail saying‘the input device is not a TTY'
. Hence the (-i) option was removed.2) Change the Return Code
Currently, the script doesn't return any non-zero error codes for most of the intermediate steps (even though they fail), which makes validation hard.
To handle this, a helper function and trap cmd are used.
The global variable RETURN_CODE is set when this is called and the same RETURN_CODE is returned when
generate_dump
invocation process exitsYou may see this is used in multiple functions instead of placing it once on the top of the script. This is because, every function can itself be considered as a subshell and each of them requires a explicit trap command.
When a command is failed with error, this logic would get append a corresponding log to stderr.
ERR: RC:-1 observed on line 729
3) Improve Error Reporting for save_cmd function
Currently any error written to the stderr by the intermediate calls are redirected to the same location as stdout, which is usually the file we see under dump/ dir. This is perfectly fine, but the sonic-mgmt test only parses the text seen in stdout.
So, a new option (-r) is added to
generate_dump
script and subsequently toshow techsupport
to redirect any intermediate errors seen to the generate_dump's stderr.With this option enabled, these sort of errors can be captured on the stderr.
4) Minor Error in sdk-dump collection logic handled
save_file is only called for the files seen in sdk_dump_path and not for directories
The reason being,
find /tmp/sdk-dumps
returns ["/tmp/sdk-dumps"] even if the dir is empty. In the next step, save_file cmd is applied on the folder and thus the error. This can be handled by the change specified above5) Minor Error in custom plugins logic handled
Added a condition to check if the dir exists before proceeding forward.
Otherwise, find command might fail saying
NOTE: The last two issues were found out because of the error reporting logic added
How I did it
How to verify it
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)