From 1f3da955b96162ceed2bcd31b52a63cbad23bed4 Mon Sep 17 00:00:00 2001 From: Hua Liu <58683130+liuh-80@users.noreply.github.com> Date: Fri, 21 Apr 2023 14:10:01 +0800 Subject: [PATCH] [S6100] Improve S6100 serial-getty monitor, wait and re-check when getty not running to avoid false alert. (#14402) (#14755) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [S6100] Improve S6100 serial-getty monitor, wait and re-check when getty not running to avoid false alert. This is cherry-pick PR for: https://github.com/sonic-net/sonic-buildimage/pull/14402 #### Why I did it On S6100, the serial-getty service some time can't auto-restart by systemd. So there is a monit unit to check serial-getty service status and restart it. However, this monit will report false alert, because in most case when serial-getty not running, systemd can restart it successfully. To avoid the false alert, improve the monitor to wait and re-check. Steps to reproduce this issue: 1. User login to device via console, and keep the connection. 2. User login to device via SSH, check the serial-getty@ttyS1.service service, it's running. 3. Run 'monit reload' from SSH connection. 4. Check syslog 1 minutes later, there will be false alert: ' 'serial-getty' process is not running' ##### Work item tracking - Microsoft ADO :17424426 #### How I did it Add check-getty.sh script to recheck again later when getty service not running. And update monit unit to check serial-getty service status with this script to avoid false alert. #### How to verify it Pass all UT. Manually check fixed code work correctly: ``` admin@***:~$ sudo systemctl stop  serial-getty@ttyS1.service admin@***:~$ sudo /usr/local/bin/check-getty.sh  admin@***:~$ echo $? 1 admin@***:~$ sudo systemctl status serial-getty@ttyS1.service ● serial-getty@ttyS1.service - Serial Getty on ttyS1      Loaded: loaded (/lib/systemd/system/serial-getty@.service; enabled-runtime; vendor preset: enabled)      Active: inactive (dead) since Tue 2023-03-28 07:15:21 UTC; 1min 13s ago admin@***:~$ sudo /usr/local/bin/check-getty.sh  admin@***:~$ echo $? 0 admin@***:~$ sudo systemctl status serial-getty@ttyS1.service ● serial-getty@ttyS1.service - Serial Getty on ttyS1      Loaded: loaded (/lib/systemd/system/serial-getty@.service; enabled-runtime; vendor preset: enabled) ``` syslog: ``` Mar 28 07:10:37.597458 *** INFO systemd[1]: serial-getty@ttyS1.service: Succeeded. Mar 28 07:12:43.010550 *** ERR monit[593]: 'serial-getty' status failed (1) -- no output Mar 28 07:12:43.010744 *** INFO monit[593]: 'serial-getty' trying to restart Mar 28 07:12:43.010846 *** INFO monit[593]: 'serial-getty' stop: '/bin/systemctl stop serial-getty@ttyS1.service' Mar 28 07:12:43.132172 *** INFO monit[593]: 'serial-getty' start: '/bin/systemctl start serial-getty@ttyS1.service' Mar 28 07:13:43.286276 *** INFO monit[593]: 'serial-getty' status succeeded (0) -- no output ``` #### Tested branch (Please provide the tested image version) - [x] 20201231.77 #### Description for the changelog [S6100] Improve S6100 serial-getty monitor. --- .../debian/platform-modules-s6100.install | 1 + .../s6100/scripts/check-getty.sh | 17 +++++++++++++++++ .../s6100/scripts/s6100_serial_getty_monitor | 3 ++- 3 files changed, 20 insertions(+), 1 deletion(-) create mode 100755 platform/broadcom/sonic-platform-modules-dell/s6100/scripts/check-getty.sh diff --git a/platform/broadcom/sonic-platform-modules-dell/debian/platform-modules-s6100.install b/platform/broadcom/sonic-platform-modules-dell/debian/platform-modules-s6100.install index 512e0529455e..5bae3b607c53 100644 --- a/platform/broadcom/sonic-platform-modules-dell/debian/platform-modules-s6100.install +++ b/platform/broadcom/sonic-platform-modules-dell/debian/platform-modules-s6100.install @@ -14,6 +14,7 @@ s6100/scripts/soft-reboot_plugin usr/share/sonic/device/x86_64-dell_s6100_c2538- s6100/scripts/ssd-fw-upgrade usr/share/sonic/device/x86_64-dell_s6100_c2538-r0 s6100/scripts/override.conf /etc/systemd/system/systemd-reboot.service.d s6100/scripts/s6100_serial_getty_monitor etc/monit/conf.d +s6100/scripts/check-getty.sh usr/local/bin common/dell_lpc_mon.sh usr/local/bin s6100/scripts/s6100_ssd_mon.sh usr/local/bin s6100/scripts/s6100_ssd_upgrade_status.sh usr/local/bin diff --git a/platform/broadcom/sonic-platform-modules-dell/s6100/scripts/check-getty.sh b/platform/broadcom/sonic-platform-modules-dell/s6100/scripts/check-getty.sh new file mode 100755 index 000000000000..9c6412eddf0b --- /dev/null +++ b/platform/broadcom/sonic-platform-modules-dell/s6100/scripts/check-getty.sh @@ -0,0 +1,17 @@ +#!/bin/bash + +RETRY=0 +while [ $RETRY -lt 5 ]; do + let RETRY=$RETRY+1 + + /bin/systemctl --quiet is-active serial-getty@ttyS1.service + status=$? + if [ $status == 0 ]; then + exit 0 + fi + + # when serial-getty not running, recheck later, beause systemd will restart serial-getty automatically. + sleep 1 +done + +exit 1 \ No newline at end of file diff --git a/platform/broadcom/sonic-platform-modules-dell/s6100/scripts/s6100_serial_getty_monitor b/platform/broadcom/sonic-platform-modules-dell/s6100/scripts/s6100_serial_getty_monitor index 1b5d0c90db37..f57ae3679016 100644 --- a/platform/broadcom/sonic-platform-modules-dell/s6100/scripts/s6100_serial_getty_monitor +++ b/platform/broadcom/sonic-platform-modules-dell/s6100/scripts/s6100_serial_getty_monitor @@ -1,4 +1,5 @@ #Dell S6100 serial getty monitor -check process serial-getty matching "ttyS" +check program serial-getty with path /usr/local/bin/check-getty.sh start program = "/bin/systemctl start serial-getty@ttyS1.service" stop program = "/bin/systemctl stop serial-getty@ttyS1.service" +if status != 0 then restart \ No newline at end of file