Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[202012] [S6100] Improve S6100 serial-getty monitor, wait and re-check when getty not running to avoid false alert. #14755

Merged

Conversation

liuh-80
Copy link
Contributor

@liuh-80 liuh-80 commented Apr 20, 2023

[S6100] Improve S6100 serial-getty monitor, wait and re-check when getty not running to avoid false alert.

This is cherry-pick PR for: #14402

Why I did it

On S6100, the serial-getty service some time can't auto-restart by systemd. So there is a monit unit to check serial-getty service status and restart it.

However, this monit will report false alert, because in most case when serial-getty not running, systemd can restart it successfully.

To avoid the false alert, improve the monitor to wait and re-check.

Steps to reproduce this issue:

  1. User login to device via console, and keep the connection.
  2. User login to device via SSH, check the [email protected] service, it's running.
  3. Run 'monit reload' from SSH connection.
  4. Check syslog 1 minutes later, there will be false alert: ' 'serial-getty' process is not running'
Work item tracking
  • Microsoft ADO :17424426

How I did it

Add check-getty.sh script to recheck again later when getty service not running.
And update monit unit to check serial-getty service status with this script to avoid false alert.

How to verify it

Pass all UT.
Manually check fixed code work correctly:

admin@***:~$ sudo systemctl stop  [email protected]
admin@***:~$ sudo /usr/local/bin/check-getty.sh 
admin@***:~$ echo $?
1
admin@***:~$ sudo systemctl status [email protected][email protected] - Serial Getty on ttyS1
     Loaded: loaded (/lib/systemd/system/[email protected]; enabled-runtime; vendor preset: enabled)
     Active: inactive (dead) since Tue 2023-03-28 07:15:21 UTC; 1min 13s ago

admin@***:~$ sudo /usr/local/bin/check-getty.sh 
admin@***:~$ echo $?
0
admin@***:~$ sudo systemctl status [email protected][email protected] - Serial Getty on ttyS1
     Loaded: loaded (/lib/systemd/system/[email protected]; enabled-runtime; vendor preset: enabled)

syslog:

Mar 28 07:10:37.597458 *** INFO systemd[1]: [email protected]: Succeeded.
Mar 28 07:12:43.010550 *** ERR monit[593]: 'serial-getty' status failed (1) -- no output
Mar 28 07:12:43.010744 *** INFO monit[593]: 'serial-getty' trying to restart
Mar 28 07:12:43.010846 *** INFO monit[593]: 'serial-getty' stop: '/bin/systemctl stop [email protected]'
Mar 28 07:12:43.132172 *** INFO monit[593]: 'serial-getty' start: '/bin/systemctl start [email protected]'
Mar 28 07:13:43.286276 *** INFO monit[593]: 'serial-getty' status succeeded (0) -- no output

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211

Tested branch (Please provide the tested image version)

  • 20201231.77

Description for the changelog

[S6100] Improve S6100 serial-getty monitor.

Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

…tty not running to avoid false alert. (sonic-net#14402)

[S6100] Improve S6100 serial-getty monitor, wait and re-check when getty not running to avoid false alert.

On S6100, the serial-getty service some time can't auto-restart by systemd. So there is a monit unit to check serial-getty service status and restart it.

However, this monit will report false alert, because in most case when serial-getty not running, systemd can restart it successfully.

To avoid the false alert, improve the monitor to wait and re-check.

Steps to reproduce this issue:
1. User login to device via console, and keep the connection.
2. User login to device via SSH, check the [email protected] service, it's running.
3. Run 'monit reload' from SSH connection.
4. Check syslog 1 minutes later, there will be false alert: ' 'serial-getty' process is not running'

Add check-getty.sh script to recheck again later when getty service not running.
And update monit unit to check serial-getty service status with this script to avoid false alert.

Pass all UT.
Manually check fixed code work correctly:

```
admin@***:~$ sudo systemctl stop  [email protected]
admin@***:~$ sudo /usr/local/bin/check-getty.sh 
admin@***:~$ echo $?
1
admin@***:~$ sudo systemctl status [email protected][email protected] - Serial Getty on ttyS1
     Loaded: loaded (/lib/systemd/system/[email protected]; enabled-runtime; vendor preset: enabled)
     Active: inactive (dead) since Tue 2023-03-28 07:15:21 UTC; 1min 13s ago

admin@***:~$ sudo /usr/local/bin/check-getty.sh 
admin@***:~$ echo $?
0
admin@***:~$ sudo systemctl status [email protected][email protected] - Serial Getty on ttyS1
     Loaded: loaded (/lib/systemd/system/[email protected]; enabled-runtime; vendor preset: enabled)
```

syslog:
```
Mar 28 07:10:37.597458 *** INFO systemd[1]: [email protected]: Succeeded.
Mar 28 07:12:43.010550 *** ERR monit[593]: 'serial-getty' status failed (1) -- no output
Mar 28 07:12:43.010744 *** INFO monit[593]: 'serial-getty' trying to restart
Mar 28 07:12:43.010846 *** INFO monit[593]: 'serial-getty' stop: '/bin/systemctl stop [email protected]'
Mar 28 07:12:43.132172 *** INFO monit[593]: 'serial-getty' start: '/bin/systemctl start [email protected]'
Mar 28 07:13:43.286276 *** INFO monit[593]: 'serial-getty' status succeeded (0) -- no output
```

[S6100] Improve S6100 serial-getty monitor.
@qiluo-msft qiluo-msft merged commit 1f3da95 into sonic-net:202012 Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants