Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. #5720

Merged
merged 6 commits into from
Nov 1, 2020

Conversation

abdosi
Copy link
Contributor

@abdosi abdosi commented Oct 26, 2020

Why/How I did:

  1. Make sure first error syslog is triggered based on FAULT TOLERANCE condition.

  2. Added support of repeat clause with alert action. This is used as trigger
    for generation of periodic syslog error messages if error is persistent

  3. Updated the monit conf files with repeat every x cycle for the alert action

For example:

Make sure monit is honoring below clause in generating error syslog.
if status != 0 for x cycle then alert repeat every y cycle.
With above clause error syslog will be generated after x cycle and for
every yth cycle if error is persistent

How I verify:

a) check program routeCheck with path "/usr/bin/route_check.py"
every 1 cycles
if status != 0 for 3 cycle then alert repeat every 3 cycles

Oct 26 18:12:54.773398 ERR monit[480]: 'routeCheck' status failed (255) -- no output
Oct 26 18:15:54.940595 ERR monit[480]: 'routeCheck' status failed (255) -- no output
Oct 26 18:18:55.091670 ERR monit[480]: 'routeCheck' status failed (255) -- no output

b) Verify monit status is fine.

c) Verify for process that are not running we are getting ERR message periodically after first failure.

is handle for clause like this:

if status != 0 for x cycle then alert repeat every y cycle.

With above clause error syslog will be generated after x cycle and for
every yth cycle if error is persistent

Signed-off-by: Abhishek Dosi <[email protected]>
Signed-off-by: Abhishek Dosi <[email protected]>
Copy link
Contributor

@jleveque jleveque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdosi: When I glanced over the Monit source last week, I envisioned adding the alert logging to the handle_alert() function in alert.c. However, I hadn't thought about how to get the message to repeat. Just curious why you chose the _handleEvent() function.

@abdosi
Copy link
Contributor Author

abdosi commented Oct 26, 2020

@abdosi: When I glanced over the Monit source last week, I envisioned adding the alert logging to the handle_alert() function in alert.c. However, I hadn't thought about how to get the message to repeat. Just curious why you chose the _handleEvent() function.

@jleveque Change was based on this commit done for Exec Action. Also i thought it will be less changes to patch.
https://bitbucket.org/tildeslash/monit/src/3367b41ba4cf2af4cadb9dc58d9d6c4770b41dda/src/event.c#lines-317

Signed-off-by: Abhishek Dosi <[email protected]>
@abdosi
Copy link
Contributor Author

abdosi commented Oct 29, 2020

@jleveque and @yozhao101 I have updated monit files also in same PR. Please review.

Signed-off-by: Abhishek Dosi <[email protected]>
@abdosi
Copy link
Contributor Author

abdosi commented Oct 30, 2020

Retest this please

@abdosi
Copy link
Contributor Author

abdosi commented Oct 30, 2020

retest baseimage please

@abdosi
Copy link
Contributor Author

abdosi commented Oct 30, 2020

retest vsimage please

@abdosi
Copy link
Contributor Author

abdosi commented Oct 31, 2020

retest buildimage please

@abdosi
Copy link
Contributor Author

abdosi commented Oct 31, 2020

retest vsimage please

@abdosi
Copy link
Contributor Author

abdosi commented Oct 31, 2020

retest baseimage please

@abdosi abdosi merged commit dddf969 into sonic-net:master Nov 1, 2020
@abdosi abdosi deleted the monit_enhancement branch November 1, 2020 00:29
abdosi added a commit that referenced this pull request Nov 1, 2020
…onit alert action when status is failed. (#5720)

Why/How I did:

Make sure first error syslog is triggered based on FAULT TOLERANCE condition.

Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent

Updated the monit conf files with repeat every x cycles for the alert action
lguohan added a commit that referenced this pull request Dec 9, 2020
…onit alert action when status is failed. (#5720)

Why/How I did:

Make sure first error syslog is triggered based on FAULT TOLERANCE condition.

Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent

Updated the monit conf files with repeat every x cycles for the alert action

Signed-off-by: Guohan Lu <[email protected]>
santhosh-kt pushed a commit to santhosh-kt/sonic-buildimage that referenced this pull request Feb 25, 2021
…onit alert action when status is failed. (sonic-net#5720)

Why/How I did:

Make sure first error syslog is triggered based on FAULT TOLERANCE condition.

Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent

Updated the monit conf files with repeat every x cycles for the alert action
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants