Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Monit] Unmonitor the processes in containers which are disabled. #5153

Merged
merged 13 commits into from
Sep 25, 2020
Merged

[Monit] Unmonitor the processes in containers which are disabled. #5153

merged 13 commits into from
Sep 25, 2020

Conversation

yozhao101
Copy link
Contributor

- Why I did it
We want to let Monit to unmonitor the processes in containers which are disabled in FEATURE table such that
Monit will not generate false alerting messages into the syslog.

- How I did it
Monit will periodically run a script which accepts three parameters: <container_name>, <process_name> and
<process_cmdline>. This script will first check whether the container is disabled in the FEATURE table or not.
If it is disabled, Monit will skip monitoring the processes. Otherwise, this script will leverage psutil library to inspect
the process tree in host to look for the processes. If the process is not found, then an alerting message will be written
into syslog.

- How to verify it
We can change the state field of a container in FEATURE table from enabled to disabled and then kill a critical
process in it to see whether Monit can generate the alerting message in syslog or not. The message format in syslog is:
<process_name> is not running.

- Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@yozhao101 yozhao101 requested review from jleveque and lguohan August 11, 2020 23:16
@abdosi
Copy link
Contributor

abdosi commented Sep 4, 2020

@yozhao101 @jleveque

While looking into fix for Issue##5292 I came across this PR. I am wondering it will be easier we can use 'group' concept of monit. We can control monitor/unmonitor of this group from hostcfgd same as how we start/unmask and stop/mask the service. I verified locally and looks good.

Updated Telemtry monit file with group

check process telemetry matching "/usr/sbin/telemetry"
if does not exist for 5 times within 5 cycles then alert
group telemetry.service

check process dialout_client matching "/usr/sbin/dialout_client_cli"
if does not exist for 5 times within 5 cycles then alert
group telemetry.service

Now when Feature is disabled we can execute sudo monit -g telemetry.service unmonitor from hostcfgd.
And when Feature is enabled we can execute sudo monit -g telemetry.service monitor from hostcfgd.

Any concern in this approach ?

@jleveque
Copy link
Contributor

jleveque commented Sep 8, 2020

@abdosi: Thank you for this suggestion. I think it is a feasable solution. The only concern I have is we would then spread the need for knowledge of Monit to other components (in this case, hostcfgd). I would prefer to encapsulate and abstract this knowledge as much as possible, but I wouldn't say it's a dealbreaker.

I think we can go ahead with this PR for the time being, and @yozhao101 can investigate the 'group' approach that you propose as a future cleanup/enhancement.

@abdosi
Copy link
Contributor

abdosi commented Sep 8, 2020

@abdosi: Thank you for this suggestion. I think it is a feasable solution. The only concern I have is we would then spread the need for knowledge of Monit to other components (in this case, hostcfgd). I would prefer to encapsulate and abstract this knowledge as much as possible, but I wouldn't say it's a dealbreaker.

I think we can go ahead with this PR for the time being, and @yozhao101 can investigate the 'group' approach that you propose as a future cleanup/enhancement.

@jleveque Thanks. I felt since hostcfgd already has awareness with Feature Table and Services Enable/Disable monit going there makes it more logical.

@yozhao101
Copy link
Contributor Author

yozhao101 commented Sep 8, 2020

@abdosi: Thank you for this suggestion. I think it is a feasable solution. The only concern I have is we would then spread the need for knowledge of Monit to other components (in this case, hostcfgd). I would prefer to encapsulate and abstract this knowledge as much as possible, but I wouldn't say it's a dealbreaker.

I think we can go ahead with this PR for the time being, and @yozhao101 can investigate the 'group' approach that you propose as a future cleanup/enhancement.

@jleveque Thanks so much for helping me answering this question. I will investigate the group approach and give the feedback.

Copy link
Contributor

@jleveque jleveque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check builds are failing and there are conflicts. Please address.

@yozhao101
Copy link
Contributor Author

Check builds are failing and there are conflicts. Please address.

Fixed.

@jleveque
Copy link
Contributor

Retest baseimage please

@jleveque
Copy link
Contributor

Retest vsimage please

1 similar comment
@jleveque
Copy link
Contributor

Retest vsimage please

@jleveque
Copy link
Contributor

@yozhao101: Will this PR cherry-pick cleanly to 201911, or will you need to open a separate PR?

@yozhao101
Copy link
Contributor Author

@yozhao101: Will this PR cherry-pick cleanly to 201911, or will you need to open a separate PR?

I will try cherry-pick this PR to 201911.

Signed-off-by: Yong Zhao <[email protected]>
@yozhao101
Copy link
Contributor Author

Retest vsimage please

2 similar comments
@yozhao101
Copy link
Contributor Author

Retest vsimage please

@yozhao101
Copy link
Contributor Author

Retest vsimage please

@jleveque jleveque merged commit 13cec4c into sonic-net:master Sep 25, 2020
jleveque pushed a commit that referenced this pull request Sep 25, 2020
We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that
Monit will not generate false alerting messages into the syslog.

- Backport of #5153 to the 201911 branch

Signed-off-by: Yong Zhao <[email protected]>
lguohan pushed a commit that referenced this pull request Dec 9, 2020
)

We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that
Monit will not generate false alerting messages into the syslog.

Signed-off-by: Yong Zhao <[email protected]>
santhosh-kt pushed a commit to santhosh-kt/sonic-buildimage that referenced this pull request Feb 25, 2021
…nic-net#5153)

We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that
Monit will not generate false alerting messages into the syslog.

Signed-off-by: Yong Zhao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants