-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Monit] Unmonitor the processes in containers which are disabled. #5153
[Monit] Unmonitor the processes in containers which are disabled. #5153
Conversation
Signed-off-by: Yong Zhao <[email protected]>
Signed-off-by: Yong Zhao <[email protected]>
While looking into fix for Issue##5292 I came across this PR. I am wondering it will be easier we can use 'group' concept of monit. We can control monitor/unmonitor of this group from hostcfgd same as how we start/unmask and stop/mask the service. I verified locally and looks good. Updated Telemtry monit file with group check process telemetry matching "/usr/sbin/telemetry" check process dialout_client matching "/usr/sbin/dialout_client_cli" Now when Feature is disabled we can execute sudo monit -g telemetry.service unmonitor from hostcfgd. Any concern in this approach ? |
@abdosi: Thank you for this suggestion. I think it is a feasable solution. The only concern I have is we would then spread the need for knowledge of Monit to other components (in this case, hostcfgd). I would prefer to encapsulate and abstract this knowledge as much as possible, but I wouldn't say it's a dealbreaker. I think we can go ahead with this PR for the time being, and @yozhao101 can investigate the 'group' approach that you propose as a future cleanup/enhancement. |
@jleveque Thanks. I felt since hostcfgd already has awareness with Feature Table and Services Enable/Disable monit going there makes it more logical. |
@jleveque Thanks so much for helping me answering this question. I will investigate the |
files. Signed-off-by: Yong Zhao <[email protected]>
host under /etc/monit/conf.d in docker-teamd.mk. Signed-off-by: Yong Zhao <[email protected]>
Signed-off-by: Yong Zhao <[email protected]>
platform/barefoot/docker-syncd-bfn/base_image_files/monit_syncd
Outdated
Show resolved
Hide resolved
platform/mellanox/docker-syncd-mlnx/base_image_files/monit_syncd
Outdated
Show resolved
Hide resolved
…to match the syncd process. Signed-off-by: Yong Zhao <[email protected]>
process_checker. Signed-off-by: Yong Zhao <[email protected]>
Signed-off-by: Yong Zhao <[email protected]>
instead of process name in syslog. Signed-off-by: Yong Zhao <[email protected]>
Signed-off-by: Yong Zhao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check builds are failing and there are conflicts. Please address.
Signed-off-by: Yong Zhao <[email protected]>
Fixed. |
Retest baseimage please |
Retest vsimage please |
1 similar comment
Retest vsimage please |
@yozhao101: Will this PR cherry-pick cleanly to 201911, or will you need to open a separate PR? |
I will try cherry-pick this PR to 201911. |
process_checker. Signed-off-by: Yong Zhao <[email protected]>
Signed-off-by: Yong Zhao <[email protected]>
Retest vsimage please |
2 similar comments
Retest vsimage please |
Retest vsimage please |
We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that Monit will not generate false alerting messages into the syslog. - Backport of #5153 to the 201911 branch Signed-off-by: Yong Zhao <[email protected]>
) We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that Monit will not generate false alerting messages into the syslog. Signed-off-by: Yong Zhao <[email protected]>
…nic-net#5153) We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that Monit will not generate false alerting messages into the syslog. Signed-off-by: Yong Zhao <[email protected]>
- Why I did it
We want to let Monit to unmonitor the processes in containers which are disabled in
FEATURE
table such thatMonit will not generate false alerting messages into the syslog.
- How I did it
Monit will periodically run a script which accepts three parameters:
<container_name>
,<process_name>
and<process_cmdline>
. This script will first check whether the container is disabled in theFEATURE
table or not.If it is disabled, Monit will skip monitoring the processes. Otherwise, this script will leverage psutil library to inspect
the process tree in host to look for the processes. If the process is not found, then an alerting message will be written
into syslog.
- How to verify it
We can change the
state
field of a container inFEATURE
table fromenabled
todisabled
and then kill a criticalprocess in it to see whether Monit can generate the alerting message in syslog or not. The message format in syslog is:
<process_name> is not running.
- Which release branch to backport (provide reason below if selected)
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)