-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove disabled and not loaded services before calling reset-failed and restart services #2266
Remove disabled and not loaded services before calling reset-failed and restart services #2266
Conversation
…nd restarting services
@@ -601,6 +617,7 @@ def _reset_failed_services(): | |||
'telemetry' | |||
] | |||
|
|||
_remove_invalid_services(services_to_reset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, is there a case in which we would want to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking of below timeline:
- a service became unhealthy and trapped in a failed status
- user decided to disabled it in FEATURE table
- user
config reload
The expectation should be the "failed" status be "reset", like a fresh new service.
You state "No". Do you mean technically impossible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If user disables service in feature table and reloads config.db, hostcfgd will mask and disable the feature. Meaning the service is no longer loaded or active.
If we try to reset-failed the service, reset-failed command will exit with error "service not loaded".
The fix is that if service is masked we do not want to run reset-failed or restart commands as those will return errors.
if service in services_to_restart: | ||
services_to_restart.remove(service) | ||
|
||
_remove_invalid_services(services_to_restart) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add sonic-mgmt test case
What about |
What I did
Added logic to remove disabled and not loaded services before calling reset-failed/restart services. Certain services like telemetry can go down and become disabled, which would cause load_minigraph to fail when resetting failed services. Services that are not loaded or disabled should not impact reset or start of other services.
How I did it
Added logic to remove services that are disabled or not loaded from the group of listed services for that specific operation. such as resetting failed or restart.
How to verify it
Manual testing. Bring down a service such as telemetry via mask or config feature state telemetry disabled, and it should not impact load_minigraph
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)