Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove disabled and not loaded services before calling reset-failed and restart services #2266

Merged
merged 3 commits into from
Jul 13, 2022

Conversation

zbud-msft
Copy link
Contributor

@zbud-msft zbud-msft commented Jul 11, 2022

What I did

Added logic to remove disabled and not loaded services before calling reset-failed/restart services. Certain services like telemetry can go down and become disabled, which would cause load_minigraph to fail when resetting failed services. Services that are not loaded or disabled should not impact reset or start of other services.

How I did it

Added logic to remove services that are disabled or not loaded from the group of listed services for that specific operation. such as resetting failed or restart.

How to verify it

Manual testing. Bring down a service such as telemetry via mask or config feature state telemetry disabled, and it should not impact load_minigraph

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

@@ -601,6 +617,7 @@ def _reset_failed_services():
'telemetry'
]

_remove_invalid_services(services_to_reset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_remove_invalid_services

If a service is disabled in FEATURE table, we can still "reset failed", right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, is there a case in which we would want to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking of below timeline:

  1. a service became unhealthy and trapped in a failed status
  2. user decided to disabled it in FEATURE table
  3. user config reload

The expectation should be the "failed" status be "reset", like a fresh new service.

You state "No". Do you mean technically impossible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user disables service in feature table and reloads config.db, hostcfgd will mask and disable the feature. Meaning the service is no longer loaded or active.

If we try to reset-failed the service, reset-failed command will exit with error "service not loaded".

The fix is that if service is masked we do not want to run reset-failed or restart commands as those will return errors.

if service in services_to_restart:
services_to_restart.remove(service)

_remove_invalid_services(services_to_restart)
Copy link
Contributor

@qiluo-msft qiluo-msft Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_remove_invalid_services

Please add a testcase which fails old code and pass new code. It could be unit test or sonic-mgmt testcase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add sonic-mgmt test case

@qiluo-msft
Copy link
Contributor

qiluo-msft commented Jul 12, 2022

What about older or newer branches? especially on 202012?

@zbud-msft zbud-msft merged commit 62b7b56 into sonic-net:201911 Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants