Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed determine/process reboot-cause service dependency #17406

Merged

Conversation

anamehra
Copy link
Contributor

@anamehra anamehra commented Dec 4, 2023

Signed-off-by: anamehra [email protected]

Why I did it

Fixes #16990 for 202305/202205 branch

Note: This PR is for 202305 and 202205. For master, a new PR will be raised with a new field (Uphold=) provided by debian bookworm to handle the dependency failure restartability of the processes.

  1. determine-reboot-cause and process-reboot-cause service does not start If the database service fails to restart in the first attempt. Even if the Database service succeeds in the next attempt, these reboot-cause services do not start.

  2. The process-reboot-cause service also does not restart if the docker or database service restarts, which leads to an empty reboot-cause history

  3. deploy-mg from sonic-mgmt also triggers the docker service restart. The restart of the docker service caused the issue stated in 2 above. The docker restart also triggers determine-reboot-cause to restart which creates an additional reboot-cause file in history and modifies the last reboot-cause.

This PR fixes these issues by making both processes start again when dependency meets after dependency failure, making both processes restart when the database service restarts, and preventing duplicate processing of the last reboot reason.

Work item tracking
  • Microsoft ADO 25892856

How I did it

  1. Modified systemd unit files to make determine-reboot-cause and process-reboot-cause services restartable when the database service restarts.
  2. On the restart, the determine-reboot-cause service should not recreate a new reboot-cause entry in the database. Added check for first start or restart to skip entry for restart case.

How to verify it

On single asic pizza box:

  1. Installed the image and check reboot-cause history
  2. restart database service and verify that determine-reboot-cause and process-reboot-cause services also restart. Verify that reboot-cause shows correct data and no new entry is created for restart.

On Chassis:

  1. Installed the image and check reboot-cause history
  2. restart the database service and verify that determine-reboot-cause and process-reboot-cause services also restart. Verify that reboot-cause shows correct data and no new entry is created for restart.
  3. Reboot LC. On Supervicor, stop database-chassis service.
    Let database service on LC fail the first time. determine-reboot-cause and process-reboot-cause would fail to start due to dependency failure
    start database-chassis on Supervisor. Database service on LC should now start successfully.
    Verify determine-reboot-cause and process-reboot-cause also starts
    Verify show reboot-cause history output

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@anamehra
Copy link
Contributor Author

anamehra commented Dec 5, 2023

Hi @prgeor , @gechiang , for your review. Thanks

@prgeor
Copy link
Contributor

prgeor commented Dec 5, 2023

@StormLiangMS can you merge this for 202305

@prgeor
Copy link
Contributor

prgeor commented Dec 5, 2023

@yxieca please help cherry pick to 202205

@yxieca
Copy link
Contributor

yxieca commented Dec 6, 2023

@anamehra please help raise a ticket for 202205 branch directly. I don't think the automation will cherry-pick PR from feature branch to feature branch.

@anamehra
Copy link
Contributor Author

anamehra commented Dec 8, 2023

@anamehra please help raise a ticket for 202205 branch directly. I don't think the automation will cherry-pick PR from feature branch to feature branch.

Please review following for 202205
#17462

@gechiang
Copy link
Collaborator

@StormLiangMS Please help review/merge this to 202305.
This is a bug fix for all platforms.

@StormLiangMS StormLiangMS merged commit 4595db4 into sonic-net:202305 Dec 17, 2023
18 checks passed
@anamehra
Copy link
Contributor Author

@abdosi , @rlhui , may we have this in 202405? Thanks

@gechiang
Copy link
Collaborator

@anamehra , looks like automation is not able to cleanly pick this PR into 202405 branch. can you please submit a separate PR directly under the 202405 branch and link this PR to it.

@anamehra
Copy link
Contributor Author

@anamehra , looks like automation is not able to cleanly pick this PR into 202405 branch. can you please submit a separate PR directly under the 202405 branch and link this PR to it.

Hi @gechiang, done!

@gechiang
Copy link
Collaborator

Hi @anamehra , I just realized that this PR you raised (#17406)
was not on the "master" branch but of the 202305 branch.
Ideally you should raise this PR on the master branch and then request backport to other older branches.
Can I ask you to convert your 202404 specific PR (sonic-net/sonic-host-services#132) to the master branch instead?
Then on the Master branch based PR, we can ask for the backport to 202405 again and I think it should be able to cleanly picked since master and 202405 are pretty close at this moment.
If we don't fix this, you will end up keep catching up whenever a new release is created...
Thanks!

@anamehra
Copy link
Contributor Author

Hi @anamehra , I just realized that this PR you raised (#17406) was not on the "master" branch but of the 202305 branch. Ideally you should raise this PR on the master branch and then request backport to other older branches. Can I ask you to convert your 202404 specific PR (sonic-net/sonic-host-services#132) to the master branch instead? Then on the Master branch based PR, we can ask for the backport to 202405 again and I think it should be able to cleanly picked since master and 202405 are pretty close at this moment. If we don't fix this, you will end up keep catching up whenever a new release is created... Thanks!

Hi @gechiang , I raised this for 202405 only due to a discussion in the community forum. There is PR with 'Uphold' fix and I am waiting on responses to my queries on that before I raise a PR on master. My original PR in master was initially rejected to use uphold.
@abdosi , @prgeor , if. you agree, I can raise this PR #17406 on master. Thanks

mssonicbld added a commit that referenced this pull request Jul 12, 2024
…utomatically (#19415)

#### Why I did it
src/sonic-host-services
```
* 02d9b55 - (HEAD -> master, origin/master, origin/HEAD) Added support to render template format of `delayed` flag on Feature Table. (#135) (28 hours ago) [abdosi]
* 60fdfea - Fixed determine/process reboot-cause service dependency (#17406) (#132) (13 days ago) [anamehra]
```
#### How I did it
#### How to verify it
#### Description for the changelog
mssonicbld added a commit that referenced this pull request Jul 23, 2024
…utomatically (#19551)

#### Why I did it
src/sonic-host-services
```
* aea0bef - (HEAD -> 202405, origin/202405) Ignore sonic_platform package fileNotFoundError on non-chassis vs platforms (#133) (#140) (4 minutes ago) [mssonicbld]
* 0e7e4d5 - Added support to render template format of `delayed` flag on Feature Table. (#135) (#137) (11 days ago) [mssonicbld]
* 235c2a4 - Fixed determine/process reboot-cause service dependency (#17406) (#132) (13 days ago) [anamehra]
```
#### How I did it
#### How to verify it
#### Description for the changelog
arun1355492 pushed a commit to arun1355492/sonic-buildimage that referenced this pull request Jul 26, 2024
…utomatically (sonic-net#19415)

#### Why I did it
src/sonic-host-services
```
* 02d9b55 - (HEAD -> master, origin/master, origin/HEAD) Added support to render template format of `delayed` flag on Feature Table. (sonic-net#135) (28 hours ago) [abdosi]
* 60fdfea - Fixed determine/process reboot-cause service dependency (sonic-net#17406) (sonic-net#132) (13 days ago) [anamehra]
```
#### How I did it
#### How to verify it
#### Description for the changelog
liushilongbuaa pushed a commit to liushilongbuaa/sonic-buildimage that referenced this pull request Aug 1, 2024
…utomatically (sonic-net#19415)

#### Why I did it
src/sonic-host-services
```
* 02d9b55 - (HEAD -> master, origin/master, origin/HEAD) Added support to render template format of `delayed` flag on Feature Table. (#135) (28 hours ago) [abdosi]
* 60fdfea - Fixed determine/process reboot-cause service dependency (sonic-net#17406) (#132) (13 days ago) [anamehra]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants