Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAS-131326 / 25.04 / Simplify HA upgrade logic + NAS-131325: system.reboot.info and failover.reboot.info methods and events #10744

Merged
merged 26 commits into from
Nov 15, 2024

Conversation

bvasilenko
Copy link
Contributor

@bvasilenko bvasilenko commented Sep 24, 2024

Changes:

This PR contains changes for 2 tickets

  1. NAS-131325:system.reboot.info and failover.reboot.info methods and events

    here's what's implemented for this ticket:

    1. Calling new endpoints system.reboot.info and failover.reboot.info based on whether system is HA licensed,
    2. Showing Reboot Info non-blocking notification in the top right corner:
      (if there are items in the reboot_required_reasons field, the following dialog should appear)
      image
  2. NAS-131326: Simplify HA upgrade logic

    here's what's implemented for this ticket:

    1. usages of failover.upgrade_pending and failover.upgrade_finish methods and events were removed, they are not necessary anymore
    2. webUI calls with failover.upgrade method and waits for middleware to upgrade both local and remote controllers, then reboots the remote controller and wait it to come online, then webUI will display a reboot reason (see pt "1" in the description of this PR)
    3. UI only calls failover.become_passive when the user is ready to reboot the local controller, and thus the upgrade will be finished.

Testing:

  1. NAS-131325:system.reboot.info and failover.reboot.info methods and events

    To test the system.reboot.info and failover.reboot.info methods and events, follow these steps:

    Login into the HA-enabled server, for example at M40 (10.220.16.82)

    Expected result: When logging into the HA system, a call to failover.reboot.info must be made, and then a subscription to failover.reboot.info. For non-HA - the same thing, only system.reboot.info.

    • If the call response or subscription contains items in reboot_required_reasons, a dialog is shown with a proposal to restart the corresponding node.

    • However, if the failover.disabled.reasons response contains reasons other than MISMATCH_VERSIONS, LOC_FIPS_REBOOT_REQ or REM_FIPS_REBOOT_REQ, the dialog is not shown.

  2. NAS-131326: Simplify HA upgrade logic

    To test the failover.upgrade job, follow these steps:

    1. Login into the HA-enabled server, for example at M40 (10.220.16.82)
    2. Go to System > Update
    3. Choose Manual Update , reject a prompt to make a backup, and choose the .update file
    4. In a new browser tab go to Dashboard page (or any other page)

    Now you can see the Reboot Info notification, shown in pt "1" of the description of this PR.

Downstream

Affects Reasoning
Documentation

@bugclerk bugclerk changed the title Simplify HA upgrade logic + NAS-131325: system.reboot.info and failover.reboot.info methods and events NAS-131326 / 25.04 / Simplify HA upgrade logic + NAS-131325: system.reboot.info and failover.reboot.info methods and events Sep 24, 2024
Copy link

codecov bot commented Sep 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.94%. Comparing base (8766977) to head (506bccb).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10744      +/-   ##
==========================================
+ Coverage   81.80%   81.94%   +0.14%     
==========================================
  Files        1626     1627       +1     
  Lines       57324    57296      -28     
  Branches     5920     5923       +3     
==========================================
+ Hits        46892    46954      +62     
+ Misses      10432    10342      -90     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bvasilenko bvasilenko marked this pull request as ready for review September 30, 2024 20:37
@bvasilenko bvasilenko requested a review from a team as a code owner September 30, 2024 20:37
@bvasilenko bvasilenko requested review from RehanY147 and removed request for a team September 30, 2024 20:37
Copy link
Contributor

@RehanY147 RehanY147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks OK. I am getting the notification but a few points should be addressed

  • I expected the dialog to be centered and follow the styles of webui's typical dialogs. However, it was stuck to the top right corner and looks weird.
  • I expected there to be a "Reboot" button on the dialog itself that I can click and reboot right away. There's no action buttons on the notification dialog.
  • The title can say "Reboot required" instead of "Reboot Info"
image

@@ -39,8 +39,19 @@

<ix-truecommand-button></ix-truecommand-button>

@if (hasRebootRequiredReasons) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make hasRebootRequiredReasons a signal.

export const rebootInfoLoaded = createAction(
'[Reboot Info API] Reboot Info Loaded',
props<{
thisNodeInfo: SystemRebootInfo | null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These keys should have reboot keyword in there somewhere. They are not node-info. They are specifically node-reboot-info. So, just calling them thisNodeInfo and otherNodeInfo can cause confusion in the future.

(state) => state,
);

export const selectThisNodeInfo = createSelector(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well. Let's add reboot keyword in the variable name so it's clear it's only the node specific reboot info.

Copy link
Contributor

@RehanY147 RehanY147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notification doesn't come up at all. See attached video.

Screen.Recording.2024-10-04.at.12.04.17.AM.mov

Copy link
Contributor

@RehanY147 RehanY147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this when I click "Reboot Local" button after checking the confirm checkbox. And the system is not rebooted.
image

Copy link
Contributor

@RehanY147 RehanY147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor

@RehanY147 RehanY147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build is failing

Copy link
Contributor

@RehanY147 RehanY147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am waiting for a system to test the HA part of this ticket. In the meantime, we should add a "Cancel" type button on the dialog as well.

It looks good for a simple system. Need an HA system to test the HA part of this ticket.

@bvasilenko
Copy link
Contributor Author

@RehanY147 middleware issue has been resolved, now the dialog is displayed

Copy link
Contributor

@RehanY147 RehanY147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see reboot and failover dialogs pop up at once. Which is confusing. Only one should appear. This is on an HA system.

Screen.Recording.2024-10-30.at.5.50.34.PM.mov

Copy link
Contributor

@RehanY147 RehanY147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now no dialog shows up when I enter the command to create a reason for reboot

Screen.Recording.2024-11-01.at.3.43.35.PM.mov

Copy link
Contributor

@RehanY147 RehanY147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running into some issue with the jenkins vms. Waiting for that to be resolved as this PR also needs to be tested on HA system

@RehanY147 RehanY147 self-requested a review November 6, 2024 23:17
Copy link
Contributor

@RehanY147 RehanY147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged with master because I was getting issues with the 2fa calls. Once merged, I can add a reason and the dialog shows up. But once I refresh the page, I see the following error in the browser console and the reboot dialog doesn't show up. I am not sure if this is because of some other merged commit or if it is because of this PR. It looks like it is a related problem. This is an HA system.
image

@bvasilenko
Copy link
Contributor Author

@RehanY147 Now everything works as expected!

@bvasilenko bvasilenko enabled auto-merge (squash) November 15, 2024 04:58
@bvasilenko bvasilenko merged commit 35ff82e into master Nov 15, 2024
11 checks passed
@bvasilenko bvasilenko deleted the NAS-131326 branch November 15, 2024 15:37
@bugclerk
Copy link
Contributor

This PR has been merged and conversations have been locked.
If you would like to discuss more about this issue please use our forums or raise a Jira ticket.

@truenas truenas locked as resolved and limited conversation to collaborators Nov 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants