-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove "Checking for updates" stage from updater #528
Conversation
launcher/sdw_updater_gui/Updater.py
Outdated
sdlog.info("dom0 update successful") | ||
return UpdateStatus.REBOOT_REQUIRED | ||
|
||
if output.find("No packages downloaded") != -1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emkll This is the most significant business logic change, I think. The return code from qubes-dom0-update does not appear to tell us reliably whether or not updates were applied, and we need to know in order to trigger the reboot logic. So I decided to look at the command output. Based on my review of the updater logic in /usr/lib/qubes/qubes-download-dom0-updates.sh
in sys-firewall
, my understanding is that this string should be present whenever a dom0
update was successful but yielded no results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@conorsch suggested maybe running the --check-only
stage here after all so we can use its return code to determine the reboot requirement. In time
runs that only adds a few seconds as the index is now sufficiently up-to-date for it to use it on the subsequent run. I'll leave it as is for now for your initial review, let me know what you think, and if you have other ideas for how to detect the reboot requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to work based on my local testing, though theres also other test there, specifically "Nothing to do.", "Complete!" and the "No packages downloaded" you've identified. If upgrades are available, we can parse the transaction summary (checks if upgrades were performed)
However, I agree that running the command with --check-only
prior to running the upgrade will make it much simpler to detect a non-zero error code than parsing output. Given the somewhat long run times, @conorsch 's approach is probably best here (we can recycle _check_updates_dom0
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll switch to using a --check-only
run for dom0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done in 0e5d172.
e75c7c3
to
436bbf6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eloquence did a first pass with time measurements and some basic functional testing, will return for another functional pass based on your test plan, left some comments inline as well
Current iteration (3 updates)
- 14 minutes for each
Old iteration (3 passes)
- 5, 5 and 6 minutes to check
- 15, 5, and 4 minutes to upgrade (15 minutes was when all VMs needed an upgrade)
- 20, 10 and 10 minutes total time
launcher/sdw_updater_gui/Updater.py
Outdated
sdlog.info("dom0 update successful") | ||
return UpdateStatus.REBOOT_REQUIRED | ||
|
||
if output.find("No packages downloaded") != -1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to work based on my local testing, though theres also other test there, specifically "Nothing to do.", "Complete!" and the "No packages downloaded" you've identified. If upgrades are available, we can parse the transaction summary (checks if upgrades were performed)
However, I agree that running the command with --check-only
prior to running the upgrade will make it much simpler to detect a non-zero error code than parsing output. Given the somewhat long run times, @conorsch 's approach is probably best here (we can recycle _check_updates_dom0
)
"<p>To keep your Workstation safe, daily software updates are required.</p> " | ||
"<p>This typically takes between 5 and 30 minutes. You cannot use the SecureDrop " | ||
"Client or any of is VMs while the updater is running.</p>" | ||
"<p><span style='color:#E62354;'><b>Interrupting software updates may break " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think "break the workstation" is somewhat hyperbolic here, especially when this is written in red. It will "break" the apt cache on the templates, in worst case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are either of the following scenarios likely to require admin intervention to recover from:
- closing the updater window at any point
- shutting down the computer?
A system shutdown will tell all processes to terminate. Won't that potentially interrupt a package install, requiring manual intervention to recover?
Anything that requires admin intervention to resolve before the journalist can access SecureDrop again can IMO be fairly described as "breakage" from the journalist's POV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emkll I did nudge Erik towards simplifying text into very black-and-white terms, here. Trying to keep admins from being over-burdened as user shepherds, felt important in that.
Also, if there's too much text—like, to be "accurate" vs too broad-of-strokes, users will see "wall of text" and not read any of it. UI text is a tough balance, in that regard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't block here but the word "broken" sounds quite negative to me, through it's true that this message is meant to be dissuasive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current wording in master
is
Any interruption in this process may break Workstation components
The proposed change here is:
Interrupting software updates may break the Workstation
Mainly for brevity and simplicity. Happy to consider alternatives as well. I think whether the word "break" makes sense here mainly depends on the impact in a worst case scenario -- e.g., if the user closes the laptop lid while a package is being installed or a critical dom0 salt sate is being applied.
Given the significant testing burden of this change, we've agreed to defer this until the next sprint (and probably after the 0.3.0-rpm release) for now, after the current (4/22-5/6) sprint. |
We've agreed this is still likely a desirable change, but our focus right now is to get SD 1.3.0 + Client/Proxy/Workstation releases with limited copy/paste support & other fixes already merged, out the door, and prep for the fedora-31 transition. Strong candidate for next sprint (after 5/6-5/20). |
1d937db
to
220f66e
Compare
(Rebased and squashed.) |
This is ready for a pass by a new reviewer (Mickael is out for a couple of weeks); remaining open comments are about wording choices in the UI, happy to kick those around more. |
I followed the test plan, with the exception that I modified the Fedora 31 template instead of Fedora 30. The updates were performed properly. I did encounter #498 once. I think the layout changes in It could well just be how I've modified my system, since @emkll didn't see any problem with it, and the screenshots obviously indicate it worked properly on your machine. At the moment I don't have a machine on which I can install Qubes from scratch to verify. Scenario 1:
|
Awesome, thanks for the test @rmol . If you're able to repro it consistently, could you take a screenshot that illustrates the clipping problem (tomorrow is fine)? :) |
(Test plan updated to |
Wow, that's .. horrifying. I'm curious how current |
Yeah that's pretty terrible too, so I won't feel too bad, but I agree it should never get into such a pathological state if at all possible :). Once you're on a fresh install, it'd be great to have STR for reproducing this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the updater on a fresh installation of Qubes 4.0.3 and the dialogs are completely readable, with no clipping of text or button labels. Since the visual horror was limited to my regular development system, I'm approving.
That's great news @rmol. If you have time, I'd appreciate any steps you can provide for getting into this broken state, as we should really make sure (probably in a separate PR) that our |
Sure, let me see if I can ruin another Qubes machine. 🙂 |
Thanks, will investigate :) |
- Shorten messaging, tweak text per UX discussions - Make important text red - Add a headline element, so we can render it above the progress bar - Reduce height
This avoids brittle output parsing, and should not have significant performance impact due to caching.
0e5d172
to
0b7bcc5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eloquence these changes look good to me. Tested using the default DPI settings in Qubes, looks good to me.
- Verified the UI changes are properly reflected in the python code (after running
pyuic
andblack
) - Verified the test coverage is 100%
- Visual review
Followed the test plan:
Scenario 1: fedora-31
case
- Downgrade a package in
fedora-31
and reboot the VM to force an actual upgrade to applied (e.g.,sudo dnf downgrade zlib
). - Run
/opt/securedrop/launcher/sdw-launcher.py --skip-delta 0
. This forces an updater run. -
- Observe that you are able to perform a successful update run with no new regressions (you may still encounter [0.2.3-rpm] libxenlight failed to create new domain
sd-log
#498).
- Observe that you are able to perform a successful update run with no new regressions (you may still encounter [0.2.3-rpm] libxenlight failed to create new domain
-
- [ x Observe that you are able to launch the client, without a reboot warning.
-
- Observe that
sdw-update-status
andsdw-last-updated
contain the expected state (both should include the current timestamp, andsdw-update-status
should contain status0
for success)
- Observe that
Scenario 2: dom0
case
- Downgrade a package in
dom0
(see instructions below). - Run
/opt/securedrop/launcher/sdw-launcher.py --skip-delta 0
. -
- Observe that you are able to perform a successful update run with no new regressions (you may still encounter [0.2.3-rpm] libxenlight failed to create new domain
sd-log
#498).
- Observe that you are able to perform a successful update run with no new regressions (you may still encounter [0.2.3-rpm] libxenlight failed to create new domain
-
- Observe that you are prompted to reboot after the update. Keep the window open for now.
-
- Observe that
sdw-update-status
andsdw-last-updated
contain the expected state (both should include the current timestamp, andsdw-update-status
should contain status2
for "reboot required").
- Observe that
-
- Running the preflight updater again still indicates that a reboot is required
- Click the reboot button and wait for the system to reboot.
- Log back in and run
/opt/securedrop/launcher/sdw-launcher.py --skip-delta 0
. - After updates, observe that you are able to run the client without a reboot.
Docs are sufficiently vague to not require any changes: https://workstation.securedrop.org/en/latest/admin/securing_workstation.html#apply-updates-when-prompted, but we should flag these changes to pilot orgs once they are released.
Approving but not immediately merging in case @rmol has any further comments.
"<p>To keep your Workstation safe, daily software updates are required.</p> " | ||
"<p>This typically takes between 5 and 30 minutes. You cannot use the SecureDrop " | ||
"Client or any of is VMs while the updater is running.</p>" | ||
"<p><span style='color:#E62354;'><b>Interrupting software updates may break " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't block here but the word "broken" sounds quite negative to me, through it's true that this message is meant to be dissuasive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good! Ran through a few iterations of the new logic. On the f31 testing, I consistently saw update times of ~15-20m per run. For the dom0 downgrade test, times were roughly 20m, then reboot, then 15m on a forced update run next time.
As a sidenote, not one did I encounter any VM start errors. Pleased with these changes going in, particularly with the verbose testing reports we have above!
Resolves #478
Status
Ready for review
Description
Test plan
Preparatory steps
(All testing should be done with an existing Qubes install at least at
0.3.0-rpm
- dev, prod or staging should make no difference)sdw-last-updated
andsdw-update-status
in~/.securedrop_launcher
/opt/securedrop/launcher
and/srv/salt/launcher
(if only the/opt
copy is overwritten, the updater itself will replace it on the next run).sd-log
#498 - you may encounter it repeatedly while testing this PR, so any additional observations to resolve that issue are appreciated.Scenario 1:
fedora-31
casefedora-31
and reboot the VM to force an actual upgrade to applied (e.g.,sudo dnf downgrade zlib
)./opt/securedrop/launcher/sdw-launcher.py --skip-delta 0
. This forces an updater run.sd-log
#498).sdw-update-status
andsdw-last-updated
contain the expected state (both should include the current timestamp, andsdw-update-status
should contain status0
for success)Scenario 2:
dom0
casedom0
(see instructions below)./opt/securedrop/launcher/sdw-launcher.py --skip-delta 0
.sd-log
#498).sdw-update-status
andsdw-last-updated
contain the expected state (both should include the current timestamp, andsdw-update-status
should contain status2
for "reboot required")./opt/securedrop/launcher/sdw-launcher.py --skip-delta 0
.Downgrading a package in
dom0
The method I used for downgrading (all commands require
sudo
or root shell):dnf history
indom0
to view recently updated packages.dnf history info <id>
.qubes-dom0-update $package-$version
, e.g.qubes-dom0-update python3-qubesimgconverter-4.0.27
That's it -- this doc claims you also have to run
dnf downgrade
but it seemed to do that automatically for me. Suggestions for a simpler process welcome so we can add them to our testing docs.Notes on performance
Note that when comparing performance with the current update process, it's important to note that the "applying updates" is never skipped after "checking updates" runs. That's because we always assume updates are required for
fedora-31
: https://github.com/freedomofpress/securedrop-workstation/blob/master/launcher/sdw_updater_gui/Updater.py#L116So an actual performance comparison is between "checking for updates + applying updates for 1...n VMs" vs. "always applying updates for all VMs".
Screenshots
Initial dialog
After clicking "Start Updates"
Updates complete, no reboot needed
Updates complete, reboot required (
dom0
updates)(Not including final string tweaks in 9c7fd82)