-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically update dom0 and VM configs over time #172
Conversation
d43b057
to
9a5535f
Compare
Using the "placeholder" top file strategy identified by @marmarek, in order to trigger automatic VM boots when Salt tasks target the VMs. Otherwise, Salt will report "SKIPPED" on the powered off VMs. Rather than manually boot the VMs each time we want to provision them, then power them off again, let's let Salt handle that. The 'securedrop-update' script can be run interactively by Admins, and is also configured to run once daily via cron, to ensure that updates are applied on a rather regular schedule.
We intend to package these dom0-specific config items into an RPM, but for now we'll continue to use Salt to copy the files around via the Makefile. Note that the `sd-dom0-files.sls` filename implies the list is comprehensive, but in fact there are dom0-specific configs scattered through the other SLS files, mostly VM specifications and RPC policy grants.
Factored in some advice received during pre-review. For now we're taking an interative approach to automating the updates. Currently we want, in order: 1. All dom0 RPMs up to date 2. All TemplateVMs up to date with packages (either RPMs or debs) What's not yet implemented is a strategy to automatically enforce the VM state regularly. That'll likely be a `qubesctl state.highstate` command, but punting for now to simplify testing of this already significant change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After having run the make target over 48h ago, I can confirm the cron logic works as expected, and all templates are updated roughly daily. Here are a few comments. I think the first one (update of the AppVMs) as well as the inline comment should be addressed by this PR. The rest are mostly observations, and might warrant follow-up tickets.
Updating AppVMs (state.highstate accross the board)
To benefit from Template updates, AppVMs must be rebooted (and associated templates shut down, more on that later). Given that rebooting arbitrary AppVMs has the potential to disrupt end-user flow. I see two options:
- For package updates, a notification (e.g,
notify-send
, like in this PR) repeated every 5 minutes or so requesting the user reboot the workstation (heavy handed, but will restart all AppVMs, includingsys-net
,sys-firewall
andsys-usb
without fail). - A small PyQt widget or dock widget running in dom0 that would tell the users updates are required and reboot the VMs that need updating (templates, then AppVMs) and apply any configuration change. (I have not figured out an obvious way to get information on via CLI, need more docs research). I've looked into the Journalist updater code, and unfortunately the QtWidget class used by that appears to be PyQt5-only, my local dom0 only has PyQt4. This will be required to apply configuration changes on the AppVms.
Updating the configuration of the AppVMs likely requires more concrete details on how we plan on shipping the configuration logic.
Resource usage
With at least 9 templates: debian, fedora, sd-workstation-template, sd-svs, sd-svs-disp, sd-journalist, sd-whonix, whonix-gw, whonix-ws (with the last 4 updating over tor) on a default workstation install, this will take a significant amount of time, and there may be a chance that not all templates are updated within that period.
For example, my last update run (I had updated the day before) took just over 30 minutes to update 14 templates. Further user testing is required, but a concurrency of 2 seems sensible in the workstation scenario.
Qubes errors while running
During the update process, I occasionally experienced the same QubesPropertyAccessError
errors that occur during provisioning, if the Qubes-Manager app is open while performing the upgrades.
When doing I would occasionally experience errors such cannot connext to qrexec agent for 60 seconds
, which happens when I've exhausted CPU/IO . It also causes the update task on the template to FAIL
instead of OK
and appears to not update the template.
Developer environment
I assume some will use the same machine for development and testing, and by design the cron job will almost certainly run at the worst possible time and slow down the workstation considerably. Perhaps in the future it would be nice to have a config option to set concurrency.
dom0/securedrop-update
Outdated
|
||
# Running `notify-send` as root doesn't work, must be normal user. | ||
# Setting 30s expire time (in ms) since it's a long-running cmd. | ||
su user -c "notify-send \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps due to the configuration of my Qubes machine, my user in dom0 is not user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch, that's a good flag. We still need to drop privileges in here; or we could dig more in the notify-send
settings. Off the cuff, inspecting /home/
for a single dirname should give us whatever the name of the (single) custom user is. Make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can safely assume that the normal user configured at install time has uid 1000; so:
id -nu 1000
Then we su
to that user to run the notify-send
commands.
Discussed review changes with @emkll; to summarize, next steps for this PR are:
The GUI notification recommending a reboot has UX implications; but it's better to warn than reboot the AppVMs while they may be in use. After this PR is stable and merged, we can ping @ninavizz (and @eloquence) for assistance evaluating the UX impact and recommended improvements. For the |
d1f9cfe
to
532a0ae
Compare
Tackling requested changes during review: * supports custom dom0 usernames * omits --templates on pkg upgrade to include dom0 * uses state.highstate to enforce VM config * notify about reboot request (so updates are applied) We'll want to clean up the reboot recommendation once we have more UX feedback. For now, it's enough to notify that updates aren't actually in effect (due to AppVMs not having been restarted).
All outstanding items addressed. Please have another look, @emkll! Would love your thoughts as well, @joshuathayer . |
dom0/securedrop-update
Outdated
securedrop-update-feedback "SecureDrop: Updating application..." | ||
qubesctl --templates \ | ||
securedrop-update-feedback "Updating application..." | ||
qubesctl \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I think I wasn't clear. My previous comment was about the ordering only, not which actions are performed. If you want to update templates, you still need --templates
. If you want to apply configuration to other vms (non-templates), then you need --all
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would explain the behavior I'm seeing locally, @marmarek; many thanks for your guidance here!
- remove comments that were already addressed - restore dom0 package updates - perform update package action only in templates
Note we can without too much difficulty migrate PyQt5 -> PyQt4 such that we can reuse that GUI application |
It seems like the flake8 container/rules have changed, mostly indentation. Ignore W605 is for invalid escape sequence '\s' in test_gpg.py:16
64bed2b
to
b4106c9
Compare
I've addressed the feedback with the actions performed in |
Thanks for handling the flake8 fixes, @emkll! Changes LGTM. Let's give @joshuathayer a chance to test prior to merging. |
Regarding UX for altering users about the need to update: an another option, we could probably add some notification to the client application UI, via a qubes-rpc job from dom0 to sd-svs which would twiddle some state in the securedrop-client DB. In terms of review... I've been running the update script now and it's been running at least 2 hours. I've not observed any errors yet but that seems significantly longer than other people's experience, eh? |
# but we *first* want the freshest RPMs from dom0, *then* we'll want to | ||
# update the VMs themselves. | ||
securedrop-update-feedback "SecureDrop: Updating dom0 configuration..." | ||
sudo qubes-dom0-update -y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding the --clean
option here will refresh dnf cache, which might be useful in some cases.
I just had an issue where qubes-dom0-update was complaining of an unsigned package, due to me attempting to download an older whonix template in an effort to reproduce #122 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, probably worth adding here, lest we forget to circle back—feel free to append, @emkll.
dom0/securedrop-update
Outdated
# `qubesctl pkg.upgrade` will automatically update dom0 packages, as well, | ||
# but we *first* want the freshest RPMs from dom0, *then* we'll want to | ||
# update the VMs themselves. | ||
securedrop-update-feedback "SecureDrop: Updating dom0 configuration..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: "SecureDrop:" is added in securedrop-update-feedback()
, so isn't needed in the message here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I eventually manually killed my first run of the update script, which seemed to be hung after running for 3+ hours. Rerunning it succeeded in just a few minutes.
I don't have a good report of where or how it hung, sadly. But the newly updated system works well for me, and if the process worked well for others, I'm happy to blame some non-SD situation on my machine for the problem I had. So, lgtm!
I've seen this happen only once in about a dozen runs—I suspect it's the Whonix VM updates, since those updates often crawl. Let's keep an eye on it over time. Since updates happen in the background, we can count on the updates being installed in a timely fashion—but if Whonix updates require hours, that may require a special fix, e.g. preventing proxying of Whonix updates over Tor. |
This happens most of the time in the |
Some packages are not updating for me after running the updater, seems to be limited to the packages served by the qubes repo. Can anyone reproduce this on a template that was newly updated by the
|
Pointed out by @joshuathayer during review; the "SecureDrop:" prefix was redundant, since it's added by the display function.
Fantastic catch, @emkll ! I was indeed able to reproduce your results. I've pushed up two commits, one with tests to guard against regressions (using the Not yet fully confident in these changes because they're still running on my machine—and it appears it's going to take a while. Note also that the test logic checking for packages being up to date assumes the AppVMs have been rebooted; that'll require a fresh cycle (or rebuild) or the VMs after updates are actually installed. As it stands, I haven't yet observed the tests passing—will try running them when the upgrade logic finishes. |
5b23b92
to
4c75e27
Compare
During review, @emkll caught that not all apt packages were updated as expected. These tests are a bit aggressive, and will fail if the AppVMs haven't been rebooted recently. That's a bit annoying, but I'd rather accept that friction than have a regression in the automatic upgrade logic.
Without `dist_upgrade=true`, the pkg.upgrade wasn't forcing all packages to their latest versions. This approach works well on Debian-based VMs, as all the SecureDrop Workstation components currently are, but there's a significant drawback: it silently fails on Fedora-based VMs, stating that the "--dist_upgrade" option is not valid for dnf. You must pass `--show-output` in order to observe the dnf failures; without it, the tasks are reported as "OK". Tried to use the "pkg.uptodate" Salt module rather than "pkg.uptodate", but the Qubes VMs reported that module wasn't available. The "dist_upgrade" option isn't explicitly documented [0], but presumably gets inherited via Salt magic from the aptpkg.upgrade module [1]. Adding `--skip-dom0` since we already upgraded dom0 packages via a previous step (qubes-dom0-update). [0] https://docs.saltstack.com/en/2017.7/ref/states/all/salt.states.pkg.html#salt.states.pkg.uptodate [1] https://docs.saltstack.com/en/2017.7/ref/modules/all/salt.modules.aptpkg.html#salt.modules.aptpkg.upgrade
One more tweak, squashed into prior commits. Tests are passing for me. Take it for a spin, @emkll. Details in the commit messages, but looks like the new approach only works against Debian VMs. That's fine for merge, but something we should be aware of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience, @conorsch ! This does indeed fix the issue described above for Debian-based templates -- all packages are now correctly updated by the securedrop-update
script/cron. I will create a follow-up ticket to track if the script correctly updates fedora-based templates.
Overview
Installs a new script to dom0 called
securedrop-update
, and symlinks it into cron.daily, so it runs regularly without manual activation. The script will:dom0
updatesCloses #24.
Screenshots
Here's what the tool looks like in action:
Metrics
The script takes ~10m to run. (If you haven't updated your TemplateVMs in a while, then it could take longer, ~30m, the first time you run it!) It will take less if you don't currently have the Workstation VMs configured (e.g. if you've recently run
make clean
).Next steps
As with other dom0-related configs, we'll need to package these as an RPM (#171) eventually.