Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically update dom0 and VM configs over time #172

Merged
merged 9 commits into from
Oct 31, 2018

Conversation

conorsch
Copy link
Contributor

Overview

Installs a new script to dom0 called securedrop-update, and symlinks it into cron.daily, so it runs regularly without manual activation. The script will:

  • pull in all new dom0 updates
  • start each TemplateVM and update its packages internally (included apt-test-qubes deb packages)
  • display clunky feedback about the update process with a pretty SD logo

Closes #24.

Screenshots

Here's what the tool looks like in action:

sd-update-2

Metrics

The script takes ~10m to run. (If you haven't updated your TemplateVMs in a while, then it could take longer, ~30m, the first time you run it!) It will take less if you don't currently have the Workstation VMs configured (e.g. if you've recently run make clean).

[user@dom0 securedrop-workstation]$ time sudo securedrop-update

[...snip...]

local:
    ----------
whonix-gw-14: OK
fedora-28: OK
openvpn: OK
sd-svs-disp-template: OK
signal-template: OK
debian-9: OK
whonix-ws-14: OK
sd-svs-template: OK
sd-journalist-template: OK
sd-whonix-template: OK
sd-workstation-template: OK
securedrop-workstation: OK

real	10m3.111s
user	0m37.405s
sys	1m1.834s

Next steps

As with other dom0-related configs, we'll need to package these as an RPM (#171) eventually.

@conorsch conorsch force-pushed the 24-automatically-upgrade-vms branch from d43b057 to 9a5535f Compare October 19, 2018 17:54
Conor Schaefer added 3 commits October 19, 2018 10:54
Using the "placeholder" top file strategy identified by @marmarek, in
order to trigger automatic VM boots when Salt tasks target the VMs.
Otherwise, Salt will report "SKIPPED" on the powered off VMs. Rather
than manually boot the VMs each time we want to provision them, then
power them off again, let's let Salt handle that.

The 'securedrop-update' script can be run interactively by Admins, and
is also configured to run once daily via cron, to ensure that updates
are applied on a rather regular schedule.
We intend to package these dom0-specific config items into an RPM, but
for now we'll continue to use Salt to copy the files around via the
Makefile.

Note that the `sd-dom0-files.sls` filename implies the list is
comprehensive, but in fact there are dom0-specific configs scattered
through the other SLS files, mostly VM specifications and RPC policy
grants.
Factored in some advice received during pre-review. For now we're taking
an interative approach to automating the updates. Currently we want,
in order:

  1. All dom0 RPMs up to date
  2. All TemplateVMs up to date with packages (either RPMs or debs)

What's not yet implemented is a strategy to automatically enforce the VM
state regularly. That'll likely be a `qubesctl state.highstate` command,
but punting for now to simplify testing of this already significant
change.
@conorsch
Copy link
Contributor Author

@emkll Does the update-tor command resolve #148? If so, might be best to submit as a follow-up PR... I gather the appropriate testing workflow for resolution there would be to uninstall the Whonix templates from dom0, then re-apply. Agree?

Copy link
Contributor

@emkll emkll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After having run the make target over 48h ago, I can confirm the cron logic works as expected, and all templates are updated roughly daily. Here are a few comments. I think the first one (update of the AppVMs) as well as the inline comment should be addressed by this PR. The rest are mostly observations, and might warrant follow-up tickets.

Updating AppVMs (state.highstate accross the board)

To benefit from Template updates, AppVMs must be rebooted (and associated templates shut down, more on that later). Given that rebooting arbitrary AppVMs has the potential to disrupt end-user flow. I see two options:

  1. For package updates, a notification (e.g, notify-send, like in this PR) repeated every 5 minutes or so requesting the user reboot the workstation (heavy handed, but will restart all AppVMs, including sys-net, sys-firewall and sys-usb without fail).
  2. A small PyQt widget or dock widget running in dom0 that would tell the users updates are required and reboot the VMs that need updating (templates, then AppVMs) and apply any configuration change. (I have not figured out an obvious way to get information on via CLI, need more docs research). I've looked into the Journalist updater code, and unfortunately the QtWidget class used by that appears to be PyQt5-only, my local dom0 only has PyQt4. This will be required to apply configuration changes on the AppVms.

Updating the configuration of the AppVMs likely requires more concrete details on how we plan on shipping the configuration logic.

Resource usage

With at least 9 templates: debian, fedora, sd-workstation-template, sd-svs, sd-svs-disp, sd-journalist, sd-whonix, whonix-gw, whonix-ws (with the last 4 updating over tor) on a default workstation install, this will take a significant amount of time, and there may be a chance that not all templates are updated within that period.

For example, my last update run (I had updated the day before) took just over 30 minutes to update 14 templates. Further user testing is required, but a concurrency of 2 seems sensible in the workstation scenario.

Qubes errors while running

During the update process, I occasionally experienced the same QubesPropertyAccessError errors that occur during provisioning, if the Qubes-Manager app is open while performing the upgrades.

When doing I would occasionally experience errors such cannot connext to qrexec agent for 60 seconds, which happens when I've exhausted CPU/IO . It also causes the update task on the template to FAIL instead of OK and appears to not update the template.

Developer environment

I assume some will use the same machine for development and testing, and by design the cron job will almost certainly run at the worst possible time and slow down the workstation considerably. Perhaps in the future it would be nice to have a config option to set concurrency.


# Running `notify-send` as root doesn't work, must be normal user.
# Setting 30s expire time (in ms) since it's a long-running cmd.
su user -c "notify-send \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps due to the configuration of my Qubes machine, my user in dom0 is not user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, that's a good flag. We still need to drop privileges in here; or we could dig more in the notify-send settings. Off the cuff, inspecting /home/ for a single dirname should give us whatever the name of the (single) custom user is. Make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can safely assume that the normal user configured at install time has uid 1000; so:

id -nu 1000

Then we su to that user to run the notify-send commands.

@conorsch
Copy link
Contributor Author

conorsch commented Oct 24, 2018

Discussed review changes with @emkll; to summarize, next steps for this PR are:

  • implement custom "user" handling for notify-send command
  • use state.highstate to enforce VM settings (not just packages inside VMs)
  • drop --templates flags and qubes-dom0-update cmd (as @marmarek helpfully pointed out is superfluous here)
  • add a GUI notification recommending reboot of the full workstation after updates are applied.

The GUI notification recommending a reboot has UX implications; but it's better to warn than reboot the AppVMs while they may be in use. After this PR is stable and merged, we can ping @ninavizz (and @eloquence) for assistance evaluating the UX impact and recommended improvements.

For the update-tor fix, we'll handle separate, in a discrete PR targeting #148.

@conorsch conorsch force-pushed the 24-automatically-upgrade-vms branch from d1f9cfe to 532a0ae Compare October 25, 2018 00:14
Tackling requested changes during review:

  * supports custom dom0 usernames
  * omits --templates on pkg upgrade to include dom0
  * uses state.highstate to enforce VM config
  * notify about reboot request (so updates are applied)

We'll want to clean up the reboot recommendation once we have more UX
feedback. For now, it's enough to notify that updates aren't actually in
effect (due to AppVMs not having been restarted).
@conorsch
Copy link
Contributor Author

All outstanding items addressed. Please have another look, @emkll! Would love your thoughts as well, @joshuathayer .

securedrop-update-feedback "SecureDrop: Updating application..."
qubesctl --templates \
securedrop-update-feedback "Updating application..."
qubesctl \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I think I wasn't clear. My previous comment was about the ordering only, not which actions are performed. If you want to update templates, you still need --templates. If you want to apply configuration to other vms (non-templates), then you need --all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would explain the behavior I'm seeing locally, @marmarek; many thanks for your guidance here!

@conorsch conorsch added the WIP label Oct 25, 2018
- remove comments that were already addressed
- restore dom0 package updates
- perform update package action only in templates
@redshiftzero
Copy link
Contributor

I've looked into the Journalist updater code, and unfortunately the QtWidget class used by that appears to be PyQt5-only, my local dom0 only has PyQt4

Note we can without too much difficulty migrate PyQt5 -> PyQt4 such that we can reuse that GUI application

It seems like the flake8 container/rules have changed, mostly
indentation.

Ignore W605 is for invalid escape sequence '\s' in test_gpg.py:16
@emkll emkll force-pushed the 24-automatically-upgrade-vms branch from 64bed2b to b4106c9 Compare October 25, 2018 16:30
@emkll
Copy link
Contributor

emkll commented Oct 25, 2018

I've addressed the feedback with the actions performed in securedrop-update (update dom0, templates, and AppVMs), and made some changes due to (new) flake8 failures. This now looks good to merge from my pespective.
Since I was the last one to push to this branch, would either @joshuathayer or @conorsch mind taking one last look at this PR?

@conorsch conorsch removed the WIP label Oct 25, 2018
@conorsch
Copy link
Contributor Author

Thanks for handling the flake8 fixes, @emkll! Changes LGTM. Let's give @joshuathayer a chance to test prior to merging.

@joshuathayer
Copy link
Contributor

Regarding UX for altering users about the need to update: an another option, we could probably add some notification to the client application UI, via a qubes-rpc job from dom0 to sd-svs which would twiddle some state in the securedrop-client DB.

In terms of review... I've been running the update script now and it's been running at least 2 hours. I've not observed any errors yet but that seems significantly longer than other people's experience, eh?

# but we *first* want the freshest RPMs from dom0, *then* we'll want to
# update the VMs themselves.
securedrop-update-feedback "SecureDrop: Updating dom0 configuration..."
sudo qubes-dom0-update -y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding the --clean option here will refresh dnf cache, which might be useful in some cases.

I just had an issue where qubes-dom0-update was complaining of an unsigned package, due to me attempting to download an older whonix template in an effort to reproduce #122 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, probably worth adding here, lest we forget to circle back—feel free to append, @emkll.

# `qubesctl pkg.upgrade` will automatically update dom0 packages, as well,
# but we *first* want the freshest RPMs from dom0, *then* we'll want to
# update the VMs themselves.
securedrop-update-feedback "SecureDrop: Updating dom0 configuration..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: "SecureDrop:" is added in securedrop-update-feedback(), so isn't needed in the message here.

Copy link
Contributor

@joshuathayer joshuathayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I eventually manually killed my first run of the update script, which seemed to be hung after running for 3+ hours. Rerunning it succeeded in just a few minutes.

I don't have a good report of where or how it hung, sadly. But the newly updated system works well for me, and if the process worked well for others, I'm happy to blame some non-SD situation on my machine for the problem I had. So, lgtm!

@conorsch
Copy link
Contributor Author

I eventually manually killed my first run of the update script, which seemed to be hung after running for 3+ hours.

I've seen this happen only once in about a dozen runs—I suspect it's the Whonix VM updates, since those updates often crawl. Let's keep an eye on it over time. Since updates happen in the background, we can count on the updates being installed in a timely fashion—but if Whonix updates require hours, that may require a special fix, e.g. preventing proxying of Whonix updates over Tor.

@kushaldas
Copy link
Contributor

I eventually manually killed my first run of the update script, which seemed to be hung after running for 3+ hours. Rerunning it succeeded in just a few minutes.

This happens most of the time in the whonix templates for me, the apt-get update even gets stuck (I tried to do that manually) to fetch any file.

@emkll
Copy link
Contributor

emkll commented Oct 30, 2018

Some packages are not updating for me after running the updater, seems to be limited to the packages served by the qubes repo. Can anyone reproduce this on a template that was newly updated by the securedrop-update script?

$ sudo apt list --upgradable 
Listing... Done
libvchan-xen/unknown 4.0.5-1+deb9u1 amd64 [upgradable from: 4.0.3-1+deb9u1]
libxen-4.8/unknown 2001:4.8.4-5+deb9u1 amd64 [upgradable from: 2001:4.8.4-2+deb9u1]
qubes-gui-agent/unknown 4.0.17-1+deb9u1 amd64 [upgradable from: 4.0.16-1+deb9u1]
xen-utils-common/unknown 2001:4.8.4-5+deb9u1 all [upgradable from: 2001:4.8.4-2+deb9u1]
xserver-xorg-input-qubes/unknown 4.0.17-1+deb9u1 amd64 [upgradable from: 4.0.16-1+deb9u1]
xserver-xorg-video-dummyqbs/unknown 4.0.17-1+deb9u1 amd64 [upgradable from: 4.0.16-1+deb9u1]

Pointed out by @joshuathayer during review; the "SecureDrop:" prefix was
redundant, since it's added by the display function.
@conorsch
Copy link
Contributor Author

Fantastic catch, @emkll ! I was indeed able to reproduce your results. I've pushed up two commits, one with tests to guard against regressions (using the apt list command you provided), and another with a patch to the updater logic. Please pull down the changes and try again.

Not yet fully confident in these changes because they're still running on my machine—and it appears it's going to take a while. Note also that the test logic checking for packages being up to date assumes the AppVMs have been rebooted; that'll require a fresh cycle (or rebuild) or the VMs after updates are actually installed. As it stands, I haven't yet observed the tests passing—will try running them when the upgrade logic finishes.

@conorsch conorsch force-pushed the 24-automatically-upgrade-vms branch 2 times, most recently from 5b23b92 to 4c75e27 Compare October 31, 2018 04:39
Conor Schaefer added 2 commits October 30, 2018 21:39
During review, @emkll caught that not all apt packages were updated as
expected. These tests are a bit aggressive, and will fail if the AppVMs
haven't been rebooted recently. That's a bit annoying, but I'd rather
accept that friction than have a regression in the automatic upgrade
logic.
Without `dist_upgrade=true`, the pkg.upgrade wasn't forcing all packages
to their latest versions. This approach works well on Debian-based VMs,
as all the SecureDrop Workstation components currently are, but there's
a significant drawback: it silently fails on Fedora-based VMs, stating
that the "--dist_upgrade" option is not valid for dnf. You must pass
`--show-output` in order to observe the dnf failures; without it, the
tasks are reported as "OK".

Tried to use the "pkg.uptodate" Salt module rather than "pkg.uptodate",
but the Qubes VMs reported that module wasn't available.

The "dist_upgrade" option isn't explicitly documented [0], but presumably
gets inherited via Salt magic from the aptpkg.upgrade module [1].

Adding `--skip-dom0` since we already upgraded dom0 packages via a
previous step (qubes-dom0-update).

[0] https://docs.saltstack.com/en/2017.7/ref/states/all/salt.states.pkg.html#salt.states.pkg.uptodate
[1] https://docs.saltstack.com/en/2017.7/ref/modules/all/salt.modules.aptpkg.html#salt.modules.aptpkg.upgrade
@conorsch
Copy link
Contributor Author

One more tweak, squashed into prior commits. Tests are passing for me. Take it for a spin, @emkll.

Details in the commit messages, but looks like the new approach only works against Debian VMs. That's fine for merge, but something we should be aware of.

Copy link
Contributor

@emkll emkll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience, @conorsch ! This does indeed fix the issue described above for Debian-based templates -- all packages are now correctly updated by the securedrop-update script/cron. I will create a follow-up ticket to track if the script correctly updates fedora-based templates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants