LVM performance, supported services listing, misc fixes and tests #238

marmarek · 2018-10-22T23:20:33Z

Major changes:

make lvm storage module asynchronous - no longer block the whole qubesd
during domain start/stop
add mechanism for advertising supported services by given VM
add per-vm shutdown timeout property

Besides this, few minor fixes and a lot of improvements for system tests.

Fixes QubesOS/qubes-issues#4014
Fixes QubesOS/qubes-issues#1696
Fixes QubesOS/qubes-issues#4283
Fixes QubesOS/qubes-issues#2829
Fixes QubesOS/qubes-issues#4402
Fixes QubesOS/qubes-issues#3438
Fixes QubesOS/qubes-issues#1570
Fixes QubesOS/qubes-issues#4228

Make PyCharm understand what mixin those objects are for.

Remove unused imports and unused variables, add some more docstrings.

Network tests create own temporary netvm. Make it disconnected from the real netvm, to not interefere in tests.

Searching based on class is used in many tests, searching by class, not only by name in wait_for_window will allow to reduce code duplication. While at it, improve it additionally: - avoid active waiting for window and use `xdotool search --sync` instead - return found window id - add wait_for_window_coro() for use where coroutine is needed - when waiting for window to disappear, check window id once and wait for that particular window to disappear (avoid xdotool race conditions on window enumeration) Besides reducing code duplication, this also move various xdotool imperfections handling into one place.

Replace manual `xdotool search calls` with wait_for_window(), where compatible.

Make sure events are sent to specific window found with xdotool search, not the one having the focus. In case of Whonix, it can be first connection wizard or whonixcheck report.

1GB image easily exceed available space on openQA instances. Use 100MB instead.

R3 format had limitation of ~40 rules per VM. Do not generate compat rules (possibly hitting that limitation) if new format, free of that limitation is supported. Fixes QubesOS/qubes-issues#1570 Fixes QubesOS/qubes-issues#4228

It can be leftover from previous failed attempt. Don't crash on it, and replace it instead. QubesOS/qubes-issues#3438

Make it clear that volume creation fails because it needs to be in the same pool as its parent. This message is shown in context of `qvm-create -p root=MyPool` for example and the previous message didn't make sense at all. Fixes QubesOS/qubes-issues#3438

codecov · 2018-10-23T01:51:16Z

Codecov Report

Merging #238 into master will increase coverage by 0.28%.
The diff coverage is 74.28%.

@@            Coverage Diff             @@
##           master     #238      +/-   ##
==========================================
+ Coverage   58.09%   58.37%   +0.28%     
==========================================
  Files          57       57              
  Lines        9275     9379     +104     
==========================================
+ Hits         5388     5475      +87     
- Misses       3887     3904      +17

Flag	Coverage Δ
#unittests	`58.37% <74.28%> (+0.28%)`	⬆️

Impacted Files	Coverage Δ
qubes/api/internal.py	`27.27% <0%> (ø)`	⬆️
qubes/vm/dispvm.py	`67.56% <0%> (ø)`	⬆️
qubes/api/admin.py	`91.57% <100%> (ø)`	⬆️
qubes/ext/windows.py	`78.12% <100%> (ø)`	⬆️
qubes/app.py	`74.13% <100%> (+0.03%)`	⬆️
qubes/vm/qubesvm.py	`41.36% <23.8%> (-0.15%)`	⬇️
qubes/storage/__init__.py	`56.27% <41.66%> (-0.31%)`	⬇️
qubes/ext/r3compatibility.py	`69.89% <50%> (-0.44%)`	⬇️
qubes/exc.py	`88.23% <75%> (-1%)`	⬇️
qubes/ext/services.py	`88.09% <83.33%> (+29.76%)`	⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc3b286...42061cb. Read the comment docs.

Support 'supported-service.*' features requests coming from VMs. Set such features directly (allow only value '1') and remove any not reported in given call. This way uninstalling package providing given service will automatically remove related 'supported-service...' feature. Fixes QubesOS/qubes-issues#4402

Fixes QubesOS/qubes-issues#2829

On some storage pools this operation can also be time consuming - for example require creating temporary volume, and volume.create() already can be a coroutine. This is also requirement for making common code used by start()/create() etc be a coroutine, otherwise neither of them can be and will block other operations. Related to QubesOS/qubes-issues#4283

Both vm.create_on_disk() and vm.start() are coroutines. Tests in this class didn't run them, so basically didn't test anything. Wrap couroutine calls with self.loop.run_until_complete(). Additionally, don't fail if LVM pool is named differently. In that case, the test is rather sily, as it probably use the same pool for source and destination (operation already tested elsewhere). But it isn't a reason for failing the test.

LVM operations can take significant amount of time. This is especially visible when stopping a VM (`vm.storage.stop()`) - in that time the whole qubesd freeze for about 2 seconds. Fix this by making all the ThinVolume methods a coroutines (where supported). Each public coroutine is also wrapped with locking on volume._lock to avoid concurrency-related problems. This all also require changing internal helper functions to coroutines. There are two functions that still needs to be called from non-coroutine call sites: - init_cache/reset_cache (initial cache fill, ThinPool.setup()) - qubes_lvm (ThinVolume.export() So, those two functions need to live in two variants. Extract its common code to separate functions to reduce code duplications. Fixes QubesOS/qubes-issues#4283

vm.shutdown(wait=True) waited indefinitely for the shutdown, which makes useless without some boilerplate handling the timeout. Since the timeout may depend on the operating system inside, add a per-VM property for it, with value inheritance from template and then from global default_shutdown_timeout property. When timeout is reached, the method raises exception - whether to kill it or not is left to the caller. Fixes QubesOS/qubes-issues#1696

First unregister the domain from collection, and only then call remove_from_disk(). Removing it from collection prevent further calls being made to it. Or if anything else keep a reference to it (for example as a netvm), then abort the operation. Additionally this makes it unnecessary to take startup lock when cleaning it up in tests.

Cleaning up after domain shutdown (domain-stopped and domain-shutdown events) relies on libvirt events which may be unreliable in some cases (events may be processed with some delay, of if libvirt was restarted in the meantime, may not happen at all). So, instead of ensuring only proper ordering between shutdown cleanup and next startup, also trigger the cleanup when we know for sure domain isn't running: - at vm.kill() - after libvirt confirms domain was destroyed - at vm.shutdown(wait=True) - after successful shutdown - at vm.remove_from_disk() - after ensuring it isn't running but just before actually removing it This fixes various race conditions: - qvm-kill && qvm-remove: remove could happen before shutdown cleanup was done and storage driver would be confused about that - qvm-shutdown --wait && qvm-clone: clone could happen before new content was commited to the original volume, making the copy of previous VM state (and probably more) Previously it wasn't such a big issue on default configuration, because LVM driver was fully synchronous, effectively blocking the whole qubesd for the time the cleanup happened. To avoid code duplication, factor out _ensure_shutdown_handled function calling actual cleanup (and possibly canceling one called with libvirt event). Note that now, "Duplicated stopped event from libvirt received!" warning may happen in normal circumstances, not only because of some bug. It is very important that post-shutdown cleanup happen when domain is not running. To ensure that, take startup_lock and under it 1) ensure its halted and only then 2) execute the cleanup. This isn't necessary when removing it from disk, because its already removed from the collection at that time, which also avoids other calls to it (see also "vm/dispvm: fix DispVM cleanup" commit). Actually, taking the startup_lock in remove_from_disk function would cause a deadlock in DispVM auto cleanup code: - vm.kill (or other trigger for the cleanup) - vm.startup_lock acquire <==== - vm._ensure_shutdown_handled - domain-shutdown event - vm._auto_cleanup (in DispVM class) - vm.remove_from_disk - cannot take vm.startup_lock again

Test for QubesOS/qubes-issues#4014

Cleanup VMs in template reverse topological order, not network one. Network can be set to None to break dependency, but template can't. For netvm to be changed, kill VMs first (kill doesn't check network dependency), so netvm change will not trigger side effects (runtime change, which could fail). This fixes cleanup for tests creating custom templates - previously order was undefined and if template was tried removed before its child VMs, it fails. All the relevant files were removed later anyway, but it lead to python objects leaks.

socat have only one variant, so one command line syntax to handle. It's also installed by default in Qubes VMs.

Yet another place wheren object references are leaked.

If domain is set to autostart, qubes-vm@ systemd service is used to start it at boot. Cleanup the service when domain is removed, and similarly enable the service when domain is created and already have autostart=True. Fixes QubesOS/qubes-issues#4014

Use something included in coreutils installed everywhere.

It's weird to set it for Windows, but not Linux.

Same as other vm-related errors. This helps QubesTestCase.cleanup_traceback() cleanup VM reference.

QubesOS/qubes-issues#3665 affects also tests...

Clean VM shutdown may timeout if its initiated before full startup, so make sure the full startup is completed first.

Nose loader do not provide loader.loadTestsFromTestCase(), use loader.loadTestsFromNames() instead.

First boot of whonix-ws based VM take extended period of time, because a lot of files needs to be copied to private volume. This takes even more time, when verbose logging through console is enabled. Extend the timeout for that.

…hown Try to collect more details about why the test failed. This will help only if qvm-open-in-dvm exist early. On the other hand, if it hang, or remote side fails to find the right editor (which results in GUI error message), this change will not provide any more details.

marmarek added 11 commits October 15, 2018 00:30

tests/integ/network: add type annotations

3756888

Make PyCharm understand what mixin those objects are for.

tests/integ/network: few more code style improvement

1514025

Remove unused imports and unused variables, add some more docstrings.

tests/integ/network: make the tests independend of default netvm

3f5618d

Network tests create own temporary netvm. Make it disconnected from the real netvm, to not interefere in tests.

tests: use improved wait_for_window in various tests

9e81087

Replace manual `xdotool search calls` with wait_for_window(), where compatible.

tests: type commands into specific found window

fd9f2e2

Make sure events are sent to specific window found with xdotool search, not the one having the focus. In case of Whonix, it can be first connection wizard or whonixcheck report.

tests: use smaller root.img in backupcompatibility tests

e8dc6cb

1GB image easily exceed available space on openQA instances. Use 100MB instead.

tests: add basic ServicesExtension tests

c01ae06

qubesvm: don't crash VM creation if icon symlink already exists

ba210c4

It can be leftover from previous failed attempt. Don't crash on it, and replace it instead. QubesOS/qubes-issues#3438

marmarek force-pushed the devel-20181015 branch 2 times, most recently from 389e71f to e4f26b7 Compare October 23, 2018 13:17

marmarek added 5 commits October 23, 2018 16:47

doc: document features, qvm-features-request and services

295705a

Fixes QubesOS/qubes-issues#2829

marmarek force-pushed the devel-20181015 branch 3 times, most recently from c37f144 to 23f46e9 Compare October 25, 2018 20:27

marmarek added 5 commits October 26, 2018 23:54

tests: make use of vm.shutdown(wait=True)

cf8b621

tests: check if qubes-vm@ service is disabled on domain removal

4e76278

Test for QubesOS/qubes-issues#4014

marmarek force-pushed the devel-20181015 branch from 23f46e9 to 342d651 Compare October 26, 2018 23:44

marmarek added 2 commits October 27, 2018 16:44

tests: use socat instead of nc

a972c61

socat have only one variant, so one command line syntax to handle. It's also installed by default in Qubes VMs.

marmarek added 9 commits October 27, 2018 16:44

tests: remove VM reference from QubesVMError

0b7aa54

Yet another place wheren object references are leaked.

tests: use /bin/uname instead of /bin/hostname as dummy output generator

e244c19

Use something included in coreutils installed everywhere.

ext/services: allow for os=Linux feature request from VM

8be70c9

It's weird to set it for Windows, but not Linux.

exc: Make QubesMemoryError inherit from QubesVMError

2f3a984

Same as other vm-related errors. This helps QubesTestCase.cleanup_traceback() cleanup VM reference.

tests: use iptables --wait

4742a63

QubesOS/qubes-issues#3665 affects also tests...

tests: wait for full user session before doing rest of the test

fb14f58

Clean VM shutdown may timeout if its initiated before full startup, so make sure the full startup is completed first.

tests: adjust extra tests loader to work with nose2

84d3547

Nose loader do not provide loader.loadTestsFromTestCase(), use loader.loadTestsFromNames() instead.

marmarek force-pushed the devel-20181015 branch from 342d651 to 84c321b Compare October 27, 2018 14:45

marmarek merged commit 42061cb into QubesOS:master Oct 29, 2018

marmarek deleted the devel-20181015 branch December 8, 2018 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LVM performance, supported services listing, misc fixes and tests #238

LVM performance, supported services listing, misc fixes and tests #238

marmarek commented Oct 22, 2018

codecov bot commented Oct 23, 2018 •

edited

Loading

LVM performance, supported services listing, misc fixes and tests #238

LVM performance, supported services listing, misc fixes and tests #238

Conversation

marmarek commented Oct 22, 2018

codecov bot commented Oct 23, 2018 • edited Loading

Codecov Report

codecov bot commented Oct 23, 2018 •

edited

Loading