Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qrexec sometimes leaks disposable VMs #9081

Open
DemiMarie opened this issue Apr 2, 2024 · 4 comments
Open

qrexec sometimes leaks disposable VMs #9081

DemiMarie opened this issue Apr 2, 2024 · 4 comments
Labels
affects-4.2 This issue affects Qubes OS 4.2. C: core diagnosed Technical diagnosis has been performed (see issue comments). P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. regression A bug in which a supported feature that worked in a prior Qubes OS release has stopped working.

Comments

@DemiMarie
Copy link

How to file a helpful issue

Qubes OS release

R4.2 but I suspect this has been a problem since before R4.1.

Brief summary

Various error paths in qrexec-client call exit() without cleaning up disposable VMs. This could be fixed with atexit(), but this is incompatible with QubesOS/qubes-core-qrexec#136.

Steps to reproduce

Not sure. I found this problem by source code inspection. I suspect it can be triggered by causing a vchan I/O error. In the case of dom0-initiated calls, this could also leak Xenstore entries.

Expected behavior

Everything cleaned up.

Actual behavior

Not everything cleaned up.

@DemiMarie DemiMarie added T: bug C: core P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels Apr 2, 2024
@marmarek
Copy link
Member

marmarek commented Apr 2, 2024

but I suspect this has been a problem since before R4.1.

unlikely, when qrexec-client was called from the python code, the disposable cleanup was handled there (with try/finally)

In the case of dom0-initiated calls

There is no such thing as automatic dispvm creation (and so - cleanup) by qrexec-daemon for dom0-initiated calls. If nothing else, dom0 doesn't have own qrexec-daemon.

As for the actual issue, one option is to restructure the code to not use exit() for error handling, but that could be complicated. An alternative would be to disarm earlier registered atexit callbacks by registering (just after fork()) one that calls _exit(). This should work correctly according to the documentation:

Functions so registered are called in the reverse order
of their registration; no arguments are passed.
...
If one of the registered functions calls _exit(2), then any remaining
functions are not invoked, and the other process termination steps per‐
formed by exit(3) are not performed.

@DemiMarie
Copy link
Author

but I suspect this has been a problem since before R4.1.

unlikely, when qrexec-client was called from the python code, the disposable cleanup was handled there (with try/finally)

Indeed, so this is an R4.2 regression.

In the case of dom0-initiated calls

There is no such thing as automatic dispvm creation (and so - cleanup) by qrexec-daemon for dom0-initiated calls. If nothing else, dom0 doesn't have own qrexec-daemon.

I thought that Python code that calls qrexec-client might rely on the -k option, but it does not appear to.

As for the actual issue, one option is to restructure the code to not use exit() for error handling, but that could be complicated.

I can try this and see how complex it is. Some of the calls to

An alternative would be to disarm earlier registered atexit callbacks by registering (just after fork()) one that calls _exit(). This should work correctly according to the documentation:

Functions so registered are called in the reverse order
of their registration; no arguments are passed.
...
If one of the registered functions calls _exit(2), then any remaining
functions are not invoked, and the other process termination steps per‐
formed by exit(3) are not performed.

This is indeed an option, provided that qubesd_call() does not call exit(), either directly or indirectly, and even in error conditions. This is because calling exit() more than once (either explicitly or implicitly) is undefined behavior.

A better solution would be to change the qubesd API: have the caller of admin.vm.CreateDisposable optionally pass a file descriptor (via SCM_RIGHTS) and, if one is sent, automatically destroy the VM when the file descriptor is readable. That’s a more complex patch, though.

@marmarek
Copy link
Member

marmarek commented Apr 2, 2024

A better solution would be to change the qubesd API: have the caller of admin.vm.CreateDisposable optionally pass a file descriptor (via SCM_RIGHTS) and

Nope, qrexec calls (which admin API conceptually is) do not have this capability. Even if technically it would be possible for a call made from dom0 (as it is implemented right now), lets not abuse the protocol.
But, there is another option - create a new call (admin.vm.CreateDisposableAndWait?) that instead of ignoring and closing stdin, keeps it open and when the remote side closes it (->EOF) the DispVm gets killed. It would still be inconsistent with all the other Admin API calls, which generally (receive the data and) close stdin before doing the action, but at least it would be in capabilities of a qrexec call concept.

@DemiMarie
Copy link
Author

A better solution would be to change the qubesd API: have the caller of admin.vm.CreateDisposable optionally pass a file descriptor (via SCM_RIGHTS) and

Nope, qrexec calls (which admin API conceptually is) do not have this capability. Even if technically it would be possible for a call made from dom0 (as it is implemented right now), lets not abuse the protocol. But, there is another option - create a new call (admin.vm.CreateDisposableAndWait?) that instead of ignoring and closing stdin, keeps it open and when the remote side closes it (->EOF) the DispVm gets killed. It would still be inconsistent with all the other Admin API calls, which generally (receive the data and) close stdin before doing the action, but at least it would be in capabilities of a qrexec call concept.

That’s a much better (and cleaner) solution, thank you. Still potentially awkward, and has issues if qubesd gets restarted, but at least not a disgusting, dom0-only hack.

Want me to work on this now, or should I focus on other tasks instead? A review on QubesOS/qubes-core-qrexec#138 would be appreciated. It fixes #9036 and #9073 as well as some test suite bugs, and it’s a prerequisite for #9037.

@andrewdavidwong andrewdavidwong added diagnosed Technical diagnosis has been performed (see issue comments). regression A bug in which a supported feature that worked in a prior Qubes OS release has stopped working. affects-4.2 This issue affects Qubes OS 4.2. labels Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.2 This issue affects Qubes OS 4.2. C: core diagnosed Technical diagnosis has been performed (see issue comments). P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. regression A bug in which a supported feature that worked in a prior Qubes OS release has stopped working.
Projects
None yet
Development

No branches or pull requests

3 participants