-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dispVM halts prematurely with custom .desktop file involving qrexec-client-vm
#3318
Comments
Just to be sure - have you checked the script isn't simply crashing? For example it should be |
@marmarek, yes, I've run this on a non-dispVM (the "base" VM the dispVM is based on) and it runs without problem. And right, |
I've come up with a minimal example of this behavior. It seems to be triggered when making In this example we're going to make an RPC call from On
Add the following to
And also run that command in your running On personal, create
Although it's not strictly necessary for this demo, to
and also run that command on your running On
And allow work to be a dispvm template:
Now we'll all set up. On
You should see the expected output:
OK, that's what we expect. Now run the same thing on a disposable VM using
The call never returns, and the disposable VM that was created has crashed. I believe that if |
… port (unconfirmed theory) Only one side of the connection should report MSG_CONNECTION_TERMINATED - the one where vchan port was allocated. This is especially important for VM-VM connection, because when two such connections are established between the same domains, in opposite directions - both will share the same vchan port number, so qrexec-daemon will get confused which connection was terminated. QubesOS/qubes-issues#3318
I think this is about qrexec connection cleanup. When you execute two qrexec connections between the same domains (and those are the only connections), but in opposite directions, they will be assigned the same vchan port number. This is normally ok, because connection direction differs (different VM act as a server). But connection cleanup code got confused and report wrong connection being closed. The only code using that report in practice is cleanup of DispVM. I've pushed preliminary attempt to fix this, but it doesn't work yet. |
@marmarek thanks for your prompt attention! I suspected something along those lines, but had no idea where to start looking... I'll be interested to take a close look at your patch. Is there documentation anywhere about setting up a development environment for building and testing these core libraries? I'd rather be submitting patches than reporting bugs ;) |
See here: https://www.qubes-os.org/doc/development-workflow/ As for the bug - the dom0 counterpart is here: https://github.com/QubesOS/qubes-core-admin-linux/blob/master/qrexec/qrexec-daemon.c#L387-L517 and https://github.com/QubesOS/qubes-core-admin-linux/blob/master/qrexec/qrexec-client.c#L713-L805. Unfortunately the logic about vchan port allocation is quite convoluted (the second link), mostly because dom0 is special - do not have own qrexec-agent/qrexec-daemon pair. BTW if you want to install qrexec service in one VM only, you can use |
Thanks so much, I'll spend some time with that tomorrow. |
@andrewdavidwong |
Are you on R4.1? I'm using R4.0 current-testing, and the one-liner test command from #4273 (which is a duplicate of this bug) still crashes the DispVM. |
For me this one-liner test works fine. |
On Fri, Aug 02, 2019 at 07:31:07AM -0700, Rusty Bird wrote:
@unman:
> The test scripts provided by @joshuathayer now seem to work as expected, and output is same between 'target' and 'dispVM:target'
Are you on R4.1? I'm using R4.0 current-testing, and the one-liner test command from #4273 (which is a duplicate of this bug) still crashes the DispVM after a few seconds.
4.0 current-testing
|
Hmm, strange. I have qubes-core-agent-4.0.47 in the Fedora 30 based sourcevm (and qubes-core-dom0-linux-4.0.18 in dom0). Its I run |
Ok, I can confirm it's still broken. I think it depends on something during source vm startup. After some source vm reboots it works reliably, but after some, it's broken. |
Just a shot in the dark, but it seems plausible if there's indeed some VM startup thing: My computer is pretty slow, and the bug always triggers. So maybe putting your (presumably faster) system under high CPU and/or I/O load during e.g. sourcevm startup would also more consistently reproduce it for you. |
Yes, it's likely. I guess it's related to connection identifier / vchan port number, which are assigned on first-free basis. See #3318 (comment) |
I think I can confirm what happens in this case. I added a debug print to qrexec-daemon whenever it allocated a port. Then I reproduced the issue:
Here is the daemon log from source VM (with identifiers replaced):
And here is one from the disp VM:
Looks like there are two connections over port 513:
Then, the second |
Initially I thought this is a different kind of collision: the "allocated port" is sometimes allocated for a vchan client (in case of I think a good move would be to prevent any kinds of collisions between two daemons. In other words:
A simple scheme to do that would be:
I ran a proof-of-concept and it seems to solve this specific bug. With the current protocol, I think it's cleaner than tracking where a given connection came from, who was running a server, etc. |
Should prevent the situations where a wrong vchan connection is terminated, which causes e.g. premature termination of a DispVM. See longer discussion in QubesOS/qubes-issues#3318.
@marmarek: Have you considered cherry-picking this patch for R4.0? It applies cleanly in qubes-core-admin-linux (after adjusting the file path). |
Should prevent the situations where a wrong vchan connection is terminated, which causes e.g. premature termination of a DispVM. See longer discussion in QubesOS/qubes-issues#3318.
Backported, included in QubesOS/updates-status#1787 |
Qubes OS version:
4.0rc2 (
dom0
and templates up to date with testing repos)Affected TemplateVMs:
At least fedora-25
Steps to reproduce the behavior:
I have a situation where I'm trying to open a particular file format in a disposable VM, from a (Whonix-based) AppVM, eg:
In my
work
VM I've configured xdg to associate.custom-ext
to a python script. That script processes the given file, and as a side effect there's a call toqrexec-client-vm
, eg:Expected behavior:
I'd expect:
work
VM to startActual behavior:
My script seems to run as far as the call to qrexec (which completes successfully), and then the entire VM gets shut down. While the machine is shutting down, my script is still running: based on writes to STDERR, it's able to make some inconsistent number of lines before the machine is down.
General notes:
Removing the
Popen
call from my script allows it to run and complete normally.Running the command on a non-disposable VM allows the script to run and complete normally, eg:
works great.
I suspected the issue may have had something to do with my script's
STDIN/OUT/ERR
being passed to the spawned child, but the following didn't change behavior at all:For debugging, i tried making other calls which involve a
fork()
in my processing script, and those calls seemed to complete without error and without causing the VM to halt. Only the call toqrexec-client-vm
caused the machine to halt.Also for debugging I tried not using a pipe for
stdin
(instead using the same/dev/null
filehandle), and that also didn't change matters.I'm stumped. I'm not sure this is a Qubes bug as much as my not understanding something, but I do expect the behavior I noted above. Thanks for taking a look!
Related issues:
The text was updated successfully, but these errors were encountered: