Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VMs not starting - libxenlight failed to create new-domain in 4.0rc-1 #3125

Closed
platschi opened this issue Sep 27, 2017 · 32 comments
Closed

VMs not starting - libxenlight failed to create new-domain in 4.0rc-1 #3125

platschi opened this issue Sep 27, 2017 · 32 comments
Labels
C: Xen P: major Priority: major. Between "default" and "critical" in severity. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@platschi
Copy link

platschi commented Sep 27, 2017

Qubes OS version (e.g., R3.2):

R4.0-rc1

Affected TemplateVMs (e.g., fedora-23, if applicable):

all, except sys-net


Steps to reproduce the behavior:

Installed fresh R4.0-rc1 iso. Qubes / all VMs worked as expected. Then updated to current-testing with sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Expected behavior:

Domains / ServiceVMs and Templates start normally.

Actual behavior:

When using the GUI app menu to start any VM, nothing happens. The Qubes tray icon only shows that sys-net is running.

In Terminal, receive this error (with any VM, in example trying to start sys-firewall)

[platschi@dom0 qubes]$ qvm-start sys-firewall`
Start failed: internal error: libxenlight failed to create new domain 'sys-firewall'

General notes:

In /var/log/qubes.log:

Starting sys-firewall
QubesException("Start failed: internal error: libxenlight failed to create new domain 'sys-firewall'",) while calling src=b'dom0' meth=b'admin.vm.Start' dest=b'sys-firewall' arg=b'' len(untrusted_payload)=0

Related issues:

@marmarek
Copy link
Member

See /var/log/libvirt/libxl/libxl-driver.log, but I guess you don't have VT-x enabled in BIOS.

@0spinboson
Copy link

0spinboson commented Sep 27, 2017

I'm also getting this since installing the latest set of (libvirt) updates, without rebooting. Downgrading "solved" the issue.

@marmarek
Copy link
Member

Ok, I think I've found what is the problem: updated xen package, but not xen-hvm-stubdom-linux with required changes. The later was just uploaded to testing repository, try now.

@0spinboson
Copy link

yes, it appears to be behaving properly again now.

@platschi
Copy link
Author

Exactly, the updated xen-hvm-stubdom-linux package solved it. Thanks a lot!

@P4z
Copy link

P4z commented Nov 25, 2017

I have just experienced this problem with R4.0 rc2 on Dell XPS L702X.

Worth to note:

  • the installer complained about issues with IO MMU (i7-2630QM)
  • the installer complained about missing root.img for vm-templates/fedora-25
  • the dom0 could not startup sys-net complaining about libxenlight

I'm back on R3.2 with no such issues. To me this is critical issue because I cannot even install R4

@andrewdavidwong andrewdavidwong added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. C: Xen labels Nov 25, 2017
@andrewdavidwong andrewdavidwong added this to the Release 4.0 milestone Nov 25, 2017
@reconmaster
Copy link

Reproduced this on x230 & t530 (i5s) with the sys-net qube using 4.0 rc3. I'll be following up with the newest rcs to verify.

@hast0011
Copy link

hast0011 commented Dec 15, 2017

Can be seen also on Dell Precision T5500 using 4.0 rc3 on sys-firewall

@marmarek
Copy link
Member

Note that default setup of Qubes 4.0 require IOMMU (aka VT-d). If installer says you don't have it, it will be a problem. If your hardware should support IOMMU/VT-d in theory, check if you have it enabled in BIOS.

@hast0011
Copy link

Back after reboot:-) It was enabled. The Installer didn't show any message about it - at least I didn't see any.

@marmarek
Copy link
Member

Check /var/log/libvirt/libxl/libxl-driver.log in dom0, you should have some more detailed error message there.

@reconmaster
Copy link

I'll be giving both the t530 and x230 shots with coreboot+seabios somewhere over the holiday. Other user advice on these models have mentioned firmware options not available in default lenovo firmware (17/09/27th i believe). I'll report back as soon as I get an outcome for both variables.

@na--
Copy link

na-- commented Jan 3, 2018

When I run qvm-start -v test, the following error appears:

Start failed: internal error: libxenlight failed to create new domain 'test', see /var/log/libvirt/libxl/libxl-driver.log for details

This is the content of /var/log/libvirt/libxl/libxl-driver.log:

2018-01-03 09:07:59.561+0000: libxl: libxl.c:422:libxl__domain_rename: domain with name "test-dm" already exists.
2018-01-03 09:07:59.562+0000: libxl: libxl_dm.c:2076:stubdom_pvqemu_cb: error connecting nics devices: Function not implemented
2018-01-03 09:07:59.562+0000: libxl: libxl_create.c:1542:domcreate_devmodel_started: device model did not start: -6

I'm using R4 and I updated everything with the latest versions yesterday (even enabled the qubes-dom0-current-testing repo in dom0)

@marmarek
Copy link
Member

marmarek commented Jan 9, 2018

domain with name "test-dm" already exists - some leftover from previous failed startup? Try remove it with sudo xl destroy test-dm

@na--
Copy link

na-- commented Jan 9, 2018

I tried that, it didn't help. Trying to start the VM after destroying the test-dm failed with the same error as before. Also, I'm not sure that the test-dm was correctly destroyed, xl list showed some null vms afterwards.
Edit: I'm unable to reproduce the issue since I restarted recently, but I'll try to be more thorough in diagnosing it when it happens next.

@marmarek
Copy link
Member

@na-- did it happened again? If not, I'd assume that one of updates have fixed it (directly or indirectly).

@na--
Copy link

na-- commented Jan 28, 2018

It hasn't happened again, so it may be fixed. I'll write here it it occurs again.

@andrewdavidwong
Copy link
Member

@platschi, @0spinboson, @P4z, @reconmaster, @hast0011:
Please let us know whether you're still affected by this issue.

@hast0011
Copy link

hast0011 commented Jan 29, 2018

I have still troubles getting Internet working on this machine, see other issue #3349. Therefore I can't install any update to see if this issue disappears. I decided to wait for rc4-iso then test again.

@hast0011
Copy link

hast0011 commented Feb 8, 2018

Installing Q4 rc4 solved the problem for me.

@P4z
Copy link

P4z commented Mar 30, 2018

Sorry about late answer.

Unfortunately Q4 rc4 is no difference here 8-(
I am about to try final release now

@andrewdavidwong andrewdavidwong removed this from the Release 4.0 milestone Mar 31, 2018
@andrewdavidwong andrewdavidwong added this to the Release 4.0 updates milestone Mar 31, 2018
@Nicolasverune
Copy link

Hi there!
Don't know if it's better to open up a new issue, so i decided to post my problem here.

I've the same issue, returning "libxenlight failed to create new domain 'sys-net'".
I tried to reinstall, nothing changed.
I did activate in the BIOS virtualization , but there is no IOMMU option.

I don't know how to upgrade 'xen-hvm-stubdom-linux'
I'm working with Qubes R4.0.1

Here is what /var/log/libvirt/libxl/libxl-driver.log returns ::

2019-02-02 11:12:48.699+0000: libxl: libxl_pci.c:1235:libxl__device_pci_add: PCI device 000:03:00.0 cannot be assigned - no IOMMU?
2019-02-02 11:12:48.699+0000: libxl: libxl_pci.c:1338:libxl__add_pcidevs: libxl_device_pci_add failed: -1
2019-02-02 11:12:48.699+0000: libxl: libxl_create.c:1512:domcreate_attach_devices: unable to add pci devices
2019-02-02 11:12:48.783+0000: libxl: libxl_linux.c:155:libxl__loopdev_cleanup: unable to release device /dev/loop0: No such device adress

Many thanks. (@marmarek @andrewdavidwong )

@Nicolasverune
Copy link

OK, i did try to install Qubes 3.1.2 instead of Q4 , and it works!
So one of the update must not be compatible with my setup.
So i ll try to reinstall next version, but for now, i'll will use 3.1.2.
Hope this can help someone !

@marmarek
Copy link
Member

marmarek commented Feb 3, 2019

PCI device 000:03:00.0 cannot be assigned - no IOMMU?

This is clearly lack of IOMMU. Installer should also warn about it. If you don't have such option in BIOS (sometimes named "I/O virtualization"), then looks like your hardware is incompatible with Qubes 4.0. See this FAQ entry and others below.

@Nicolasverune
Copy link

I knew their wasn't IOMMU, and installer did warn me, but as i did activate virtualization (i'll check tomorrow for the exact name), i thought it was enough.
I did check for my hardware, but couldn't find it. I'll add it at the list.

Thanks

@dancegit
Copy link

I have IOMMU set to true in the boot. Running qubes 4.0 (R4.0) and .. it does work 'sometimes' . I am using pureboot (heads+coreboot) and it does not work.. but when I was using only 'coreboot' it worked fine all the time.. perhaps I need to wait for purism to upgrade their coreboot build for pureboot or something like that.....

@sourceXORapprentice
Copy link

Experienced this today but it went away on its own, Qubes r4.0 (updated yesterday, xen_version 4.8.5-7.fc25, Linux 4.19.43-1.pvops.qubes.x86_64).

On boot I attempted to use Qubes Manager to switch the sys-firewall from Fedora 29 to a fresh Fedora 30 template. Starting the sys-firewall with only sys-net & sys-usb running produced the same error and it failed to start:
"Start failed: internal error: libxenlight failed to create new domain 'sys-firewall'"
Tried starting it a second time, same message, no sys-firewall.
Tried starting it a third time. same message, no sys-firewall. Started a sys-vpn connected to sys-net and loaded personal to come here. I've been using Qubes 4.0 for months so I'm assuming it's not a BIOS setting.
Tried starting it a fourth time while whispering "please-please-please-work" and it started no problem. Shutdown the sys-firewall and started a fifth time and connected my personal VM to it now, still works fine now.

Log:
libxl-driver.log

@andrewdavidwong andrewdavidwong added the P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. label Jul 21, 2019
@oliversturm
Copy link

I have just encountered this (or a similar) issue. My system has been running 8 days and I've started and stopped any number of vms. Just now I ran qvm-open-in-dvm to view a document and the error came up: Domain disp2648 has failed to start: internal error: libxenlight failed to create new domain 'disp2648' - I've never seen that before and I trust it will be gone if I restart (but I'm not doing that right now, I'm in the middle of things).

Here are the lines from libxl-driver.log that correspond to the startup attempt. I decided to post this because the log info looked different from what I saw above.

2019-11-07 14:36:59.051+0000: libxl: libxl_device.c:1093:device_backend_callback: unable to add device with path /local/domain/5/backend/vif/24/0
2019-11-07 14:36:59.051+0000: libxl: libxl_create.c:1512:domcreate_attach_devices: unable to add nic devices
2019-11-07 14:37:09.121+0000: libxl: libxl_device.c:1093:device_backend_callback: unable to remove device with path /local/domain/5/backend/vif/24/0
2019-11-07 14:37:09.132+0000: libxl: libxl.c:1669:devices_destroy_cb: libxl__devices_destroy failed for 24

@redshiftzero
Copy link

We're seeing this intermittently in SecureDrop land: freedomofpress/securedrop-workstation#498

libxl-driver.log provided by @rocodes here

@andrewdavidwong andrewdavidwong added P: major Priority: major. Between "default" and "critical" in severity. and removed P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels Mar 25, 2020
@eloquence
Copy link

At least in our case this seems to directly correlate with qmemman failing, see follow-up comments here.

@marmarek
Copy link
Member

extract from the above libxl-driver.log:

2020-03-25 14:05:19.582+0000: libxl: libxl_dm.c:2098:stubdom_xswait_cb: Stubdom 57 for 56 startup: startup timed out

This seems to be the issue. Can you check console log of the corresponding stubdom (/var/log/xen/console/guest-<VMNAME>-dm.log)?

@marmarek
Copy link
Member

marmarek commented Nov 9, 2020

There were at least 3 completely different bugs discussed here already (wrong xen-hvm-stubdom-linux package version, hardware lacking IOMMU, and startup race condition). Since all of them are either fixed or tracked elsewhere, I'm closing this quite already confusing issue.

@marmarek marmarek closed this as completed Nov 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Xen P: major Priority: major. Between "default" and "critical" in severity. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests