Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Disks get unavailable when VM is shut down #7490

Open
VincentHermes opened this issue May 3, 2023 · 7 comments · May be fixed by #9823
Open

Data Disks get unavailable when VM is shut down #7490

VincentHermes opened this issue May 3, 2023 · 7 comments · May be fixed by #9823

Comments

@VincentHermes
Copy link

ISSUE TYPE
Bug Report
COMPONENT NAME
Disk Controller
CLOUDSTACK VERSION
4.16
OS / ENVIRONMENT
KVM, Windows, SCSI rootDiskController

SUMMARY

Adding more than 6 Disks in a VM results in a second SCSI controller being created. The type of the controller varies whether the disk is attached while the VM is running or the VM is started while having more than 6 disks. If disks are added on the fly, everything works fine. If the VM is stopped and started while already having more than 6 disks, the second controller being added is of a type that breaks Windows 2022 (and others I think, still testing around).

STEPS TO REPRODUCE

Normal Disk Setting in XML
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/storpool-byid/nbmn.b.xxxx' index='0'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <serial>abcdefghijklmnop42</serial>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>

! Note the alias name in this config
Every other disk until the 6th will be configured the same way, the alias name iterates to "scsi0-0-0-5"

7th Disk Setting in XML if attached live, VM not being stopped
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/storpool-byid/nbmn.b.xxxx' index='7'/>
      <backingStore/>
      <target dev='sdg' bus='scsi'/>
      <serial>abcdefghijklmnop42</serial>
      <alias name='scsi1-0-0-0'/>
      <address type='drive' controller='1' bus='0' target='0' unit='0'/>

! Note the alias name in this config, its now "scsi1-0-0-0" which is okay as it has three zeroes for some reason and in the OS it is recognized as a "RedHat Virtio SCSI controller". All disks work correctly in the OS this way.

7th Disk Setting in XML if the VM has been stopped and then started again (XML gets recreated)
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/storpool-byid/nbmn.b.xxxx' index='7'/>
      <backingStore/>
      <target dev='sdg' bus='scsi'/>
      <serial>abcdefghijklmnop42</serial>
      <alias name='scsi1-0-0'/>
      <address type='drive' controller='1' bus='0' target='0' unit='0'/>

! Note the alias name in this config, its now "scsi1-0-0" so it is missing a zero and also it becomes a different type of controller. The RedHat driver no longer works. The only driver able to be installed for this device is the VMWare PVSCSI driver, which still renders the attached disks unavailable and breaks the Windows Boot (BSOD) even though the root disk is on the other controller. In this case you need to remove every disk until you have only 6 left and start the VM. If you attach a disk again, it will be a new "unknown device" again.

I wonder what happens if the VM has virtio instead of SCSI as rootDiskController. Checking that out.

EXPECTED RESULTS
At least keep the controller type the same
ACTUAL RESULTS
Customers bricking their VMs after being stopped one time because disks are missing.
@weizhouapache
Copy link
Member

moved to 4.18.2.0

@weizhouapache
Copy link
Member

I was not able to reproduce the issue

when add 6 data disk (7 in total including root disk), the xml has the following

# virsh dumpxml i-2-9-VM |grep scsi
      <target dev='sda' bus='scsi'/>
      <alias name='scsi0-0-0-0'/>
      <target dev='sdb' bus='scsi'/>
      <alias name='scsi0-0-0-1'/>
      <target dev='sdc' bus='scsi'/>
      <alias name='scsi0-0-0-2'/>
      <target dev='sde' bus='scsi'/>
      <alias name='scsi0-0-0-4'/>
      <target dev='sdf' bus='scsi'/>
      <alias name='scsi0-0-0-5'/>
      <target dev='sdg' bus='scsi'/>
      <alias name='scsi0-0-0-6'/>
      <target dev='sdh' bus='scsi'/>
      <alias name='scsi1-0-0-0'/>

when stop/start the vm, the output of the same (the index of data disks are different).

I tested on rocky8, below is the host information

# virsh version
Compiled against library: libvirt 8.0.0
Using library: libvirt 8.0.0
Using API: QEMU 8.0.0
Running hypervisor: QEMU 6.2.0


# cat /etc/os-release 
NAME="Rocky Linux"
VERSION="8.4 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel fedora"
VERSION_ID="8.4"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.4 (Green Obsidian)"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:rocky:rocky:8.4:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
ROCKY_SUPPORT_PRODUCT="Rocky Linux"
ROCKY_SUPPORT_PRODUCT_VERSION="8"

@VincentHermes
can you share the information of your kvm host ?

@tcp-dw
Copy link

tcp-dw commented Oct 9, 2024

In my case having OS type to "Other PV Virtio-SCSI" on the Windows VM, after adding 8 data disks (9 with root disk) I end up to the following domain xml:
<disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='io_uring' discard='unmap'/> <source dev='/dev/storpool-byid/zkr.b.t4yc' index='10'> <privateData> <nodenames> <nodename type='storage' name='libvirt-10-storage'/> <nodename type='format' name='libvirt-10-format'/> </nodenames> </privateData> </source> <backingStore/> <target dev='sda' bus='scsi'/> <serial>3a07b614d1704f50be96</serial> <alias name='scsi0-0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> <privateData> <qom name='scsi0-0-0-0'/> </privateData> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='io_uring' discard='unmap'/> <source dev='/dev/storpool-byid/zkr.b.t4yp' index='9'> <privateData> <nodenames> <nodename type='storage' name='libvirt-9-storage'/> <nodename type='format' name='libvirt-9-format'/> </nodenames> </privateData> </source> <backingStore/> <target dev='sdb' bus='scsi'/> <serial>9fe7994d35f044b5a37a</serial> <alias name='scsi0-0-0-1'/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> <privateData> <qom name='scsi0-0-0-1'/> </privateData> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='io_uring' discard='unmap'/> <source dev='/dev/storpool-byid/zkr.b.t4yq' index='8'> <privateData> <nodenames> <nodename type='storage' name='libvirt-8-storage'/> <nodename type='format' name='libvirt-8-format'/> </nodenames> </privateData> </source> <backingStore/> <target dev='sdc' bus='scsi'/> <serial>5f9c8bba843046338646</serial> <alias name='scsi0-0-0-2'/> <address type='drive' controller='0' bus='0' target='0' unit='2'/> <privateData> <qom name='scsi0-0-0-2'/> </privateData> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='io_uring' discard='unmap'/> <source dev='/dev/storpool-byid/zkr.b.t4yx' index='7'> <privateData> <nodenames> <nodename type='storage' name='libvirt-7-storage'/> <nodename type='format' name='libvirt-7-format'/> </nodenames> </privateData> </source> <backingStore/> <target dev='sde' bus='scsi'/> <serial>0f7dee19248e41f69782</serial> <alias name='scsi0-0-0-4'/> <address type='drive' controller='0' bus='0' target='0' unit='4'/> <privateData> <qom name='scsi0-0-0-4'/> </privateData> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='io_uring' discard='unmap'/> <source dev='/dev/storpool-byid/zkr.b.t4yo' index='6'> <privateData> <nodenames> <nodename type='storage' name='libvirt-6-storage'/> <nodename type='format' name='libvirt-6-format'/> </nodenames> </privateData> </source> <backingStore/> <target dev='sdf' bus='scsi'/> <serial>6c0820647f4b4daca6b8</serial> <alias name='scsi0-0-0-5'/> <address type='drive' controller='0' bus='0' target='0' unit='5'/> <privateData> <qom name='scsi0-0-0-5'/> </privateData> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='io_uring' discard='unmap'/> <source dev='/dev/storpool-byid/zkr.b.t4yt' index='5'> <privateData> <nodenames> <nodename type='storage' name='libvirt-5-storage'/> <nodename type='format' name='libvirt-5-format'/> </nodenames> </privateData> </source> <backingStore/> <target dev='sdg' bus='scsi'/> <serial>0c03146a946f4f989e16</serial> <alias name='scsi0-0-0-6'/> <address type='drive' controller='0' bus='0' target='0' unit='6'/> <privateData> <qom name='scsi0-0-0-6'/> </privateData> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='io_uring' discard='unmap'/> <source dev='/dev/storpool-byid/zkr.b.t4y1' index='4'> <privateData> <nodenames> <nodename type='storage' name='libvirt-4-storage'/> <nodename type='format' name='libvirt-4-format'/> </nodenames> </privateData> </source> <backingStore/> <target dev='sdh' bus='scsi'/> <serial>709bd8ed34a743dd81d5</serial> <alias name='scsi1-0-0'/> <address type='drive' controller='1' bus='0' target='0' unit='0'/> <privateData> <qom name='scsi1-0-0'/> </privateData> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='io_uring' discard='unmap'/> <source dev='/dev/storpool-byid/zkr.b.t4yu' index='3'> <privateData> <nodenames> <nodename type='storage' name='libvirt-3-storage'/> <nodename type='format' name='libvirt-3-format'/> </nodenames> </privateData> </source> <backingStore/> <target dev='sdi' bus='scsi'/> <serial>287f17c5425841f0a127</serial> <alias name='scsi1-0-1'/> <address type='drive' controller='1' bus='0' target='0' unit='1'/> <privateData> <qom name='scsi1-0-1'/> </privateData> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='io_uring' discard='unmap'/> <source dev='/dev/storpool-byid/zkr.b.t4yw' index='2'> <privateData> <nodenames> <nodename type='storage' name='libvirt-2-storage'/> <nodename type='format' name='libvirt-2-format'/> </nodenames> </privateData> </source> <backingStore/> <target dev='sdj' bus='scsi'/> <serial>cdd42a01e7454e89b788</serial> <alias name='scsi1-0-2'/> <address type='drive' controller='1' bus='0' target='0' unit='2'/> <privateData> <qom name='scsi1-0-2'/> </privateData> </disk> <disk type='file' device='cdrom'> <driver name='qemu'/> <target dev='sdd' bus='sata'/> <readonly/> <alias name='sata0-0-3'/> <address type='drive' controller='0' bus='0' target='0' unit='3'/> <privateData> <qom name='sata0-0-3'/> </privateData> </disk> <controller type='scsi' index='0' model='virtio-scsi'> <driver queues='2'/> <alias name='scsi0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </controller> <controller type='sata' index='0'> <alias name='ide'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> </controller> <controller type='pci' index='0' model='pcie-root'> <alias name='pcie.0'/> </controller> <controller type='scsi' index='1' model='lsilogic'> <alias name='scsi1'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x01' function='0x0'/> </controller>

Windows detects the 6 first disks attached to virtio-scsi controller #0 but not the rest, because they are attached on lsilogic controller # 1.
A possible workaround might be to use agent transformation hook to change that behavior but it may be more proper to fix it in the code.

@VincentHermes
Copy link
Author

VincentHermes commented Oct 9, 2024

@tcp-dw Thank you for reproducing. It seems like scsi1-0-0 and scsi1-0-1 in your case with "lsilogic" controller is the same problem as for me. Windows sees only the 6 disks on the first controller. I found some issue in the source code where the xml is created when I inspected this 16 months ago but honestly, I dont remember and I dont even know java. We set a limit of 6 disks for windows VMs in our custom UI and thats it.

@tcp-dw
Copy link

tcp-dw commented Oct 9, 2024

I did further testing and it seems that it creates a new controller of type lsilogic in every next 6 disks. Every time the first controller's model is 'virtio-scsi'.
It would be a very easy fix in that case, by setting subsequent controller's model to 'virtio-scsi' and not leaving the default model of 'lsilogic', given that OS type is "Other PV Virtio-SCSI" which presents newer virtio-scsi controller to the VM.

@weizhouapache
Copy link
Member

thanks @tcp-dw
I was able to reproduce the issue

  • create vm
  • attach 6 disks. it looks good
    <controller type='scsi' index='0' model='virtio-scsi'>
      <driver queues='1'/>
      <alias name='scsi0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </controller>
    <controller type='scsi' index='1' model='virtio-scsi'>
      <alias name='scsi1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </controller>
  • stop and start the vm
    <controller type='scsi' index='0' model='virtio-scsi'>
      <driver queues='1'/>
      <alias name='scsi0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </controller>
...
    <controller type='scsi' index='1' model='lsilogic'>
      <alias name='scsi1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>

It seems to be a bug, as the controller id of all virtio-scsi disks are hardcoded as 0 ((short)0)

return new SCSIDef((short)0, 0, 0, 9, 0, vcpus, isIothreadsEnabled);

@weizhouapache
Copy link
Member

@VincentHermes @tcp-dw
I have created a PR #9823 for this issue.

If you know how to build the project, you can port to 4.18/4.19/main and test it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment