-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migration: add capability to bypass the shared memory #2
migration: add capability to bypass the shared memory #2
Conversation
migration/migration.c
Outdated
MigrationState *s; | ||
|
||
/* it is not workable with postcopy yet. */ | ||
if (migrate_postcopy_ram()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing braces
migration/ram.c
Outdated
@@ -772,6 +772,10 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, | |||
unsigned long *bitmap = rb->bmap; | |||
unsigned long next; | |||
|
|||
/* when this ramblock is requested bypassing */ | |||
if (!bitmap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing brances
ram_state_reset(*rsp); | ||
|
||
return 0; | ||
} | ||
|
||
static void ram_list_init_bitmaps(void) | ||
static void ram_list_init_bitmaps(RAMState *rs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if rs
is NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is possible, the original code would panic at https://github.com/kata-containers/qemu/pull/2/files#diff-7a5fd992f4559f6907ed9fd0a89cd159R839
@@ -842,7 +846,9 @@ static void migration_bitmap_sync(RAMState *rs) | |||
qemu_mutex_lock(&rs->bitmap_mutex); | |||
rcu_read_lock(); | |||
RAMBLOCK_FOREACH(block) { | |||
migration_bitmap_sync_range(rs, block, 0, block->used_length); | |||
if (!migrate_bypass_shared_memory() || !qemu_ram_is_shared(block)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though migrate_bypass_shared_memory
was enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even when migrate_bypass_shared_memory
is set, ram can be set private instead of being shared. The only case we can skip syncing is migrate_bypass_shared_memory && ram_is_shared
.
cc @laijs |
1) What's this When the migration capability 'bypass-shared-memory' is set, the shared memory will be bypassed when migration. It is the key feature to enable several excellent features for the qemu, such as qemu-local-migration, qemu-live-update, extremely-fast-save-restore, vm-template, vm-fast-live-clone, yet-another-post-copy-migration, etc.. The philosophy behind this key feature, including the resulting advanced key features, is that a part of the memory management is separated out from the qemu, and let the other toolkits such as libvirt, kata-containers (https://github.com/kata-containers) runv(https://github.com/hyperhq/runv/) or some multiple cooperative qemu commands directly access to it, manage it, provide features on it. 2) Status in real world The hyperhq(http://hyper.sh http://hypercontainer.io/) introduced the feature vm-template(vm-fast-live-clone) to the hyper container for several years, it works perfect. (see hyperhq/runv#297). The feature vm-template makes the containers(VMs) can be started in 130ms and save 80M memory for every container(VM). So that the hyper containers are fast and high-density as normal containers. kata-containers project (https://github.com/kata-containers) which was launched by hyper, intel and friends and which descended from runv (and clear-container) should have this feature enabled. Unfortunately, due to the code confliction between runv&cc, this feature was temporary disabled and it is being brought back by hyper and intel team. 3) How to use and bring up advanced features. In current qemu command line, shared memory has to be configured via memory-object. a) feature: qemu-local-migration, qemu-live-update Set the mem-path on the tmpfs and set share=on for it when start the vm. example: -object \ memory-backend-file,id=mem,size=128M,mem-path=/dev/shm/memory,share=on \ -numa node,nodeid=0,cpus=0-7,memdev=mem when you want to migrate the vm locally (after fixed a security bug of the qemu-binary, or other reason), you can start a new qemu with the same command line and -incoming, then you can migrate the vm from the old qemu to the new qemu with the migration capability 'bypass-shared-memory' set. The migration will migrate the device-state *ONLY*, the memory is the origin memory backed by tmpfs file. b) feature: extremely-fast-save-restore the same above, but the mem-path is on the persistent file system. c) feature: vm-template, vm-fast-live-clone the template vm is started as 1), and paused when the guest reaches the template point(example: the guest app is ready), then the template vm is saved. (the qemu process of the template can be killed now, because we need only the memory and the device state files (in tmpfs)). Then we can launch one or multiple VMs base on the template vm states, the new VMs are started without the “share=on”, all the new VMs share the initial memory from the memory file, they save a lot of memory. all the new VMs start from the template point, the guest app can go to work quickly. The new VM booted from template vm can’t become template again, if you need this unusual chained-template feature, you can write a cloneable-tmpfs kernel module for it. The libvirt toolkit can’t manage vm-template currently, in the hyperhq/runv, we use qemu wrapper script to do it. I hope someone add “libvrit managed template” feature to libvirt. d) feature: yet-another-post-copy-migration It is a possible feature, no toolkit can do it well now. Using nbd server/client on the memory file is reluctantly Ok but inconvenient. A special feature for tmpfs might be needed to fully complete this feature. No one need yet another post copy migration method, but it is possible when some crazy man need it. Cc: Samuel Ortiz <[email protected]> Cc: Sebastien Boeuf <[email protected]> Cc: James O. D. Hunt <[email protected]> Cc: Xu Wang <[email protected]> Cc: Peng Tao <[email protected]> Cc: Xiao Guangrong <[email protected]> Cc: Xiao Guangrong <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]>
6bfb0a7
to
8ba3a0f
Compare
@devimc PR updated. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Apart from the performance improvement, is there a test to add to ensure this is DTRT? lgtm. /cc @anthonyzxu, @sboeuf, @markdryan, @rbradford. |
@jodh-intel Any suggestions on how to add tests for it here? It seems there is no CI for the qemu repo. Do we build qemu-lite from source in CI? If so, I can add some tests for vm factory in the tests repo and make it depend on this PR. |
@bergwolf - no, we don't any more: https://github.com/kata-containers/tests/blob/master/.ci/install_qemu.sh#L15. Not a blocker and it doesn't even need to be in this repo but it would be good to be able to assert we're getting the expected behaviour. |
@jodh-intel In that case, I would suggest merging this PR first and then I can start adding tests for vm factory in the tests repo that would rely on this patch. OTOH, the patch itself has been in production use on https://hyper.sh for more than two years so I'd say it's quite stable. WDYT? |
lgtm I'll wait to see if we can get one more review on this PR today but your plan sounds good to me. btw @chavafg - I think we should try to setup a CI for qemu at some point as we're modifying the code and it would be best if we could find issues with qemu here rather than when we test it in combination with other system elements. /cc @grahamwhaley, @sboeuf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Q. (not sure I saw mention) - are we trying/intending to push this upstream? That would be our preference of course :-)
I'm fine with the merge now and tests added in the other repo.
Also, yes, if we are going to carry qemu patches (and patching this repo is very rare for us normally), then we should probably enable the CI here - it is going to be very low cost I think.
@grahamwhaley Yes, this is being pushed to QEMU upstream. |
After f771c54 it is possible to select device and head which to take screendump from. And even though we check if provided head number falls within range, it may still happen that the console has no surface yet leading to SIGSEGV: qemu.git $ ./x86_64-softmmu/qemu-system-x86_64 \ -qmp stdio \ -device virtio-vga,id=video0,max_outputs=4 {"execute":"qmp_capabilities"} {"execute":"screendump", "arguments":{"filename":"/tmp/screen.ppm", "device":"video0", "head":1}} Segmentation fault #0 0x00005628249dda88 in ppm_save (filename=0x56282826cbc0 "/tmp/screen.ppm", ds=0x0, errp=0x7fff52a6fae0) at ui/console.c:304 #1 0x00005628249ddd9b in qmp_screendump (filename=0x56282826cbc0 "/tmp/screen.ppm", has_device=true, device=0x5628276902d0 "video0", has_head=true, head=1, errp=0x7fff52a6fae0) at ui/console.c:375 #2 0x00005628247740df in qmp_marshal_screendump (args=0x562828265e00, ret=0x7fff52a6fb68, errp=0x7fff52a6fb60) at qapi/qapi-commands-ui.c:110 Here, @ds from frame #0 (or @surface from frame #1) is dereferenced at the very beginning of ppm_save(). And because it's NULL crash happens. Signed-off-by: Michal Privoznik <[email protected]> Reviewed-by: Thomas Huth <[email protected]> Message-id: cb05bb1909daa6ba62145c0194aafa05a14ed3d1.1526569138.git.mprivozn@redhat.com Signed-off-by: Gerd Hoffmann <[email protected]> (cherry picked from commit 08d9864) Signed-off-by: Michael Roth <[email protected]>
One multifd channel will shutdown all the other multifd's IOChannel when it fails to receive an IOChannel. In this senario, if some multifds had not received its IOChannel yet, it would try to shutdown its IOChannel which could cause nullptr access at qio_channel_shutdown. Here is the coredump stack: #0 object_get_class (obj=obj@entry=0x0) at qom/object.c:908 kata-containers#1 0x00005563fdbb8f4a in qio_channel_shutdown (ioc=0x0, how=QIO_CHANNEL_SHUTDOWN_BOTH, errp=0x0) at io/channel.c:355 kata-containers#2 0x00005563fd7b4c5f in multifd_recv_terminate_threads (err=<optimized out>) at migration/ram.c:1280 kata-containers#3 0x00005563fd7bc019 in multifd_recv_new_channel (ioc=ioc@entry=0x556400255610, errp=errp@entry=0x7ffec07dce00) at migration/ram.c:1478 kata-containers#4 0x00005563fda82177 in migration_ioc_process_incoming (ioc=ioc@entry=0x556400255610, errp=errp@entry=0x7ffec07dce30) at migration/migration.c:605 kata-containers#5 0x00005563fda8567d in migration_channel_process_incoming (ioc=0x556400255610) at migration/channel.c:44 kata-containers#6 0x00005563fda83ee0 in socket_accept_incoming_migration (listener=0x5563fff6b920, cioc=0x556400255610, opaque=<optimized out>) at migration/socket.c:166 kata-containers#7 0x00005563fdbc25cd in qio_net_listener_channel_func (ioc=<optimized out>, condition=<optimized out>, opaque=<optimized out>) at io/net-listener.c:54 kata-containers#8 0x00007f895b6fe9a9 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 kata-containers#9 0x00005563fdc18136 in glib_pollfds_poll () at util/main-loop.c:218 kata-containers#10 0x00005563fdc181b5 in os_host_main_loop_wait (timeout=1000000000) at util/main-loop.c:241 qemu#11 0x00005563fdc183a2 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:517 qemu#12 0x00005563fd8edb37 in main_loop () at vl.c:1791 qemu#13 0x00005563fd74fd45 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4473 To fix it up, let's check p->c before calling qio_channel_shutdown. Signed-off-by: Jiahui Cen <[email protected]> Signed-off-by: Ying Fang <[email protected]> Reviewed-by: Juan Quintela <[email protected]> Signed-off-by: Juan Quintela <[email protected]>
…threads One multifd will lock all the other multifds' IOChannel mutex to inform them to quit by setting p->quit or shutting down p->c. In this senario, if some multifds had already been terminated and multifd_load_cleanup/multifd_save_cleanup had destroyed their mutex, it could cause destroyed mutex access when trying lock their mutex. Here is the coredump stack: #0 0x00007f81a2794437 in raise () from /usr/lib64/libc.so.6 kata-containers#1 0x00007f81a2795b28 in abort () from /usr/lib64/libc.so.6 kata-containers#2 0x00007f81a278d1b6 in __assert_fail_base () from /usr/lib64/libc.so.6 kata-containers#3 0x00007f81a278d262 in __assert_fail () from /usr/lib64/libc.so.6 kata-containers#4 0x000055eb1bfadbd3 in qemu_mutex_lock_impl (mutex=0x55eb1e2d1988, file=<optimized out>, line=<optimized out>) at util/qemu-thread-posix.c:64 kata-containers#5 0x000055eb1bb4564a in multifd_send_terminate_threads (err=<optimized out>) at migration/ram.c:1015 kata-containers#6 0x000055eb1bb4bb7f in multifd_send_thread (opaque=0x55eb1e2d19f8) at migration/ram.c:1171 kata-containers#7 0x000055eb1bfad628 in qemu_thread_start (args=0x55eb1e170450) at util/qemu-thread-posix.c:502 kata-containers#8 0x00007f81a2b36df5 in start_thread () from /usr/lib64/libpthread.so.0 kata-containers#9 0x00007f81a286048d in clone () from /usr/lib64/libc.so.6 To fix it up, let's destroy the mutex after all the other multifd threads had been terminated. Signed-off-by: Jiahui Cen <[email protected]> Signed-off-by: Ying Fang <[email protected]> Reviewed-by: Juan Quintela <[email protected]> Signed-off-by: Juan Quintela <[email protected]>
v->vq forgot to cleanup in virtio_9p_device_unrealize, the memory leak stack is as follow: Direct leak of 14336 byte(s) in 2 object(s) allocated from: #0 0x7f819ae43970 (/lib64/libasan.so.5+0xef970) ??:? kata-containers#1 0x7f819872f49d (/lib64/libglib-2.0.so.0+0x5249d) ??:? kata-containers#2 0x55a3a58da624 (./x86_64-softmmu/qemu-system-x86_64+0x2c14624) /mnt/sdb/qemu/hw/virtio/virtio.c:2327 kata-containers#3 0x55a3a571bac7 (./x86_64-softmmu/qemu-system-x86_64+0x2a55ac7) /mnt/sdb/qemu/hw/9pfs/virtio-9p-device.c:209 kata-containers#4 0x55a3a58e7bc6 (./x86_64-softmmu/qemu-system-x86_64+0x2c21bc6) /mnt/sdb/qemu/hw/virtio/virtio.c:3504 kata-containers#5 0x55a3a5ebfb37 (./x86_64-softmmu/qemu-system-x86_64+0x31f9b37) /mnt/sdb/qemu/hw/core/qdev.c:876 Reported-by: Euler Robot <[email protected]> Signed-off-by: Pan Nengyuan <[email protected]> Message-Id: <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Christian Schoenebeck <[email protected]> Acked-by: Greg Kurz <[email protected]>
This patch fix memleaks when attaching/detaching virtio-scsi device, the memory leak stack is as follow: Direct leak of 21504 byte(s) in 3 object(s) allocated from: #0 0x7f491f2f2970 (/lib64/libasan.so.5+0xef970) ??:? kata-containers#1 0x7f491e94649d (/lib64/libglib-2.0.so.0+0x5249d) ??:? kata-containers#2 0x564d0f3919fa (./x86_64-softmmu/qemu-system-x86_64+0x2c3e9fa) /mnt/sdb/qemu/hw/virtio/virtio.c:2333 kata-containers#3 0x564d0f2eca55 (./x86_64-softmmu/qemu-system-x86_64+0x2b99a55) /mnt/sdb/qemu/hw/scsi/virtio-scsi.c:912 kata-containers#4 0x564d0f2ece7b (./x86_64-softmmu/qemu-system-x86_64+0x2b99e7b) /mnt/sdb/qemu/hw/scsi/virtio-scsi.c:924 kata-containers#5 0x564d0f39ee47 (./x86_64-softmmu/qemu-system-x86_64+0x2c4be47) /mnt/sdb/qemu/hw/virtio/virtio.c:3531 kata-containers#6 0x564d0f980224 (./x86_64-softmmu/qemu-system-x86_64+0x322d224) /mnt/sdb/qemu/hw/core/qdev.c:865 Reported-by: Euler Robot <[email protected]> Signed-off-by: Pan Nengyuan <[email protected]> Message-Id: <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]>
Receive/transmit/event vqs forgot to cleanup in vhost_vsock_unrealize. This patch save receive/transmit vq pointer in realize() and cleanup vqs through those vq pointers in unrealize(). The leak stack is as follow: Direct leak of 21504 byte(s) in 3 object(s) allocated from: #0 0x7f86a1356970 (/lib64/libasan.so.5+0xef970) ??:? kata-containers#1 0x7f86a09aa49d (/lib64/libglib-2.0.so.0+0x5249d) ??:? kata-containers#2 0x5604852f85ca (./x86_64-softmmu/qemu-system-x86_64+0x2c3e5ca) /mnt/sdb/qemu/hw/virtio/virtio.c:2333 kata-containers#3 0x560485356208 (./x86_64-softmmu/qemu-system-x86_64+0x2c9c208) /mnt/sdb/qemu/hw/virtio/vhost-vsock.c:339 kata-containers#4 0x560485305a17 (./x86_64-softmmu/qemu-system-x86_64+0x2c4ba17) /mnt/sdb/qemu/hw/virtio/virtio.c:3531 kata-containers#5 0x5604858e6b65 (./x86_64-softmmu/qemu-system-x86_64+0x322cb65) /mnt/sdb/qemu/hw/core/qdev.c:865 kata-containers#6 0x5604861e6c41 (./x86_64-softmmu/qemu-system-x86_64+0x3b2cc41) /mnt/sdb/qemu/qom/object.c:2102 Reported-by: Euler Robot <[email protected]> Signed-off-by: Pan Nengyuan <[email protected]> Message-Id: <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
When adding new devices implementing QOM interfaces, we might forgot to add the Kconfig dependency that pulls the required objects in when building. Since QOM dependencies are resolved at runtime, we don't get any link-time failures, and QEMU aborts while starting: $ qemu ... Segmentation fault (core dumped) (gdb) bt #0 0x00007ff6e96b1e35 in raise () from /lib64/libc.so.6 kata-containers#1 0x00007ff6e969c895 in abort () from /lib64/libc.so.6 kata-containers#2 0x00005572bc5051cf in type_initialize (ti=0x5572be6f1200) at qom/object.c:323 kata-containers#3 0x00005572bc505074 in type_initialize (ti=0x5572be6f1800) at qom/object.c:301 kata-containers#4 0x00005572bc505074 in type_initialize (ti=0x5572be6e48e0) at qom/object.c:301 kata-containers#5 0x00005572bc506939 in object_class_by_name (typename=0x5572bc56109a) at qom/object.c:959 kata-containers#6 0x00005572bc503dd5 in cpu_class_by_name (typename=0x5572bc56109a, cpu_model=0x5572be6d9930) at hw/core/cpu.c:286 Since the caller has access to the qdev parent/interface names, we can simply display them to avoid starting a debugger: $ qemu ... qemu: missing interface 'fancy-if' for object 'fancy-dev' Aborted (core dumped) This commit is similar to e02bdf1 ("Display more helpful message when an object type is missing"). Signed-off-by: Philippe Mathieu-Daudé <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
This patch fix memleaks when attaching/detaching virtio-scsi device, the memory leak stack is as follow: Direct leak of 21504 byte(s) in 3 object(s) allocated from: #0 0x7f491f2f2970 (/lib64/libasan.so.5+0xef970) ??:? kata-containers#1 0x7f491e94649d (/lib64/libglib-2.0.so.0+0x5249d) ??:? kata-containers#2 0x564d0f3919fa (./x86_64-softmmu/qemu-system-x86_64+0x2c3e9fa) /mnt/sdb/qemu/hw/virtio/virtio.c:2333 kata-containers#3 0x564d0f2eca55 (./x86_64-softmmu/qemu-system-x86_64+0x2b99a55) /mnt/sdb/qemu/hw/scsi/virtio-scsi.c:912 kata-containers#4 0x564d0f2ece7b (./x86_64-softmmu/qemu-system-x86_64+0x2b99e7b) /mnt/sdb/qemu/hw/scsi/virtio-scsi.c:924 kata-containers#5 0x564d0f39ee47 (./x86_64-softmmu/qemu-system-x86_64+0x2c4be47) /mnt/sdb/qemu/hw/virtio/virtio.c:3531 kata-containers#6 0x564d0f980224 (./x86_64-softmmu/qemu-system-x86_64+0x322d224) /mnt/sdb/qemu/hw/core/qdev.c:865 Reported-by: Euler Robot <[email protected]> Signed-off-by: Pan Nengyuan <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
All paths that lead to bdrv_backup_top_drop(), except for the call from backup_clean(), imply that the BDS AioContext has already been acquired, so doing it there too can potentially lead to QEMU hanging on AIO_WAIT_WHILE(). An easy way to trigger this situation is by issuing a two actions transaction, with a proper and a bogus blockdev-backup, so the second one will trigger a rollback. This will trigger a hang with an stack trace like this one: #0 0x00007fb680c75016 in __GI_ppoll (fds=0x55e74580f7c0, nfds=1, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 kata-containers#1 0x000055e743386e09 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 kata-containers#2 0x000055e743386e09 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:336 kata-containers#3 0x000055e743388dc4 in aio_poll (ctx=0x55e7458925d0, blocking=blocking@entry=true) at util/aio-posix.c:669 kata-containers#4 0x000055e743305dea in bdrv_flush (bs=bs@entry=0x55e74593c0d0) at block/io.c:2878 kata-containers#5 0x000055e7432be58e in bdrv_close (bs=0x55e74593c0d0) at block.c:4017 kata-containers#6 0x000055e7432be58e in bdrv_delete (bs=<optimized out>) at block.c:4262 kata-containers#7 0x000055e7432be58e in bdrv_unref (bs=bs@entry=0x55e74593c0d0) at block.c:5644 kata-containers#8 0x000055e743316b9b in bdrv_backup_top_drop (bs=bs@entry=0x55e74593c0d0) at block/backup-top.c:273 kata-containers#9 0x000055e74331461f in backup_job_create (job_id=0x0, bs=bs@entry=0x55e7458d5820, target=target@entry=0x55e74589f640, speed=0, sync_mode=MIRROR_SYNC_MODE_FULL, sync_bitmap=sync_bitmap@entry=0x0, bitmap_mode=BITMAP_SYNC_MODE_ON_SUCCESS, compress=false, filter_node_name=0x0, on_source_error=BLOCKDEV_ON_ERROR_REPORT, on_target_error=BLOCKDEV_ON_ERROR_REPORT, creation_flags=0, cb=0x0, opaque=0x0, txn=0x0, errp=0x7ffddfd1efb0) at block/backup.c:478 kata-containers#10 0x000055e74315bc52 in do_backup_common (backup=backup@entry=0x55e746c066d0, bs=bs@entry=0x55e7458d5820, target_bs=target_bs@entry=0x55e74589f640, aio_context=aio_context@entry=0x55e7458a91e0, txn=txn@entry=0x0, errp=errp@entry=0x7ffddfd1efb0) at blockdev.c:3580 qemu#11 0x000055e74315c37c in do_blockdev_backup (backup=backup@entry=0x55e746c066d0, txn=0x0, errp=errp@entry=0x7ffddfd1efb0) at /usr/src/debug/qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64/./qapi/qapi-types-block-core.h:1492 qemu#12 0x000055e74315c449 in blockdev_backup_prepare (common=0x55e746a8de90, errp=0x7ffddfd1f018) at blockdev.c:1885 qemu#13 0x000055e743160152 in qmp_transaction (dev_list=<optimized out>, has_props=<optimized out>, props=0x55e7467fe2c0, errp=errp@entry=0x7ffddfd1f088) at blockdev.c:2340 qemu#14 0x000055e743287ff5 in qmp_marshal_transaction (args=<optimized out>, ret=<optimized out>, errp=0x7ffddfd1f0f8) at qapi/qapi-commands-transaction.c:44 qemu#15 0x000055e74333de6c in do_qmp_dispatch (errp=0x7ffddfd1f0f0, allow_oob=<optimized out>, request=<optimized out>, cmds=0x55e743c28d60 <qmp_commands>) at qapi/qmp-dispatch.c:132 qemu#16 0x000055e74333de6c in qmp_dispatch (cmds=0x55e743c28d60 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175 qemu#17 0x000055e74325c061 in monitor_qmp_dispatch (mon=0x55e745908030, req=<optimized out>) at monitor/qmp.c:145 qemu#18 0x000055e74325c6fa in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:234 qemu#19 0x000055e743385866 in aio_bh_call (bh=0x55e745807ae0) at util/async.c:117 qemu#20 0x000055e743385866 in aio_bh_poll (ctx=ctx@entry=0x55e7458067a0) at util/async.c:117 qemu#21 0x000055e743388c54 in aio_dispatch (ctx=0x55e7458067a0) at util/aio-posix.c:459 qemu#22 0x000055e743385742 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260 qemu#23 0x00007fb68543e67d in g_main_dispatch (context=0x55e745893a40) at gmain.c:3176 qemu#24 0x00007fb68543e67d in g_main_context_dispatch (context=context@entry=0x55e745893a40) at gmain.c:3829 qemu#25 0x000055e743387d08 in glib_pollfds_poll () at util/main-loop.c:219 qemu#26 0x000055e743387d08 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 qemu#27 0x000055e743387d08 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 qemu#28 0x000055e74316a3c1 in main_loop () at vl.c:1828 qemu#29 0x000055e743016a72 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504 Fix this by not acquiring the AioContext there, and ensuring all paths leading to it have it already acquired (backup_clean()). RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1782111 Signed-off-by: Sergio Lopez <[email protected]> Signed-off-by: Kevin Wolf <[email protected]>
Dirty map addition and removal functions are not acquiring to BDS AioContext, while they may call to code that expects it to be acquired. This may trigger a crash with a stack trace like this one: #0 0x00007f0ef146370f in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 kata-containers#1 0x00007f0ef144db25 in __GI_abort () at abort.c:79 kata-containers#2 0x0000565022294dce in error_exit (err=<optimized out>, msg=msg@entry=0x56502243a730 <__func__.16350> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36 kata-containers#3 0x00005650222950ba in qemu_mutex_unlock_impl (mutex=mutex@entry=0x5650244b0240, file=file@entry=0x565022439adf "util/async.c", line=line@entry=526) at util/qemu-thread-posix.c:108 kata-containers#4 0x0000565022290029 in aio_context_release (ctx=ctx@entry=0x5650244b01e0) at util/async.c:526 kata-containers#5 0x000056502221cd08 in bdrv_can_store_new_dirty_bitmap (bs=bs@entry=0x5650244dc820, name=name@entry=0x56502481d360 "bitmap1", granularity=granularity@entry=65536, errp=errp@entry=0x7fff22831718) at block/dirty-bitmap.c:542 kata-containers#6 0x000056502206ae53 in qmp_block_dirty_bitmap_add (errp=0x7fff22831718, disabled=false, has_disabled=<optimized out>, persistent=<optimized out>, has_persistent=true, granularity=65536, has_granularity=<optimized out>, name=0x56502481d360 "bitmap1", node=<optimized out>) at blockdev.c:2894 kata-containers#7 0x000056502206ae53 in qmp_block_dirty_bitmap_add (node=<optimized out>, name=0x56502481d360 "bitmap1", has_granularity=<optimized out>, granularity=<optimized out>, has_persistent=true, persistent=<optimized out>, has_disabled=false, disabled=false, errp=0x7fff22831718) at blockdev.c:2856 kata-containers#8 0x00005650221847a3 in qmp_marshal_block_dirty_bitmap_add (args=<optimized out>, ret=<optimized out>, errp=0x7fff22831798) at qapi/qapi-commands-block-core.c:651 kata-containers#9 0x0000565022247e6c in do_qmp_dispatch (errp=0x7fff22831790, allow_oob=<optimized out>, request=<optimized out>, cmds=0x565022b32d60 <qmp_commands>) at qapi/qmp-dispatch.c:132 kata-containers#10 0x0000565022247e6c in qmp_dispatch (cmds=0x565022b32d60 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175 qemu#11 0x0000565022166061 in monitor_qmp_dispatch (mon=0x56502450faa0, req=<optimized out>) at monitor/qmp.c:145 qemu#12 0x00005650221666fa in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:234 qemu#13 0x000056502228f866 in aio_bh_call (bh=0x56502440eae0) at util/async.c:117 qemu#14 0x000056502228f866 in aio_bh_poll (ctx=ctx@entry=0x56502440d7a0) at util/async.c:117 qemu#15 0x0000565022292c54 in aio_dispatch (ctx=0x56502440d7a0) at util/aio-posix.c:459 qemu#16 0x000056502228f742 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260 qemu#17 0x00007f0ef5ce667d in g_main_dispatch (context=0x56502449aa40) at gmain.c:3176 qemu#18 0x00007f0ef5ce667d in g_main_context_dispatch (context=context@entry=0x56502449aa40) at gmain.c:3829 qemu#19 0x0000565022291d08 in glib_pollfds_poll () at util/main-loop.c:219 qemu#20 0x0000565022291d08 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 qemu#21 0x0000565022291d08 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 qemu#22 0x00005650220743c1 in main_loop () at vl.c:1828 qemu#23 0x0000565021f20a72 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504 Fix this by acquiring the AioContext at qmp_block_dirty_bitmap_add() and qmp_block_dirty_bitmap_add(). RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1782175 Signed-off-by: Sergio Lopez <[email protected]> Signed-off-by: Kevin Wolf <[email protected]>
external_snapshot_abort() calls to bdrv_set_backing_hd(), which returns state->old_bs to the main AioContext, as it's intended to be used then the BDS is going to be released. As that's not the case when aborting an external snapshot, return it to the AioContext it was before the call. This issue can be triggered by issuing a transaction with two actions, a proper blockdev-snapshot-sync and a bogus one, so the second will trigger a transaction abort. This results in a crash with an stack trace like this one: #0 0x00007fa1048b28df in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 kata-containers#1 0x00007fa10489ccf5 in __GI_abort () at abort.c:79 kata-containers#2 0x00007fa10489cbc9 in __assert_fail_base (fmt=0x7fa104a03300 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5572240b44d8 "bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs)", file=0x557224014d30 "block.c", line=2240, function=<optimized out>) at assert.c:92 kata-containers#3 0x00007fa1048aae96 in __GI___assert_fail (assertion=assertion@entry=0x5572240b44d8 "bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs)", file=file@entry=0x557224014d30 "block.c", line=line@entry=2240, function=function@entry=0x5572240b5d60 <__PRETTY_FUNCTION__.31620> "bdrv_replace_child_noperm") at assert.c:101 kata-containers#4 0x0000557223e631f8 in bdrv_replace_child_noperm (child=0x557225b9c980, new_bs=new_bs@entry=0x557225c42e40) at block.c:2240 kata-containers#5 0x0000557223e68be7 in bdrv_replace_node (from=0x557226951a60, to=0x557225c42e40, errp=0x5572247d6138 <error_abort>) at block.c:4196 kata-containers#6 0x0000557223d069c4 in external_snapshot_abort (common=0x557225d7e170) at blockdev.c:1731 kata-containers#7 0x0000557223d069c4 in external_snapshot_abort (common=0x557225d7e170) at blockdev.c:1717 kata-containers#8 0x0000557223d09013 in qmp_transaction (dev_list=<optimized out>, has_props=<optimized out>, props=0x557225cc7d70, errp=errp@entry=0x7ffe704c0c98) at blockdev.c:2360 kata-containers#9 0x0000557223e32085 in qmp_marshal_transaction (args=<optimized out>, ret=<optimized out>, errp=0x7ffe704c0d08) at qapi/qapi-commands-transaction.c:44 kata-containers#10 0x0000557223ee798c in do_qmp_dispatch (errp=0x7ffe704c0d00, allow_oob=<optimized out>, request=<optimized out>, cmds=0x5572247d3cc0 <qmp_commands>) at qapi/qmp-dispatch.c:132 qemu#11 0x0000557223ee798c in qmp_dispatch (cmds=0x5572247d3cc0 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175 qemu#12 0x0000557223e06141 in monitor_qmp_dispatch (mon=0x557225c69ff0, req=<optimized out>) at monitor/qmp.c:120 qemu#13 0x0000557223e0678a in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:209 qemu#14 0x0000557223f2f366 in aio_bh_call (bh=0x557225b9dc60) at util/async.c:117 qemu#15 0x0000557223f2f366 in aio_bh_poll (ctx=ctx@entry=0x557225b9c840) at util/async.c:117 qemu#16 0x0000557223f32754 in aio_dispatch (ctx=0x557225b9c840) at util/aio-posix.c:459 qemu#17 0x0000557223f2f242 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260 qemu#18 0x00007fa10913467d in g_main_dispatch (context=0x557225c28e80) at gmain.c:3176 qemu#19 0x00007fa10913467d in g_main_context_dispatch (context=context@entry=0x557225c28e80) at gmain.c:3829 qemu#20 0x0000557223f31808 in glib_pollfds_poll () at util/main-loop.c:219 qemu#21 0x0000557223f31808 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 qemu#22 0x0000557223f31808 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 qemu#23 0x0000557223d13201 in main_loop () at vl.c:1828 qemu#24 0x0000557223bbfb82 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504 RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1779036 Signed-off-by: Sergio Lopez <[email protected]> Signed-off-by: Kevin Wolf <[email protected]>
If the multifd_send_threads is not created when migration is failed, multifd_save_cleanup would be called twice. In this senario, the multifd_send_state is accessed after it has been released, the result is that the source VM is crashing down. Here is the coredump stack: Program received signal SIGSEGV, Segmentation fault. 0x00005629333a78ef in multifd_send_terminate_threads (err=err@entry=0x0) at migration/ram.c:1012 1012 MultiFDSendParams *p = &multifd_send_state->params[i]; #0 0x00005629333a78ef in multifd_send_terminate_threads (err=err@entry=0x0) at migration/ram.c:1012 kata-containers#1 0x00005629333ab8a9 in multifd_save_cleanup () at migration/ram.c:1028 kata-containers#2 0x00005629333abaea in multifd_new_send_channel_async (task=0x562935450e70, opaque=<optimized out>) at migration/ram.c:1202 kata-containers#3 0x000056293373a562 in qio_task_complete (task=task@entry=0x562935450e70) at io/task.c:196 kata-containers#4 0x000056293373a6e0 in qio_task_thread_result (opaque=0x562935450e70) at io/task.c:111 kata-containers#5 0x00007f475d4d75a7 in g_idle_dispatch () from /usr/lib64/libglib-2.0.so.0 kata-containers#6 0x00007f475d4da9a9 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 kata-containers#7 0x0000562933785b33 in glib_pollfds_poll () at util/main-loop.c:219 kata-containers#8 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 kata-containers#9 main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518 kata-containers#10 0x00005629334c5acf in main_loop () at vl.c:1810 qemu#11 0x000056293334d7bb in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4471 If the multifd_send_threads is not created when migration is failed. In this senario, we don't call multifd_save_cleanup in multifd_new_send_channel_async. Signed-off-by: Zhimin Feng <[email protected]> Reviewed-by: Juan Quintela <[email protected]> Signed-off-by: Juan Quintela <[email protected]>
It's not a big deal, but 'check qtest-ppc/ppc64' runs fail if sanitizers is enabled. The memory leak stack is as follow: Direct leak of 128 byte(s) in 4 object(s) allocated from: #0 0x7f11756f5970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) kata-containers#1 0x7f1174f2549d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) kata-containers#2 0x556af05aa7da in mm_fw_cfg_init /mnt/sdb/qemu/tests/libqos/fw_cfg.c:119 kata-containers#3 0x556af059f4f5 in read_boot_order_pmac /mnt/sdb/qemu/tests/boot-order-test.c:137 kata-containers#4 0x556af059efe2 in test_a_boot_order /mnt/sdb/qemu/tests/boot-order-test.c:47 kata-containers#5 0x556af059f2c0 in test_boot_orders /mnt/sdb/qemu/tests/boot-order-test.c:59 kata-containers#6 0x556af059f52d in test_pmac_oldworld_boot_order /mnt/sdb/qemu/tests/boot-order-test.c:152 kata-containers#7 0x7f1174f46cb9 (/lib64/libglib-2.0.so.0+0x73cb9) kata-containers#8 0x7f1174f46b73 (/lib64/libglib-2.0.so.0+0x73b73) kata-containers#9 0x7f1174f46b73 (/lib64/libglib-2.0.so.0+0x73b73) kata-containers#10 0x7f1174f46f71 in g_test_run_suite (/lib64/libglib-2.0.so.0+0x73f71) qemu#11 0x7f1174f46f94 in g_test_run (/lib64/libglib-2.0.so.0+0x73f94) Reported-by: Euler Robot <[email protected]> Signed-off-by: Pan Nengyuan <[email protected]> Message-Id: <[email protected]> Reviewed-by: Thomas Huth <[email protected]> Signed-off-by: Thomas Huth <[email protected]>
When remove dup_fd in monitor_fdset_dup_fd_find_remove function, we need to free mon_fdset_fd_dup. ASAN shows memory leak stack: Direct leak of 96 byte(s) in 3 object(s) allocated from: #0 0xfffd37b033b3 in __interceptor_calloc (/lib64/libasan.so.4+0xd33b3) kata-containers#1 0xfffd375c71cb in g_malloc0 (/lib64/libglib-2.0.so.0+0x571cb) kata-containers#2 0xaaae25bf1c17 in monitor_fdset_dup_fd_add /qemu/monitor/misc.c:1724 kata-containers#3 0xaaae265cfd8f in qemu_open /qemu/util/osdep.c:315 kata-containers#4 0xaaae264e2b2b in qmp_chardev_open_file_source /qemu/chardev/char-fd.c:122 kata-containers#5 0xaaae264e47cf in qmp_chardev_open_file /qemu/chardev/char-file.c:81 kata-containers#6 0xaaae264e118b in qemu_char_open /qemu/chardev/char.c:237 kata-containers#7 0xaaae264e118b in qemu_chardev_new /qemu/chardev/char.c:964 kata-containers#8 0xaaae264e1543 in qemu_chr_new_from_opts /qemu/chardev/char.c:680 kata-containers#9 0xaaae25e12e0f in chardev_init_func /qemu/vl.c:2083 kata-containers#10 0xaaae26603823 in qemu_opts_foreach /qemu/util/qemu-option.c:1170 qemu#11 0xaaae258c9787 in main /qemu/vl.c:4089 qemu#12 0xfffd35b80b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f) qemu#13 0xaaae258d7b63 (/qemu/build/aarch64-softmmu/qemu-system-aarch64+0x8b7b63) Reported-by: Euler Robot <[email protected]> Signed-off-by: Chen Qun <[email protected]> Reviewed-by: Marc-André Lureau <[email protected]> Message-Id: <[email protected]> Signed-off-by: Laurent Vivier <[email protected]>
If we call the qmp 'query-block' while qemu is working on 'block-commit', it will cause memleaks, the memory leak stack is as follow: Indirect leak of 12360 byte(s) in 3 object(s) allocated from: #0 0x7f80f0b6d970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) kata-containers#1 0x7f80ee86049d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) kata-containers#2 0x55ea95b5bb67 in qdict_new /mnt/sdb/qemu-4.2.0-rc0/qobject/qdict.c:29 kata-containers#3 0x55ea956cd043 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6427 kata-containers#4 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399 kata-containers#5 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399 kata-containers#6 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399 kata-containers#7 0x55ea958818ea in bdrv_block_device_info /mnt/sdb/qemu-4.2.0-rc0/block/qapi.c:56 kata-containers#8 0x55ea958879de in bdrv_query_info /mnt/sdb/qemu-4.2.0-rc0/block/qapi.c:392 kata-containers#9 0x55ea9588b58f in qmp_query_block /mnt/sdb/qemu-4.2.0-rc0/block/qapi.c:578 kata-containers#10 0x55ea95567392 in qmp_marshal_query_block qapi/qapi-commands-block-core.c:95 Indirect leak of 4120 byte(s) in 1 object(s) allocated from: #0 0x7f80f0b6d970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) kata-containers#1 0x7f80ee86049d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) kata-containers#2 0x55ea95b5bb67 in qdict_new /mnt/sdb/qemu-4.2.0-rc0/qobject/qdict.c:29 kata-containers#3 0x55ea956cd043 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6427 kata-containers#4 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399 kata-containers#5 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399 kata-containers#6 0x55ea9569f301 in bdrv_backing_attach /mnt/sdb/qemu-4.2.0-rc0/block.c:1064 kata-containers#7 0x55ea956a99dd in bdrv_replace_child_noperm /mnt/sdb/qemu-4.2.0-rc0/block.c:2283 kata-containers#8 0x55ea956b9b53 in bdrv_replace_node /mnt/sdb/qemu-4.2.0-rc0/block.c:4196 kata-containers#9 0x55ea956b9e49 in bdrv_append /mnt/sdb/qemu-4.2.0-rc0/block.c:4236 kata-containers#10 0x55ea958c3472 in commit_start /mnt/sdb/qemu-4.2.0-rc0/block/commit.c:306 qemu#11 0x55ea94b68ab0 in qmp_block_commit /mnt/sdb/qemu-4.2.0-rc0/blockdev.c:3459 qemu#12 0x55ea9556a7a7 in qmp_marshal_block_commit qapi/qapi-commands-block-core.c:407 Fixes: bb808d5 Reported-by: Euler Robot <[email protected]> Signed-off-by: Pan Nengyuan <[email protected]> Message-id: [email protected] Signed-off-by: Max Reitz <[email protected]>
gtk_widget_get_window() returns NULL if the widget's window is not realized, and QEMU crashes. Example under gtk 3.22.30 (mate 1.20.1): qemu-system-x86_64: Gdk: gdk_window_get_origin: assertion 'GDK_IS_WINDOW (window)' failed (gdb) bt #0 0x00007ffff496cf70 in gdk_window_get_origin () from /usr/lib64/libgdk-3.so.0 kata-containers#1 0x00007ffff49582a0 in gdk_display_get_monitor_at_window () from /usr/lib64/libgdk-3.so.0 kata-containers#2 0x0000555555bb73e2 in gd_refresh_rate_millihz (window=0x5555579d6280) at ui/gtk.c:1973 kata-containers#3 gd_vc_gfx_init (view_menu=0x5555579f0590, group=0x0, idx=0, con=<optimized out>, vc=0x5555579d4a90, s=0x5555579d49f0) at ui/gtk.c:2048 kata-containers#4 gd_create_menu_view (s=0x5555579d49f0) at ui/gtk.c:2149 kata-containers#5 gd_create_menus (s=0x5555579d49f0) at ui/gtk.c:2188 kata-containers#6 gtk_display_init (ds=<optimized out>, opts=0x55555661ed80 <dpy>) at ui/gtk.c:2256 kata-containers#7 0x000055555583d5a0 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4358 Fixes: c4c0092 and 28b58f1 (display/gtk: get proper refreshrate) Reported-by: Jan Kiszka <[email protected]> Signed-off-by: Philippe Mathieu-Daudé <[email protected]> Tested-by: Jan Kiszka <[email protected]> Message-id: [email protected] Signed-off-by: Gerd Hoffmann <[email protected]>
It's a mismatch between g_strsplit and g_free, it will cause a memory leak as follow: [root@localhost]# ./aarch64-softmmu/qemu-system-aarch64 -accel help Accelerators supported in QEMU binary: tcg kvm ================================================================= ==1207900==ERROR: LeakSanitizer: detected memory leaks Direct leak of 8 byte(s) in 2 object(s) allocated from: #0 0xfffd700231cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb) kata-containers#1 0xfffd6ec57163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163) kata-containers#2 0xfffd6ec724d7 in g_strndup (/lib64/libglib-2.0.so.0+0x724d7) kata-containers#3 0xfffd6ec73d3f in g_strsplit (/lib64/libglib-2.0.so.0+0x73d3f) kata-containers#4 0xaaab66be5077 in main /mnt/sdc/qemu-master/qemu-4.2.0-rc0/vl.c:3517 kata-containers#5 0xfffd6e140b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f) kata-containers#6 0xaaab66bf0f53 (./build/aarch64-softmmu/qemu-system-aarch64+0x8a0f53) Direct leak of 2 byte(s) in 2 object(s) allocated from: #0 0xfffd700231cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb) kata-containers#1 0xfffd6ec57163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163) kata-containers#2 0xfffd6ec7243b in g_strdup (/lib64/libglib-2.0.so.0+0x7243b) kata-containers#3 0xfffd6ec73e6f in g_strsplit (/lib64/libglib-2.0.so.0+0x73e6f) kata-containers#4 0xaaab66be5077 in main /mnt/sdc/qemu-master/qemu-4.2.0-rc0/vl.c:3517 kata-containers#5 0xfffd6e140b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f) kata-containers#6 0xaaab66bf0f53 (./build/aarch64-softmmu/qemu-system-aarch64+0x8a0f53) Reported-by: Euler Robot <[email protected]> Signed-off-by: Pan Nengyuan <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
It's easy to reproduce as follow: virsh qemu-monitor-command vm1 --pretty '{"execute": "device-list-properties", "arguments":{"typename":"exynos4210.uart"}}' ASAN shows memory leak stack: kata-containers#1 0xfffd896d71cb in g_malloc0 (/lib64/libglib-2.0.so.0+0x571cb) kata-containers#2 0xaaad270beee3 in timer_new_full /qemu/include/qemu/timer.h:530 kata-containers#3 0xaaad270beee3 in timer_new /qemu/include/qemu/timer.h:551 kata-containers#4 0xaaad270beee3 in timer_new_ns /qemu/include/qemu/timer.h:569 kata-containers#5 0xaaad270beee3 in exynos4210_uart_init /qemu/hw/char/exynos4210_uart.c:677 kata-containers#6 0xaaad275c8f4f in object_initialize_with_type /qemu/qom/object.c:516 kata-containers#7 0xaaad275c91bb in object_new_with_type /qemu/qom/object.c:684 kata-containers#8 0xaaad2755df2f in qmp_device_list_properties /qemu/qom/qom-qmp-cmds.c:152 Reported-by: Euler Robot <[email protected]> Signed-off-by: Chen Qun <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Message-id: [email protected] Signed-off-by: Peter Maydell <[email protected]>
Coverity reports: *** CID 1419387: Memory - illegal accesses (OVERRUN) /hw/hppa/dino.c: 267 in dino_chip_read_with_attrs() 261 val = s->ilr & s->imr & s->icr; 262 break; 263 case DINO_TOC_ADDR: 264 val = s->toc_addr; 265 break; 266 case DINO_GMASK ... DINO_TLTIM: >>> CID 1419387: Memory - illegal accesses (OVERRUN) >>> Overrunning array "s->reg800" of 12 4-byte elements at element index 12 (byte offset 48) using index "(addr - 2048UL) / 4UL" (which evaluates to 12). 267 val = s->reg800[(addr - DINO_GMASK) / 4]; 268 if (addr == DINO_PAMR) { 269 val &= ~0x01; /* LSB is hardwired to 0 */ 270 } 271 if (addr == DINO_MLTIM) { 272 val &= ~0x07; /* 3 LSB are hardwired to 0 */ *** CID 1419393: Memory - corruptions (OVERRUN) /hw/hppa/dino.c: 363 in dino_chip_write_with_attrs() 357 /* These registers are read-only. */ 358 break; 359 360 case DINO_GMASK ... DINO_TLTIM: 361 i = (addr - DINO_GMASK) / 4; 362 val &= reg800_keep_bits[i]; >>> CID 1419393: Memory - corruptions (OVERRUN) >>> Overrunning array "s->reg800" of 12 4-byte elements at element index 12 (byte offset 48) using index "i" (which evaluates to 12). 363 s->reg800[i] = val; 364 break; 365 366 default: 367 /* Controlled by dino_chip_mem_valid above. */ 368 g_assert_not_reached(); *** CID 1419394: Memory - illegal accesses (OVERRUN) /hw/hppa/dino.c: 362 in dino_chip_write_with_attrs() 356 case DINO_IRR1: 357 /* These registers are read-only. */ 358 break; 359 360 case DINO_GMASK ... DINO_TLTIM: 361 i = (addr - DINO_GMASK) / 4; >>> CID 1419394: Memory - illegal accesses (OVERRUN) >>> Overrunning array "reg800_keep_bits" of 12 4-byte elements at element index 12 (byte offset 48) using index "i" (which evaluates to 12). 362 val &= reg800_keep_bits[i]; 363 s->reg800[i] = val; 364 break; 365 366 default: 367 /* Controlled by dino_chip_mem_valid above. */ Indeed the array should contain 13 entries, the undocumented register 0x82c is missing. Fix by increasing the array size and adding the missing register. CID 1419387 can be verified with: $ echo x 0xfff80830 | hppa-softmmu/qemu-system-hppa -S -monitor stdio -display none QEMU 4.2.50 monitor - type 'help' for more information (qemu) x 0xfff80830 qemu/hw/hppa/dino.c:267:15: runtime error: index 12 out of bounds for type 'uint32_t [12]' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /home/phil/source/qemu/hw/hppa/dino.c:267:15 in 00000000fff80830: 0x00000000 and CID 1419393/1419394 with: $ echo writeb 0xfff80830 0x69 \ | hppa-softmmu/qemu-system-hppa -S -accel qtest -qtest stdio -display none [I 1581634452.654113] OPENED [R +4.105415] writeb 0xfff80830 0x69 qemu/hw/hppa/dino.c:362:16: runtime error: index 12 out of bounds for type 'const uint32_t [12]' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior qemu/hw/hppa/dino.c:362:16 in ================================================================= ==29607==ERROR: AddressSanitizer: global-buffer-overflow on address 0x5577dae32f30 at pc 0x5577d93f2463 bp 0x7ffd97ea11b0 sp 0x7ffd97ea11a8 READ of size 4 at 0x5577dae32f30 thread T0 #0 0x5577d93f2462 in dino_chip_write_with_attrs qemu/hw/hppa/dino.c:362:16 kata-containers#1 0x5577d9025664 in memory_region_write_with_attrs_accessor qemu/memory.c:503:12 kata-containers#2 0x5577d9024920 in access_with_adjusted_size qemu/memory.c:539:18 kata-containers#3 0x5577d9023608 in memory_region_dispatch_write qemu/memory.c:1482:13 kata-containers#4 0x5577d8e3177a in flatview_write_continue qemu/exec.c:3166:23 kata-containers#5 0x5577d8e20357 in flatview_write qemu/exec.c:3206:14 kata-containers#6 0x5577d8e1fef4 in address_space_write qemu/exec.c:3296:18 kata-containers#7 0x5577d8e20693 in address_space_rw qemu/exec.c:3306:16 kata-containers#8 0x5577d9011595 in qtest_process_command qemu/qtest.c:432:13 kata-containers#9 0x5577d900d19f in qtest_process_inbuf qemu/qtest.c:705:9 kata-containers#10 0x5577d900ca22 in qtest_read qemu/qtest.c:717:5 qemu#11 0x5577da8c4254 in qemu_chr_be_write_impl qemu/chardev/char.c:183:9 qemu#12 0x5577da8c430c in qemu_chr_be_write qemu/chardev/char.c:195:9 qemu#13 0x5577da8cf587 in fd_chr_read qemu/chardev/char-fd.c:68:9 qemu#14 0x5577da9836cd in qio_channel_fd_source_dispatch qemu/io/channel-watch.c:84:12 qemu#15 0x7faf44509ecc in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x4fecc) qemu#16 0x5577dab75f96 in glib_pollfds_poll qemu/util/main-loop.c:219:9 qemu#17 0x5577dab74797 in os_host_main_loop_wait qemu/util/main-loop.c:242:5 qemu#18 0x5577dab7435a in main_loop_wait qemu/util/main-loop.c:518:11 qemu#19 0x5577d9514eb3 in main_loop qemu/vl.c:1682:9 qemu#20 0x5577d950699d in main qemu/vl.c:4450:5 qemu#21 0x7faf41a87f42 in __libc_start_main (/lib64/libc.so.6+0x23f42) qemu#22 0x5577d8cd4d4d in _start (qemu/build/sanitizer/hppa-softmmu/qemu-system-hppa+0x1256d4d) 0x5577dae32f30 is located 0 bytes to the right of global variable 'reg800_keep_bits' defined in 'qemu/hw/hppa/dino.c:87:23' (0x5577dae32f00) of size 48 SUMMARY: AddressSanitizer: global-buffer-overflow qemu/hw/hppa/dino.c:362:16 in dino_chip_write_with_attrs Shadow bytes around the buggy address: 0x0aaf7b5be590: 00 f9 f9 f9 f9 f9 f9 f9 00 02 f9 f9 f9 f9 f9 f9 0x0aaf7b5be5a0: 07 f9 f9 f9 f9 f9 f9 f9 07 f9 f9 f9 f9 f9 f9 f9 0x0aaf7b5be5b0: 07 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00 0x0aaf7b5be5c0: 00 00 00 02 f9 f9 f9 f9 00 00 00 00 00 00 00 00 0x0aaf7b5be5d0: 00 00 00 00 00 00 00 00 00 00 00 03 f9 f9 f9 f9 =>0x0aaf7b5be5e0: 00 00 00 00 00 00[f9]f9 f9 f9 f9 f9 00 00 00 00 0x0aaf7b5be5f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0aaf7b5be600: 00 00 01 f9 f9 f9 f9 f9 00 00 00 00 07 f9 f9 f9 0x0aaf7b5be610: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00 0x0aaf7b5be620: 00 00 00 05 f9 f9 f9 f9 00 00 00 00 07 f9 f9 f9 0x0aaf7b5be630: f9 f9 f9 f9 00 00 f9 f9 f9 f9 f9 f9 07 f9 f9 f9 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==29607==ABORTING Fixes: Covertiy CID 1419387 / 1419393 / 1419394 (commit 1809259) Acked-by: Helge Deller <[email protected]> Signed-off-by: Philippe Mathieu-Daudé <[email protected]> Message-Id: <[email protected]> Signed-off-by: Richard Henderson <[email protected]>
'fdt' forgot to clean both e500 and pnv when we call 'system_reset' on ppc, this patch fix it. The leak stacks are as follow: Direct leak of 4194304 byte(s) in 4 object(s) allocated from: #0 0x7fafe37dd970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) kata-containers#1 0x7fafe2e3149d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) kata-containers#2 0x561876f7f80d in create_device_tree /mnt/sdb/qemu-new/qemu/device_tree.c:40 kata-containers#3 0x561876b7ac29 in ppce500_load_device_tree /mnt/sdb/qemu-new/qemu/hw/ppc/e500.c:364 kata-containers#4 0x561876b7f437 in ppce500_reset_device_tree /mnt/sdb/qemu-new/qemu/hw/ppc/e500.c:617 kata-containers#5 0x56187718b1ae in qemu_devices_reset /mnt/sdb/qemu-new/qemu/hw/core/reset.c:69 kata-containers#6 0x561876f6938d in qemu_system_reset /mnt/sdb/qemu-new/qemu/vl.c:1412 kata-containers#7 0x561876f6a25b in main_loop_should_exit /mnt/sdb/qemu-new/qemu/vl.c:1645 kata-containers#8 0x561876f6a398 in main_loop /mnt/sdb/qemu-new/qemu/vl.c:1679 kata-containers#9 0x561876f7da8e in main /mnt/sdb/qemu-new/qemu/vl.c:4438 kata-containers#10 0x7fafde16b812 in __libc_start_main ../csu/libc-start.c:308 qemu#11 0x5618765c055d in _start (/mnt/sdb/qemu-new/qemu/build/ppc64-softmmu/qemu-system-ppc64+0x2b1555d) Direct leak of 1048576 byte(s) in 1 object(s) allocated from: #0 0x7fc0a6f1b970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) kata-containers#1 0x7fc0a656f49d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) kata-containers#2 0x55eb05acd2ca in pnv_dt_create /mnt/sdb/qemu-new/qemu/hw/ppc/pnv.c:507 kata-containers#3 0x55eb05ace5bf in pnv_reset /mnt/sdb/qemu-new/qemu/hw/ppc/pnv.c:578 kata-containers#4 0x55eb05f2f395 in qemu_system_reset /mnt/sdb/qemu-new/qemu/vl.c:1410 kata-containers#5 0x55eb05f43850 in main /mnt/sdb/qemu-new/qemu/vl.c:4403 kata-containers#6 0x7fc0a18a9812 in __libc_start_main ../csu/libc-start.c:308 kata-containers#7 0x55eb0558655d in _start (/mnt/sdb/qemu-new/qemu/build/ppc64-softmmu/qemu-system-ppc64+0x2b1555d) Reported-by: Euler Robot <[email protected]> Signed-off-by: Pan Nengyuan <[email protected]> Message-Id: <[email protected]> Reviewed-by: Greg Kurz <[email protected]> Signed-off-by: David Gibson <[email protected]>
Similar to other virtio device(https://patchwork.kernel.org/patch/11399237/), virtio queues forgot to delete in unrealize, and aslo error path in realize, this patch fix these memleaks, the leak stack is as follow: Direct leak of 57344 byte(s) in 1 object(s) allocated from: #0 0x7f15784fb970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) kata-containers#1 0x7f157790849d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) kata-containers#2 0x55587a1bf859 in virtio_add_queue /mnt/sdb/qemu-new/qemu_test/qemu/hw/virtio/virtio.c:2333 kata-containers#3 0x55587a2071d5 in vuf_device_realize /mnt/sdb/qemu-new/qemu_test/qemu/hw/virtio/vhost-user-fs.c:212 kata-containers#4 0x55587a1ae360 in virtio_device_realize /mnt/sdb/qemu-new/qemu_test/qemu/hw/virtio/virtio.c:3531 kata-containers#5 0x55587a63fb7b in device_set_realized /mnt/sdb/qemu-new/qemu_test/qemu/hw/core/qdev.c:891 kata-containers#6 0x55587acf03f5 in property_set_bool /mnt/sdb/qemu-new/qemu_test/qemu/qom/object.c:2238 kata-containers#7 0x55587acfce0d in object_property_set_qobject /mnt/sdb/qemu-new/qemu_test/qemu/qom/qom-qobject.c:26 kata-containers#8 0x55587acf5c8c in object_property_set_bool /mnt/sdb/qemu-new/qemu_test/qemu/qom/object.c:1390 kata-containers#9 0x55587a8e22a2 in pci_qdev_realize /mnt/sdb/qemu-new/qemu_test/qemu/hw/pci/pci.c:2095 kata-containers#10 0x55587a63fb7b in device_set_realized /mnt/sdb/qemu-new/qemu_test/qemu/hw/core/qdev.c:891 qemu#11 0x55587acf03f5 in property_set_bool /mnt/sdb/qemu-new/qemu_test/qemu/qom/object.c:2238 qemu#12 0x55587acfce0d in object_property_set_qobject /mnt/sdb/qemu-new/qemu_test/qemu/qom/qom-qobject.c:26 qemu#13 0x55587acf5c8c in object_property_set_bool /mnt/sdb/qemu-new/qemu_test/qemu/qom/object.c:1390 qemu#14 0x55587a496d65 in qdev_device_add /mnt/sdb/qemu-new/qemu_test/qemu/qdev-monitor.c:679 Reported-by: Euler Robot <[email protected]> Signed-off-by: Pan Nengyuan <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Stefan Hajnoczi <[email protected]> Message-Id: <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
Similar to other virtio-deivces, ctrl_vq forgot to delete in virtio_crypto_device_unrealize, this patch fix it. This device has aleardy maintained vq pointers. Thus, we use the new virtio_delete_queue function directly to do the cleanup. The leak stack: Direct leak of 10752 byte(s) in 3 object(s) allocated from: #0 0x7f4c024b1970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) kata-containers#1 0x7f4c018be49d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) kata-containers#2 0x55a2f8017279 in virtio_add_queue /mnt/sdb/qemu-new/qemu_test/qemu/hw/virtio/virtio.c:2333 kata-containers#3 0x55a2f8057035 in virtio_crypto_device_realize /mnt/sdb/qemu-new/qemu_test/qemu/hw/virtio/virtio-crypto.c:814 kata-containers#4 0x55a2f8005d80 in virtio_device_realize /mnt/sdb/qemu-new/qemu_test/qemu/hw/virtio/virtio.c:3531 kata-containers#5 0x55a2f8497d1b in device_set_realized /mnt/sdb/qemu-new/qemu_test/qemu/hw/core/qdev.c:891 kata-containers#6 0x55a2f8b48595 in property_set_bool /mnt/sdb/qemu-new/qemu_test/qemu/qom/object.c:2238 kata-containers#7 0x55a2f8b54fad in object_property_set_qobject /mnt/sdb/qemu-new/qemu_test/qemu/qom/qom-qobject.c:26 kata-containers#8 0x55a2f8b4de2c in object_property_set_bool /mnt/sdb/qemu-new/qemu_test/qemu/qom/object.c:1390 kata-containers#9 0x55a2f80609c9 in virtio_crypto_pci_realize /mnt/sdb/qemu-new/qemu_test/qemu/hw/virtio/virtio-crypto-pci.c:58 Reported-by: Euler Robot <[email protected]> Signed-off-by: Pan Nengyuan <[email protected]> Cc: "Gonglei (Arei)" <[email protected]> Message-Id: <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
virtio queues forgot to delete in unrealize, and aslo error path in realize, this patch fix these memleaks, the leak stack is as follow: Direct leak of 114688 byte(s) in 16 object(s) allocated from: #0 0x7f24024fdbf0 in calloc (/lib64/libasan.so.3+0xcabf0) kata-containers#1 0x7f2401642015 in g_malloc0 (/lib64/libglib-2.0.so.0+0x50015) kata-containers#2 0x55ad175a6447 in virtio_add_queue /mnt/sdb/qemu/hw/virtio/virtio.c:2327 kata-containers#3 0x55ad17570cf9 in vhost_user_blk_device_realize /mnt/sdb/qemu/hw/block/vhost-user-blk.c:419 kata-containers#4 0x55ad175a3707 in virtio_device_realize /mnt/sdb/qemu/hw/virtio/virtio.c:3509 kata-containers#5 0x55ad176ad0d1 in device_set_realized /mnt/sdb/qemu/hw/core/qdev.c:876 kata-containers#6 0x55ad1781ff9d in property_set_bool /mnt/sdb/qemu/qom/object.c:2080 kata-containers#7 0x55ad178245ae in object_property_set_qobject /mnt/sdb/qemu/qom/qom-qobject.c:26 kata-containers#8 0x55ad17821eb4 in object_property_set_bool /mnt/sdb/qemu/qom/object.c:1338 kata-containers#9 0x55ad177aeed7 in virtio_pci_realize /mnt/sdb/qemu/hw/virtio/virtio-pci.c:1801 Reported-by: Euler Robot <[email protected]> Signed-off-by: Pan Nengyuan <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Message-Id: <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
In currently implementation there will be a memory leak when nbd_client_connect() returns error status. Here is an easy way to reproduce: 1. run qemu-iotests as follow and check the result with asan: ./check -raw 143 Following is the asan output backtrack: Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f629688a560 in calloc (/usr/lib64/libasan.so.3+0xc7560) kata-containers#1 0x7f6295e7e015 in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x50015) kata-containers#2 0x56281dab4642 in qobject_input_start_struct /mnt/sdb/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:295 kata-containers#3 0x56281dab1a04 in visit_start_struct /mnt/sdb/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:49 kata-containers#4 0x56281dad1827 in visit_type_SocketAddress qapi/qapi-visit-sockets.c:386 kata-containers#5 0x56281da8062f in nbd_config /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1716 kata-containers#6 0x56281da8062f in nbd_process_options /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1829 kata-containers#7 0x56281da8062f in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873 Direct leak of 15 byte(s) in 1 object(s) allocated from: #0 0x7f629688a3a0 in malloc (/usr/lib64/libasan.so.3+0xc73a0) kata-containers#1 0x7f6295e7dfbd in g_malloc (/usr/lib64/libglib-2.0.so.0+0x4ffbd) kata-containers#2 0x7f6295e96ace in g_strdup (/usr/lib64/libglib-2.0.so.0+0x68ace) kata-containers#3 0x56281da804ac in nbd_process_options /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1834 kata-containers#4 0x56281da804ac in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873 Indirect leak of 24 byte(s) in 1 object(s) allocated from: #0 0x7f629688a3a0 in malloc (/usr/lib64/libasan.so.3+0xc73a0) kata-containers#1 0x7f6295e7dfbd in g_malloc (/usr/lib64/libglib-2.0.so.0+0x4ffbd) kata-containers#2 0x7f6295e96ace in g_strdup (/usr/lib64/libglib-2.0.so.0+0x68ace) kata-containers#3 0x56281dab41a3 in qobject_input_type_str_keyval /mnt/sdb/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:536 kata-containers#4 0x56281dab2ee9 in visit_type_str /mnt/sdb/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:297 kata-containers#5 0x56281dad0fa1 in visit_type_UnixSocketAddress_members qapi/qapi-visit-sockets.c:141 kata-containers#6 0x56281dad17b6 in visit_type_SocketAddress_members qapi/qapi-visit-sockets.c:366 kata-containers#7 0x56281dad186a in visit_type_SocketAddress qapi/qapi-visit-sockets.c:393 kata-containers#8 0x56281da8062f in nbd_config /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1716 kata-containers#9 0x56281da8062f in nbd_process_options /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1829 kata-containers#10 0x56281da8062f in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873 Fixes: 8f071c9 Reported-by: Euler Robot <[email protected]> Signed-off-by: Pan Nengyuan <[email protected]> Reviewed-by: Vladimir Sementsov-Ogievskiy <[email protected]> Cc: qemu-stable <[email protected]> Cc: Vladimir Sementsov-Ogievskiy <[email protected]> Message-Id: <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Signed-off-by: Eric Blake <[email protected]>
The virtqueue code sets up MemoryRegionCaches to access the virtqueue guest RAM data structures. The code currently assumes that VRingMemoryRegionCaches is initialized before device emulation code accesses the virtqueue. An assertion will fail in vring_get_region_caches() when this is not true. Device fuzzing found a case where this assumption is false (see below). Virtqueue guest RAM addresses can also be changed from a vCPU thread while an IOThread is accessing the virtqueue. This breaks the same assumption but this time the caches could become invalid partway through the virtqueue code. The code fetches the caches RCU pointer multiple times so we will need to validate the pointer every time it is fetched. Add checks each time we call vring_get_region_caches() and treat invalid caches as a nop: memory stores are ignored and memory reads return 0. The fuzz test failure is as follows: $ qemu -M pc -device virtio-blk-pci,id=drv0,drive=drive0,addr=4.0 \ -drive if=none,id=drive0,file=null-co://,format=raw,auto-read-only=off \ -drive if=none,id=drive1,file=null-co://,file.read-zeroes=on,format=raw \ -display none \ -qtest stdio endianness outl 0xcf8 0x80002020 outl 0xcfc 0xe0000000 outl 0xcf8 0x80002004 outw 0xcfc 0x7 write 0xe0000000 0x24 0x00ffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffab5cffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffabffffffab0000000001 inb 0x4 writew 0xe000001c 0x1 write 0xe0000014 0x1 0x0d The following error message is produced: qemu-system-x86_64: /home/stefanha/qemu/hw/virtio/virtio.c:286: vring_get_region_caches: Assertion `caches != NULL' failed. The backtrace looks like this: #0 0x00007ffff5520625 in raise () at /lib64/libc.so.6 kata-containers#1 0x00007ffff55098d9 in abort () at /lib64/libc.so.6 kata-containers#2 0x00007ffff55097a9 in _nl_load_domain.cold () at /lib64/libc.so.6 kata-containers#3 0x00007ffff5518a66 in annobin_assert.c_end () at /lib64/libc.so.6 kata-containers#4 0x00005555559073da in vring_get_region_caches (vq=<optimized out>) at qemu/hw/virtio/virtio.c:286 kata-containers#5 vring_get_region_caches (vq=<optimized out>) at qemu/hw/virtio/virtio.c:283 kata-containers#6 0x000055555590818d in vring_used_flags_set_bit (mask=1, vq=0x5555575ceea0) at qemu/hw/virtio/virtio.c:398 kata-containers#7 virtio_queue_split_set_notification (enable=0, vq=0x5555575ceea0) at qemu/hw/virtio/virtio.c:398 kata-containers#8 virtio_queue_set_notification (vq=vq@entry=0x5555575ceea0, enable=enable@entry=0) at qemu/hw/virtio/virtio.c:451 kata-containers#9 0x0000555555908512 in virtio_queue_set_notification (vq=vq@entry=0x5555575ceea0, enable=enable@entry=0) at qemu/hw/virtio/virtio.c:444 kata-containers#10 0x00005555558c697a in virtio_blk_handle_vq (s=0x5555575c57e0, vq=0x5555575ceea0) at qemu/hw/block/virtio-blk.c:775 qemu#11 0x0000555555907836 in virtio_queue_notify_aio_vq (vq=0x5555575ceea0) at qemu/hw/virtio/virtio.c:2244 qemu#12 0x0000555555cb5dd7 in aio_dispatch_handlers (ctx=ctx@entry=0x55555671a420) at util/aio-posix.c:429 qemu#13 0x0000555555cb67a8 in aio_dispatch (ctx=0x55555671a420) at util/aio-posix.c:460 qemu#14 0x0000555555cb307e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260 qemu#15 0x00007ffff7bbc510 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 qemu#16 0x0000555555cb5848 in glib_pollfds_poll () at util/main-loop.c:219 qemu#17 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 qemu#18 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 qemu#19 0x00005555559b20c9 in main_loop () at vl.c:1683 qemu#20 0x0000555555838115 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4441 Reported-by: Alexander Bulekov <[email protected]> Cc: Michael Tsirkin <[email protected]> Cc: Cornelia Huck <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: [email protected] Signed-off-by: Stefan Hajnoczi <[email protected]> Message-Id: <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
'list' forgot to free at the end of dump_vmstate_json_to_file(), although it's called only once, but seems like a clean code. Fix the leak as follow: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x7fb946abd768 in __interceptor_malloc (/lib64/libasan.so.5+0xef768) kata-containers#1 0x7fb945eca445 in g_malloc (/lib64/libglib-2.0.so.0+0x52445) kata-containers#2 0x7fb945ee2066 in g_slice_alloc (/lib64/libglib-2.0.so.0+0x6a066) kata-containers#3 0x7fb945ee3139 in g_slist_prepend (/lib64/libglib-2.0.so.0+0x6b139) kata-containers#4 0x5585db591581 in object_class_get_list_tramp /mnt/sdb/qemu-new/qemu/qom/object.c:1084 kata-containers#5 0x5585db590f66 in object_class_foreach_tramp /mnt/sdb/qemu-new/qemu/qom/object.c:1028 kata-containers#6 0x7fb945eb35f7 in g_hash_table_foreach (/lib64/libglib-2.0.so.0+0x3b5f7) kata-containers#7 0x5585db59110c in object_class_foreach /mnt/sdb/qemu-new/qemu/qom/object.c:1038 kata-containers#8 0x5585db5916b6 in object_class_get_list /mnt/sdb/qemu-new/qemu/qom/object.c:1092 kata-containers#9 0x5585db335ca0 in dump_vmstate_json_to_file /mnt/sdb/qemu-new/qemu/migration/savevm.c:638 kata-containers#10 0x5585daa5bcbf in main /mnt/sdb/qemu-new/qemu/vl.c:4420 qemu#11 0x7fb941204812 in __libc_start_main ../csu/libc-start.c:308 qemu#12 0x5585da29420d in _start (/mnt/sdb/qemu-new/qemu/build/x86_64-softmmu/qemu-system-x86_64+0x27f020d) Indirect leak of 7472 byte(s) in 467 object(s) allocated from: #0 0x7fb946abd768 in __interceptor_malloc (/lib64/libasan.so.5+0xef768) kata-containers#1 0x7fb945eca445 in g_malloc (/lib64/libglib-2.0.so.0+0x52445) kata-containers#2 0x7fb945ee2066 in g_slice_alloc (/lib64/libglib-2.0.so.0+0x6a066) kata-containers#3 0x7fb945ee3139 in g_slist_prepend (/lib64/libglib-2.0.so.0+0x6b139) kata-containers#4 0x5585db591581 in object_class_get_list_tramp /mnt/sdb/qemu-new/qemu/qom/object.c:1084 kata-containers#5 0x5585db590f66 in object_class_foreach_tramp /mnt/sdb/qemu-new/qemu/qom/object.c:1028 kata-containers#6 0x7fb945eb35f7 in g_hash_table_foreach (/lib64/libglib-2.0.so.0+0x3b5f7) kata-containers#7 0x5585db59110c in object_class_foreach /mnt/sdb/qemu-new/qemu/qom/object.c:1038 kata-containers#8 0x5585db5916b6 in object_class_get_list /mnt/sdb/qemu-new/qemu/qom/object.c:1092 kata-containers#9 0x5585db335ca0 in dump_vmstate_json_to_file /mnt/sdb/qemu-new/qemu/migration/savevm.c:638 kata-containers#10 0x5585daa5bcbf in main /mnt/sdb/qemu-new/qemu/vl.c:4420 qemu#11 0x7fb941204812 in __libc_start_main ../csu/libc-start.c:308 qemu#12 0x5585da29420d in _start (/mnt/sdb/qemu-new/qemu/build/x86_64-softmmu/qemu-system-x86_64+0x27f020d) Reported-by: Euler Robot <[email protected]> Signed-off-by: Pan Nengyuan <[email protected]> Reviewed-by: Juan Quintela <[email protected]> Reviewed-by: Dr. David Alan Gilbert <[email protected]> Signed-off-by: Juan Quintela <[email protected]>
The ARCH LBR feature is part of intel PMU, so its detection cpuid bit CPUID.0x7[0].edx bit19 shuold be cleared yet when vPMU is disabled(e.g. -cpu host,pmu=off). Otherwise guest kernel fails to initialize the IA32_XSS_MSR and panic finally in this situation, due to it assumes the LBR xsave state is always supported if CPUID.7[0].edx bit19 is set. [ 0.000000] Command line: root=/dev/vda2 nokaslr console=hvc0 console=ttyS0 earlyprintk=ttyS0 ignore_loglevel l1tf=off [ 0.000000] x86/split lock detection: #DB: warning on user-space bus_locks [ 0.000000] unchecked MSR access error: WRMSR to 0xda0 (tried to write 0x0000000000008400) at rIP: 0xffffffff810dd0c4 (native_write_msr+0x4/0x20) [ 0.000000] Call Trace: [ 0.000000] ? fpu__init_cpu_xstate+0x88/0xa0 [ 0.000000] ? fpu__init_system_xstate+0x14c/0x59a [ 0.000000] ? fpu__init_system+0x107/0x12c [ 0.000000] ? early_cpu_init+0x32e/0x358 [ 0.000000] ? setup_arch+0x43/0xc28 [ 0.000000] ? start_kernel+0x58/0x608 [ 0.000000] ? secondary_startup_64_no_verify+0xd1/0xdb [ 0.000000] Call Trace: [ 0.000000] ? wrmsrl.constprop.10+0x15/0x20 [ 0.000000] ? fpu__init_system_xstate+0x1d0/0x59a [ 0.000000] ? fpu__init_system+0x107/0x12c [ 0.000000] ? early_cpu_init+0x32e/0x358 [ 0.000000] ? setup_arch+0x43/0xc28 [ 0.000000] ? start_kernel+0x58/0x608 [ 0.000000] ? secondary_startup_64_no_verify+0xd1/0xdb ... [ 0.166232] rcu: Hierarchical SRCU implementation. [ 0.166232] smp: Bringing up secondary CPUs ... [ 0.166232] x86: Booting SMP configuration: [ 0.166232] .... node #0, CPUs: #1 [ 0.003785] Call Trace: [ 0.003785] fpu__init_cpu_xstate+0x86/0xa0 [ 0.003785] cpu_init+0x174/0x1d0 [ 0.003785] start_secondary+0xb/0xf0 [ 0.003785] secondary_startup_64_no_verify+0xd1/0xdb [ 0.007785] kvm-clock: cpu 1, msr 3c01041, secondary cpu clock [ 0.167991] kvm-guest: setup async PF for cpu 1 [ 0.167991] #2 [ 0.003785] Call Trace: [ 0.003785] fpu__init_cpu_xstate+0x86/0xa0 [ 0.003785] cpu_init+0x174/0x1d0 [ 0.003785] start_secondary+0xb/0xf0 [ 0.003785] secondary_startup_64_no_verify+0xd1/0xdb [ 0.007785] kvm-clock: cpu 2, msr 3c01081, secondary cpu clock [ 0.171703] kvm-guest: setup async PF for cpu 2 [ 0.171703] #3 ... [ 0.835832] x86/mm: Checked W+X mappings: passed, no W+X pages found. [ 0.836395] Run /sbin/init as init process [ 0.836768] with arguments: [ 0.837041] /sbin/init [ 0.837293] nokaslr [ 0.837519] with environment: [ 0.837809] HOME=/ [ 0.838028] TERM=linux [ 0.849223] ------------[ cut here ]------------ [ 0.849645] Bad FPU state detected at __restore_fpregs_from_fpstate+0x2a/0x40, reinitializing FPU registers. [ 0.849656] WARNING: CPU: 1 PID: 1 at arch/x86/mm/extable.c:66 ex_handler_fprestore+0x4e/0x60 [ 0.851345] CPU: 1 PID: 1 Comm: init Not tainted 5.15.0-rc6spr-guest-kernel-gbe3d4853f499-dirty #386 [ 0.852184] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 0.853205] RIP: 0010:ex_handler_fprestore+0x4e/0x60 [ 0.853660] Code: b5 67 82 81 e6 ff 04 00 00 e8 ee 0d fd ff b8 01 00 00 00 c3 48 89 c6 48 c7 c7 60 7c 57 82 c6 05 95 23 04 02 01 e8 a5 46 ab 00 <0f> 0b eb c7 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 80 3d 74 23 [ 0.855268] RSP: 0000:ffa0000000013d90 EFLAGS: 00010086 [ 0.855714] RAX: 0000000000000060 RBX: ffa0000000013e58 RCX: 0000000000000001 [ 0.856322] RDX: 0000000080000001 RSI: 00000000ffffdfff RDI: 00000000ffffffff [ 0.856966] RBP: 000000000000000d R08: 0000000000000000 R09: c0000000ffffdfff [ 0.857605] R10: ffa0000000013bd8 R11: ffa0000000013bd0 R12: 0000000000000000 [ 0.858231] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 0.858869] FS: 0000000000000000(0000) GS:ff11000237880000(0000) knlGS:0000000000000000 [ 0.859553] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.860072] CR2: 00007ffd583659c9 CR3: 000000010106d003 CR4: 0000000000771ee0 [ 0.860737] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 0.861396] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 0.862058] PKRU: 55555554 [ 0.862315] Call Trace: [ 0.862548] fixup_exception+0x3c/0x50 [ 0.862901] exc_general_protection+0x9f/0x2a0 [ 0.863311] asm_exc_general_protection+0x1e/0x30 [ 0.863745] RIP: 0010:__restore_fpregs_from_fpstate+0x2a/0x40 [ 0.864251] Code: 48 83 ec 08 48 89 3c 24 eb 0a 0f 1f 00 db e2 0f 77 db 04 24 0f 1f 44 00 00 48 89 f2 48 8b 3c 24 89 f0 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 c3 48 8b 04 24 48 0f ae 08 48 83 c4 08 c3 0f 1f 40 00 [ 0.865884] RSP: 0000:ffa0000000013f00 EFLAGS: 00010046 [ 0.866370] RAX: 00000000000004e7 RBX: ff11000100218000 RCX: ffffffff8259d00b [ 0.867004] RDX: 0000000000000000 RSI: 00000000000004e7 RDI: ff11000100219340 [ 0.867630] RBP: ff11000100219300 R08: ff11000100fa0000 R09: 0000000000000000 [ 0.868244] R10: 0000000000000246 R11: 0000000000000002 R12: 0000000000000001 [ 0.868863] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 0.869476] switch_fpu_return+0x4c/0xd0 [ 0.869837] exit_to_user_mode_prepare+0x177/0x190 [ 0.870258] syscall_exit_to_user_mode+0x18/0x30 [ 0.870664] ? rest_init+0xd0/0xd0 [ 0.870979] ret_from_fork+0x15/0x30 [ 0.871288] RIP: 0033:0x7f557a495100 [ 0.871597] Code: Unable to access opcode bytes at RIP 0x7f557a4950d6. [ 0.872154] RSP: 002b:00007ffd58365830 EFLAGS: 00000200 ORIG_RAX: 000000000000003b [ 0.872788] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 0.873400] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 0.874009] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 0.874614] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 0.875219] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 0.875846] ---[ end trace 55091adb0bf51996 ]--- ... [ 1.221965] RIP: 0010:bsearch+0x0/0x80 [ 1.221966] Code: ff ff ff ff 48 0f bd d7 29 d0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 48 0f bc c7 c3 cc cc cc cc cc cc cc cc cc cc <41> 57 41 56 41 55 41 54 55 53 48 83 ec 08 48 85 d2 48 89 3c 24 74 [ 1.221966] RSP: 0000:ffa0000000010000 EFLAGS: 00010092 [ 1.221967] RAX: aaaaaaaaaaaaaaab RBX: ffa00000000100d8 RCX: 000000000000000c [ 1.221967] RDX: 00000000000004b5 RSI: ffffffff826c38f0 RDI: ffa0000000010008 [ 1.221968] RBP: 000000000000000d R08: ffffffff81b7f6b0 R09: 0000000000000000 [ 1.221968] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 1.221969] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 1.221969] FS: 0000000000000000(0000) GS:ff11000237880000(0000) knlGS:0000000000000000 [ 1.221970] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.221970] CR2: 00007f557a4950d6 CR3: 000000010106d003 CR4: 0000000000771ee0 [ 1.221972] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.221973] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 1.221973] PKRU: 55555554 [ 1.221974] Kernel panic - not syncing: Fatal exception in interrupt [ 1.222165] Kernel Offset: disabled [ 1.232592] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Signed-off-by: Yuan Yao <[email protected]>
Fix warnings such: disas/nanomips.c:3251:64: warning: format specifies type 'char *' but the argument has type 'int64' (aka 'long long') [-Wformat] return img_format("CACHE 0x%" PRIx64 ", %s(%s)", op_value, s_value, rs); ~~ ^~~~~~~ %lld To avoid crashes such (kernel from commit f375ad6): $ qemu-system-mipsel -cpu I7200 -d in_asm -kernel generic_nano32r6el_page4k ... ---------------- IN: __bzero 0x805c6084: 20c4 6950 ADDU r13, a0, a2 0x805c6088: 9089 ADDIU a0, 1 Process 70261 stopped * thread #6, stop reason = EXC_BAD_ACCESS (code=1, address=0xfffffffffffffff0) frame #0: 0x00000001bfe38864 libsystem_platform.dylib`_platform_strlen + 4 libsystem_platform.dylib`: -> 0x1bfe38864 <+4>: ldr q0, [x1] 0x1bfe38868 <+8>: adr x3, #-0xc8 ; ___lldb_unnamed_symbol314 0x1bfe3886c <+12>: ldr q2, [x3], #0x10 0x1bfe38870 <+16>: and x2, x0, #0xf Target 0: (qemu-system-mipsel) stopped. (lldb) bt * thread #6, stop reason = EXC_BAD_ACCESS (code=1, address=0xfffffffffffffff0) * frame #0: 0x00000001bfe38864 libsystem_platform.dylib`_platform_strlen + 4 frame #1: 0x00000001bfce76a0 libsystem_c.dylib`__vfprintf + 4544 frame #2: 0x00000001bfd158b4 libsystem_c.dylib`_vasprintf + 280 frame #3: 0x0000000101c22fb0 libglib-2.0.0.dylib`g_vasprintf + 28 frame #4: 0x0000000101bfb7d8 libglib-2.0.0.dylib`g_strdup_vprintf + 32 frame #5: 0x000000010000fb70 qemu-system-mipsel`img_format(format=<unavailable>) at nanomips.c:103:14 [opt] frame #6: 0x0000000100018868 qemu-system-mipsel`SB_S9_(instruction=<unavailable>, info=<unavailable>) at nanomips.c:12616:12 [opt] frame #7: 0x000000010000f90c qemu-system-mipsel`print_insn_nanomips at nanomips.c:589:28 [opt] Fixes: 4066c15 ("disas/nanomips: Remove IMMEDIATE functions") Reported-by: Stefan Weil <[email protected]> Reviewed-by: Stefan Weil <[email protected]> Signed-off-by: Philippe Mathieu-Daudé <[email protected]> Message-Id: <[email protected]>
According to Chapter "CPUID Virtualization" in TDX module spec, CPUID bits of TD can be classified into 6 types: ------------------------------------------------------------------------ 1 | As configured | configurable by VMM, independent of native value; ------------------------------------------------------------------------ 2 | As configured | configurable by VMM if the bit is supported natively (if native) | Otherwise it equals as native(0). ------------------------------------------------------------------------ 3 | Fixed | fixed to 0/1 ------------------------------------------------------------------------ 4 | Native | reflect the native value ------------------------------------------------------------------------ 5 | Calculated | calculated by TDX module. ------------------------------------------------------------------------ 6 | Inducing #VE | get #VE exception ------------------------------------------------------------------------ Note: 1. All the configurable XFAM related features and TD attributes related features fall into type #2. And fixed0/1 bits of XFAM and TD attributes fall into type #3. 2. For CPUID leaves not listed in "CPUID virtualization Overview" table in TDX module spec, TDX module injects #VE to TDs when those are queried. For this case, TDs can request CPUID emulation from VMM via TDVMCALL and the values are fully controlled by VMM. Due to TDX module has its own virtualization policy on CPUID bits, it leads to what reported via KVM_GET_SUPPORTED_CPUID diverges from the supported CPUID bits for TDs. In order to keep a consistent CPUID configuration between VMM and TDs. Adjust supported CPUID for TDs based on TDX restrictions. Currently only focus on the CPUID leaves recognized by QEMU's feature_word_info[] that are indexed by a FeatureWord. Introduce a TDX CPUID lookup table, which maintains 1 entry for each FeatureWord. Each entry has below fields: - tdx_fixed0/1: The bits that are fixed as 0/1; - vmm_fixup: The bits that are configurable from the view of TDX module. But they requires emulation of VMM when they are configured as enabled. For those, they are not supported if VMM doesn't report them as supported. So they need be fixed up by checking if VMM supports them. - inducing_ve: TD gets #VE when querying this CPUID leaf. The result is totally configurable by VMM. - supported_on_ve: It's valid only when @inducing_ve is true. It represents the maximum feature set supported that be emulated for TDs. By applying TDX CPUID lookup table and TDX capabilities reported from TDX module, the supported CPUID for TDs can be obtained from following steps: - get the base of VMM supported feature set; - if the leaf is not a FeatureWord just return VMM's value without modification; - if the leaf is an inducing_ve type, applying supported_on_ve mask and return; - include all native bits, it covers type #2, #4, and parts of type #1. (it also includes some unsupported bits. The following step will correct it.) - apply fixed0/1 to it (it covers #3, and rectifies the previous step); - add configurable bits (it covers the other part of type #1); - fix the ones in vmm_fixup; - filter the one has valid .supported field; (Calculated type is ignored since it's determined at runtime). Co-developed-by: Chenyi Qiang <[email protected]> Signed-off-by: Chenyi Qiang <[email protected]> Signed-off-by: Xiaoyao Li <[email protected]>
According to Chapter "CPUID Virtualization" in TDX module spec, CPUID bits of TD can be classified into 6 types: ------------------------------------------------------------------------ 1 | As configured | configurable by VMM, independent of native value; ------------------------------------------------------------------------ 2 | As configured | configurable by VMM if the bit is supported natively (if native) | Otherwise it equals as native(0). ------------------------------------------------------------------------ 3 | Fixed | fixed to 0/1 ------------------------------------------------------------------------ 4 | Native | reflect the native value ------------------------------------------------------------------------ 5 | Calculated | calculated by TDX module. ------------------------------------------------------------------------ 6 | Inducing #VE | get #VE exception ------------------------------------------------------------------------ Note: 1. All the configurable XFAM related features and TD attributes related features fall into type #2. And fixed0/1 bits of XFAM and TD attributes fall into type #3. 2. For CPUID leaves not listed in "CPUID virtualization Overview" table in TDX module spec, TDX module injects #VE to TDs when those are queried. For this case, TDs can request CPUID emulation from VMM via TDVMCALL and the values are fully controlled by VMM. Due to TDX module has its own virtualization policy on CPUID bits, it leads to what reported via KVM_GET_SUPPORTED_CPUID diverges from the supported CPUID bits for TDs. In order to keep a consistent CPUID configuration between VMM and TDs. Adjust supported CPUID for TDs based on TDX restrictions. Currently only focus on the CPUID leaves recognized by QEMU's feature_word_info[] that are indexed by a FeatureWord. Introduce a TDX CPUID lookup table, which maintains 1 entry for each FeatureWord. Each entry has below fields: - tdx_fixed0/1: The bits that are fixed as 0/1; - vmm_fixup: The bits that are configurable from the view of TDX module. But they requires emulation of VMM when they are configured as enabled. For those, they are not supported if VMM doesn't report them as supported. So they need be fixed up by checking if VMM supports them. - inducing_ve: TD gets #VE when querying this CPUID leaf. The result is totally configurable by VMM. - supported_on_ve: It's valid only when @inducing_ve is true. It represents the maximum feature set supported that be emulated for TDs. By applying TDX CPUID lookup table and TDX capabilities reported from TDX module, the supported CPUID for TDs can be obtained from following steps: - get the base of VMM supported feature set; - if the leaf is not a FeatureWord just return VMM's value without modification; - if the leaf is an inducing_ve type, applying supported_on_ve mask and return; - include all native bits, it covers type #2, #4, and parts of type #1. (it also includes some unsupported bits. The following step will correct it.) - apply fixed0/1 to it (it covers #3, and rectifies the previous step); - add configurable bits (it covers the other part of type #1); - fix the ones in vmm_fixup; - filter the one has valid .supported field; (Calculated type is ignored since it's determined at runtime). Co-developed-by: Chenyi Qiang <[email protected]> Signed-off-by: Chenyi Qiang <[email protected]> Signed-off-by: Xiaoyao Li <[email protected]>
When the migration capability 'bypass-shared-memory'
is set, the shared memory will be bypassed when migration.
It is the key feature to enable several excellent features for
the qemu, such as qemu-local-migration, qemu-live-update,
extremely-fast-save-restore, vm-template, vm-fast-live-clone,
yet-another-post-copy-migration, etc..
The philosophy behind this key feature, including the resulting
advanced key features, is that a part of the memory management
is separated out from the qemu, and let the other toolkits
such as libvirt, kata-containers (https://github.com/kata-containers)
runv(https://github.com/hyperhq/runv/) or some multiple cooperative
qemu commands directly access to it, manage it, provide features on it.
The hyperhq(http://hyper.sh http://hypercontainer.io/)
introduced the feature vm-template(vm-fast-live-clone)
to the hyper container for several years, it works perfect.
(see hyperhq/runv#297).
The feature vm-template makes the containers(VMs) can
be started in 130ms and save 80M memory for every
container(VM). So that the hyper containers are fast
and high-density as normal containers.
kata-containers project (https://github.com/kata-containers)
which was launched by hyper, intel and friends and which descended
from runv (and clear-container) should have this feature enabled.
Unfortunately, due to the code confliction between runv&cc,
this feature was temporary disabled and it is being brought
back by hyper and intel team.
In current qemu command line, shared memory has
to be configured via memory-object.
a) feature: qemu-local-migration, qemu-live-update
Set the mem-path on the tmpfs and set share=on for it when
start the vm. example:
-object
memory-backend-file,id=mem,size=128M,mem-path=/dev/shm/memory,share=on
-numa node,nodeid=0,cpus=0-7,memdev=mem
when you want to migrate the vm locally (after fixed a security bug
of the qemu-binary, or other reason), you can start a new qemu with
the same command line and -incoming, then you can migrate the
vm from the old qemu to the new qemu with the migration capability
'bypass-shared-memory' set. The migration will migrate the device-state
ONLY, the memory is the origin memory backed by tmpfs file.
b) feature: extremely-fast-save-restore
the same above, but the mem-path is on the persistent file system.
c) feature: vm-template, vm-fast-live-clone
the template vm is started as 1), and paused when the guest reaches
the template point(example: the guest app is ready), then the template
vm is saved. (the qemu process of the template can be killed now, because
we need only the memory and the device state files (in tmpfs)).
Then we can launch one or multiple VMs base on the template vm states,
the new VMs are started without the “share=on”, all the new VMs share
the initial memory from the memory file, they save a lot of memory.
all the new VMs start from the template point, the guest app can go to
work quickly.
The new VM booted from template vm can’t become template again,
if you need this unusual chained-template feature, you can write
a cloneable-tmpfs kernel module for it.
The libvirt toolkit can’t manage vm-template currently, in the
hyperhq/runv, we use qemu wrapper script to do it. I hope someone add
“libvrit managed template” feature to libvirt.
d) feature: yet-another-post-copy-migration
It is a possible feature, no toolkit can do it well now.
Using nbd server/client on the memory file is reluctantly Ok but
inconvenient. A special feature for tmpfs might be needed to
fully complete this feature.
No one need yet another post copy migration method,
but it is possible when some crazy man need it.
Cc: Samuel Ortiz [email protected]
Cc: Sebastien Boeuf [email protected]
Cc: James O. D. Hunt [email protected]
Cc: Xu Wang [email protected]
Cc: Peng Tao [email protected]
Cc: Xiao Guangrong [email protected]
Cc: Xiao Guangrong [email protected]
Signed-off-by: Lai Jiangshan [email protected]