-
Notifications
You must be signed in to change notification settings - Fork 51
Very slow framebuffer with hyperv_fb on recent Windows hosts, especially in Gen2 VM #655
Comments
Forget to mention, the windows 10 version is 18850.1000 and the VM is creating using Hyper-v gen 2. |
I checked the code of hyperv_fb.c and hv/vmbus_drv.c in the kernel, and could not find issues there on my own. |
Thanks for reporting the issue! Which Linux distribution are you using and are you using the built-in kernel in the distribution ? Is there a .iso from the distribution vendor's website? We'd like to create a VM from the .iso and try to reproduce the issue. Can you please test Gen-1 VM? |
I'm using manjaro with gnome, and iso here. # press Ctrl+Alt+F3 to switch to a new tty and login
sudo pacman -Sy
sudo pacman -S xf86-video-fbdev # install the fbdev package
# there's no need restart gdm (and should not)
# just switch back to tty1 using Ctrl+Alt+F1 and continue |
Gen-1 VM seems not to suffer from the issue in Xorg, it is using VESA (vesafb) instead of FBDEV, |
I can reproduce the exact symptoms on latest Hyper-V build. It looks like recently something on the host side causes this issue. BTW, the issue can not reproduce on a old Windows Server RS2 host. So far, please blacklist the hyperv_fb driver to work around the issue. I'm going to report this issue to Hyper-V team, but I'm afraid it can not be resolved soon. |
Any update in this topic ? I observe it on latest Windows 10 Pro (host) with Ubuntu (guest, in fact every version) but only when enable more than 1 CPU vCore for VM. Gen 1 and Gen 2 VMs. RDC is not option for me, also using standard framebuffers is a little problematic and it would be good to see some progress here. |
@marcinwiacek: Can you please share your host version? On the host, please press Win+R and then run "winver.exe", and you should see something like "Version XXXX (OS Build XXXXX.XXX)" . The meaning of the version numbers is explained here: https://en.wikipedia.org/wiki/Windows_10_version_history . The slowness is introduced by recent host versions (we know RS1 and RS2 are good, and RS5 and newer are bad). Hyper-V team has been working on this, but so far a thorough fix is still not available. At the same time, we (Linux team) are trying to mitigating the slowness by implementing on-demand framebuffer updates. We have some internal patches and are testing them. We have not finalized the patches yet, and the performance improvement may not be very big on recent hosts, before Hyper-V team fixes the hosts. We'll keep the link updated once we make more progress. |
10.0.18362.207 I was thinking about RDC (but not very good option like I said) + using vesafb / uvesafb or any other FB (but no luck with this). If you know any workaround, I will be more than happy to test -> like I said the only one option is small resolution or having one CPU core. |
10.0.18362 is 19H1, which has the slowness issue, as I mentioned. It looks you're saying the FB is not slow when the VM is configured with only 1 virtual CPU? We don't see this. In our tests, the FB in an SMP VM (i.e. more than 1 vCPU) is as slow as that in a 1-vCPU VM, when the VM runs on "recent" host builds, including RS5 and 19H1. If you do need a GUI environment in a Linux VM, I suggest you run vnc server in the VM (which is fast, as it's based on TCP, not Hyper-V VMBus), e.g. https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-vnc-on-ubuntu-18-04. You need a vnc client (e.g. vnc viewer) to connect to the server. |
I confirm - machine with one vCPU has got hyperv_fb working fast, two and more vCPU make it slow. Can it be connected with some Spectre/prediction patches ? And why other FB work always fast ? I also suggest to make small test - please create VM with hyperv_fb in the Ubuntu guest under older host (Windows not affected by bug), then migrate host to latest version. I had some VM (unfortunately lost), which was working fine for me in this scenario (I'm sure now let's say in 90%). vnc - if all will fail, I will have to use (thx) |
I guess Spectre/prediction patches are not related here. I'm not sure what you mean by "other FB work always fast". If you can give the detailed instructions to test different FBs and how you measure "fast", I'll try to reproduce it. Unluckily I don't have a host with exactly the same host build version, and I don't have a host that can upgrade from RS1 (or RS2) to RS5(or 19H1), but I think my build should produce the same result as yours, when we test the access speed of the framebuffers. However, I can not reproduce your symptoms and I can not understand your symptoms (e.g. the FB is not slow with 1 CPU). In my tests on recent buggy hosts, in a Gen1 VM, the legacy FB and Hyper-V synthetic FB are both slow; in a Gen2 VM, the legacy UEFI FB is fast, but the Hyper-V synthetic FB is slow. |
blacklist the hyperv_fb and you would fallback to efifb (if UEFI enabled in guest), then you might not suffer from the performance issue.
|
thx, I have checked vesafb, uvesafb, efifb and all of them are fast, but I don't have custom resolution (or at least big one). hyper_fb works fine with 1 CPU only (with more is slow). xrandr doesn't work. Honestly speaking I don't understand why it's so difficult - is it possible create custom BIOS/UEFI for guests which will have set very big max resolution or many big resolution custom VESA modes? I guess, it would help. for example 1900x900, 1920x1020, 1440x900, etc. just predefine them please & it would be good to be able to setup them using "vga=mode" in kernel options. |
ubuntu 19.04 guest, many cpu, hypervfb, no integration services in vm settings, vm gen 2, checked processor compatibility in vm settings, default numa, secure boot, no dynamic memory - works fine critical - checking option in "hardware\processor\compatibility" mission complete. |
As I mentioned, the FB's performance can be quite different, depending on the configuration: I really don't think the FB performance should be affected by:
When we say the FB is slow or fast, we'd like to know how fast/slow it is by some tool. wget https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/CREDITS?h=v5.0 -O credits.txt In this way, we can exactly know how fast/slow the FB is in a given VM, when we try different scenarios: 1 I'm setting up a Gen1 Ubuntu 19.04 VM on a 18362.175 host, and will report some numbers later. |
On a recent host (host OS build: 18362.175), I installed a Gen-2 Ubuntu 19.04 VM (Desktop version). The CPU is 'Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz". The test "time cat credits.txt" with text mode terminal takes 28 seconds (slow!). Note: here, in Xorg GUI mode or text mode terminal, the same Hyper-V synthetic framebuffer device is used. That's why we can use the test "time cat credits.txt" to measure the FB device's performance. Next, after I blacklist the hyperv_fb driver, Hyper-V synthetic framebuffer is not used ("dmesg | grep hyperv_fb" outputs nothing), and only the legacy UEFI FB device is used ("dmesg" contains "efifb: mode is 1024x768x32, linelength=4096, pages=1"). I did the test "time cat credits.txt" again and now it only takes 1.3 seconds (fast). If I change to 1 CPU and enable "Hardware\Processor\Compatibility\Migration to a physical computer with a different processor version", the result is still about 1.3 seconds. These are what I meant by saying "in a Gen2 VM, the legacy UEFI FB is fast, but the Hyper-V synthetic FB is slow." , and I don't think the number of virual CPUs or enabling the "Hardware\Processor\Compatibility\Migration..." option should make a difference. If you're seeing something different, we'd like to have the details, just as I provided. Note: here I don't test Gen-1 VM. In a Gen1 VM, the legacy PCI FB device and Hyper-V synthetic FB device are both slow. |
I also did the "time cat credits.txt" test in a Gen1 Ubuntu 19.04 VM on the same host (host OS build: 18362.175). By default, with the Hyper-V FB device, the test also takes 28 seconds; if I blacklist the hyperv_fb driver, the test takes 21 seconds. Again, the number of virtual CPUs (1 vs. 4) or enabling the "Hardware\Processor\Compatibility\Migration..." option makes NO difference. |
6 cores, disabled compatibility option, enabled hyperv_fb 1.55.048 (almost 2 minutes!) 1 core, disabled compatibility option, enabled hyperv_fb 0.4.522 6 cores, enabled compatibility option, enabled hyper_fb 0.3.337 |
@marcinwiacek Thanks for sharing the perf numbers! I suppose your host version is 10.0.18362.207, and the VM here is a Gen2 VM? Can you share your VM's "cat /proc/cpuinfo"? Your test #1 vs. #3: it looks "disabled compatibility option" would make the FB extremely slow. |
1 core, enabled compatibility option, enabled hyper_fb 0.5.940 Numbers make sense:
Compatibility is additionally resolving hyperv_fb problem. Intel i7-6820HQ, you're right about versions |
It looks your theory can not explain why both "1 core, enabled compatibility option" and "1 core, disabled compatibility option" are fast: Unluckily I can not repro the same symptom with my host (18362.175, which should be very similar to yours) and CPU (i7-7600U, which is a little newer). I'll keep an eye on this symptom and try to repro it if I can find a HW/SW setup that's more similar to yours. |
Excluding bug which we're tracking everything looks very sensible. It looks, that bug is in multiple CPU support & compatibility option is disabling something, which makes problem. I won't be surprised if code for detecting CPU features is buggy somewhere.
6th gen vs 7th gen - they can be very different (different microcode, different graphic card & because of it drivers, etc.) |
I have the same issue, on AMD.
|
I am seeing this same issue on Server 2019 with a SLES 12 virtual Gen 1. Its very interesting that if I have a gnome term open to full screen and do a top, Xorg runs 99% and the frame buffering weridness happens.....but if I shrink that terminal down to less than 25% of the screen, Xorg drops down to less than 10% cpu and the buffering stops. This is on the latest 2019 build with SLES 12 patched up to current. |
@jaredheath : this should be the known host issue I mentioned previously. Hyper-V team has not fixed the issue for Server 2019, but we (the Linux team) made 2 patches to the hyper-v framebuffer driver in Linux VM so the issue can be effectively mitigated: [v4] video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host: https://patchwork.kernel.org/patch/11132483/ [PATHC,v6] video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver: https://patchwork.kernel.org/patch/11149671/ The patches will be in the mainline Linux kernel git repository soon, and will eventually propagate into new versions of various Linux distributions (which could take months or longer). If you'd like to use the 2 patches now, you need to build & install a kernel with the 2 patches applied. |
yeah, that won't be an option in our environment. I guess we have to defer deployment. Time to alert management. Thanks for the reply. |
Is blacklisting hyperv_fb an option? #655 (comment) |
@jaredheath : As @jimbo1qaz reminded, blacklisting the hyperv_fb driver may be an option to you, especially when you use a Generation 2 VM. |
The issue is still there even in mainline.
For me the issue is present for both Gen1 and Gen2 VMs.
Blacklisting Rolling a guest VM back to |
IMO v5.5.8 and v5.6-rc5 should have the same framebuffer performance, because "git diff v5.5.8 v5.6-rc5 -- drivers/video/fbdev/hyperv_fb.c" returns nothing, except for a 2-line patch that is only used in the VM hibernation scenario. For a Gen2 VM, yes, the Hyper-V synthetic framebuffer is still slow, even with Wei's 3 recent hyperv_fb patches:. We'll ping Hyper-V team (n+1)'th time... The workaround is to black list hyperv_fb and use the efifb framebuffer, which is fast enough, typically. For a Gen1 VM, the Hyper-V synthetic framebuffer is also still slow, but the slowness can be effectively mitigated by Wei's third patch video: hyperv: hyperv_fb: Use physical memory for fb on HyperV Gen 1 VMs.: please remember to add a proper kernel parameter for "cma=", e.g. cma=130m (see the changelog of the patch).
I can not understand. v5.2.13 should also suffer from the slow Hyper-V synthetic framebuffer issue. v5.2.13 doesn't have the 3 patches from Wei, so the slowness can not be mitigated by the cma=130m hack for Gen-1. I would let @whu2014 provide his insights. |
I'm sorry for the confusion. I did not specify that I tested this rollback only on Gen2 VMs. Corrected my previous comment. |
I had the same issue on my hyper-v 2019 server and checking the "Processor compatibility" box helps |
I still can't understand why checking the "Processor compatibility" box would make a difference and I can't reproduce the same symptom with my test VMs. :-( @shubell: can you please share your host version (run "winver.exe" on the host), your Linux kernel version (run "uname -a") ? Is your Linux VM a Generation 1 VM or Generation 2? |
Well the issue started when I migrated from hyper-v 2016 to 2019 (version 1809 OS buld 17763.1282). Have 3 ubuntu 16.04 LTS gen1 VM ( 4.15.0-112-generic #113~16.04.1-Ubuntu ) and the same issue was on all. Compatibility fixes the fb issue on all. Could allso be and issue with some CPUs. Mine is 2x Xeon Gold 5217. |
We observed that the problem with the "slow framebuffer" can trigger in one
of the 2 flavors, depending on your hardware/CPU:
1. The framebuffer is so slow, that scrolling is a slide show and the whole
system is completely unusable. This happens on most of the PCs/laptops. The
"processor compatibility" checkbox brings a huge performance improvement in
such a case and reduces the problem to (2)
2. The framebuffer is slow (noticeably slower than on Windows versions
where the bug was not there), but much faster than in (1) and the system is
usable in general, even though scrolling is not as smooth as it should be.
This happens on some PCs/laptop out-of-the-box, on other machines one needs
to activate the "processor compatibility" checkbox
We also observed that the Wei's hyper-v patches don't bring that much
improvement in scenario (2) for us. They do make the VM a bit more
responsive, but one also observes more screen tearing in such a case. So
the overall user experience is a bit worse than without the patches applied.
…On Mon, Jul 27, 2020 at 9:48 PM Dexuan Cui ***@***.***> wrote:
I still can't understand why checking the "Processor compatibility" box
would make a difference and I can't reproduce the same symptom with my test
VMs. :-(
@shubell <https://github.com/shubell>: can you please share your host
version (run "winver.exe" on the host), your Linux kernel version (run
"uname -a") ? Is your Linux VM a Generation 1 VM or Generation 2?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#655 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB25FXWUOTKBVCIQYDZWCKDR5XK2LANCNFSM4G442KDA>
.
|
I'm very surprised. On Jul 6, 2019 and earlier I had given concrete Windows version + concrete CPU model. It looks, that MS team is checking it with other CPUs... and this the most probably makes lack of reproduction success ("processor compatibility" seems to make trick and I understand, that issue is connected with some concrete CPU features). And now very open questions: how many $ are required to find/lean/buy CPU with the same features? And why does it require > year? And you know what? I have even found answer in 5 minutes - used laptop with this CPU costs ca. 1300 USD (and this is really edge case) PS. I'm not using this product from longer time - just went into some other solutions and seen today GH notification. |
Thanks shubell and mikov for sharing more info! Hi marcinwiacek, I'm as frustrated as you on this bug... I know there must be a Hyper-V bug, and I have pinged Hyper-V team many times... Here I was just trying to collect more information, which can help to resolve the issue thoroughly. I'll ping Hyper-V team again. |
Hehe i was just kinda surprised that noone before noticed that the
"compatibility feature" helped.
…On Tue, 28 Jul 2020, 23:25 Dexuan Cui, ***@***.***> wrote:
Thanks shubell and mikov for sharing more info!
Hi marcinwiacek, I'm as frustrated as you on this bug... I know there must
be a Hyper-V bug, and I have pinged Hyper-V team many times... Here I was
just trying to collect more information, which can help to resolve the
issue thoroughly. I'll ping Hyper-V team again.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#655 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC36E7VOW63M62XUIKFMZK3R5463FANCNFSM4G442KDA>
.
|
To be honest - I don't know if this is your initiative (made because you're engineer from hearth and want to fix it) or some task from your manager. I know, that lack of support makes, that people look for alternatives (and don't return when they're better). They need solutions, not explanations or excuses. From my side - it's not important what you will do (and I'm far away from being frustrated). Google, Intel, Microsoft... People can use today many other things. Good luck! PS. Right now this story here is more than one year old and this bug is still young, when you compare it to some bug from Google (where I was somehow involved and which was submitted on Mar 16, 2012). |
Today I happened to find a server with the same host version "version 1809 OS bulid 17763.1282", so I did some quick tests with a newly-created Generation-1 Ubuntu 16.04 VM (4.15.0-112-generic) and the CPU type is: cpu family : 6 I think I can reproduce shubell's observation: the VM's Xorg desktop window is not very responsive, e.g. after I right-click the desktop, the context menu pops up in near 1 second. Later, after I check "Hardware\Processor\Compatibility\Migration to a physical computer with a different processor version", the Xorg desktop becomes much more responsive, but I can still perceive that it's not 100% normal as it's supposed to be. I just reported the finding to Hyper-V team. BTW, I'm from the Linux team and we have no control over Hyper-V. According to my tests, there is indeed a workaround if you can use Generation-2 VM: somehow the legacy EFI FB driver is not affected, so in a Generation-2 VM, we can blacklist the hyperv_fb driver and use the efifb driver, which is fast. |
This will also require a change of a boot manager (e.g. rEFInd) as setting a custom resolution for |
So you are running a slow VM and some fast VMs with older Linux kernels on the same host at the same time, and hence you think the slow VM's kernel (which is newer) introduces the slowness? If so, can you please share the host version info (please run "winver.exe" on the host) and the kernel version info of the slow VM and the fast VMs (please run "uname -a")? Also please clarify if it's a Gen1 or Gen2 VM, if you check "Hardware\Processor\Compatibility\Migration to a physical computer with a different processor version", and when you feel the slowness, are you using a text mode tty ternimal or a Xorg window. I'm pretty sure this slowness is introduced by the host, not the guest, so I'm asking for the above info just in case the guest somehow makes the slowness worse in recent Linux kernels. |
@pavel : Thanks for the detailed report! We'll take a look at the difference between 5.3.8 and 5.8.10. |
I did some tests today in a similar environment with the same host build version, the same VM kernel versions, the same resolution 1920x1080, and I don't check "Migrate to physical computer with a different processor version". The environmental differences are: I'm using a Gen2 Ubuntu 20.04 (rather than Arch Linux) VM and I built the 5.3.8 and 5.8.10 kernel from the upstream stable kernel git repo. IMO these differences should not matter. In my test, with v5.8.10 the framebuffer is faster than v5.3.8: 1) in the Xorg GUI environment it looks the framebuffer is only slightly faster in 5.8.10 than in v5.3.8; 2) in the case of text mode terminal (I used tty3 in my test), I run "wget 'https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/CREDITS?h=v5.9-rc7' -O test.txt @pavel: Can you also do the "cat" test with the same test.txt file against v5.3.8 vs. v5.8.10? With both the kernels, the mouse movement is basically normal to me, and I don't experience any noticeable sluggishness. Before the Hyper-V team fixes the slow framebuffer, for a Gen2 VM, the only easy workaround is to blacklist the hyperv_fb driver and use the legacy UEFI framebuffer (which happens to be fast) -- I undersand the drawback is that it seems impossible (?) to use a larger resolution. Another possible workaround is to use VNC server (we need to run VNC viewer to connect to the VM via network) or xrdp (This is intergrated with Hyper-V Manager: see https://docs.microsoft.com/en-us/virtualization/community/team-blog/2018/20180228-sneak-peek-taking-a-spin-with-enhanced-linux-vms and microsoft/linux-vm-tools#106 (comment). I'm not sure how easy it's to make this work for Arch Linux. The links I shared are mainly for Ubuntu) |
Thanks, @pavel ! So your result is the same as mine. Now I understood your earlier description:
vm1 actually is not so "normal" as it takes too long (about 40 seconds) to print the text file to tty (vm2 only needs about 3 seconds). vm2 actually has faster rendering than vm1, though the contents of vm2's screen become unrecognizable in tty. Not sure why in Xorg the mouse movement and animation are not sluggish to me -- it looks there is a little slowness, but it's not noticeable to me. PS, I hate to say this, but we still have no update from Hyper-V team about the slow framebuffer issue. :-( |
Got a 3rd VM up.
No issues in neither Xorg nor tty. 😕 |
I am experiencing similar issues with Debian 10.2 - Looking forward to hearing from the hyper-v team. |
It turns out to be a Linux bug that only happens when the VM runs on recent Hyper-V since sometime in 2018. I just posted a patch here: https://lkml.org/lkml/2020/11/17/2222 . Hopefully the fix will be in v5.10 and will be integrated into various Linux distros. |
BTW, in a Gen-1 VM on recent Hyper-V since 2018, the legacy VRAM is also mapped uncacheable by default, so I can also perceive the slowness before the Hyper-V synthetic framebuffer driver "hyperv_fb" loads. To work around that slowness, we can use this kernel parameter "video=vesafb:mtrr:3", which tells the legacy framebuffer driver "vesafb" to map the legacy VRAM cacheable. |
Hi dcui, If I need to set the screen resolution to 1920x1080, are there any other parameters needed in the above video statement ? |
@tjleary75 To set the resolution to 1600x1200 in my Ubuntu 20.04.1 VM, I have the below 2 lines in my /etc/default/grub: |
No issues in my updated |
Before, on Ubuntu Server console, "time cat 4mbfile.txt" would be 4+ minutes. Now, after Somewhat related to this, found out the hard way that using Set-VMVideo is a must to get higher resolutions, wasted considerable time testing different video:vesa/uvesa options and wondering why nothing has any effect. |
FYI: For Ubuntu 20.04, as I just checked, the latest linux-azure kernel Ubuntu-azure-5.4.0-1039.41 (Jan 18) still does not have the fix, but the generic 5.4 kernel Ubuntu-5.4.0-66.74 and the HWE kernel Ubuntu-hwe-5.8-5.8.0-44.50_20.04.1 already have the fix. |
I'm using manjaro gnome3. With linux version 4.19.
The hyperv_fb is much slower than default efifb. (you could see the buffer is rendered line by line when scrolling with hyperv_fb, but could hardly notice when using )
the mode of hyperv_fb is U:1152x864p-0 (8192kB)
after blacklist the hyperv_fb it is using hardware EFI VGA(told by /var/log/Xorg.0.log)
the mode of efifb is U:1024x768p-75 (3072kB)
I'm not sure what's wrong with hyperv_fb, and the situation goes even worse when set the resolution "video=hyperv_fb:1920x1080", I hadn't found a way to set mode of efifb to 1920x1080 so haven't test for that.
The text was updated successfully, but these errors were encountered: