Skip to content
This repository has been archived by the owner on Sep 28, 2024. It is now read-only.

Very slow framebuffer with hyperv_fb on recent Windows hosts, especially in Gen2 VM #655

Open
clouds56 opened this issue Mar 10, 2019 · 70 comments
Assignees

Comments

@clouds56
Copy link

I'm using manjaro gnome3. With linux version 4.19.
The hyperv_fb is much slower than default efifb. (you could see the buffer is rendered line by line when scrolling with hyperv_fb, but could hardly notice when using )
the mode of hyperv_fb is U:1152x864p-0 (8192kB)
after blacklist the hyperv_fb it is using hardware EFI VGA(told by /var/log/Xorg.0.log)
the mode of efifb is U:1024x768p-75 (3072kB)

I'm not sure what's wrong with hyperv_fb, and the situation goes even worse when set the resolution "video=hyperv_fb:1920x1080", I hadn't found a way to set mode of efifb to 1920x1080 so haven't test for that.

@clouds56
Copy link
Author

Forget to mention, the windows 10 version is 18850.1000 and the VM is creating using Hyper-v gen 2.

@clouds56
Copy link
Author

I checked the code of hyperv_fb.c and hv/vmbus_drv.c in the kernel, and could not find issues there on my own.
Would it possible caused by change of some part of code in Hyper-V (maybe related to the RemoteFX deprecated?)

@dcui
Copy link
Contributor

dcui commented Mar 11, 2019

Thanks for reporting the issue! Which Linux distribution are you using and are you using the built-in kernel in the distribution ? Is there a .iso from the distribution vendor's website? We'd like to create a VM from the .iso and try to reproduce the issue.

Can you please test Gen-1 VM?

@clouds56
Copy link
Author

clouds56 commented Mar 11, 2019

I'm using manjaro with gnome, and iso here.
When using with Gen-2 VM, please follow the instruction here before install:

# press Ctrl+Alt+F3 to switch to a new tty and login
sudo pacman -Sy
sudo pacman -S xf86-video-fbdev # install the fbdev package
# there's no need restart gdm (and should not)
# just switch back to tty1 using Ctrl+Alt+F1 and continue

@clouds56
Copy link
Author

Gen-1 VM seems not to suffer from the issue in Xorg, it is using VESA (vesafb) instead of FBDEV,
When I switch to tty in Gen-1 VM (it would switch to hyperv_fb), it is still slow when scrolling in less.

@dcui
Copy link
Contributor

dcui commented Mar 12, 2019

I can reproduce the exact symptoms on latest Hyper-V build. It looks like recently something on the host side causes this issue. BTW, the issue can not reproduce on a old Windows Server RS2 host.

So far, please blacklist the hyperv_fb driver to work around the issue.

I'm going to report this issue to Hyper-V team, but I'm afraid it can not be resolved soon.

@dcui dcui changed the title Performance issue in linux 4.19 Very slow framebuffer with hyperv_fb on recent Windows hosts, especially in Gen2 VM Mar 12, 2019
@marcinwiacek
Copy link

marcinwiacek commented Jul 2, 2019

Any update in this topic ? I observe it on latest Windows 10 Pro (host) with Ubuntu (guest, in fact every version) but only when enable more than 1 CPU vCore for VM. Gen 1 and Gen 2 VMs.

RDC is not option for me, also using standard framebuffers is a little problematic and it would be good to see some progress here.

@dcui
Copy link
Contributor

dcui commented Jul 3, 2019

@marcinwiacek: Can you please share your host version? On the host, please press Win+R and then run "winver.exe", and you should see something like "Version XXXX (OS Build XXXXX.XXX)" . The meaning of the version numbers is explained here: https://en.wikipedia.org/wiki/Windows_10_version_history .

The slowness is introduced by recent host versions (we know RS1 and RS2 are good, and RS5 and newer are bad). Hyper-V team has been working on this, but so far a thorough fix is still not available.

At the same time, we (Linux team) are trying to mitigating the slowness by implementing on-demand framebuffer updates. We have some internal patches and are testing them. We have not finalized the patches yet, and the performance improvement may not be very big on recent hosts, before Hyper-V team fixes the hosts.

We'll keep the link updated once we make more progress.

@marcinwiacek
Copy link

marcinwiacek commented Jul 3, 2019

10.0.18362.207

I was thinking about RDC (but not very good option like I said) + using vesafb / uvesafb or any other FB (but no luck with this). If you know any workaround, I will be more than happy to test -> like I said the only one option is small resolution or having one CPU core.

@dcui
Copy link
Contributor

dcui commented Jul 3, 2019

10.0.18362 is 19H1, which has the slowness issue, as I mentioned.

It looks you're saying the FB is not slow when the VM is configured with only 1 virtual CPU? We don't see this. In our tests, the FB in an SMP VM (i.e. more than 1 vCPU) is as slow as that in a 1-vCPU VM, when the VM runs on "recent" host builds, including RS5 and 19H1.

If you do need a GUI environment in a Linux VM, I suggest you run vnc server in the VM (which is fast, as it's based on TCP, not Hyper-V VMBus), e.g. https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-vnc-on-ubuntu-18-04. You need a vnc client (e.g. vnc viewer) to connect to the server.

@marcinwiacek
Copy link

marcinwiacek commented Jul 3, 2019

I confirm - machine with one vCPU has got hyperv_fb working fast, two and more vCPU make it slow. Can it be connected with some Spectre/prediction patches ? And why other FB work always fast ?

I also suggest to make small test - please create VM with hyperv_fb in the Ubuntu guest under older host (Windows not affected by bug), then migrate host to latest version. I had some VM (unfortunately lost), which was working fine for me in this scenario (I'm sure now let's say in 90%).

vnc - if all will fail, I will have to use (thx)

@dcui
Copy link
Contributor

dcui commented Jul 4, 2019

I guess Spectre/prediction patches are not related here.

I'm not sure what you mean by "other FB work always fast". If you can give the detailed instructions to test different FBs and how you measure "fast", I'll try to reproduce it.

Unluckily I don't have a host with exactly the same host build version, and I don't have a host that can upgrade from RS1 (or RS2) to RS5(or 19H1), but I think my build should produce the same result as yours, when we test the access speed of the framebuffers. However, I can not reproduce your symptoms and I can not understand your symptoms (e.g. the FB is not slow with 1 CPU).

In my tests on recent buggy hosts, in a Gen1 VM, the legacy FB and Hyper-V synthetic FB are both slow; in a Gen2 VM, the legacy UEFI FB is fast, but the Hyper-V synthetic FB is slow.

@clouds56
Copy link
Author

clouds56 commented Jul 4, 2019

blacklist the hyperv_fb and you would fallback to efifb (if UEFI enabled in guest), then you might not suffer from the performance issue.

# /etc/modprobe.d/blacklist.conf
blacklist hyperv_fb

@marcinwiacek
Copy link

marcinwiacek commented Jul 4, 2019

thx, I have checked vesafb, uvesafb, efifb and all of them are fast, but I don't have custom resolution (or at least big one). hyper_fb works fine with 1 CPU only (with more is slow). xrandr doesn't work.

Honestly speaking I don't understand why it's so difficult - is it possible create custom BIOS/UEFI for guests which will have set very big max resolution or many big resolution custom VESA modes? I guess, it would help.

for example 1900x900, 1920x1020, 1440x900, etc.

just predefine them please & it would be good to be able to setup them using "vga=mode" in kernel options.

@marcinwiacek
Copy link

marcinwiacek commented Jul 4, 2019

ubuntu 19.04 guest, many cpu, hypervfb, no integration services in vm settings, vm gen 2, checked processor compatibility in vm settings, default numa, secure boot, no dynamic memory - works fine

critical - checking option in "hardware\processor\compatibility"

mission complete.

@dcui
Copy link
Contributor

dcui commented Jul 5, 2019

As I mentioned, the FB's performance can be quite different, depending on the configuration:
Gen1 VM vs. Gen2 VM?
Legacy FBs (PCI FB or UEFI FB) or Hyper-V synthetic FB (hyperv_fb)?
Old good hosts vs. new buggy hosts?
How big is the resolution of the legacy FB ("dmesg |grep fb" should contains the info) or Hyper-V synthetic FB (the default 1152x864 vs. a different one?)?

I really don't think the FB performance should be affected by:

  1. the numbers of the VM CPUs.
  2. whether we enable or disable the Integration Services.
  3. whether we check option in "hardware\processor\compatibility.

When we say the FB is slow or fast, we'd like to know how fast/slow it is by some tool.
I usually do a simple test: in the VM, press Ctrl+Alt+F3 to enter the text mode terminal, and run:

wget https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/CREDITS?h=v5.0 -O credits.txt
time cat credits.txt

In this way, we can exactly know how fast/slow the FB is in a given VM, when we try different scenarios: 1
CPU vs. more? legacy FB vs. Hyper-V synthetic FB, etc.

I'm setting up a Gen1 Ubuntu 19.04 VM on a 18362.175 host, and will report some numbers later.

@dcui
Copy link
Contributor

dcui commented Jul 5, 2019

On a recent host (host OS build: 18362.175), I installed a Gen-2 Ubuntu 19.04 VM (Desktop version).

The CPU is 'Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz".
The VM has 4 virtual CPUs and by default it uses Hyper-V FB device ("dmesg" shows "hyperv_fb: Screen resolution: 1152x864, Color depth: 3").

The test "time cat credits.txt" with text mode terminal takes 28 seconds (slow!).
When I configure 1 virtual CPU and/or enable the option in Hardware\Processor\Compatibility\Migration to a physical computer with a different processor version", the test still takes 28 seconds -- no difference at all.

Note: here, in Xorg GUI mode or text mode terminal, the same Hyper-V synthetic framebuffer device is used. That's why we can use the test "time cat credits.txt" to measure the FB device's performance.

Next, after I blacklist the hyperv_fb driver, Hyper-V synthetic framebuffer is not used ("dmesg | grep hyperv_fb" outputs nothing), and only the legacy UEFI FB device is used ("dmesg" contains "efifb: mode is 1024x768x32, linelength=4096, pages=1"). I did the test "time cat credits.txt" again and now it only takes 1.3 seconds (fast). If I change to 1 CPU and enable "Hardware\Processor\Compatibility\Migration to a physical computer with a different processor version", the result is still about 1.3 seconds.

These are what I meant by saying "in a Gen2 VM, the legacy UEFI FB is fast, but the Hyper-V synthetic FB is slow." , and I don't think the number of virual CPUs or enabling the "Hardware\Processor\Compatibility\Migration..." option should make a difference. If you're seeing something different, we'd like to have the details, just as I provided.

Note: here I don't test Gen-1 VM. In a Gen1 VM, the legacy PCI FB device and Hyper-V synthetic FB device are both slow.

@dcui
Copy link
Contributor

dcui commented Jul 5, 2019

I also did the "time cat credits.txt" test in a Gen1 Ubuntu 19.04 VM on the same host (host OS build: 18362.175). By default, with the Hyper-V FB device, the test also takes 28 seconds; if I blacklist the hyperv_fb driver, the test takes 21 seconds.

Again, the number of virtual CPUs (1 vs. 4) or enabling the "Hardware\Processor\Compatibility\Migration..." option makes NO difference.

@marcinwiacek
Copy link

marcinwiacek commented Jul 6, 2019

time cat my_big_file

6 cores, disabled compatibility option, enabled hyperv_fb

1.55.048 (almost 2 minutes!)
0.004
0.322

1 core, disabled compatibility option, enabled hyperv_fb

0.4.522
0.000
0.126

6 cores, enabled compatibility option, enabled hyper_fb

0.3.337
0.000
0.313

@dcui
Copy link
Contributor

dcui commented Jul 6, 2019

@marcinwiacek Thanks for sharing the perf numbers! I suppose your host version is 10.0.18362.207, and the VM here is a Gen2 VM? Can you share your VM's "cat /proc/cpuinfo"?

Your test #1 vs. #3: it looks "disabled compatibility option" would make the FB extremely slow.
Your test #1 vs. #2: #2 is not slow despite "disabled compatibility option", and it looks using 1 CPU dramatically makes the FB a lot faster?
What about "1 core, enabled compatibility option, enabled hyperv_fb"

@marcinwiacek
Copy link

marcinwiacek commented Jul 6, 2019

1 core, enabled compatibility option, enabled hyper_fb

0.5.940
0
0.151

Numbers make sense:

  • more cores = bigger speed (of course task is not CPU consuming & difference is not huge)
  • compatibility = slower speed

Compatibility is additionally resolving hyperv_fb problem.

Intel i7-6820HQ, you're right about versions

@dcui
Copy link
Contributor

dcui commented Jul 6, 2019

It looks your theory can not explain why both "1 core, enabled compatibility option" and "1 core, disabled compatibility option" are fast:

Unluckily I can not repro the same symptom with my host (18362.175, which should be very similar to yours) and CPU (i7-7600U, which is a little newer). I'll keep an eye on this symptom and try to repro it if I can find a HW/SW setup that's more similar to yours.

@marcinwiacek
Copy link

marcinwiacek commented Jul 6, 2019

It looks your theory can not explain why both
"1 core, enabled compatibility option" and "1 core, disabled compatibility option" are fast:

Excluding bug which we're tracking everything looks very sensible.

It looks, that bug is in multiple CPU support & compatibility option is disabling something, which makes problem.

I won't be surprised if code for detecting CPU features is buggy somewhere.

CPU (i7-7600U, which is a little newer)

https://ark.intel.com/content/www/us/en/ark/products/88970/intel-core-i7-6820hq-processor-8m-cache-up-to-3-60-ghz.html

https://ark.intel.com/content/www/us/en/ark/products/97466/intel-core-i7-7600u-processor-4m-cache-up-to-3-90-ghz.html

6th gen vs 7th gen - they can be very different (different microcode, different graphic card & because of it drivers, etc.)

@nyanpasu64
Copy link

I have the same issue, on AMD.

  • Windows 10 Education 1903:
  • "manjaro-xfce-18.0.4-stable-x86_64.iso" (same issue on "archlinux-2019.08.01-x86_64.iso")
  • lscpu states "AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx" (not Intel CPU), with only 1 thread/core/socket allocated to the VM.

@jaredheath
Copy link

I am seeing this same issue on Server 2019 with a SLES 12 virtual Gen 1. Its very interesting that if I have a gnome term open to full screen and do a top, Xorg runs 99% and the frame buffering weridness happens.....but if I shrink that terminal down to less than 25% of the screen, Xorg drops down to less than 10% cpu and the buffering stops.

This is on the latest 2019 build with SLES 12 patched up to current.

@dcui
Copy link
Contributor

dcui commented Sep 27, 2019

@jaredheath : this should be the known host issue I mentioned previously. Hyper-V team has not fixed the issue for Server 2019, but we (the Linux team) made 2 patches to the hyper-v framebuffer driver in Linux VM so the issue can be effectively mitigated:

[v4] video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host: https://patchwork.kernel.org/patch/11132483/

[PATHC,v6] video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver: https://patchwork.kernel.org/patch/11149671/

The patches will be in the mainline Linux kernel git repository soon, and will eventually propagate into new versions of various Linux distributions (which could take months or longer).

If you'd like to use the 2 patches now, you need to build & install a kernel with the 2 patches applied.

@jaredheath
Copy link

yeah, that won't be an option in our environment. I guess we have to defer deployment. Time to alert management.

Thanks for the reply.

@nyanpasu64
Copy link

Is blacklisting hyperv_fb an option? #655 (comment)

@dcui
Copy link
Contributor

dcui commented Sep 27, 2019

@jaredheath : As @jimbo1qaz reminded, blacklisting the hyperv_fb driver may be an option to you, especially when you use a Generation 2 VM.

@pavel
Copy link

pavel commented Mar 11, 2020

The issue is still there even in mainline.

Host: 10.0.18362.1
Guest kernel: 5.6.0-rc5
Guest distribution: Arch Linux

For me the issue is present for both Gen1 and Gen2 VMs.
Also compared to 5.5.8 mainline 5.6.0-rc5 is even worse as I can now see artifacts (when not using X11) for processes that continuously render text (e.g. pacman). While using X11 the rendering is slow as it was in 5.5.8:

  • noticeable mouse stuttering
  • noticeable UI rendering lag (e.g. noticeable animation stuttering when opening a new browser tab)

Blacklisting hyperv_fb does the trick but then you end up with efifb for which AFAIK the only way to set resolution is to use grub.

Rolling a guest VM back to 5.2.13 resolves the problem for me for Gen2 VMs.

@dcui
Copy link
Contributor

dcui commented Mar 11, 2020

IMO v5.5.8 and v5.6-rc5 should have the same framebuffer performance, because "git diff v5.5.8 v5.6-rc5 -- drivers/video/fbdev/hyperv_fb.c" returns nothing, except for a 2-line patch that is only used in the VM hibernation scenario.

For a Gen2 VM, yes, the Hyper-V synthetic framebuffer is still slow, even with Wei's 3 recent hyperv_fb patches:. We'll ping Hyper-V team (n+1)'th time... The workaround is to black list hyperv_fb and use the efifb framebuffer, which is fast enough, typically.

For a Gen1 VM, the Hyper-V synthetic framebuffer is also still slow, but the slowness can be effectively mitigated by Wei's third patch video: hyperv: hyperv_fb: Use physical memory for fb on HyperV Gen 1 VMs.: please remember to add a proper kernel parameter for "cma=", e.g. cma=130m (see the changelog of the patch).

Rolling a guest VM back to 5.2.13 resolves the problem for me

I can not understand. v5.2.13 should also suffer from the slow Hyper-V synthetic framebuffer issue. v5.2.13 doesn't have the 3 patches from Wei, so the slowness can not be mitigated by the cma=130m hack for Gen-1.

I would let @whu2014 provide his insights.

@pavel
Copy link

pavel commented Mar 11, 2020

I can not understand. v5.2.13 should also suffer from the slow Hyper-V synthetic framebuffer issue. v5.2.13 doesn't have the 3 patches from Wei, so the slowness can not be mitigated by the cma=130m hack for Gen-1.

I'm sorry for the confusion. I did not specify that I tested this rollback only on Gen2 VMs. Corrected my previous comment.

@shubell
Copy link

shubell commented Jul 27, 2020

I had the same issue on my hyper-v 2019 server and checking the "Processor compatibility" box helps

@dcui
Copy link
Contributor

dcui commented Jul 27, 2020

I still can't understand why checking the "Processor compatibility" box would make a difference and I can't reproduce the same symptom with my test VMs. :-(

@shubell: can you please share your host version (run "winver.exe" on the host), your Linux kernel version (run "uname -a") ? Is your Linux VM a Generation 1 VM or Generation 2?

@shubell
Copy link

shubell commented Jul 28, 2020

Well the issue started when I migrated from hyper-v 2016 to 2019 (version 1809 OS buld 17763.1282). Have 3 ubuntu 16.04 LTS gen1 VM ( 4.15.0-112-generic #113~16.04.1-Ubuntu ) and the same issue was on all. Compatibility fixes the fb issue on all. Could allso be and issue with some CPUs. Mine is 2x Xeon Gold 5217.

@mikov
Copy link

mikov commented Jul 28, 2020 via email

@marcinwiacek
Copy link

marcinwiacek commented Jul 28, 2020

I'm very surprised. On Jul 6, 2019 and earlier I had given concrete Windows version + concrete CPU model. It looks, that MS team is checking it with other CPUs... and this the most probably makes lack of reproduction success ("processor compatibility" seems to make trick and I understand, that issue is connected with some concrete CPU features).

And now very open questions: how many $ are required to find/lean/buy CPU with the same features? And why does it require > year?

And you know what? I have even found answer in 5 minutes - used laptop with this CPU costs ca. 1300 USD (and this is really edge case)

PS. I'm not using this product from longer time - just went into some other solutions and seen today GH notification.

@dcui
Copy link
Contributor

dcui commented Jul 28, 2020

Thanks shubell and mikov for sharing more info!

Hi marcinwiacek, I'm as frustrated as you on this bug... I know there must be a Hyper-V bug, and I have pinged Hyper-V team many times... Here I was just trying to collect more information, which can help to resolve the issue thoroughly. I'll ping Hyper-V team again.

@shubell
Copy link

shubell commented Jul 28, 2020 via email

@marcinwiacek
Copy link

Thanks shubell and mikov for sharing more info!

Hi marcinwiacek, I'm as frustrated as you on this bug... I know there must be a Hyper-V bug, and I have pinged Hyper-V team many times... Here I was just trying to collect more information, which can help to resolve the issue thoroughly. I'll ping Hyper-V team again.

To be honest - I don't know if this is your initiative (made because you're engineer from hearth and want to fix it) or some task from your manager.

I know, that lack of support makes, that people look for alternatives (and don't return when they're better). They need solutions, not explanations or excuses.

From my side - it's not important what you will do (and I'm far away from being frustrated). Google, Intel, Microsoft... People can use today many other things.

Good luck!

PS. Right now this story here is more than one year old and this bug is still young, when you compare it to some bug from Google (where I was somehow involved and which was submitted on Mar 16, 2012).

@dcui
Copy link
Contributor

dcui commented Jul 28, 2020

Today I happened to find a server with the same host version "version 1809 OS bulid 17763.1282", so I did some quick tests with a newly-created Generation-1 Ubuntu 16.04 VM (4.15.0-112-generic) and the CPU type is:

cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
stepping : 1

I think I can reproduce shubell's observation: the VM's Xorg desktop window is not very responsive, e.g. after I right-click the desktop, the context menu pops up in near 1 second. Later, after I check "Hardware\Processor\Compatibility\Migration to a physical computer with a different processor version", the Xorg desktop becomes much more responsive, but I can still perceive that it's not 100% normal as it's supposed to be. I just reported the finding to Hyper-V team. BTW, I'm from the Linux team and we have no control over Hyper-V.

According to my tests, there is indeed a workaround if you can use Generation-2 VM: somehow the legacy EFI FB driver is not affected, so in a Generation-2 VM, we can blacklist the hyperv_fb driver and use the efifb driver, which is fast.

@pavel
Copy link

pavel commented Sep 12, 2020

so in a Generation-2 VM, we can blacklist the hyperv_fb driver and use the efifb driver, which is fast

This will also require a change of a boot manager (e.g. rEFInd) as setting a custom resolution for efifb is not as easy as it is for hyperv_fb.
What is the most concerning part for me is that I have several Gen 2 VMs running some older linux kernels and working in 1080p using hyperv_fb without any issues. It looks like this is a regression introduced somewhere in the hyperv_fb code.

@dcui
Copy link
Contributor

dcui commented Sep 12, 2020

What is the most concerning part for me is that I have several Gen 2 VMs running some older linux kernels and working in 1080p using hyperv_fb without any issues. It looks like this is a regression introduced somewhere in the hyperv_fb code.

So you are running a slow VM and some fast VMs with older Linux kernels on the same host at the same time, and hence you think the slow VM's kernel (which is newer) introduces the slowness?

If so, can you please share the host version info (please run "winver.exe" on the host) and the kernel version info of the slow VM and the fast VMs (please run "uname -a")? Also please clarify if it's a Gen1 or Gen2 VM, if you check "Hardware\Processor\Compatibility\Migration to a physical computer with a different processor version", and when you feel the slowness, are you using a text mode tty ternimal or a Xorg window.

I'm pretty sure this slowness is introduced by the host, not the guest, so I'm asking for the above info just in case the guest somehow makes the slowness worse in recent Linux kernels.

@pavel
Copy link

pavel commented Sep 26, 2020

Here's the setup:

Host: Version 1909 (OS Build 18363.1082)
Gen2 VM 1: Linux vm1 5.3.8-arch1-1
Gen2 VM 2: Linux vm2 5.8.10-arch1-1

vm1 operates completely normal both in tty and Xorg. vm2 however in Xorg has sluggish mouse movement and animations (e.g. opening a new tab in the browser), and in tty has noticeably slower rendering than vm1.
This is how it looks for vm2 when I run find /:
vm2-slow-tty
vm1 tty find / for comparison:
vm1-normal-tty
Both VMs are on the same host. Both VMs have "Migrate to physical computer with a different processor version" unchecked.
Both VMs used the exact same OS installation script and have video=hyperv_fb:1920x1080 kernel parameter set at boot.

So you are running a slow VM and some fast VMs with older Linux kernels on the same host at the same time, and hence you think the slow VM's kernel (which is newer) introduces the slowness?

Yes.

@dcui
Copy link
Contributor

dcui commented Sep 26, 2020

@pavel : Thanks for the detailed report! We'll take a look at the difference between 5.3.8 and 5.8.10.

@dcui
Copy link
Contributor

dcui commented Sep 29, 2020

I did some tests today in a similar environment with the same host build version, the same VM kernel versions, the same resolution 1920x1080, and I don't check "Migrate to physical computer with a different processor version". The environmental differences are: I'm using a Gen2 Ubuntu 20.04 (rather than Arch Linux) VM and I built the 5.3.8 and 5.8.10 kernel from the upstream stable kernel git repo. IMO these differences should not matter.

In my test, with v5.8.10 the framebuffer is faster than v5.3.8: 1) in the Xorg GUI environment it looks the framebuffer is only slightly faster in 5.8.10 than in v5.3.8; 2) in the case of text mode terminal (I used tty3 in my test), I run "wget 'https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/CREDITS?h=v5.9-rc7' -O test.txt
time cat test.txt", and the "cat" command takes 40+ seconds with v5.3.8, but only takes 3 seconds with v5.8.10. Note: with v5.8.10, the screen becomes very blurry (I'm not sure if we're able to improve this) when "cat" is printing the lines during the 3 seconds; with v5.3.8, "cat" takes a much longer time but the screen is basically not blurry.

@pavel: Can you also do the "cat" test with the same test.txt file against v5.3.8 vs. v5.8.10?

With both the kernels, the mouse movement is basically normal to me, and I don't experience any noticeable sluggishness.

Before the Hyper-V team fixes the slow framebuffer, for a Gen2 VM, the only easy workaround is to blacklist the hyperv_fb driver and use the legacy UEFI framebuffer (which happens to be fast) -- I undersand the drawback is that it seems impossible (?) to use a larger resolution.

Another possible workaround is to use VNC server (we need to run VNC viewer to connect to the VM via network) or xrdp (This is intergrated with Hyper-V Manager: see https://docs.microsoft.com/en-us/virtualization/community/team-blog/2018/20180228-sneak-peek-taking-a-spin-with-enhanced-linux-vms and microsoft/linux-vm-tools#106 (comment). I'm not sure how easy it's to make this work for Arch Linux. The links I shared are mainly for Ubuntu)

@pavel
Copy link

pavel commented Oct 2, 2020

Here're the results:

VM time cat test.txt
Gen2 VM 1: Linux vm1 5.3.8-arch1-1 37.17 secs
Gen2 VM 2: Linux vm2 5.8.10-arch1-1 2.7 secs

vm1
time_cat_vm1
vm2
tme_cat_vm2

@dcui
Copy link
Contributor

dcui commented Oct 2, 2020

Thanks, @pavel ! So your result is the same as mine.

Now I understood your earlier description:

Gen2 VM 1: Linux vm1 5.3.8-arch1-1
Gen2 VM 2: Linux vm2 5.8.10-arch1-1
vm1 operates completely normal both in tty and Xorg. vm2 however in Xorg has sluggish mouse movement and animations (e.g. opening a new tab in the browser), and in tty has noticeably slower rendering than vm1.

vm1 actually is not so "normal" as it takes too long (about 40 seconds) to print the text file to tty (vm2 only needs about 3 seconds). vm2 actually has faster rendering than vm1, though the contents of vm2's screen become unrecognizable in tty.

Not sure why in Xorg the mouse movement and animation are not sluggish to me -- it looks there is a little slowness, but it's not noticeable to me.

PS, I hate to say this, but we still have no update from Hyper-V team about the slow framebuffer issue. :-(

@pavel
Copy link

pavel commented Oct 2, 2020

Got a 3rd VM up.

Gen2 VM 3: Linux vm3 5.4.0-48-generic (Ubuntu 20.04.1 LTS)
video=hyperv_fb:1920x1080

No issues in neither Xorg nor tty. 😕

@tjleary75
Copy link

I am experiencing similar issues with Debian 10.2 - Looking forward to hearing from the hyper-v team.

@dcui
Copy link
Contributor

dcui commented Nov 18, 2020

It turns out to be a Linux bug that only happens when the VM runs on recent Hyper-V since sometime in 2018. I just posted a patch here: https://lkml.org/lkml/2020/11/17/2222 . Hopefully the fix will be in v5.10 and will be integrated into various Linux distros.

@dcui
Copy link
Contributor

dcui commented Nov 18, 2020

BTW, in a Gen-1 VM on recent Hyper-V since 2018, the legacy VRAM is also mapped uncacheable by default, so I can also perceive the slowness before the Hyper-V synthetic framebuffer driver "hyperv_fb" loads. To work around that slowness, we can use this kernel parameter "video=vesafb:mtrr:3", which tells the legacy framebuffer driver "vesafb" to map the legacy VRAM cacheable.

@tjleary75
Copy link

tjleary75 commented Nov 23, 2020

BTW, in a Gen-1 VM on recent Hyper-V since 2018, the legacy VRAM is also mapped uncacheable by default, so I can also perceive the slowness before the Hyper-V synthetic framebuffer driver "hyperv_fb" loads. To work around that slowness, we can use this kernel parameter "video=vesafb:mtrr:3", which tells the legacy framebuffer driver "vesafb" to map the legacy VRAM cacheable.

Hi dcui, If I need to set the screen resolution to 1920x1080, are there any other parameters needed in the above video statement ?

@dcui
Copy link
Contributor

dcui commented Nov 23, 2020

@tjleary75
In a Gen-1 VM, the legacy VGA device emulated by Hyper-V does not support 1920x1080 -- the highest supported resolution is 1600x1200. You can verify this by the grub "vbeinfo" command: https://linuxconfig.org/how-to-increase-tty-console-resolution-on-ubuntu-18-04-server .

To set the resolution to 1600x1200 in my Ubuntu 20.04.1 VM, I have the below 2 lines in my /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="maybe-ubiquity video=vesafb:mtrr:3"
GRUB_GFXMODE=1600x1200
(Run "update-grub && reboot" to make it take effect)

@pavel
Copy link

pavel commented Jan 4, 2021

No issues in my updated Gen2 VM 2: Linux vm2 5.10.4-arch2-1.

@kilves76
Copy link

Before, on Ubuntu Server console, "time cat 4mbfile.txt" would be 4+ minutes. Now, after
apt install linux-image-5.10.0-1008-oem ; update-grub
it is 10 seconds. On Ubuntu Desktop terminal it's only 0.4 seconds. Both using 1920x1080 resolution.
Thank you!

Somewhat related to this, found out the hard way that using Set-VMVideo is a must to get higher resolutions, wasted considerable time testing different video:vesa/uvesa options and wondering why nothing has any effect.
Set-VMVideo -VMName namehere -HorizontalResolution:1920 -VerticalResolution:1080 -ResolutionType Single

@dcui
Copy link
Contributor

dcui commented Feb 16, 2021

FYI: For Ubuntu 20.04, as I just checked, the latest linux-azure kernel Ubuntu-azure-5.4.0-1039.41 (Jan 18) still does not have the fix, but the generic 5.4 kernel Ubuntu-5.4.0-66.74 and the HWE kernel Ubuntu-hwe-5.8-5.8.0-44.50_20.04.1 already have the fix.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests