Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use drm_fb_helper's emulation of Linux fbdev to integrate with FreeBSD's vt(4) #243

Merged
merged 10 commits into from
Nov 26, 2023

Conversation

dumbbell
Copy link
Member

@dumbbell dumbbell commented Mar 17, 2023

This patch series replaces our current vt(4) integration layer with a new one relying entirely on drm_fb_helper.c.

Why

In Linux 5.17, the amdgpu driver relies entirely on drm_fb_helper to handle its framebuffer. Indeed, the framebuffer-specific code is removed.

Also in Linux 5.17, something changed in the i915 driver. I don't know exactly what, but the memory region passed to vt(4) and vt_fb isn't the one displayed on the screen. Therefore, all writes made by vt_fb leads to no changes on the screen and the computer looks frozen (that's not the case). However, when I vt-switch to another tty, the screen is refreshed to show the content of the tty active before the switch. Switching tty is the only time vt_fb calls into DRM. There is enough evidence that the i915 driver now assumes that the framebuffer is accessed like drm_fb_helper does it, i.e. like a Linux fbdev device.

The symptoms are the same when we enable PSR in the i915 driver (enable_psr=1). It's currently turned off by default on FreeBSD, unlike Linux.

Unfortunately, our vt(4) integration is way more basic than than Linux' fbdev and doesn't reproduce the behavior of an fbdev device.

How

The branch contains two major parts:

  1. It uncomments the code in drm_fb_helper.c and related code in DRM drivers. This makes sure that the code compiles, even though it is not used at that point.
  2. It introduces a new vt(4) backend called vt_drmfb, compiled in drm.ko. This new backend is responsible for bridging the vt(4) callbacks (upper layer) and the fbdev callbacks (lower layer).

This work relies on patches to freebsd-src. They are available in the linuxkpi-updates-for-drm branch in my freebsd-src fork. Patches were submitted for reviews to Phabricator:

TODO

  • Check how kldunload(8) will behave.
  • Fix the nested locking issue between vtbuf_lock() in vt_flush() and WQ_EXEC_LOCK() in schedule_work() in linuxkpi (sleepable lock acquired inside a non-sleepable lock section).
  • Add support for vt(4) bitblt_image callback mask argument; there is no equivalent on the fbdev side.

This work is required by the Linux 5.17 backport (#236).

Fixes #17.

@dumbbell dumbbell added bug Something isn't working enhancement New feature or request labels Mar 17, 2023
@dumbbell dumbbell requested a review from evadot March 17, 2023 15:45
@dumbbell dumbbell self-assigned this Mar 17, 2023
@dumbbell dumbbell force-pushed the fully-use-drm_fb_helper-on-freebsd branch from 7b69765 to a4719f3 Compare March 20, 2023 20:15
@rhelmot
Copy link

rhelmot commented Mar 21, 2023

As mentioned in #236, there seem to be some issues with locking. The linuxkpi code wants to take a sleep lock and perform allocations in several places, which does not work well with the vtdev lock and window spinlocks.

The first issue that comes up when you try running the new system with witness enabled is that vd->vd_curwindow->vb_lock is held during vt_flush, and drm_fb_helper_damage tries to add to the linuxkpi work queue in order to send the damage signal.

After bluntly patching that, the next issue is that vt_drmfb_postswitch locks vtdev. This triggers a huge number of warnings as it performs allocations and eventually a reversal when the vt_upgrade sysinit is called and tries to lock vtdev while holding the giant lock.

After patching that, the final issue is once again in vt_flush, this time with calling bitblt.

Backtraces for the above:

panic: acquiring blockable sleep lock with spinlock or critical section held (sleep mutex) linux_wq_exec @ /root/freebsd-src/sys/compat/linuxkpi/common/src/linux_work.c:105
cpuid = 0
time = 1679351346
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015c42ca60
vpanic() at vpanic+0x152/frame 0xfffffe015c42cab0
panic() at panic+0x43/frame 0xfffffe015c42cb10
witness_checkorder() at witness_checkorder+0xc5c/frame 0xfffffe015c42ccd0
__mtx_lock_flags() at __mtx_lock_flags+0x94/frame 0xfffffe015c42cd20
linux_queue_work_on() at linux_queue_work_on+0x9e/frame 0xfffffe015c42cd60
vt_drmfb_drawrect() at vt_drmfb_drawrect+0x5b/frame 0xfffffe015c42cd90
vt_flush() at vt_flush+0x2c4/frame 0xfffffe015c42cdf0
vt_timer() at vt_timer+0xe/frame 0xfffffe015c42ce10
softclock_call_cc() at softclock_call_cc+0x14b/frame 0xfffffe015c42cec0
softclock_thread() at softclock_thread+0xc6/frame 0xfffffe015c42cef0
fork_exit() at fork_exit+0x80/frame 0xfffffe015c42cf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe015c42cf30
  Mar 21 02:47:49 daisy kernel: uma_zalloc_debug: zone "malloc-2048" with the following non-sleepable locks held:                                                                             
  Mar 21 02:47:49 daisy kernel: exclusive sleep mutex vtdev (vtdev) r = 0 (0xffffffff81aec5f8) locked @ /usr/home/audrey/proj/kernel/kernel/sys/dev/vt/vt_core.c:3057                         
  Mar 21 02:47:49 daisy kernel: stack backtrace:                                                                                                                                              
  Mar 21 02:47:49 daisy kernel: #0 0xffffffff80c5ff65 at witness_debugger+0x65                                                                                                                
  Mar 21 02:47:49 daisy kernel: #1 0xffffffff80c610b9 at witness_warn+0x3f9                                                                                                                   
  Mar 21 02:47:49 daisy kernel: #2 0xffffffff80f70f74 at uma_zalloc_debug+0x34                                                                                                                
  Mar 21 02:47:49 daisy kernel: #3 0xffffffff80f70a88 at uma_zalloc_arg+0x28                                                                                                                  
  Mar 21 02:47:49 daisy kernel: #4 0xffffffff80bc1bff at malloc+0x7f                                                                                                                          
  Mar 21 02:47:49 daisy kernel: #5 0xffffffff8532b8c0 at intel_atomic_state_alloc+0x20                                                                                                        
  Mar 21 02:47:49 daisy kernel: #6 0xffffffff854d93b4 at drm_client_modeset_commit_atomic+0x34                                                                                                
  Mar 21 02:47:49 daisy kernel: #7 0xffffffff854d95c4 at drm_client_modeset_commit_locked+0x74                                                                                                
  Mar 21 02:47:49 daisy kernel: #8 0xffffffff854d9751 at drm_client_modeset_commit+0x21                                                                                                       
  Mar 21 02:47:49 daisy kernel: #9 0xffffffff8551dd22 at __drm_fb_helper_restore_fbdev_mode_unlocked+0x82                                                                                     
  Mar 21 02:47:49 daisy kernel: #10 0xffffffff8551f554 at drm_fb_helper_set_par+0x54                                                                                                            Mar 21 02:47:49 daisy kernel: #11 0xffffffff8546d9f7 at intel_fbdev_set_par+0x17                                                                                                            
  Mar 21 02:47:49 daisy kernel: #12 0xffffffff8551d825 at vt_drmfb_init+0xe5                                                                                                                    Mar 21 02:47:49 daisy kernel: #13 0xffffffff80a358a8 at vt_replace_backend+0x118                                                                                                            
  Mar 21 02:47:49 daisy kernel: #14 0xffffffff80a3576b at vt_allocate+0x7b                                                                                                                    
  Mar 21 02:47:49 daisy kernel: #15 0xffffffff8551c900 at linux_register_framebuffer+0x150                                                                                                    
  Mar 21 02:47:49 daisy kernel: #16 0xffffffff8551ffd3 at __drm_fb_helper_initial_config_and_unlock+0x4d3                                                                                     
  Mar 21 02:47:49 daisy kernel: #17 0xffffffff8546cff0 at intel_fbdev_initial_config_async+0x20  
  Mar 21 02:47:49 daisy kernel: lock order reversal: (Giant after non-sleepable)                                                                                                              
  Mar 21 02:47:49 daisy kernel:  1st 0xffffffff81aec5f8 vtdev (vtdev, sleep mutex) @ /usr/home/audrey/proj/kernel/kernel/sys/dev/vt/vt_core.c:3057                                            
  Mar 21 02:47:49 daisy kernel:  2nd 0xffffffff81a02bc0 Giant (Giant, sleep mutex) @ /usr/home/audrey/proj/kernel/kernel/sys/compat/linuxkpi/common/src/linux_compat.c:2312                   
  Mar 21 02:47:49 daisy kernel: lock order Giant -> vtdev established at:                                                                                                                     
  Mar 21 02:47:49 daisy kernel: #0 0xffffffff80c5f26b at witness_checkorder+0x2fb                                                                                                             
  Mar 21 02:47:49 daisy kernel: #1 0xffffffff80bc8784 at __mtx_lock_flags+0x94                                                                                                                
  Mar 21 02:47:49 daisy kernel: #2 0xffffffff80a34e16 at vt_upgrade+0x386                                                                                                                     
  Mar 21 02:47:49 daisy kernel: #3 0xffffffff80b73a08 at mi_startup+0x1e8                                                                                                                     
  Mar 21 02:47:49 daisy kernel: #4 0xffffffff8038f023 at btext+0x23                                                                                                                           
  Mar 21 02:47:49 daisy kernel: lock order vtdev -> Giant attempted at:                                                                                                                       
  Mar 21 02:47:49 daisy kernel: #0 0xffffffff80c5fb23 at witness_checkorder+0xbb3                                                                                                             
  Mar 21 02:47:49 daisy kernel: #1 0xffffffff80bc8784 at __mtx_lock_flags+0x94                                                                                                                
  Mar 21 02:47:49 daisy kernel: #2 0xffffffff80e73693 at linux_wait_for_timeout_common+0x143                                                                                                  
  Mar 21 02:47:49 daisy kernel: #3 0xffffffff854cd110 at drm_atomic_helper_wait_for_flip_done+0x70                                                                                            
  Mar 21 02:47:49 daisy kernel: #4 0xffffffff8535729a at intel_atomic_commit_tail+0xdba                                                                                                       
  Mar 21 02:47:49 daisy kernel: #5 0xffffffff85351945 at intel_atomic_commit+0x345                                                                                                              Mar 21 02:47:49 daisy kernel: #6 0xffffffff854d94d8 at drm_client_modeset_commit_atomic+0x158                                                                                               
  Mar 21 02:47:49 daisy kernel: #7 0xffffffff854d95c4 at drm_client_modeset_commit_locked+0x74                                                                                                
  Mar 21 02:47:49 daisy kernel: #8 0xffffffff854d9751 at drm_client_modeset_commit+0x21                                                                                                       
  Mar 21 02:47:49 daisy kernel: #9 0xffffffff8551dd22 at __drm_fb_helper_restore_fbdev_mode_unlocked+0x82                                                                                     
  Mar 21 02:47:49 daisy kernel: #10 0xffffffff8551f554 at drm_fb_helper_set_par+0x54                                                                                                          
  Mar 21 02:47:49 daisy kernel: #11 0xffffffff8546d9f7 at intel_fbdev_set_par+0x17                                                                                                            
  Mar 21 02:47:49 daisy kernel: #12 0xffffffff8551d825 at vt_drmfb_init+0xe5                                                                                                                  
  Mar 21 02:47:49 daisy kernel: #13 0xffffffff80a358a8 at vt_replace_backend+0x118                                                                                                            
  Mar 21 02:47:49 daisy kernel: #14 0xffffffff80a3576b at vt_allocate+0x7b                                                                                                                    
  Mar 21 02:47:49 daisy kernel: #15 0xffffffff8551c900 at linux_register_framebuffer+0x150                                                                                                    
  Mar 21 02:47:49 daisy kernel: #16 0xffffffff8551ffd3 at __drm_fb_helper_initial_config_and_unlock+0x4d3                                                                                     
  Mar 21 02:47:49 daisy kernel: #17 0xffffffff8546cff0 at intel_fbdev_initial_config_async+0x20
  Mar 21 02:54:03 daisy kernel: panic: acquiring blockable sleep lock with spinlock or critical section held (sleep mutex) linux_wq_exec @ /usr/home/audrey/proj/kernel/kernel/sys/compat/linu  xkpi/common/src/linux_work.c:105                                                                                                                                                            
  Mar 21 02:54:03 daisy kernel: cpuid = 0                                                                                                                                                     
  Mar 21 02:54:03 daisy kernel: time = 1679392069                                                                                                                                             
  Mar 21 02:54:03 daisy kernel: KDB: stack backtrace:                                                                                                                                         
  Mar 21 02:54:03 daisy kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe010812b9b0                                                                                
  Mar 21 02:54:03 daisy kernel: vpanic() at vpanic+0x152/frame 0xfffffe010812ba00                                                                                                             
  Mar 21 02:54:03 daisy kernel: panic() at panic+0x43/frame 0xfffffe010812ba60                                                                                                                
  Mar 21 02:54:03 daisy kernel: witness_checkorder() at witness_checkorder+0xc5c/frame 0xfffffe010812bc20                                                                                     
  Mar 21 02:54:03 daisy kernel: __mtx_lock_flags() at __mtx_lock_flags+0x94/frame 0xfffffe010812bc70                                                                                          
  Mar 21 02:54:03 daisy kernel: linux_queue_work_on() at linux_queue_work_on+0x9e/frame 0xfffffe010812bcb0                                                                                    
  Mar 21 02:54:03 daisy kernel: vt_drmfb_bitblt_text() at vt_drmfb_bitblt_text+0x20a/frame 0xfffffe010812bd90                                                                                 
  Mar 21 02:54:03 daisy kernel: vt_flush() at vt_flush+0x433/frame 0xfffffe010812bdf0                                                                                                         
  Mar 21 02:54:03 daisy kernel: vt_timer() at vt_timer+0xe/frame 0xfffffe010812be10                                                                                                           
  Mar 21 02:54:03 daisy kernel: softclock_call_cc() at softclock_call_cc+0x14b/frame 0xfffffe010812bec0                                                                                       
  Mar 21 02:54:03 daisy kernel: softclock_thread() at softclock_thread+0xc6/frame 0xfffffe010812bef0                                                                                          
  Mar 21 02:54:03 daisy kernel: fork_exit() at fork_exit+0x80/frame 0xfffffe010812bf30                                                                                                        
  Mar 21 02:54:03 daisy kernel: fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe010812bf30                                                                                             
  Mar 21 02:54:03 daisy kernel: --- trap 0, rip = 0, rsp = 0, rbp = 0 --- 

Here's the patches I came up with. I have no idea what invariants exactly these locks are supposed to guard, so I cannot comment on their correctness :)

kernel:

diff --git a/sys/dev/vt/vt_core.c b/sys/dev/vt/vt_core.c
index 267dd7bf2489..80002b9c21f9 100644
--- a/sys/dev/vt/vt_core.c
+++ b/sys/dev/vt/vt_core.c
@@ -1406,17 +1406,23 @@ vt_flush(struct vt_device *vd)
                vd->vd_flags &= ~VDF_INVALID;
 
                a = teken_get_curattr(&vw->vw_terminal->tm_emulator);
+               /*
+                * The below calls may need to run a sleep lock in order to
+                * dispatch linuxkpi work
+                */
+               vtbuf_unlock(&vw->vw_buf);
                vt_set_border(vd, &vw->vw_draw_area, a->ta_bgcolor);
                vt_termrect(vd, vf, &tarea);
                if (vd->vd_driver->vd_invalidate_text)
                        vd->vd_driver->vd_invalidate_text(vd, &tarea);
                if (vt_draw_logo_cpus)
                        vtterm_draw_cpu_logos(vd);
+               vtbuf_lock(&vw->vw_buf);
        }
 
        if (tarea.tr_begin.tp_col < tarea.tr_end.tp_col) {
-               vd->vd_driver->vd_bitblt_text(vd, vw, &tarea);
                vtbuf_unlock(&vw->vw_buf);
+               vd->vd_driver->vd_bitblt_text(vd, vw, &tarea);
                return (1);
        }

drm-kmod:

diff --git a/drivers/gpu/drm/vt_drmfb.c b/drivers/gpu/drm/vt_drmfb.c
index 116121e14f..62ef106d0c 100644
--- a/drivers/gpu/drm/vt_drmfb.c
+++ b/drivers/gpu/drm/vt_drmfb.c
@@ -257,6 +257,7 @@ vt_drmfb_postswitch(struct vt_device *vd)
 {
        struct fb_info *fbio;
        struct linux_fb_info *info;
+       int locked = mtx_owned(&vd->vd_lock);
 
        fbio = vd->vd_softc;
        info = to_linux_fb_info(fbio);
@@ -266,8 +267,14 @@ vt_drmfb_postswitch(struct vt_device *vd)
        }
 
        if (!kdb_active && !KERNEL_PANICKED()) {
+               if (locked) {
+                       VT_UNLOCK(vd);
+               }
                linux_set_current(curthread);
                info->fbops->fb_set_par(info);
+               if (locked) {
+                       VT_LOCK(vd);
+               }
        } else {
 #ifdef DDB
                db_trace_self_depth(10);

@dumbbell
Copy link
Member Author

I'm not sure that releasing the window buffer can work in vt_flush(). I see two possible issues:

  1. While the buffer is actually being displayed, a running program can still prints characters on its stdout. However, the buffer was marked as "undirty" (i.e. its content was flushed to screen). Therefore, we might miss a change to the content which never reaches the screen.
  2. This increases the contention on the lock a lot which will slow down a program spamming its stdout.

What I'm experimenting with on my side is to change the lock type of the lock protecting the workqueue implementation in linuxkpi (a "default" mutex so far) to the same type as the one used in drm_fb_helper (a "spin" mutex). I'm not sure it is the perfect solution either though...

@rhelmot
Copy link

rhelmot commented Mar 22, 2023

Could you post the patch which does that so I can test it? I found some fairly intense graphical corruption switching to hikari while driving my patches so I suspect you may be right.

@dumbbell
Copy link
Member Author

@rhelmot: The computer crashes hard with my wip patch unfortunately.

I think the corruption you see in Hikari is unrelated. The vt(4) integration layer only drives the display of the console, the text terminals before starting an X.Org or Wayland session. Once under X.Org/Wayland, only the DRM drivers are used exclusively and there should be no corruption.

Note that with the 5.17 drivers on my Intel 12th gen GPU, Wayland (using Sway) is unusable for me (only garbage on the screen). It works with 5.15-lts however, so this corruption must be fixed at some point in 5.18+. I need to take a picture of the screen and share it to compare with what you see.

@dumbbell dumbbell force-pushed the fully-use-drm_fb_helper-on-freebsd branch from a4719f3 to 8d3c57e Compare March 30, 2023 20:12
@dumbbell dumbbell force-pushed the fully-use-drm_fb_helper-on-freebsd branch 2 times, most recently from 21ae6a8 to 82ffa79 Compare April 14, 2023 18:31
@Jose123456
Copy link

In case this helps: I have similar console problems in Void Linux with kernel 6.1 and the Nvidia proprietary driver. Switching to a new virtual console with ctrl-alt-fnum works, and scrolls normally until you get to the bottom of the screen. It then appears to freeze, but switching back and forth between consoles causes it to refresh.

@dumbbell dumbbell force-pushed the fully-use-drm_fb_helper-on-freebsd branch from 82ffa79 to 7908237 Compare August 5, 2023 20:22
@dumbbell
Copy link
Member Author

dumbbell commented Aug 5, 2023

Hello!

I've been away in July. I'm back home and slowly resume work on this vt(4) integration and the Linux 5.17 DRM drivers port.

After polishing things a bit today, including commit history and commit messages, I'm starting to use the 5.17 DRM drivers on a daily basis. This includes the new vt(4) integration. We'll see how stable it is or not.

Note that there is one thing left to work on: the support for masks in the bitblt operation. To display the mouse pointer in the console, FreeBSD uses a mask it passes to bitblt. Linux doesn't have such a thing. I think it supports an alpha channel but I'm not sure. Anyway, currently the mask is commented out in the new vt(4) integration which means the mouse pointer is never displayed for now.

@dumbbell dumbbell force-pushed the fully-use-drm_fb_helper-on-freebsd branch from 7908237 to 0cc6615 Compare August 6, 2023 09:17
@dumbbell
Copy link
Member Author

dumbbell commented Aug 6, 2023

I have added mouse pointer mask support this morning. The new vt(4) integration is complete and is now ready for wider testing. I admit I only tested it as part of the Linu 5.17 port so far.

Note that this new integration depends on changes committed to the linuxkpi-updates-for-drm branch in my freebsd-src fork.

Once I'm more confident with the quality, I will submit reviews to reviews.freebsd.org for the patchs to freebsd-src and update the pull request description with links to the reviews.

@dumbbell dumbbell marked this pull request as ready for review August 6, 2023 09:19
@dumbbell
Copy link
Member Author

@JustAnotherHumanBeing reported the following problem with the vt(4) integration in #236:

So far, I noticed only one problem after installing the linuxkpi-updates-for-drm kernel and the i915kms driver from the 5.17 branch. If I have the setting kern.vt.splash_cpu=1 in my loader.conf and the splash logos are still on the screen when it's time for i915kms to take over (it could be the case if kern.vt.splash_cpu_duration is set to the value that is not too small), then the logos will become invisible and the area of the screen that would otherwise have been occupied by the logos will become inaccessible by any tty-based app until the value of kern.vt.splash_cpu_duration has expired.

I just pushed some fixes to both freebsd-src and the update-to-v5.17 branch (which are meant for this pull request, just not had time to rebase yet). However, I didn't look at the issue reported above. I will look at it next.

@dumbbell dumbbell marked this pull request as draft August 11, 2023 07:52
@dumbbell
Copy link
Member Author

I found the problem with missing CPU logos. The nested loops in fb_fillrect() was incorrect and never drawn anything.

I'm still not satisfied with the overall patch though. Still doing some improvements before I push it again to GitHub.

@dumbbell dumbbell force-pushed the fully-use-drm_fb_helper-on-freebsd branch from 0cc6615 to 9a72b9b Compare August 12, 2023 12:27
@dumbbell
Copy link
Member Author

Ok, new commits pushed. Compared to last week, I changed the overall approach.

Initially, I changed vt(4) to support vd_pre_bitblt_text() and post_bitblt_text() callbacks to allow the driver to run code outside of the vt_buf lock. In the context of the integration with DRM, it allowed the new integration vt_drmfb to proceed with the display of the buffer after the console content was drawn to that buffer. That's the code that acquires default mutex and can't run inside a held spin mutex section.

This worked well. Unfortunately, it meant that the screen was actually refreshed once after the entire buffer was updated. With the old integration, the screen was refreshed after each displayed character. Because of that, showing a lot of content like cat text-file-with-thousands-of-lines.txt looked really sluggish, even though the overall time to display the content was the same. The reason was that the screen was refreshed less frequently and a lot of content changed at once.

The new approach is based on a flag set by the driver. If the driver sets this flag, the vt_flush() function in vt(4) uses its internal vd_drawn* arrays and a new vd_pos_to_flus array to dump everything that needs to be refreshed. It then unlocks vt_buf and calls the driver. Basically steps are split and executed in a slightly different order compared to vt_flush() with the flag set to false (the default).

This time, the driver can refresh the screen after each character and still run outside of the vt_buf-locked section.

When I compare the speed at which cat(1) can write that big text file to the console, the new integration is a couple percent faster. We are talking 1.35 seconds with the old integration vs. 1.30 seconds with the new one for a file with 500,000 lines on my Intel 12th gen laptop.

The problem reported by @JustAnotherHumanBeing is also fixed.

I'm going to mark this pull request as ready to review again.

@dumbbell dumbbell marked this pull request as ready for review August 12, 2023 14:24
@dumbbell dumbbell marked this pull request as draft August 20, 2023 08:53
@dumbbell dumbbell force-pushed the fully-use-drm_fb_helper-on-freebsd branch 2 times, most recently from a7b55ec to 499e1a0 Compare August 24, 2023 19:05
@dumbbell
Copy link
Member Author

dumbbell commented Aug 24, 2023

I believe the instabilities I saw with the amdgpu driver (both with master and update-to-v5.17 are now solved, thanks to the commit titled "vt(4) integration: Move fictitious range (un)register out of (un)register_framebuffer()".

I'm not marking this pull request as ready for review yet. With the change of GFP_KERNEL from M_WAITOK to M_NOWAIT, it broke the iwlwifi driver's suspend & resume. I'm working on fixing this.

@dumbbell dumbbell force-pushed the fully-use-drm_fb_helper-on-freebsd branch from 499e1a0 to 28337ab Compare August 26, 2023 20:27
@dumbbell
Copy link
Member Author

The iwlwifi driver's suspend & resume support is not yet implemented, so my change to GFP_KERNEL didn't make it any worse.

I pushed another change that make it possible to kldunload(8) the i915kms driver. I didn't try with the amdgpu yet. The screen remains black afterwards though.

@dumbbell dumbbell marked this pull request as ready for review September 2, 2023 12:18
@dumbbell
Copy link
Member Author

dumbbell commented Sep 2, 2023

I'm using the 5.17 update on a daily basis, which includes this new vt(4) integration. I believe it's ready for wider testing and review!

I didn't submit reviews for the patches to freebsd-src yet. Before I do that, I would love some feedback on the overall approach :-)

@clausecker
Copy link

Could this rewrite fix some issues I have with the nvidia driver and waking up from suspend to RAM (screen comes back up blank or all garbled, but if I start a new X server, that X server works (while previously started X servers do not)).

@dumbbell
Copy link
Member Author

I don't think it will fix anything related to the NVIDIA proprietary driver. I'm not sure it relies on DRM in the first place.

@t-zuehlsdorff
Copy link

I really would like to test on my Framework Gen 13 Laptop.

Is there any guide how to do this?

@dumbbell
Copy link
Member Author

dumbbell commented Oct 2, 2023

I really would like to test on my Framework Gen 13 Laptop.

Is there any guide how to do this?

You could look at #236, but I doubt Intel 13th gen was already supported in Linux 5.17.

@dumbbell dumbbell marked this pull request as draft October 20, 2023 15:43
@dumbbell
Copy link
Member Author

I'm marking this pull request as a draft again. There are regressions around the interaction with kdb/ddb.

I just discovered that drm_fb_helper exposes two dedicated functions as the fb_debug_enter and fb_debug_leave fbdev callbacks. I'm experimenting with them. This will require small changes to vt_core.c in base as well.

In the process, I stumble upon a problem I saw in the past (but didn't have a solution for): drm_fb_helper wants to use a callbacks structure which is NULL for the i915 driver. I wonder how this works on Linux because I can't find a place where the driver set it.

@dumbbell
Copy link
Member Author

The fb_debug_{enter,leave} lead is not the correct one. I was surprised the code seems to be missing from i915, but in fact, it's on purpose:

commit 9c79e0b1d096c06e4b012a32dddb04be0ce9716e
Author: Daniel Vetter <[email protected]>
Date:   Mon Apr 3 10:32:59 2017 +0200

    drm/fb-helper: Give up on kgdb for atomic drivers
    
    It just doesn't work. It probably stopped working way, way before that
    (e.g. i915 grabbed random mutexes all over in modeset code at least
    since gen6), but with atomic and all the ww_mutex stuff it's indeed
    hopeless.
    
    Remove ->mode_set_base_atomic from the 2 atomic drivers (i915 and
    nouveau) that still had one (both had dummy implementations already
    anyway), and shunt atomic drivers in the helpers debug_enter/leave
    functions.
(...)

See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9c79e0b1d096c06e4b012a32dddb04be0ce9716e.

[Why]
We may need that memory early enough that the async initialization may
not be ready yet.

[How]
We stop using a SYSINIT() function and perform the initialization from
`i915_init()` instead.
…ster_framebuffer()

[Why]
To correctly register the VRAM memory fictitious region, we relied on
the fact that the region address and size was defined in
`fb_info->apertures->ranges[0]`. The values were set by the driver when
they initialized their framebuffer.

Really, this was a by-product of the framebuffer initialization and
the region must be registered regardless of that. Likewise for the
accompagnying call to `vt_freeze_main_vd()`.

Starting with Linux 5.17, the `amdgpu` doesn't fill the
`fb_info->apertures->ranges[0]` values anymore anyway. It starts to use
the generic framebuffer code which doesn't use them.

Therefore, we need to have a FreeBSD-specific setup of that memory
region outside of the framebuffer initialization/vt(4) integration code.

[How]
A new pair of functions is introduced in `drm_os_freebsd.c` to register
and unregister the memory region, as well as calling
`vt_(un)freeze_main_vd()` accordingly.

The drivers need to call explicitly the new functions. We can't rely on
a side effect of the framebuffer initialization code.

As a bonus, the calls to vt_(un)freeze_main_vd()` are now symetrical
everywhere. Before this change, drivers called `vt_unfreeze_main_vd()`
at different places.
... except the code related specifically to sysrq and framebuffer
deferred I/O.

[Why]
In Linux 5.17, the amdgpu driver relies entirely on `drm_fb_helper` to
handle its framebuffer. Indeed, the framebuffer-specific code is
removed.

Also in Linux 5.17, something changed in the i915 driver. I don't know
exactly what, but the memory region passed to vt(4) and `vt_fb` isn't
the one displayed on the screen. Therefore, all writes made by `vt_fb`
leads to no changes on the screen and the computer looks frozen (that's
not the case). However, when I vt-switch to another tty, the screen is
refreshed to show the content of the tty active *before* the switch.
Switching tty is the only time `vt_fb` calls into DRM. There is enough
evidence that the i915 driver now assumes that the framebuffer is
accessed like `drm_fb_helper` does it, i.e. like a Linux fbdev device.

Unfortunately, our vt(4) integration is way way simpler than than Linux'
fbdev.

The goal of this commit and following ones is to change that vt(4)
integration layer to behave like fbdev.

[How]
The first step is uncomment everything in `drm_fb_helper` to make sure
that the API itself is ok.

`drm_fb_helper` depends on lower-level functions, especially
`fb_sys_{read,write}` and `fb_cfb_{read,write}` which are responsible
for all reads and writes to the allocated memory region for the
framebuffer. In this commit, they are simple stubs. They will be
implemented in a future commit.

As mentionned at the top, two areas are left commentted out:
* sysrq, a Linux specific mechanism to interact with the kernel at a
  very low level. There is something close in FreeBSD, but we don't need
  to worry about it in `drm_fb_helper.
* framebuffer deferred I/O, a framebuffer feature to flush the memory
  region asynchronously. We can't really implement that on FreeBSD and I
  believe we don't need to.
…r_framebuffer()`

[Why]
Before, it was scattered in several functions in `drm_fb_helper.c`,
enclosed in `#ifdef __FreeBSD__`. It made it difficult to follow how the
structure was initialized.

Now, everything is centralized in a FreeBSD-specific source file.

`register_framebuffer()` would redo the job for several struct fields
anyway.

[How]
Everything is derived from Linux' `struct fb_info`, like before.

Only a couple fields are initialized from `struct drm_fb_helper`. This
means the lower-level `register_framebuffer()` functions knows about the
upper DRM layer.
…elper`

[Why]
The goal is to make the framebuffer behavior closer to what happens on
Linux.

One major difference is that on FreeBSD, we used to pass the screen base
address to `vt_fb` during initialization, then `vt_fb` would write
directly to that memory region. This worked fine for years.
Unfortunately, this broke somehow with the backport of DRM from Linux
5.17. At that point, the screen was not refreshed anymore, even though
`vt_fb` wrote pixels to the correct memory region.

The only time the screen was refreshed, it was during a vt-switch. The
screen then displayed the content of the tty we just switched away, but
still, something happened on the screen. And that's the only operation
we called into `drm_fb_helper` from `vt_fb`.

On Linux, `fbcon` calls into `fbdev`, emulated by `drm_fb_helper` for
each access (read & write) to the framebuffer. It never reads or writes
to the memory region on its own.

[How]
Therefore we need a new integration layer to make sure vt(4) calls into
`drm_fb_helper` like Linux' fbcon does. We do this by adding a new vt(4)
backend called `vt_drmfb`, which replaces `vt_fb`.

`vt_drmfb` bridges between:
* DRM's `drm_fb_helper` emulation of Linux' fbdev below, and
* vt(4) console handling above

In other words, it maps fbdev callbacks to vt(4)'s backend expected
callbacks.

`vt_drmfb` takes care of converting the vt(4) callbacks arguments to
whatever is expected by the underlying fbdev emulation. Then, when it's
time to write to the actual framebuffer memory, the implementation of
`cfb_*` functions are derived from the `vt_fb` "setpixel" code.

`vt_drmfb` is maintained inside drm-kmod and is compiled inside drm(4).
It is not under `sys/dev/vt/hw`. The reason is that `vt_drmfb` needs to
know the `drm_fb_helper` API and must evolve with it.

`vt_drmfb` has another difference: it doesn't use a taskqueue(9) to call
into `drm_fb_helper as `vt_fb` did. It performs the calls synchronously.
`drm_fb_helper` already uses a Linux workqueue to make updates
asynchronous with the underlying driver. This simplifies the integration
a lot because the whole `fb_mode_task`-related machinery is removed from
`drm_os_freebsd.c` and `linux_fb.c`.

Because we don't need that taskqueue(9) anymore, this patch also gets
rid of the `struct vt_kms_softc` indirection. Now, the `struct
drm_fb_helper` pointed is stored directly into FreeBSD's `struct
fb_info->fb_priv`.
…f one is already installed

[Why]
Some DRM drivers consider a failure from `registry_framebuffer()` as a
fatal one.

Also in the future, we may want to support multiple framebuffers at the
same time, like Linux.
... instead of FreeBSD's `struct fb_info`.

[Why]
With the introduction of `vt_drmfb`, we try to use `struct
linux_fb_info` for every parameters and all callbacks. This is to make
sure we use the same values as Linux, just in case we forget to update
the corresponding FreeBSD's `struct fb_info`.
[Why]
At least in i915, it is NULL in my tests because the code behind it was
removed a long time ago. We didn't execute this function before the
introduction of vt_drmfb, so it didn't cause any problems.
[Why]
With vt_drmfb, I believe they are not needed anymore.
@dumbbell dumbbell force-pushed the fully-use-drm_fb_helper-on-freebsd branch from 28337ab to 63c6d03 Compare November 24, 2023 16:57
@dumbbell
Copy link
Member Author

dumbbell commented Nov 24, 2023

I think the new vt(4) integration is good enough for the time being w.r.t. panic handling. At least it's not worse than the current one in my testing.

We can try to improve it iteratively, but it shouldn't block 5.17 anymore IMHO. I'm marking this pull request as ready to review again.

@dumbbell dumbbell marked this pull request as ready for review November 24, 2023 17:06
@wulf7 wulf7 merged commit 0c5621b into freebsd:master Nov 26, 2023
1 check failed
@dumbbell dumbbell deleted the fully-use-drm_fb_helper-on-freebsd branch November 27, 2023 09:53
freebsd-git pushed a commit to freebsd/freebsd-src that referenced this pull request Dec 7, 2023
This change seems to break some drivers such as the mlx5*(4) drivers.

As kib@ says:
> According to the 'official' Linux kernel documentation, the GFP_KERNEL
> flag implies sleepable context.

It was introduced while working on the new vt(4)/DRM integration [1].
During this work, doing sleepable allocations broke vt(4) and the DRM
drivers. However, I made further improvements and some locking-related
fixed to the new integration without revisiting the need for it.

After more testing, the improvements to the integration mentionned above
seems to make the change to `GFP_KERNEL` unneeded now. I can thus
revert it to restore expectations of other drivers.

This reverts commit 14dcd40.

[1] freebsd/drm-kmod#243

Reviewed by:	kib
Approved by:	kib
Differential Revision:	https://reviews.freebsd.org/D42962
bsdjhb pushed a commit to bsdjhb/cheribsd that referenced this pull request Mar 5, 2024
This change seems to break some drivers such as the mlx5*(4) drivers.

As kib@ says:
> According to the 'official' Linux kernel documentation, the GFP_KERNEL
> flag implies sleepable context.

It was introduced while working on the new vt(4)/DRM integration [1].
During this work, doing sleepable allocations broke vt(4) and the DRM
drivers. However, I made further improvements and some locking-related
fixed to the new integration without revisiting the need for it.

After more testing, the improvements to the integration mentionned above
seems to make the change to `GFP_KERNEL` unneeded now. I can thus
revert it to restore expectations of other drivers.

This reverts commit 14dcd40.

[1] freebsd/drm-kmod#243

Reviewed by:	kib
Approved by:	kib
Differential Revision:	https://reviews.freebsd.org/D42962
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Console not refreshing
6 participants