-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crash in vc4 drm_atomic_helper_wait_for_vblank #66
Comments
I've also seen this periodically and don't have any good ideas yet. |
This looks like a timing issue. So the first question would be: does the vc4 driver only exceed the timeout by a certain amount of time or is it possible that the driver would wait forever? I suggest to add a second wait_event_timeout() with timeout = 1 sec to drm_atomic_helper_wait_for_vblank in case the first call fails. If the second wait_event_timeout also fails it's possible that the abort condition is never met in some situations. But first of all we need a better scenario to reproduce the issue. |
I get the same output during boot with 4.10.0-next-20170303+ on a Raspberry Pi Zero. Used config: bcm2835_defconfig + CONFIG_CMA + CONFIG_DMA_CMA |
Okay, Michael's patch seems to fix this issue. |
I've queued this for various Fedora kernels, and it's in -next already but it might be worthwhile getting it tagged for stable@ |
Before sending to stable it would be necessary to identify the introducing commit, so we can add a fixes tag and name the relevant kernel versions. |
@lategoodbye not sure how that cursor patch would help, since the backtrace here is from a general setcrtc. |
Still seeing this in Fedora with 4.11 rc1. dmesg with drm.debug=0x1e attached. |
Still present with Fedora 4.11.0-0.rc6.git0.1.fc26.armv7h. Happens whenever vc4 is not blacklisted. |
@ryniker can you please provide details of how this is happening, it's hardware related so a "me too" is useless. Details of the monitor, the connecting interface etc are useful, and an updated crash output for the appropriate kernel too. |
@ryniker Could you please also provide the exact stacktrace and test the mentioned patch (which is only in linux-next)? |
There does seem to be a hardware issue here. My monitor is a Dell
P2415Qb connected by an HDMI cable to the Raspberry Pi. The monitor
wants to display 3840x2160, but of course it can accomodate lesser
resolution.
I tried with an older Viewsonic VX2235wm monitor that cannot support such
high resolution.. The kernel fault did not occur. To be more accurate,
this (issue #66) kernel fault did not occur; a different fault (not
related to the vc4 module) occurred at a later time.
The complete log for connection to the Dell monitor is available here:
http://ryniker.org/Fedora/arm/log_04
The first backtrace (it happens multiple times) is copied below.
The log for connection to the Viewsonic monitor is here:
http://ryniker.org/Fedora/arm/log_03
Peter, do you know if there is already a Fedora 26 build with this patch:
https://patchwork.kernel.org/patch/9589651/
I suspect to answer this question requires koji-fu beyond my ken.
I did experience delays when I used the Viewsonic monitor that matched
what [email protected] described as fixed by this patch.
…________________________________________________________________________________
Mar 01 16:43:27 rpi3-2 kernel: ------------[ cut here ]------------
Mar 01 16:43:27 rpi3-2 kernel: WARNING: CPU: 0 PID: 476 at drivers/gpu/drm/drm_atomic_helper.c:1122 drm_atomic_helper_wait_for_vblanks+0xdc/0x1e8 [drm_kms_helper]
Mar 01 16:43:27 rpi3-2 kernel: [CRTC:65] vblank wait timed out
Mar 01 16:43:27 rpi3-2 kernel: Modules linked in: vc4(+) snd_soc_core ac97_bus snd_pcm_dmaengine snd_seq snd_seq_device joydev snd_pcm snd_timer brcmfmac snd soundcore brcmutil drm_kms_helper cfg80211 drm rfkill fb_sys_fops syscopyarea sysfillrect sysimgblt bcm2835_rng bcm2835_dma bcm2835_wdt leds_gpio nfsd auth_rpcgss nfs_acl lockd grace sunrpc hid_logitech_hidpp hid_logitech_dj smsc95xx usbnet mii mmc_block dwc2 udc_core sdhci_iproc sdhci_pltfm sdhci pwm_bcm2835 bcm2835 mmc_core i2c_bcm2835
Mar 01 16:43:27 rpi3-2 kernel: CPU: 0 PID: 476 Comm: systemd-udevd Not tainted 4.11.0-0.rc7.git0.1.fc26.armv7hl #1
Mar 01 16:43:27 rpi3-2 kernel: Hardware name: Generic DT based system
Mar 01 16:43:27 rpi3-2 kernel: [<c0311700>] (unwind_backtrace) from [<c030c430>] (show_stack+0x18/0x1c)
Mar 01 16:43:27 rpi3-2 kernel: [<c030c430>] (show_stack) from [<c06394a4>] (dump_stack+0x80/0xa0)
Mar 01 16:43:27 rpi3-2 kernel: [<c06394a4>] (dump_stack) from [<c034c8b4>] (__warn+0xe4/0x104)
Mar 01 16:43:27 rpi3-2 kernel: [<c034c8b4>] (__warn) from [<c034c910>] (warn_slowpath_fmt+0x3c/0x4c)
Mar 01 16:43:27 rpi3-2 kernel: [<c034c910>] (warn_slowpath_fmt) from [<bf46d3f0>] (drm_atomic_helper_wait_for_vblanks+0xdc/0x1e8 [drm_kms_helper])
Mar 01 16:43:27 rpi3-2 kernel: [<bf46d3f0>] (drm_atomic_helper_wait_for_vblanks [drm_kms_helper]) from [<bf6087c8>] (vc4_atomic_complete_commit+0x5c/0xb4 [vc4])
Mar 01 16:43:27 rpi3-2 kernel: [<bf6087c8>] (vc4_atomic_complete_commit [vc4]) from [<bf608a20>] (vc4_atomic_commit+0x200/0x210 [vc4])
Mar 01 16:43:27 rpi3-2 kernel: [<bf608a20>] (vc4_atomic_commit [vc4]) from [<bf470b88>] (restore_fbdev_mode+0x98/0x268 [drm_kms_helper])
Mar 01 16:43:27 rpi3-2 kernel: [<bf470b88>] (restore_fbdev_mode [drm_kms_helper]) from [<bf471dd0>] (drm_fb_helper_restore_fbdev_mode_unlocked+0x38/0x70 [drm_kms_helper])
Mar 01 16:43:27 rpi3-2 kernel: [<bf471dd0>] (drm_fb_helper_restore_fbdev_mode_unlocked [drm_kms_helper]) from [<bf471e5c>] (drm_fb_helper_set_par+0x54/0x64 [drm_kms_helper])
Mar 01 16:43:27 rpi3-2 kernel: [<bf471e5c>] (drm_fb_helper_set_par [drm_kms_helper]) from [<c06d43dc>] (fbcon_init+0x2c8/0x44c)
Mar 01 16:43:27 rpi3-2 kernel: [<c06d43dc>] (fbcon_init) from [<c074597c>] (visual_init+0xc4/0x114)
Mar 01 16:43:27 rpi3-2 kernel: [<c074597c>] (visual_init) from [<c07471b4>] (do_bind_con_driver+0x26c/0x2d8)
Mar 01 16:43:27 rpi3-2 kernel: [<c07471b4>] (do_bind_con_driver) from [<c0747590>] (do_take_over_console+0x16c/0x1a0)
Mar 01 16:43:27 rpi3-2 kernel: [<c0747590>] (do_take_over_console) from [<c06d45b8>] (do_fbcon_takeover+0x58/0xc0)
Mar 01 16:43:27 rpi3-2 kernel: [<c06d45b8>] (do_fbcon_takeover) from [<c036c420>] (notifier_call_chain+0x48/0x6c)
Mar 01 16:43:27 rpi3-2 kernel: [<c036c420>] (notifier_call_chain) from [<c036c874>] (__blocking_notifier_call_chain+0x48/0x60)
Mar 01 16:43:27 rpi3-2 kernel: [<c036c874>] (__blocking_notifier_call_chain) from [<c036c8a8>] (blocking_notifier_call_chain+0x1c/0x24)
Mar 01 16:43:27 rpi3-2 kernel: [<c036c8a8>] (blocking_notifier_call_chain) from [<c06dd8e8>] (register_framebuffer+0x230/0x274)
Mar 01 16:43:27 rpi3-2 kernel: [<c06dd8e8>] (register_framebuffer) from [<bf471b18>] (drm_fb_helper_initial_config+0x16c/0x344 [drm_kms_helper])
Mar 01 16:43:27 rpi3-2 kernel: [<bf471b18>] (drm_fb_helper_initial_config [drm_kms_helper]) from [<bf47235c>] (drm_fbdev_cma_init_with_funcs+0xb8/0xf4 [drm_kms_helper])
Mar 01 16:43:27 rpi3-2 kernel: [<bf47235c>] (drm_fbdev_cma_init_with_funcs [drm_kms_helper]) from [<bf608ae4>] (vc4_kms_load+0x90/0xb0 [vc4])
Mar 01 16:43:27 rpi3-2 kernel: [<bf608ae4>] (vc4_kms_load [vc4]) from [<bf604400>] (vc4_drm_bind+0xe4/0x128 [vc4])
Mar 01 16:43:27 rpi3-2 kernel: [<bf604400>] (vc4_drm_bind [vc4]) from [<c0780a00>] (try_to_bring_up_master+0x1ec/0x254)
Mar 01 16:43:27 rpi3-2 kernel: [<c0780a00>] (try_to_bring_up_master) from [<c0781044>] (component_master_add_with_match+0xbc/0xe8)
Mar 01 16:43:27 rpi3-2 kernel: [<c0781044>] (component_master_add_with_match) from [<bf6044b8>] (vc4_platform_drm_probe+0x74/0xb0 [vc4])
Mar 01 16:43:27 rpi3-2 kernel: [<bf6044b8>] (vc4_platform_drm_probe [vc4]) from [<c07883cc>] (platform_drv_probe+0x58/0xa4)
Mar 01 16:43:27 rpi3-2 kernel: [<c07883cc>] (platform_drv_probe) from [<c0786520>] (driver_probe_device+0x274/0x40c)
Mar 01 16:43:27 rpi3-2 kernel: [<c0786520>] (driver_probe_device) from [<c0786740>] (__driver_attach+0x88/0xf8)
Mar 01 16:43:27 rpi3-2 kernel: [<c0786740>] (__driver_attach) from [<c07845fc>] (bus_for_each_dev+0x84/0x94)
Mar 01 16:43:27 rpi3-2 kernel: [<c07845fc>] (bus_for_each_dev) from [<c0785968>] (bus_add_driver+0x1bc/0x23c)
Mar 01 16:43:27 rpi3-2 kernel: [<c0785968>] (bus_add_driver) from [<c0787330>] (driver_register+0xa8/0xe8)
Mar 01 16:43:27 rpi3-2 kernel: [<c0787330>] (driver_register) from [<c0301d64>] (do_one_initcall+0x12c/0x154)
Mar 01 16:43:27 rpi3-2 kernel: [<c0301d64>] (do_one_initcall) from [<c044e240>] (do_init_module+0x60/0x1d8)
Mar 01 16:43:27 rpi3-2 kernel: [<c044e240>] (do_init_module) from [<c03d74c4>] (load_module+0x2120/0x21bc)
Mar 01 16:43:27 rpi3-2 kernel: [<c03d74c4>] (load_module) from [<c03d779c>] (SyS_finit_module+0xb0/0xc4)
Mar 01 16:43:27 rpi3-2 kernel: [<c03d779c>] (SyS_finit_module) from [<c0307f3c>] (__sys_trace_return+0x0/0x10)
Mar 01 16:43:27 rpi3-2 kernel: ---[ end trace 8a938154436ca048 ]---
|
We've had the "drm: vc4: Don't wait for vblank when updating the cursor" since March 1st, it's helped in a small subset of cases. |
I'm seeing the crash with a fairly generic Samsung UE50F6500 1080 TV. Looking through the text file there's 3-4 different back traces that all end up in the drm_atomic_helper_wait_for_vblanks crash. It also seems some what random, I wrong out a second identical card and booted it and it worked on this TV, same PRi3 etc. This image booted fine on a RPi3, then fine on a RPi2, and then crashed when put into the RPi3. Resetting it it then booted fine. It seems very timing based. |
With a RPi3 on the Samsung UE50F6500 1080 TV it's pretty much 50/50 whether we crash or it works. The alignment though is terrible. |
Today i was able to reproduce this (or a similiar) issue on my Raspberry Pi Zero with a HP ZR2440w and a 3 port HDMI switch. Scenario:
It never crashed the system, but the display stays black in this state. Unfortunately i wasn't able to reproduce it always (maybe 3 of 12 cases) and the kernel is a little bit older 4.11.0-rc5-next-20170404+. Here are the traces:
|
wait4(-2147483648, 0x20, 0, 0xdd0000) triggers: UBSAN: Undefined behaviour in kernel/exit.c:1651:9 The related calltrace is as follows: negation of -2147483648 cannot be represented in type 'int': CPU: 9 PID: 16482 Comm: zj Tainted: G B ---- ------- 3.10.0-327.53.58.71.x86_64+ #66 Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285 /BC11BTSA , BIOS CTSAV036 04/27/2011 Call Trace: dump_stack+0x19/0x1b ubsan_epilogue+0xd/0x50 __ubsan_handle_negate_overflow+0x109/0x14e SyS_wait4+0x1cb/0x1e0 system_call_fastpath+0x16/0x1b Exclude the overflow to avoid the UBSAN warning. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: zhongjiang <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: David Rientjes <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Xishi Qiu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
@nullr0ute Is this issue still reproducible with Fedora 29? |
This one seems to randomly pop up every now and again, we can probably close it but I've not idea if it's definitely fixed. |
Seeing this crash with Fedora 25 on GNOME workstation. The monitor is Haier HL22KN1.
Kernel is the Fedora 4.8.1-301 kernel which is vanilla with some upstream vc4/clock patches
The vc4/clocks patches on top of 4.8.4 are:
http://pkgs.fedoraproject.org/cgit/rpms/kernel.git/tree/bcm283x-vc4-fixes.patch?h=f25&id=08645910f608325614c3a3da1e517c51ee4b2b19
The text was updated successfully, but these errors were encountered: