Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pi 4 - bcm2835-codec - Driver leaks buffers because frame completion might be too aggressive #3325

Open
nemosupremo opened this issue Nov 8, 2019 · 15 comments

Comments

@nemosupremo
Copy link

Describe the bug
I have a Pi 4 running WPE Webkit v0.22.5 on Raspbian Buster. I am playing videos, in a loop, on the HTML5 browser. WPE Webkit uses gstreamer internally which uses the v4l2 stack to play hardware accelerated video. Every time the video stops running, the driver leaks 2 buffers with the message bcm2835-codec: bcm2835_codec_stop_streaming: Timeout waiting for buffers to be returned - 178 outstanding. Eventually when there are ~6000 buffers outstanding, the device becomes unstable and refuses to play anymore video. Kernel timeouts also start to happen.

To reproduce
There is no straightforward way to reproduce the issue without building WPEWebkit / Cog / Weston, which takes hours. I tried to reproduce the issue with gst-play-1.0, but I wasn't able to reproduce it.

Expected behaviour
The buffers are not leaked.

Actual behaviour
Buffers are leaked

System

  • Which model of Raspberry Pi? Pi 4
  • Which OS and version (cat /etc/rpi-issue)?
Raspberry Pi reference 2019-09-26
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 80d486687ea77d31fc3fc13cf3a2f8b464e129be, stage2
  • Which firmware version (vcgencmd version)?
Nov  1 2019 14:39:41 
Copyright (c) 2012 Broadcom
version 2faac616a157856a1d205f3a9713a7130defa643 (clean) (release) (start)
  • Which kernel version (uname -a)? Linux raspberry 4.19.81-v7l+ #1276 SMP Fri Nov 1 14:23:48 GMT 2019 armv7l GNU/Linux

Logs

[12231.713321] WARNING*: CPU: 3 PID: 672 at drivers/media/common/videobuf2/videobuf2-core.c:1688 __vb2_queue_cancel+0x158/0x1c8 \[videobuf2_common]*

\[12231.713329] Modules linked in*: binfmt_misc tun ctr aes_arm_bs crypto_simd cryptd ccm bnep hci_uart btbcm serdev bluetooth ecdh_generic 8021q garp stp llc arc4 rt2800usb rt2800lib rt2x00usb rt2x00lib mac80211 crc_ccitt vc4 v3d drm_kms_helper gpu_sched brcmfmac drm brcmutil drm_panel_orientation_quirks snd_soc_core snd_compress snd_pcm_dmaengine sha256_generic syscopyarea sysfillrect sysimgblt fb_sys_fops raspberrypi_hwmon hwmon snd_bcm2835(C) cfg80211 snd_pcm snd_timer rfkill snd bcm2835_codec(C) v4l2_mem2mem bcm2835_v4l2(C) bcm2835_mmal_vchiq(C) videobuf2_dma_contig v4l2_common videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media vc_sm_cma(C) rpivid_mem uio_pdrv_genirq uio fixed ip_tables x_tables ipv6*

\[12231.713851] CPU*: 3 PID: 672 Comm: ositorWorkQueue Tainted: G WC 4.19.80-v7l+ #1275*

\[12231.713857] Hardware name*: BCM2835*

\[12231.713886] *\[<c0212d10>] (unwind_backtrace) from \[<c020d530>] (show_stack+0x20/0x24)*

\[12231.713902] *\[<c020d530>] (show_stack) from \[<c0980b60>] (dump_stack+0xd4/0x118)*

\[12231.713917] *\[<c0980b60>] (dump_stack) from \[<c0222314>] (__warn+0x104/0x11c)*

\[12231.713932] *\[<c0222314>] (__warn) from \[<c0222464>] (warn_slowpath_null+0x50/0x58)*

\[12231.713960] *\[<c0222464>] (warn_slowpath_null) from \[<bf232104>] (__vb2_queue_cancel+0x158/0x1c8 \[videobuf2_common])*

\[12231.714003] *\[<bf232104>] (__vb2_queue_cancel \[videobuf2_common]) from \[<bf232ac0>] (vb2_core_streamoff+0x44/0xb4 \[videobuf2_common])*

\[12231.714040] *\[<bf232ac0>] (vb2_core_streamoff \[videobuf2_common]) from \[<bf24395c>] (vb2_streamoff+0x40/0x60 \[videobuf2_v4l2])*

\[12231.714072] *\[<bf24395c>] (vb2_streamoff \[videobuf2_v4l2]) from \[<bf2b0c9c>] (v4l2_m2m_streamoff+0x3c/0xf4 \[v4l2_mem2mem])*

\[12231.714102] *\[<bf2b0c9c>] (v4l2_m2m_streamoff \[v4l2_mem2mem]) from \[<bf2b0d74>] (v4l2_m2m_ioctl_streamoff+0x20/0x24 \[v4l2_mem2mem])*

\[12231.714177] *\[<bf2b0d74>] (v4l2_m2m_ioctl_streamoff \[v4l2_mem2mem]) from \[<bf1a7e68>] (v4l_streamoff+0x28/0x2c \[videodev])*

\[12231.714295] *\[<bf1a7e68>] (v4l_streamoff \[videodev]) from \[<bf1aa7a0>] (__video_do_ioctl+0x268/0x528 \[videodev])*

\[12231.714408] *\[<bf1aa7a0>] (__video_do_ioctl \[videodev]) from \[<bf1ae42c>] (video_usercopy+0x218/0x600 \[videodev])*

\[12231.714518] *\[<bf1ae42c>] (video_usercopy \[videodev]) from \[<bf1ae834>] (video_ioctl2+0x20/0x24 \[videodev])*

\[12231.714627] *\[<bf1ae834>] (video_ioctl2 \[videodev]) from \[<bf1a6144>] (v4l2_ioctl+0x4c/0x60 \[videodev])*

\[12231.714693] *\[<bf1a6144>] (v4l2_ioctl \[videodev]) from \[<c03c4e20>] (do_vfs_ioctl+0xbc/0x804)*

\[12231.714711] *\[<c03c4e20>] (do_vfs_ioctl) from \[<c03c55ac>] (ksys_ioctl+0x44/0x6c)*

\[12231.714727] *\[<c03c55ac>] (ksys_ioctl) from \[<c03c55ec>] (sys_ioctl+0x18/0x1c)*

\[12231.714742] *\[<c03c55ec>] (sys_ioctl) from \[<c0201000>] (ret_fast_syscall+0x0/0x28)*

\[12231.714750] *Exception stack(0xd27a1fa8 to 0xd27a1ff0)*

\[12231.714760] 1fa0*: a8d23978 a4cc5000 00000016 40045613 a05d6ab8 b237d510*

\[12231.714771] 1fc0*: a8d23978 a4cc5000 a4cc57d0 00000036 a8d15e10 b2f89000 b2f8a92c b2e96f18*

\[12231.714779] 1fe0*: b2816c60 a64fe7bc a4c8d954 b237d51c*

\[12231.714788] *---\[ end trace c769d38881409aa5 ]---*

\[12231.714799] videobuf2_common*: driver bug: stop_streaming operation is leaving buf d59a6c09 in active state*

\[12231.714811] videobuf2_common*: driver bug: stop_streaming operation is leaving buf 3cf2bfe1 in active state*

\[12275.314073] bcm2835-codec bcm2835-codec: bcm2835_codec_stop_streaming: Timeout waiting for buffers to be returned - 178 outstanding

\[12275.314260] *------------\[ cut here ]------------*

Additional context
I was able to alleviate issue by increasing the wait_for_completion_timeout to 10 * HZ. I've been running the system for several days now and I no longer see those messages.

@moritzvieli
Copy link

We see the same error message when trying to seek with a fairly simple pipeline:
uridecodebin ! queue ! kmssink

The decoder v4l2h264dec seems to be the issue.

Please also see this thread on gstreamer-devel:
http://gstreamer-devel.966125.n4.nabble.com/Seeking-with-kmssink-does-not-work-td4692965.html

Seeking is currently not possible on the Raspberry Pi 4. Would be really cool to get some feedback here.

@nemosupremo
Copy link
Author

I recently had to re-apply this patch on a couple of devices I had upgraded. For anyone finding this, I'm building an image via pi-gen and here's what I'm doing (you can probably also use it a on a running device with Raspbian):

firmware=$(zgrep "firmware as of" \
 "/usr/share/doc/raspberrypi-kernel/changelog.Debian.gz" | \
 head -n1 | sed  -n 's|.* \([^ ]*\)$|\1|p')
kernel="$(curl -s -L "https://github.com/raspberrypi/firmware/raw/$firmware/extra/git_hash")"
# We only care about the Pi4 here
uname="$(curl -s -L "https://github.com/raspberrypi/firmware/raw/$firmware/extra/uname_string7l")"
KVER="$(echo ${uname} | grep -Po '\b(Linux version )\K(?<price>[^\ ]+)' | cat)"

mkdir -p /tmp/linux
cd /tmp/linux
git init
git remote add origin https://github.com/raspberrypi/linux.git
git fetch --depth 1 origin ${kernel}
git reset --hard FETCH_HEAD
KERNEL=kernel7l
ARCH=arm
make bcmrpi_defconfig prepare
sed -i 's/wait_for_completion_timeout(&ctx->frame_cmplt, HZ)/wait_for_completion_timeout(\&ctx->frame_cmplt, 10 * HZ)/g' drivers/staging/vc04_services/bcm2835-codec/bcm2835-v4l2-codec.c
cd drivers/staging/vc04_services/bcm2835-codec/
make -C /tmp/linux M=$(pwd) modules
cp bcm2835-codec.ko /lib/modules/${KVER}/kernel/drivers/staging/vc04_services/bcm2835-codec/bcm2835-codec.ko
mkdir -p /lib/modules/${KVER}/extra
cp bcm2835-codec.ko /lib/modules/${KVER}/extra/bcm2835-codec.ko

@moritzvieli
Copy link

@nemosupremo Wow! You're my hero! This fix works! We will add it to our pi-gen scripts as well. Thanks a lot for sharing! 👍

@6by9
Copy link
Contributor

6by9 commented Sep 20, 2020

OK, I'll bump the timeout up a bit. It shouldn't take that long to flush though, so something is a bit odd.

GStreamer 1.16 includes a couple of fixes to avoid allocating a stupid number of buffers (basically filling memory) after a seek.
Likewise #3790 fixed up the kernel driver to deal with this reallocating of a different number of buffers at streamon to the number reported in queue_setiup.

@ndufresne
Copy link
Contributor

GStreamer 1.16 includes a couple of fixes to avoid allocating a stupid number of buffers (basically filling memory) after a seek.

hmm, right, sorry about that, we didn't have codec with CREATE_BUFS support when 1.14 came out, and never notice that one.

@6by9
Copy link
Contributor

6by9 commented Sep 21, 2020

hmm, right, sorry about that, we didn't have codec with CREATE_BUFS support when 1.14 came out, and never notice that one.

No issue at all.
I had the benefit of having GStreamer (and FFmpeg) provide some level of support whilst developing the codec driver. With the state of the documentation when I started, had I had no userspace implementation I would have been struggling a lot more.

AFAIK What GStreamer was doing was valid within the V4L2 API spec, just unexpected. I expected the number of buffers specified in queue_setup to be the number passed back in start_streaming, but it wasn't in this case.

@hailfinger
Copy link
Contributor

I have a similar (maybe even identical) problem with latest GStreamer 1.19 and latest Kernel 5.10.17 in bug #4194 .

@jessesung
Copy link
Contributor

Hi,

I'm still able to reproduce the issue with the current 2-second timeout, and this looks like a firmware issue IMHO.

  1. bcm2835-v4lc-codec.c:bcm2835_codec_stop_streaming() waits on ctx->frame_cmplt, which in turn depends on bcm2835-v4lc-codec.c:op_buffer_cb()/ip_buffer_cb() to trigger the complete event.
  2. while we expect bcm2835-v4lc-codec.c:op_buffer_cb()/ib_buffer_cb() to be called by mmal-vchiq.c:buffer_work_cb(), the buffer_work_cb() doesn't get called although there are still buffers hanging around.
  3. Seems mmal-vchiq.c:buffer_work_cb() would only get called when there's a message delivered from the firmware.

Also tried with timeout as 10, 20, and even 30 seconds, and they don't really help with the issue.

Tested with some newer version firmware, the issue can still be reproduced:

  • Commit ac362357ef9
  • Commit 7208c3d557c
  • Commit afdb60cc1a8
  • Commit 1a468744941
  • Commit 8717598c7eb

@6by9
Copy link
Contributor

6by9 commented Sep 14, 2021

There is no leak when that WARN triggers. Videobuf2 cleans up all the buffers that are in the wrong state on behalf of the driver.
https://elixir.bootlin.com/linux/latest/source/drivers/media/common/videobuf2/videobuf2-core.c#L1950

@timemaster5
Copy link

Important to mention is the Raspberry Pi version.

I've been looking into this issue for a while and found out some people have resolved this either by upgrading firmware/kernel or by increasing the timeout in the codecs driver or increasing the overvoltage settings. And some other people have never resolved this issue in any of the mentioned ways.

The difference between them is that the problem seems to be resolvable on Rpi 4 but not on Rpi3+ or Compute Module 3+.

I am sure this is why this bug persisted opened for two years. It could even be a hardware issue, but when I use userland, all is fine. So I guess it is only some undocumented quirk of the BRCM chip.

@ibash
Copy link

ibash commented Dec 27, 2021

Is this the same error?

https://gist.github.com/ibash/bdbdfac2a1b46a406f681460e122aadc

On a pi4:

$ uname -a
Linux duet3 5.10.63-v7l+ #1496 SMP Wed Dec 1 15:58:56 GMT 2021 armv7l GNU/Linux
$ cat /etc/rpi-issue
Raspberry Pi reference 2021-07-12
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, d59f5a5313d15919e7a438ca3a1bfa2689effcd5, stage-dsf

vcgencmd version is unresponsive

@6by9
Copy link
Contributor

6by9 commented Dec 28, 2021

#4765 should deal with the leak, but not merged yet.

@6by9
Copy link
Contributor

6by9 commented Dec 28, 2021

@ibash It's the same error being reported, but for different reasons.
bcm2835-v4l2-0: port_disable failed, error -62 implies something more significant went wrong on the firmware side, and with the camera stack rather than the codec.
Best to open a new issue, but please include information on what format you were streaming with, which camera module, and which variant of Pi.

@timemaster5
Copy link

Hi @ibash and @6by9 , thank's for reporting. My understanding is that it is the same error. It just doesn't matter where the stream comes from, whether it was from the camera or from the file. From hw codecs perspective it is the same workload.

I want to try the patch mentioned above. It seems interesting.

@pelwell
Copy link
Contributor

pelwell commented Dec 28, 2021

@6by9 Happy to merge #4765 before feedback from John if you are confident in it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants