Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Calling switch_mode_and_capture_file repeatedly eventually causes OSError: [Errno 12] Cannot allocate memory #1130

Open
Snelso91 opened this issue Oct 12, 2024 · 18 comments

Comments

@Snelso91
Copy link

Describe the bug
I have a python program called bambu_rpi_webcam.py that I created in order to run an MJPEG stream 24/7 in video mode and then occasionally switch to still mode to take a full resolution still for a timelapse when a signal is received from home assistant via MQTT.
The MJPEG server functionality is copy and pasted from the example code mjpeg_server_2.py.

To do the alternating functionality of streaming MJPEG and taking high res still images, I have two configs setup:
A lower resolution (1480, 1080) 30fps video config:

video_config = picam2.create_video_configuration(
    main={"size": video_resolution},
    controls={"AfMode": controls.AfModeEnum.Manual,
              "LensPosition": lens_position})

A high resolution (3552, 2592) stills config:

still_config = picam2.create_still_configuration(
    main={"size": still_resolution},
    controls={"AfMode": controls.AfModeEnum.Manual,
              "LensPosition": lens_position,
              "NoiseReductionMode": controls.draft.NoiseReductionModeEnum.HighQuality})

In order to quickly switch from the MJPEG mode to the stills mode, I use the following function, which gets called by a callback function when the MQTT signal is received:

def take_frame():
    start_time = time.time() 
    picam2.stop_encoder(encoder)

    filename = "{}{:0>4}.jpg".format(frames_path, layer_num)
    print("frame filename = \"{}\"".format(filename))
    picam2.switch_mode_and_capture_file(still_config, filename)

    print("take frame elapsed time = {:.2f}".format(time.time() - start_time))
    picam2.start_encoder(encoder)

So you can see the core of this function is to:

  1. Stop the MJPEG encoder
  2. Call switch_mode_and_capture_file to quickly switch to the stills config, take the image and then switch back
  3. Re-start the MJPEG encoder again.

This function works completely fine the first few times it is called. However, after approx. 5 hours of running the program, and about 267 times of the MQTT handler calling this function, the program inevitably silently locks up with the error:
OSError: [Errno 12] Cannot allocate memory
If you look below you can see the full traceback for this error, and it is clear that it is caused when switch_mode_and_capture_file is called.

In order to try and fix this error I tried following advice from #1102, to update my OS using:

sudo apt update
sudo apt full-upgrade

I also updated picamera2 and others using:

sudo apt upgrade

However neither of these things helped get rid of the problem.

In #1102, I also saw a mention of a potential solution of commenting out this line in /lib/udev/rules.d/60-dma-heap.rules:
SUBSYSTEM=="dma_heap", KERNEL=="linux,cma", SYMLINK+="dma_heap/vidbuf_cached", OPTIONS+="link_priority=-50"

And in #1125, I saw a mention of using a "persistent allocator" to solve the problem.

Since these were 2 different solutions and different bugs, I wasn't sure if they were the correct solution for my exact issue.
In addition, I also saw a comment #1102 (comment) that said that #1102 was to do with the Pi 5, and so a new bug report should be opened for devices that aren't the Pi 5
And so for these 2 reasons I have opened this new issue.

To Reproduce
I have found it hard to reproduce the behaviour on demand by using a single test script. I tried using this test python script, but even with 10,000 iterations I could not force the bug to show up:
test_stability.py

Whereas if I let the original bambu_rpi_webcam.py program run for 5 hours and only 267 iterations then it will cause the bug.

Expected behaviour
I was expecting switch_mode_and_capture_file to be able to switch back and forth between video and still modes an infinite number of times without crashing due to a memory leak/bug.
Clearly the problem is not a general lack of memory otherwise it would not work the first time, but this issue only occurs after several iterations of using switch_mode_and_capture_file.

Console Output, Screenshots
The traceback when the error occurs is as follows:

Oct 11 21:53:31 p1s-cam python[410]: Exception in thread Thread-2 (_thread_main):
Oct 11 21:53:31 p1s-cam python[410]: Traceback (most recent call last):
Oct 11 21:53:31 p1s-cam python[410]:   File "/usr/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
Oct 11 21:53:31 p1s-cam python[410]:     self.run()
Oct 11 21:53:31 p1s-cam python[410]:   File "/usr/lib/python3.11/threading.py", line 975, in run
Oct 11 21:53:31 p1s-cam python[410]:     self._target(*self._args, **self._kwargs)
Oct 11 21:53:31 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/paho/mqtt/client.py", line 3591, in _thread_main
Oct 11 21:53:31 p1s-cam python[410]:     self.loop_forever(retry_first_connection=True)
Oct 11 21:53:31 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/paho/mqtt/client.py", line 1756, in loop_forever
Oct 11 21:53:31 p1s-cam python[410]:     rc = self._loop(timeout)
Oct 11 21:53:31 p1s-cam python[410]:          ^^^^^^^^^^^^^^^^^^^
Oct 11 21:53:31 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/paho/mqtt/client.py", line 1164, in _loop
Oct 11 21:53:31 p1s-cam python[410]:     rc = self.loop_read()
Oct 11 21:53:31 p1s-cam python[410]:          ^^^^^^^^^^^^^^^^
Oct 11 21:53:31 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/paho/mqtt/client.py", line 1556, in loop_read
Oct 11 21:53:31 p1s-cam python[410]:     rc = self._packet_read()
Oct 11 21:53:31 p1s-cam python[410]:          ^^^^^^^^^^^^^^^^^^^
Oct 11 21:53:31 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/paho/mqtt/client.py", line 2439, in _packet_read
Oct 11 21:53:31 p1s-cam python[410]:     rc = self._packet_handle()
Oct 11 21:53:31 p1s-cam python[410]:          ^^^^^^^^^^^^^^^^^^^^^
Oct 11 21:53:31 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/paho/mqtt/client.py", line 3033, in _packet_handle
Oct 11 21:53:31 p1s-cam python[410]:     return self._handle_publish()
Oct 11 21:53:31 p1s-cam python[410]:            ^^^^^^^^^^^^^^^^^^^^^^
Oct 11 21:53:31 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/paho/mqtt/client.py", line 3327, in _handle_publish
Oct 11 21:53:32 p1s-cam python[410]:     self._handle_on_message(message)
Oct 11 21:53:32 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/paho/mqtt/client.py", line 3570, in _handle_on_message
Oct 11 21:53:32 p1s-cam python[410]:     on_message(self, self._userdata, message)
Oct 11 21:53:32 p1s-cam python[410]:   File "/home/user/bambu_rpi_webcam/bambu_rpi_webcam.py", line 207, in my_callback
Oct 11 21:53:32 p1s-cam python[410]:     take_frame()
Oct 11 21:53:32 p1s-cam python[410]:   File "/home/user/bambu_rpi_webcam/bambu_rpi_webcam.py", line 143, in take_frame
Oct 11 21:53:32 p1s-cam python[410]:     picam2.switch_mode_and_capture_file(still_config, filename)
Oct 11 21:53:32 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/picamera2/picamera2.py", line 1458, in switch_mode_and_capture_file
Oct 11 21:53:32 p1s-cam python[410]:     return self.dispatch_functions(functions, wait, signal_function, immediate=True)
Oct 11 21:53:32 p1s-cam python[410]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 11 21:53:32 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/picamera2/picamera2.py", line 1354, in dispatch_functions
Oct 11 21:53:32 p1s-cam python[410]:     return job.get_result(timeout=timeout) if wait else job
Oct 11 21:53:32 p1s-cam python[410]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 11 21:53:32 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/picamera2/job.py", line 79, in get_result
Oct 11 21:53:32 p1s-cam python[410]:     return self._future.result(timeout=timeout)
Oct 11 21:53:32 p1s-cam python[410]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 11 21:53:32 p1s-cam python[410]:   File "/usr/lib/python3.11/concurrent/futures/_base.py", line 456, in result
Oct 11 21:53:32 p1s-cam python[410]:     return self.__get_result()
Oct 11 21:53:32 p1s-cam python[410]:            ^^^^^^^^^^^^^^^^^^^
Oct 11 21:53:32 p1s-cam python[410]:   File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
Oct 11 21:53:32 p1s-cam python[410]:     raise self._exception
Oct 11 21:53:32 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/picamera2/job.py", line 48, in execute
Oct 11 21:53:32 p1s-cam python[410]:     done, result = self._functions[0]()
Oct 11 21:53:32 p1s-cam python[410]:                    ^^^^^^^^^^^^^^^^^^^^
Oct 11 21:53:32 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/picamera2/picamera2.py", line 1451, in capture_and_switch_back_
Oct 11 21:53:32 p1s-cam python[410]:     self.switch_mode_(preview_config)
Oct 11 21:53:32 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/picamera2/picamera2.py", line 1421, in switch_mode_
Oct 11 21:53:32 p1s-cam python[410]:     self.configure_(camera_config)
Oct 11 21:53:32 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/picamera2/picamera2.py", line 1109, in configure_
Oct 11 21:53:32 p1s-cam python[410]:     self.allocator.allocate(libcamera_config, camera_config.get("use_case"))
Oct 11 21:53:32 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/picamera2/allocators/dmaallocator.py", line 43, in allocate
Oct 11 21:53:32 p1s-cam python[410]:     fd = self.dmaHeap.alloc(f"picamera2-{i}", stream_config.frame_size)
Oct 11 21:53:32 p1s-cam python[410]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 11 21:53:32 p1s-cam python[410]:   File "/usr/lib/python3/dist-packages/picamera2/dma_heap.py", line 98, in alloc
Oct 11 21:53:32 p1s-cam python[410]:     ret = fcntl.ioctl(self.__dmaHeapHandle.get(), DMA_HEAP_IOCTL_ALLOC, alloc)
Oct 11 21:53:32 p1s-cam python[410]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 11 21:53:32 p1s-cam python[410]: OSError: [Errno 12] Cannot allocate memory

Below is the full journalctl log of the program up until the point I noticed it was no longer working because of the memory error:
journalctl.log

Hardware :
Raspberry Pi Zero 2 W
Raspberry Pi Camera Module V3 Wide

Additional context
OS: Raspberry Pi OS Bookworm
OS version (result of uname -a):
Linux p1s-cam 6.6.51+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.51-1+rpt3 (2024-10-08) aarch64 GNU/Linux

picamera2 version (result of dpkg -s python3-picamera2 | grep Version):
0.3.22-2

As per #1102:
Result of ls -l /dev/dma_heap:

total 0
crw-rw---- 1 root video 253, 1 Oct 11 16:53 linux,cma
crw-rw---- 1 root video 253, 0 Oct 11 16:53 system
lrwxrwxrwx 1 root root       9 Oct 11 16:53 vidbuf_cached -> linux,cma

Contents of /lib/udev/rules.d/60-dma-heap.rules:

SUBSYSTEM=="dma_heap", GROUP="video", MODE="0660"
SUBSYSTEM=="dma_heap", KERNEL=="system", PROGRAM="/bin/grep -q \"^Revision\s*:\s*[ 123][0-9a-fA-F][0-9a-fA-F]4[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]$$\" /proc/cpuinfo", SYMLINK+="dma_heap/vidbuf_cached"
SUBSYSTEM=="dma_heap", KERNEL=="linux,cma", SYMLINK+="dma_heap/vidbuf_cached", OPTIONS+="link_priority=-50"
@Snelso91 Snelso91 changed the title [BUG] Calling switch_mode_and_capture_file repeatedly eventually causes OSError: [Errno 12] Cannot allocate memory [BUG] Calling switch_mode_and_capture_file repeatedly eventually causes OSError: [Errno 12] Cannot allocate memory Oct 12, 2024
@davidplowman
Copy link
Collaborator

I wonder if this is the same problem as #1125.

You could try the workaround mentioned there (to use the PersistentAllocator), which may be possible if you have just a couple of configuration that you are switching between. Of course, I realise that on a Pi Zero 2 you may be rather memory constrained.

@Snelso91
Copy link
Author

@davidplowman thanks, I'll look at trying that soon, although I was curious if commenting out the line in /lib/udev/rules.d/60-dma-heap.rules would be better for fixing the root cause of the memory leak as mentioned in #1102 ?

Since you mentioned how PersistentAllocator would be more appropriate for the case of switching back and forth between 2 configs in #1125, I had a related question to do with performance:

If my goal is to switch from the video config to the still config, capture a jpeg, and then switch back to the video config, is switch_mode_and_capture_file the quickest way to do that?
When I say quickest, I'm specifically referring to the total time it takes for whatever commands or function to capture the raw image data (not how long it takes for the jpeg to be captured), since at the point the image data is in RAM, then the image is "locked in", and can't be changed further by things moving in the camera's FOV.

For example, if the pipeline of switch_mode_and_capture_file is something like Raw Image Data -> Memory -> JPEG Encoder -> JPEG file, then I would expect that that probably would be the quickest method since the raw image data is captured first without any delay.

But if it is something like Raw Image Data -> JPEG Encoder -> JPEG file, then how long it takes to capture the frame data is dependent on the slow JPEG encoder instead of a faster operation like just storing the image data in memory?

The reason I'm asking is because there is currently a delay in the chain from when the printer sends a signal to capture a timelapse frame, the HA server receiving it and sending an MQTT packet, and then finally when the pi receives the packet, the time it takes to switch modes and capture the frame.

So if I could optimise the last part, the time it takes to capture the image data (not the jpeg), then that would reduce the total time I need to have the printer stay parked (since the printer can't receive feedback from the pi, the printer just waits in the park area a fixed amount of time for the signal to propagate and the pi to take an image)

@davidplowman
Copy link
Collaborator

On anything other than a Pi 5, the vidbuf_cached heap must point to the linux,cma heap.

If you're going to do a mode switch for the capture, then switch_mode_and_capture_file is probably best. Using the PersistentAllocator might improve it slightly, because you don't have to deallocate and reallocate buffers all the time.

Note that stopping the camera and switching to a different mode is relatively time-consuming, normally I would expect a few hundred milliseconds. The other option would be to run permanently in the capture mode - if you can run at 10 or 15fps then that's a much lower latency. The catch is that you might not have enough memory. Running with 2 buffers might be enough for occasional captures, you'd have to try it - otherwise you'd need 3. If you don't need a preview then you could use 24-bit RGB instead of 32-bit to save some space. You might even be able to use YUV420 (even less memory), though you'd probably need a software conversion to RGB for saving as JPEG (though OpenCV has a routine for that).

@Snelso91
Copy link
Author

Snelso91 commented Oct 14, 2024

@davidplowman ok thanks, I'll leave the switch_mode_and_capture_file as-is for now and start using PersistentAllocator.
The modified code hasn't had an issue when using the PersistentAllocator yet, but I've only run it for a few hours so I'll have to wait a bit longer to say definitively whether the workaround worked.

Regarding switching speed, a while ago I did time how long it took to run:

picam2.stop_encoder(encoder)
picam2.switch_mode_and_capture_file(still_config, filename)

And it usually came out around 1.5 secs (with some variance) before using PersistentAllocator, but I know that this includes the time to create the jpg image from the raw data.

So were you're implying that the time to run stop_encoder(), switch modes, and capture the image into RAM is usually less than 1 sec?
In other words, if the difference in time between capturing an image to RAM with stopping the encoder and switching config, vs staying in the high res config and capturing an image to RAM is <0.5 sec, then I think it would probably be fine leaving it with the dual video and stills configs in order to get the higher framerate video.
This is because the total delay I'm looking at for the whole signal path is on the order of 5.15 secs currently (the vast majority comes from the printer being slow to send MQTT messages to HA as far as I can tell), so taking 0.5 secs off that might be worth less than losing smooth video.

Also, doing it this way (dual configs) would allow me to use some features such as HQ noise reduction for the still images, while still using the normal fast denoising for the video mode, whereas if you only used a single config you would probably have to use fast denoising in order to get a decent framerate for the video, at the expense of the denoising quality for the stills?

Regarding memory, currently I don't explicitly set the format of the image in either config, but I assumed that because I don't configure or call for a preview at any point then it would automatically select either RGB24 or YUV?

If not, then I suppose setting RGB24 for both would be best, or would setting YUV for the video mode be better?
Obviously YUV would use less memory than RGB24, but from what I understand you are throwing away chroma data in order to do so, whereas in the case of RGB32->RGB24 you aren't losing any info since an image from a camera doesn't contain alpha data right?

Although maybe for MJPEG. using YUV has no quality difference since MJPEG stream isn't the best quality to begin with (some blockiness) when using the H/W encoder?

And regarding the actual image format for RGB24, does it make any difference for the still jpeg config, or the video MJPEG config whether I select BGR888, or RGB888?
If it doesn't make a difference, is there a "normal" or default selection that I should choose out of those 2 options?

@davidplowman
Copy link
Collaborator

@davidplowman ok thanks, I'll leave the switch_mode_and_capture_file as-is for now and start using PersistentAllocator. The modified code hasn't had an issue when using the PersistentAllocator yet, but I've only run it for a few hours so I'll have to wait a bit longer to say definitively whether the workaround worked.

Regarding switching speed, a while ago I did time how long it took to run:

picam2.stop_encoder(encoder)
picam2.switch_mode_and_capture_file(still_config, filename)

And it usually came out around 1.5 secs (with some variance) before using PersistentAllocator, but I know that this includes the time to create the jpg image from the raw data.

So were you're implying that the time to run stop_encoder(), switch modes, and capture the image into RAM is usually less than 1 sec? In other words, if the difference in time between capturing an image to RAM with stopping the encoder and switching config, vs staying in the high res config and capturing an image to RAM is <0.5 sec, then I think it would probably be fine leaving it with the dual video and stills configs in order to get the higher framerate video. This is because the total delay I'm looking at for the whole signal path is on the order of 5.15 secs currently (the vast majority comes from the printer being slow to send MQTT messages to HA as far as I can tell), so taking 0.5 secs off that might be worth less than losing smooth video.

It might be worth timing some of this stuff for yourself. switch_mode_and_capture_file obviously stops, reconfigures and starts the camera twice, whilst saving a JPEG in the middle. You could write out this code explicitly, to see how the bits behave. Maybe like this:

picam2.stop()
picam2.configure(still_mode)
picam2.start()
req = picam2.capture_request()
req.save('main', "test.jpg")
req.release()
picam2.stop()
picam2.configure(video_mode)
picam2.start()

That's pretty much what switch_mode_and_capture_file does, I don't think the above should be very different.

Also, doing it this way (dual configs) would allow me to use some features such as HQ noise reduction for the still images, while still using the normal fast denoising for the video mode, whereas if you only used a single config you would probably have to use fast denoising in order to get a decent framerate for the video, at the expense of the denoising quality for the stills?

That's true. On Pis other than Pi 5s, doing the higher quality (and slower) stills denoise may reduce the framerate. Note that you can change the denoise setting while the camera is running, but the catch will be knowing on which frame the change has actually taken effect.

Regarding memory, currently I don't explicitly set the format of the image in either config, but I assumed that because I don't configure or call for a preview at any point then it would automatically select either RGB24 or YUV?

If not, then I suppose setting RGB24 for both would be best, or would setting YUV for the video mode be better? Obviously YUV would use less memory than RGB24, but from what I understand you are throwing away chroma data in order to do so, whereas in the case of RGB32->RGB24 you aren't losing any info since an image from a camera doesn't contain alpha data right?

Preview/video modes select 32-bit ARGB formats by default, but only because they're usually easier to display. If you're not displaying them, 24-bit RGB would be more efficient. YUV420 is even better, and the H.264 video encoder will accept it directly.

Unfortunately Python mostly doesn't have much support for YUV420, so you'd have to convert it to 24-bit RGB for saving as a JPEG. OpenCV has a function cv2.cvtColor that will do this, like so:

    rgb = cv2.cvtColor(yuv420, cv2.COLOR_YUV420p2RGB)

Although maybe for MJPEG. using YUV has no quality difference since MJPEG stream isn't the best quality to begin with (some blockiness) when using the H/W encoder?

MJPEG should accept RGB and YUV420 image formats but, as you say, quality is worse for a similar bitrate compared to H.264.

And regarding the actual image format for RGB24, does it make any difference for the still jpeg config, or the video MJPEG config whether I select BGR888, or RGB888? If it doesn't make a difference, is there a "normal" or default selection that I should choose out of those 2 options?

I don't think the choice of BGR or RGB makes any difference. My rule of thumb is that reds and blues invariably come out the wrong way round and you end up swapping colours until it's right. Always swap colours by requesting the other format, rather than doing software conversions.

@Snelso91
Copy link
Author

Snelso91 commented Nov 18, 2024

@davidplowman thanks for your suggestions, and apologies for my late reply.
As a preface I wanted to ask, since the Persistent Allocator seems to be a good workaround, should I close this issue as solved, or should I leave it open since the root cause of the problem was not fixed yet?

Now regarding your reply:
I tried using YUV for the video config for the MJPEG stream, but it made the colours noticeably different to RGB24, so I decided to stick with RGB24. I guess I should have expected that given that the point of YUV420 is to compress the colour data haha.

For BGR vs RGB, I tried both and it doesn't make any difference for jpeg images like you said, so I decided to use BGR888, since the default format if I don't specify any is the 32 bit version of that.


Now regarding the timings, I copied your code and replaced switch_mode_and_capture_file with it, and also inserted several variables to keep track of the execution time of each section of the code. The meaning of the various times are as follows:

time1 = The time to run stop_encoder as well as stopping the camera, switching to the still image config, and then starting the camera.
time2 = The previous time + the time to run req = picam2.capture_request()
time3 = The previous time + the time to run req.save('main', filename)
switch and capture time = the time to run only the code that is equivalent to switch_mode_and_capture_file, that you provided. This is equivalent to time3 - the time to run stop_encoder.
full time = the time to run the entire take_frame function. This is equivalent to time3 + the time to release the request, stop the camera, switch back to the video config, start the camera, and start the encoder again.

Using some code in the 3D printer and a pi pico to translate the PWM signal, I sent a digital on/off signal to the GPIO of the the raspberry pi, which started the take_frame function, waited for the function to finish/return, and then went back to waiting for the next GPIO pulse. I repeated this 4000 times and printed the time variables for each time a frame was taken, and also kept track of the maximum values of each across the whole test. I found the point in the journalctl log where the maximum time occurred (3664.jpg) and the log is as follows (with the previous and following frame logs above and below it for context):

Nov 18 19:56:37 p1s-cam python[92043]: GPIO input signal received
Nov 18 19:56:37 p1s-cam python[92043]: frame filename = "./frames/3663.jpg"
Nov 18 19:56:37 p1s-cam python[92043]: [72:50:14.483712581] [92059]  INFO Camera camera.cpp:1197 configuring streams: (0) 3552x2592-BGR888 (1) 4608x2592-SBG>
Nov 18 19:56:37 p1s-cam python[92043]: [72:50:14.500515112] [92048]  INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format>
Nov 18 19:56:38 p1s-cam python[92043]: [72:50:15.437990270] [92059]  INFO Camera camera.cpp:1197 configuring streams: (0) 1480x1080-BGR888 (1) 2304x1296-SBG>
Nov 18 19:56:38 p1s-cam python[92043]: [72:50:15.447620332] [92048]  INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format>
Nov 18 19:56:38 p1s-cam python[92043]: Layer Number = 3664
Nov 18 19:56:38 p1s-cam python[92043]: time1 =     0.10, time2 =     0.39, time3 =     0.99, switch and capture time =     1.02, full time =     1.07
Nov 18 19:56:38 p1s-cam python[92043]: time1 max = 0.55, time2 max = 0.83, time3 max = 5.35, switch and capture time max = 4.96, full time max = 5.44

Nov 18 19:56:41 p1s-cam python[92043]: GPIO input signal received
Nov 18 19:56:41 p1s-cam python[92043]: frame filename = "./frames/3664.jpg"
Nov 18 19:56:41 p1s-cam python[92043]: [72:50:18.771922135] [92059]  INFO Camera camera.cpp:1197 configuring streams: (0) 3552x2592-BGR888 (1) 4608x2592-SBG>
Nov 18 19:56:41 p1s-cam python[92043]: [72:50:18.783724636] [92048]  INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format>
Nov 18 19:56:51 p1s-cam python[92043]: [72:50:28.196571997] [92059]  INFO Camera camera.cpp:1197 configuring streams: (0) 1480x1080-BGR888 (1) 2304x1296-SBG>
Nov 18 19:56:51 p1s-cam python[92043]: [72:50:28.206903567] [92048]  INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format>
Nov 18 19:56:51 p1s-cam python[92043]: Layer Number = 3665
Nov 18 19:56:51 p1s-cam python[92043]: time1 =     0.47, time2 =     0.72, time3 =     9.83, switch and capture time =     9.48, full time =     9.92
Nov 18 19:56:51 p1s-cam python[92043]: time1 max = 0.55, time2 max = 0.83, time3 max = 9.83, switch and capture time max = 9.48, full time max = 9.92

Nov 18 19:56:53 p1s-cam python[92043]: GPIO input signal received
Nov 18 19:56:53 p1s-cam python[92043]: frame filename = "./frames/3665.jpg"
Nov 18 19:56:53 p1s-cam python[92043]: [72:50:30.498512884] [92059]  INFO Camera camera.cpp:1197 configuring streams: (0) 3552x2592-BGR888 (1) 4608x2592-SBG>
Nov 18 19:56:53 p1s-cam python[92043]: [72:50:30.509967835] [92048]  INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format>
Nov 18 19:56:54 p1s-cam python[92043]: [72:50:31.389616379] [92059]  INFO Camera camera.cpp:1197 configuring streams: (0) 1480x1080-BGR888 (1) 2304x1296-SBG>
Nov 18 19:56:54 p1s-cam python[92043]: [72:50:31.399856335] [92048]  INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format>
Nov 18 19:56:54 p1s-cam python[92043]: Layer Number = 3666
Nov 18 19:56:54 p1s-cam python[92043]: time1 =     0.47, time2 =     0.73, time3 =     1.30, switch and capture time =     0.94, full time =     1.39
Nov 18 19:56:54 p1s-cam python[92043]: time1 max = 0.55, time2 max = 0.83, time3 max = 9.83, switch and capture time max = 9.48, full time max = 9.92

So you can see that by far the biggest contributor to the total function runtime is the req.save('main', filename) command. Given that this is just saving the image that has already been captured into a jpeg file, is it possible to speed up this worst case req.save runtime of about 9.1 secs by using multithreading or something else?
Alternatively, can I run the jpeg encoding/saving in the background/a separate thread, and allow the rest of the function to finish (this would release the request, stop the camera, switch back to video mode and start the camera and encoder again)?

It is critical that the camera/take_frame function is ready to take another photo again within about 4 seconds of it first being triggered, so the total execution time of the take_frame function before it returns must be less than 4 secs.
Otherwise, the program will miss the next gpio pulse which could happen as little as 4 seconds after the previous one, and will therefore miss taking a frame. I can confirm this has been happening since the most recent test only has 3991 frames when there should be 4001, and it is almost certainly because req.save is taking too long.

If it helps, I wouldn't mind putting a copy of the frame into memory and then saving that into a jpeg in a background thread. Of course, this copy of the frame would have to have some kind of protection so that it doesn't get overwritten by the next iteration while it is still being saved to a jpeg.
Whether the solution uses requests or not I don't mind, but given that I have to stop the video encoder before switching modes to the still config, I don't think I have to worry about dropped frames that using requests would normally mitigate. I only care about getting the raw image as fast as possible (while using dual configs if possible), and then as a secondary priority encoding the image to jpeg as fast as possible.

Obviously, if I use the copy the frame into memory idea and the jpeg saving takes 9 secs several times in a row, the pi could run out of memory, but that shouldn't happen as the average time to encode the jpeg/save the request appears to be 1 sec or less, it's just these outliers that take up to 9 secs sometimes which I want to deal with.

Also, I could get rid of the switching configs part and just run the camera with the high quality stills config (but lose out on achieving 30fps for the mjpeg stream), but given that the worst case total time is about 10 seconds due to the req.save issue, I don't think saving about 0.3 secs from not switching configs would help here.


Separately, I also noticed something interesting in the log which is that right at the end of the test, when the pi was zipping up all of the images (to make it quicker to copy off the pi later), I got this error which I've never seen before:

Nov 18 20:18:15 p1s-cam python[92043]: zipping images...
Nov 18 20:35:19 p1s-cam python[92043]: [73:28:56.356819564] [92048]  WARN V4L2 v4l2_videodevice.cpp:2095 /dev/video0[17:cap]: Dequeue timer of 1000000.00us >
Nov 18 20:35:19 p1s-cam python[92043]: [73:28:56.357025761] [92048] ERROR RPI pipeline_base.cpp:1364 Camera frontend has timed out!
Nov 18 20:35:20 p1s-cam python[92043]: [73:28:56.357101334] [92048] ERROR RPI pipeline_base.cpp:1365 Please check that your camera sensor connector is attac>
Nov 18 20:35:20 p1s-cam python[92043]: [73:28:56.357158469] [92048] ERROR RPI pipeline_base.cpp:1366 Alternatively, try another cable and/or sensor.
Nov 18 20:38:38 p1s-cam python[92043]: finished zipping images

Is it possible this could be causing the req.save issue, or is it just that I need to reseat the FPC cable?
The cable I'm using is very short so it shouldn't pick up much interference, but on the other hand there are some power and data wires running right next to the camera and its cable inside the printer, so could those be causing enough interference to cause this error?
I tried refreshing the camera stream since this error, but it has caused the stream to completely shutdown, and when I look at htop I can see that my program is basically using no CPU which is not right because usually it uses around 15% constantly since it is always encoding the MJPEG stream.

I should also note that I noticed when running htop, that the swap amount used on the pi zero 2 was quite high (about 190MB/200MB), but the actual RAM usage was still below 250MB/417MB in most cases so this seemed like a false alarm?

@Snelso91
Copy link
Author

Snelso91 commented Nov 19, 2024

Ok, so I did some further reading and it seems like this might be a solution to get what I described above:

def save_frame(buffer, config, metadata, filename):
    picam2.helpers.save(picam2.helpers.make_image(buffer, config), metadata, filename)

def take_frame():
    picam2.stop_encoder(encoder)
    buffers, metadata = picam2.switch_mode_and_capture_buffers(still_config)
    Thread(target=save_frame, args=(buffers[0], still_config["main"], metadata, filename)).start()
    picam2.start_encoder(encoder)

It's based on the capture_dng_and_jpeg_helpers.py example.
Also, I've omitted some parts of the take_frame function that weren't picam2-related functions for readability.

This new version of take_frame now switches to the still config, saves the main buffer and switches back (so that we don't have issues when we restart the encoder). Then all the work of converting the buffer to an image and then saving that image to a jpg file is offloaded to the save_frame function which is run in a non blocking way via a python thread.


Unfortunately when I tested it, the time to run just this line took a maximum of 8.2 secs (and 3.7 secs on another iteration) during just the first few tests:

buffers, metadata = picam2.switch_mode_and_capture_buffers(still_config)

Which is strange because with the previous code, the limiting factor was saving the request data, not getting the request/buffer data itself...

Also, I just did some further testing using the normal GPIO test, instead of me manually triggering take_frame a few timeswith an async callback function, and unfortunately it seems either there is a memory leak in this new code, or the threaded part is not returning quickly enough and so the threads are piling up in memory faster than they can be completed.
If it is the latter, that is strange given that I am using the same test/interval between frames as before, and that interval only caused a few frames to be missed, whereas with this new method it would suggest the interval is too fast for the average time it takes for the thread to finish/return.

Please let me know if there is a faster or more error-free solution, as per my previous message.

@davidplowman
Copy link
Collaborator

davidplowman commented Nov 19, 2024

Hi, thanks for the update. A couple of comments...

Firstly, when you say "colours are different", do you mean they're actually different colours, or just that a the pixel level they show different levels of colour detail. The latter might be expected when using YUV420 (though to be honest, it's normally marginal to the point of being unnoticeable, except perhaps when there's nasty aliasing going on). The former is more surprising. I wonder if you should try setting the configuration's colour_space to libcamera.ColorSpace.Sycc(). If you don't say, create_video_configuration is probably giving you something slightly different, maybe even Rec709.

As regards the timing questions, I'm starting to get a bit lost in all the text, I don't know if it's possible for you to distill what you're doing into a simple ~20 line test case. Then I could run it easily and check what's happening.

But broadly I think the approach sounds right. I'd have a main thread that starts and stops the encoder. Then another thread that does the big JPEG encodes sounds right. I'd either copy the image buffers and forward them, or I'd forward the requests directly, not sure. I'd be inclined to use a single thread there, so that you can wait for it to go idle before sending it more stuff.

Edited to add: we're just about to post an updated release which will fix the original memory problem. But I think in this kind of case, the PersistentAllocator is a better, more reliable, and probably faster solution, so I'd stick with it!

@Snelso91
Copy link
Author

Snelso91 commented Nov 19, 2024

@davidplowman thanks, I'll investigate the colour discrepancy later, and will probably keep using Persistent Allocator as you suggested.

Sorry for the overly verbose previous message.
To be more clear this time, I've attached a copy of the python code as a zip file, with all the stuff that isn't related to picamera2 removed (such as MQTT etc):
bambu_rpi_webcam.py

In the above code, take_frame() is the key function that is taking too long, and I call it every time a GPIO input goes high - see handle_GPIO_signal_forever().
But for the sake of testing you can of course call take_frame multiple times with some other function such as a for loop instead of having to wire up the GPIO to some other device to generate the input signal.

The core code of take_frame that I have an issue with is this:

start_time = time.time()
picam2.stop_encoder(encoder)

start_time_frame = time.time()

picam2.stop()
picam2.configure(still_config)
picam2.start()
time1 = time.time() - start_time

req = picam2.capture_request()
time2 = time.time() - start_time

req.save('main', filename)
time3 = time.time() - start_time

req.release()
picam2.stop()
picam2.configure(video_config)
picam2.start()

switch_and_capture_time = time.time() - start_time_frame

picam2.start_encoder(encoder)
full_time = time.time() - start_time

The issue I have (that I was talking about in my 1st reply yesterday) was that the time to run req.save('main', filename) can vary wildly from less than 1 second, up to a maximum of 9.1 seconds.

First, assume that a gpio signal pulse goes high every 4 seconds for 20ms.
Given that req.save is blocking while it saves the jpeg file, then that means because of the req.save issue, the whole take_frame function will take just under 10 seconds to process in the worst case. Because it is blocking, it will miss the next gpio pulse that happens at t=4s and t=8s, because take_frame will not return control to handle_GPIO_signal_forever until t=10s.

Therefore, my question is, how can I rewrite this code so that the saving of the jpeg is handled in a separate thread?

In other words take_frame will return after stopping encoder, switching modes, getting the image data, switching back, and starting the encoder. That list of things in theory should be fairly fast (less than 4 sec). The separate thread will then save the jpeg image to disk in the background, so that if it ends up taking 9.1 secs to save, it will not block the next call of take_frame from happening, and so no images will be missed.

@Snelso91
Copy link
Author

Snelso91 commented Nov 20, 2024

So I did some further testing with many different configurations, such as:

  1. Switch mode and capture a request, save it to file, and then release the request
  2. Switch mode and capture a buffer and then in a separate thread convert the buffer to an image and save it to a file
  3. Switch mode and capture file to a BytesIO buffer, and then save that buffer to disk/micro SD card in a separate thread. This makes sure that any slow writing to disk does not affect the total runtime of the take_frame function. I should note that specifically the time taken to write the image (approx. 2.3MB) to disk in this thread was always far less than 1 second, so I can conclude that IO speed is not the cause of my issues.
  4. Most recently, I just tried using a single config (instead of 2 separate ones), to remove the need to switch configs, and even with just 1 config, capturing a request, saving it to file and then releasing the request was still showing very high maximum runtimes for saving the request (8.6 secs).

In all of these cases, given that I have ruled out disk IO speeds being an issue (such as in case 3), the common problem seems to be the commands for getting data in the first place, such as capturing a buffer, capturing a file/image, or saving a request.

In other words, it seems specifically the process of getting the camera data sometimes runs extremely slowly (max times vary from 8-30 secs depending on the test run or the config).
Originally I thought that the process of converting the raw camera data into PIL image/jpeg bytes to be written to disk might be a possible cause of the slowdown as well, but given that the problem still occurs when just capturing a buffer, I think we can rule that out, and instead it must be to do with getting the data in the first place?

Given that these commands are the culprits, I'm not sure how I can go about fixing this?
I've already tried simplifying the code as much as possible, but if the problem is simply running req.save or capture_file or capture_buffers, then I'm not sure what more I can do because the runtime of those functions is outside of my control?

Also, I thought I should ask if it's best to move this conversation to a new issue, given that the problems that I'm asking about are now quite off-topic compared to my original post about the memory allocation problem?

@davidplowman
Copy link
Collaborator

I took your code snippet above and turned into a standalone example (below). I didn't have a Pi Zero 2W to hand, so I took a Pi 3B and set total_mem=512 which should behave the same. I put the latest 32-bit OS on it.

import time

from io import BytesIO
from threading import Thread
from queue import Queue

from picamera2 import Picamera2
from picamera2.allocators import PersistentAllocator
from picamera2.encoders import H264Encoder
from picamera2.outputs import FileOutput

def encode_thread(q):
    while (item := q.get()):
        filename, req = item
        data = BytesIO()
        t0 = time.time()
        req.save('main', data, format='jpeg')
        req.release()
        t1 = time.time()
        with open(filename, "wb") as f:
            f.write(data.getbuffer())
        print("jpeg", t1 - t0, "save", time.time() - t1)

picam2 = Picamera2()
picam2.allocator = PersistentAllocator()
half_res = (picam2.sensor_resolution[0] // 2, picam2.sensor_resolution[1] // 2)
video_config = picam2.create_video_configuration({'size': (640, 360)}, raw={'size': half_res})
picam2.configure(video_config)
still_config = picam2.create_still_configuration()
picam2.configure(still_config)

q = Queue()
thread = Thread(target=encode_thread, args=(q,))
thread.start()

start_time = None
for _ in range(10):
    encoder = H264Encoder()
    encoder.output = FileOutput("out.mp4")

    picam2.configure(video_config)
    picam2.start()

    picam2.start_encoder(encoder)
    if start_time:
        print("******* full time taken", time.time() - start_time)
    time.sleep(1)

    start_time = time.time()
    picam2.stop_encoder(encoder)
    print("Stop encoder", time.time() - start_time)

    picam2.stop()
    picam2.configure(still_config)
    picam2.start()

    req = picam2.capture_request()
    q.put(("test.jpg", req))

    picam2.stop()

q.put(None)
thread.join()

I found that mostly things were very well behaved. About 0.7s per JPEG, 0.01s to write the data, and the "full time" was easily under 1s.

But occasionally I found it going wrong when everything was taking much longer. I could only reproduce this in the first few runs after booting up. Also, interestingly, without reducing the memory to 512MB, it thought it was OK.

Could you confirm what behaviour you see with the script above? I understood (this is such a long thread now!) that you're using a Pi Zero 2W. Can you confirm that you're using the latest OS. Are you using 32 or 64 bit?

@Snelso91
Copy link
Author

Snelso91 commented Nov 21, 2024

@davidplowman for my OS, I'm running Raspberry Pi OS Bookworm Lite 64bit.
I'm using lite since I'm using the pi headless and only want to run a few programs on it, so I didn't see a need to install the full version as it would just use more RAM and CPU in the background.
I don't really have a particular reason why I chose 64 bit over 32 bit, other than I assumed it might be a bit better.

OS Details (results of uname -a):
Linux p1s-cam 6.6.51+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.51-1+rpt3 (2024-10-08) aarch64 GNU/Linux
picamera2 version (result of dpkg -s python3-picamera2 | grep Version):
Version: 0.3.22-2

I took your code and added a few lines just to keep track of the max full time and to give me an alert if the full time went over 1 sec:

alert = False
max_full_time = 0
for _ in range(10):
    encoder = H264Encoder()
    encoder.output = FileOutput("out.mp4")

    picam2.configure(video_config)
    picam2.start()

    picam2.start_encoder(encoder)
    if start_time:
        full_time = time.time() - start_time
        max_full_time = max(full_time, max_full_time)
        print("******* full time taken", full_time)
        print("max {}".format(max_full_time))
        if alert:
            print("ALERTED PREVIOUSLY")
        if full_time > 1:
            alert = True
            print("FULL TIME ALERT")
    time.sleep(1)

However it seems I didn't need them to tell me something was wrong because after a few runs of the program I got a full blown error and traceback which I've never seen before:

******* full time taken 0.9123744964599609
max 0.9123744964599609
Stop encoder 1.9047117233276367
[40:25:26.961712734] [5318]  INFO Camera camera.cpp:1197 configuring streams: (0) 4608x2592-BGR888 (1) 4608x2592-SBGGR10_CSI2P
[40:25:26.969010234] [5313]  INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 4608x2592-SBGGR10_1X10 - Selected unicam format: 4608x2592-pBAA
[40:25:27.506036050] [5318]  INFO Camera camera.cpp:1197 configuring streams: (0) 640x360-XBGR8888 (1) 2304x1296-SBGGR10_CSI2P
[40:25:27.517512271] [5313]  INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 2304x1296-SBGGR10_1X10 - Selected unicam format: 2304x1296-pBAA
jpeg 3.5903677940368652 save 0.05013895034790039

Exception in thread Thread-2 (encode_thread):
Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
******* full time taken 2.5445058345794678
max 2.5445058345794678
FULL TIME ALERT
    self.run()
  File "/usr/lib/python3.11/threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "/home/scott_admin/bambu_rpi_webcam/github_test.py", line 17, in encode_thread
    req.save('main', data, format='jpeg')
  File "/usr/lib/python3/dist-packages/picamera2/request.py", line 173, in save
    return self.picam2.helpers.save(self.make_image(name), self.get_metadata(), file_output,
                                    ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/picamera2/request.py", line 165, in make_image
    return self.picam2.helpers.make_image(self.make_buffer(name), self.config[name], width, height)
                                          ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/picamera2/request.py", line 149, in make_buffer
    with _MappedBuffer(self, name, write=False) as b:
  File "/usr/lib/python3/dist-packages/picamera2/request.py", line 30, in __enter__
    self.__mm = self.__sync.__enter__()
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/picamera2/allocators/dmaallocator.py", line 118, in __enter__
    raise RuntimeError("failed to find buffer in DmaSync")
RuntimeError: failed to find buffer in DmaSync

Stop encoder 0.40723443031311035

As you can see, the full runtime for the run with the error was 2.5 secs.
Before I got this error, the previous runs full times were 0.49-0.97 secs, stop encoder took 0.39-0.40 secs, jpeg took 0.59-0.63 secs, and save took 0.005-0.006 secs (it's quite dark in that room atm so the jpeg size is only 813 KB).

Does this imply that I need to switch to the 32 bit lite OS?
Or could it be that it's not possible to run this code consistently (i.e. with full times less than 1 sec) on an Pi Zero 2 W because of its RAM constraints?
For reference. here is htop when the pi is idle and not running any programs other than system ones:
image

If it's something that I can fix without having to change hardware then I'd be more than happy to try, such as switching OS or updating something.

@davidplowman
Copy link
Collaborator

Thanks for the update. That's all very interesting.

I chose 32-bit because I think it's generally regarded as slightly easier on the memory, which might explain why you are perhaps seeing problems with greater frequency than I am. But even then, I have occasional problems on the 32-bit OS, most notably shortly after booting.

The code I posted is vulnerable to crashing when a long delay happens (as you saw, and which I sometimes see also). The solution, or at least the sticking plaster, would be signalling back to the main thread once it's safe to proceed. (I don't know if that kind of a workaround - where you occasionally miss a capture - is any good to you or not.)

However, I do want to poke around a bit more and see if there are any clues as to what's happening. After the system boots, there are usually a few processes that might come along and "do something", so perhaps they're getting in the way. Another thing I'm curious about is whether increasing the amount of swap helps - it's easy enough to try. Using lots of swap on a micro SD card is usually not a great idea, but if it's transient and happens rarely, then maybe that's not so bad.

@davidplowman
Copy link
Collaborator

Well, I've poked round a bit more but am not very much the wiser. I can't really see how it can be struggling with memory because it seems to stay under 250MB at all times, yet it does seem sensitive to memory pressure. I found this variation which does reduce the memory footprint slightly, and I think it does work a bit better.

import time

from threading import Thread
from queue import Queue

import cv2

from picamera2 import Picamera2, MappedArray
from picamera2.allocators import PersistentAllocator
from picamera2.encoders import H264Encoder
from picamera2.outputs import FileOutput

def encode_thread(q):
    while (item := q.get()):
        filename, req = item
        t0 = time.time()
        with MappedArray(req, "main") as m:
            cv2.imwrite(filename, m.array)
        req.release()
        print("jpeg and save", time.time() - t0)

picam2 = Picamera2()
picam2.allocator = PersistentAllocator()
half_res = (picam2.sensor_resolution[1] // 2, picam2.sensor_resolution[0] // 2)
video_config = picam2.create_video_configuration({'size': (640, 360), 'format': 'YUV420'}, raw={'size': half_res})
picam2.configure(video_config)
still_config = picam2.create_still_configuration()
picam2.configure(still_config)

q = Queue()
thread = Thread(target=encode_thread, args=(q,))
thread.start()

start_time = None
for _ in range(10):
    encoder = H264Encoder()
    encoder.output = FileOutput("out.mp4")

    picam2.configure(video_config)
    picam2.start()

    picam2.start_encoder(encoder)
    if start_time:
        print("******* full time taken", time.time() - start_time)
    time.sleep(1)

    start_time = time.time()
    picam2.stop_encoder(encoder)
    print("Stop encoder", time.time() - start_time)

    picam2.stop()
    picam2.configure(still_config)
    picam2.start()

    req = picam2.capture_request()
    q.put(("test.jpg", req))

    picam2.stop()

q.put(None)
thread.join()

It takes a little longer to get going because of the cv2 import (sudo apt install python3-opencv if you don't have it), but after that I think is behaves better.

@Snelso91
Copy link
Author

Snelso91 commented Nov 22, 2024

@davidplowman thanks for the code and suggestions!

I tried the suggestion from your 1st message of increasing the swap amount, and I increased it to 1024MB but unfortunately that didn't help the original program and it still had a long max full time of 3.3 secs.
Also, I made sure to not run the program close to startup, as all the tests I've been doing have been while the pi has been switched on for hours/days already.

Regarding the most new code you posted, I found it did help a lot, and using your test program I observed a max full time of 1.86-1.99 secs after about 1000 iterations.
These times were after I had changed the video config to use full 1920x1080 resolution and BGR888 format instead of YUV (which increased memory usage, but made it more like my original program).
I also used high quality denoise for the stills config and set an optimised manual focus for both modes (which increased the JPEG size).
Lastly I also decreased swap size back to the default of 200MB, since it didn't make any difference for the 1st program. Also, htop showed it wasn't really being used, since like you said, the total RAM usage when running the 1st program was only about 250-270MB / 417MB, and with this new one the usage is only about 225MB.
So it seems the new code is more performant/optimised than the old code, as it is faster even with these more stressful conditions.

However, when I integrated this new code into my original program (as well as starting up another small program that I had shutdown to check whether it was causing the issues) and ran it with 1000 iterations in a for loop just like in your program, I unfortunately got the same traceback error from earlier:

[0:10:12.523934564] [712] [1;32m INFO [1;37mCamera [1;34mcamera.cpp:1197 [0mconfiguring streams: (0) 3552x2592-BGR888 (1) 4608x2592-SBGGR10_CSI2P
[0:10:13.039651399] [706] [1;32m INFO [1;37mRPI [1;34mvc4.cpp:622 [0mSensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 4608x2592-SBGGR10_1X10 - Selected unicam format: 4608x2592-pBAA
frame filename = "./frames/0299.jpg"
[0:10:13.341894464] [712] [1;32m INFO [1;37mCamera [1;34mcamera.cpp:1197 [0mconfiguring streams: (0) 1480x1080-BGR888 (1) 2304x1296-SBGGR10_CSI2P
[0:10:13.351748009] [706] [1;32m INFO [1;37mRPI [1;34mvc4.cpp:622 [0mSensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 2304x1296-SBGGR10_1X10 - Selected unicam format: 2304x1296-pBAA
Layer Number = 300
stop_encoder_time     0.421, switch_to_still_time     0.564, get_request_time     0.256, switch_to_video_time     0.050, start_encoder_time     0.029, full_time     1.321
stop_encoder_time_max 0.435, switch_to_still_time_max 0.564, get_request_time_max 0.284, switch_to_video_time_max 0.110, start_encoder_time_max 0.058, full_time_max 1.321
jpeg_time     2.479
jpeg_time_max 2.479

Exception in thread Thread-2 (save_frame_thread):
Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.11/threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "/home/scott_admin/bambu_rpi_webcam/bambu_rpi_webcam.py", line 152, in save_frame_thread
    with MappedArray(req, "main") as m:
  File "/usr/lib/python3/dist-packages/picamera2/request.py", line 46, in __enter__
    b = self.__buffer.__enter__()
        ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/picamera2/request.py", line 30, in __enter__
    self.__mm = self.__sync.__enter__()
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/picamera2/allocators/dmaallocator.py", line 118, in __enter__
    raise RuntimeError("failed to find buffer in DmaSync")
RuntimeError: failed to find buffer in DmaSync

Do you know how I can fix this error?
I know you mentioned about the workaround/fix meaning I have to miss some captures, but I think for my use case with this program I really want to avoid missing any frames at all if possible.

I know you mentioned before that this crash can happen if the jpeg encode thread takes too long, but I don't quite understand why this would happen, or why it can't find a buffer?
For example, it looks like you setup a queue system so that each request is retrieved in the main thread and then sent to the render thread, and then the render thread converts it into a jpeg and saved. But given that there is a queue, shouldn't the processing time of the render thread not matter?
For example lets say it is still rendering the previous jpeg when the next request is put into the queue. Why can't the render thread just finish rendering the current jpeg, and then move onto rendering the next request in the queue until the queue is empty, which is what the code seems to suggest?

My rough guess about this is because it uses MappedArray, and that is directly accessing the buffer memory in order to get the raw frame data (instead of first making a copy of the buffer), then once a new request happens in the main thread, the program loses access to the old memory that MappedArray was accessing and it can't complete the encode, or maybe it can't start a new MappedArray for the next encode that is now in the queue?

That is just a complete guess though, so I would be grateful if you could let me know how I can fix this error, and if it is possible to do so while not missing any frames to be encoded?

One workaround I tried, was to make the encoding single threaded instead of multi threaded again, so now the main function blocks until imwrite has finished. The results were somewhat predictable in that the max execution time of the whole function went up to 3.66 secs, which if you look at the error I posted above is about equal to the total time that it would have taken if it was single threaded if you add up the relevant individual times (3.72 secs). So a single threaded version gets rid of the error, but means increased blocking time.

For this use case, 3.66 secs blocking time is just about acceptable (the limit is approx 4 secs), but given that the average is about 0.8-1.2 secs, I would like to try and reduce this max value if possible, just in case it ends up going higher than 4 secs in the future because my testing was not thorough enough.
The only other option I can try in order to increase stability/decrease max time is to move to the 32 bit version of the lite OS, but as you said that may not do much so I'd appreciate any other suggestions you have, such as how to fix the multi threaded version.

@davidplowman
Copy link
Collaborator

I'll look again next week, when I may be able to find some more knowledgeable Linux-y types. In the meantime, if you did have a spare SD card lying around, I'd encourage you to give the 32-bit OS a go. It is thought to be better, but it would be interesting to know, and I don't currently have any other ideas to try, certainly nothing as relatively straightforward!

@Snelso91
Copy link
Author

Snelso91 commented Nov 25, 2024

@davidplowman I did some more testing over the weekend, and found out the following things:

  • The 32 bit lite OS seemed to have helped reduce the variance in function runtime, and reduce the average and max times.
  • I found a solution/workaround to the DMA error, which is that instead of having a single render thread with a queue to get new data into it, I instead spawn a new render thread for each call of the main capture thread.
    So it turns out it's nothing to do with using Mapped Array in particular, and instead is to do with using a queue.
    In theory this new way is less efficient since you would have some overhead from starting and stopping so many threads all the time, but it's better than getting an error.
  • I also did some testing comparing the different methods of getting and saving the image to file (switch_mode + Mapped Array using imwrite, switch_mode + capture_file, switch_mode + req.save, and finally switch_mode_and_capture_file), and Mapped Array was definitely the fastest so I will probably keep using that.
  • Unfortunately the program still has "flare ups" where occasionally for no discernible reason the program will go from approx. 1.5 second max times for take_frame (using multiple separate encode threads as above) and 1.3 second jpeg encode times to 4+ second times for either the main thread or the encode thread, and the program will miss frames.

I should note that the testing for this was done using my main program rather than your isolated test program, because I needed it to take pictures when the GPIO pin went high.
It also runs some other code such pinging an IP address once per minute, and sending/receiving MQTT messages.
In theory these extra processes could be causing the slowdown, but in practice when I did testing I did not observe much difference between your program and mine in terms of the average or max times, and the core code for the take_frame function is identical between the two.
I am also not running any other programs on the pi - it is effectively a fresh install of the 32 bit lite OS, other than installing the python libraries for my program and its systemd service to get it to run on boot/in the background all the time.

Given that I am still having issues even with the "optimal" code using Mapped Array and multiple threads for encoding, and using the optimal OS (32bit lite), do I need to give up on trying to use this program with a pi zero 2 due to its lack of RAM?
I can go and buy a Pi 3B+ or Pi 4 1GB (they're about the same price) if it will help solve this issue, since it seems like we've tried basically everything and it's still not working, and I really would like to fix this soon if possible in order to move on with the next tasks that it's blocking.


Now for the details to go with the above summary (you can skip these if needed):

DMA error fix & using multiple render threads
I tested your original single encode thread with queue with req.save instead of Mapped Array, but it still had the DMA error, so that's how I ruled out Mapped Array being the cause of the error.
Also, the DMA error happened most often when the sleep time between subsequent take_frame executions was low.
I did some further testing and I could get the DMA error to show up after around 300 iterations if I set the delay between iterations to 0.4 secs or less rather than 1 sec.

After implementing the new method where I spawn a new render thread each time we take a picture, I was able to get this new version to work fine with no errors even with 0 delay/sleep between take_frame calls.
However, because you are starting a new take_frame call while the render thread of the previous call is still rendering, you do end up accumulating several render threads that are all running in the background while you are continuing to make new main thread calls (and therefore make new render threads)
From my testing, this didn't cause a memory overflow or even a significant increase in memory. Instead the only noticeable effect from running take_frame back to back with no delay was that both take_frame and the render threads would take longer to execute, which makes sense since python is only single threaded so it's having to time share all these different threads on one core only. So if you decrease the interval between calling take_frame, then the maximum time it takes to run/time it blocks for also increases due to the increased load on the single CPU core.

Debugging continued slowdowns
I don't know exactly why the program still runs slowly occasionally and misses frames.
Sometimes the main thread will run slowly and so full_time will increase beyond 4 seconds, and so that would be why a frame was missed (however I still don't know why the picamera2 commands run slowly on occasion, which then causes the take_frame full_time/execution time to spike).
However, other times I have noticed in the journalctl log of the service that runs my python program, that the runtime of the main thead/take_frame is below 1 second, but it will still miss frames which seems contradictory. For example:

Nov 24 23:50:52 p1s-cam python[22218]: GPIO input signal received
Nov 24 23:50:52 p1s-cam python[22218]: [54:14:10.520781182] [22226]  INFO Camera camera.cpp:1197 configuring streams: (0) 3552x2592-RGB888 (1) 4608x2592-SBGGR10_CSI2P
Nov 24 23:50:52 p1s-cam python[22218]: [54:14:10.538388484] [22221]  INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 4608x2592-SBGGR10_1X10 - Selected unicam format: 4608x2592-pBAA
Nov 24 23:50:52 p1s-cam python[22218]: frame filename = "./frames/0018.jpg"
Nov 24 23:50:52 p1s-cam python[22218]: [54:14:10.877875334] [22226]  INFO Camera camera.cpp:1197 configuring streams: (0) 1480x1080-BGR888 (1) 2304x1296-SBGGR10_CSI2P
Nov 24 23:50:52 p1s-cam python[22218]: [54:14:10.894824825] [22221]  INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 2304x1296-SBGGR10_1X10 - Selected unicam format: 2304x1296-pBAA
Nov 24 23:50:52 p1s-cam python[22218]: Layer Number = 19
Nov 24 23:50:52 p1s-cam python[22218]: stop_encoder_time     0.032, switch_to_still_time     0.076, get_request_time     0.285, switch_to_video_time     0.075, start_encoder_time     0.034, full_time     0.503
Nov 24 23:50:52 p1s-cam python[22218]: stop_encoder_time_max 0.430, switch_to_still_time_max 0.250, get_request_time_max 0.291, switch_to_video_time_max 0.085, start_encoder_time_max 0.055, full_time_max 0.918
Nov 24 23:50:53 p1s-cam python[22218]: frame save finished: "./frames/0018.jpg"
Nov 24 23:50:53 p1s-cam python[22218]: jpeg_time     0.559
Nov 24 23:50:53 p1s-cam python[22218]: jpeg_time_max 0.881
Nov 24 23:50:56 p1s-cam python[22218]: GPIO input signal received
Nov 24 23:50:56 p1s-cam python[22218]: [54:14:14.757332884] [22226]  INFO Camera camera.cpp:1197 configuring streams: (0) 3552x2592-RGB888 (1) 4608x2592-SBGGR10_CSI2P
Nov 24 23:50:56 p1s-cam python[22218]: [54:14:14.776477422] [22221]  INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 4608x2592-SBGGR10_1X10 - Selected unicam format: 4608x2592-pBAA
Nov 24 23:50:56 p1s-cam python[22218]: frame filename = "./frames/0019.jpg"
Nov 24 23:50:56 p1s-cam python[22218]: [54:14:15.115237971] [22226]  INFO Camera camera.cpp:1197 configuring streams: (0) 1480x1080-BGR888 (1) 2304x1296-SBGGR10_CSI2P
Nov 24 23:50:56 p1s-cam python[22218]: [54:14:15.133188553] [22221]  INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 2304x1296-SBGGR10_1X10 - Selected unicam format: 2304x1296-pBAA
Nov 24 23:50:57 p1s-cam python[22218]: Layer Number = 20
Nov 24 23:50:57 p1s-cam python[22218]: stop_encoder_time     0.418, switch_to_still_time     0.075, get_request_time     0.283, switch_to_video_time     0.081, start_encoder_time     0.034, full_time     0.893
Nov 24 23:50:57 p1s-cam python[22218]: stop_encoder_time_max 0.430, switch_to_still_time_max 0.250, get_request_time_max 0.291, switch_to_video_time_max 0.085, start_encoder_time_max 0.055, full_time_max 0.918
Nov 24 23:50:57 p1s-cam python[22218]: frame save finished: "./frames/0019.jpg"
Nov 24 23:50:57 p1s-cam python[22218]: jpeg_time     0.986
Nov 24 23:50:57 p1s-cam python[22218]: jpeg_time_max 0.986

Nov 24 23:51:06 p1s-cam python[22218]: GPIO input signal received
Nov 24 23:51:06 p1s-cam python[22218]: [54:14:18.204636353] [22226]  INFO Camera camera.cpp:1197 configuring streams: (0) 3552x2592-RGB888 (1) 4608x2592-SBGGR10_CSI2P
Nov 24 23:51:06 p1s-cam python[22218]: [54:14:18.223546256] [22221]  INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 4608x2592-SBGGR10_1X10 - Selected unicam format: 4608x2592-pBAA
Nov 24 23:51:06 p1s-cam python[22218]: frame filename = "./frames/0020.jpg"
Nov 24 23:51:06 p1s-cam python[22218]: [54:14:18.563682739] [22226]  INFO Camera camera.cpp:1197 configuring streams: (0) 1480x1080-BGR888 (1) 2304x1296-SBGGR10_CSI2P
Nov 24 23:51:06 p1s-cam python[22218]: [54:14:18.577902601] [22221]  INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 2304x1296-SBGGR10_1X10 - Selected unicam format: 2304x1296-pBAA
Nov 24 23:51:06 p1s-cam python[22218]: Layer Number = 21
Nov 24 23:51:06 p1s-cam python[22218]: stop_encoder_time     0.032, switch_to_still_time     0.085, get_request_time     0.284, switch_to_video_time     0.068, start_encoder_time     0.033, full_time     0.502
Nov 24 23:51:06 p1s-cam python[22218]: stop_encoder_time_max 0.430, switch_to_still_time_max 0.250, get_request_time_max 0.291, switch_to_video_time_max 0.085, start_encoder_time_max 0.055, full_time_max 0.918
Nov 24 23:51:06 p1s-cam python[22218]: frame save finished: "./frames/0020.jpg"
Nov 24 23:51:06 p1s-cam python[22218]: jpeg_time     5.301
Nov 24 23:51:06 p1s-cam python[22218]: jpeg_time_max 5.301

For reference, a GPIO pulse was happening at an interval of about 3.8 seconds. You can see that in the top section, everything is working fine - both full_time_max and jpeg_time_max are low. However, the next GPIO signal received happens a whole 10 seconds after the previous one (23:51:06 vs 23:50:56), and there is a sudden spike of jpeg_time in the render thread to 5.3 secs.

My guess about this is because even though the jpeg encode is happening in another thread, because python doesn't have real multi threading, is it possible that the jpeg thread could suddenly be assigned CPU time by the GIL for several seconds at a time when the main thread needs it?
That could maybe explain why the main thread misses some inputs despite the previous iteration of take frame having already run and finished and it waiting for a GPIO input as normal.

Or could it be something else in the linux system interfering with the python program or hogging the CPU for a few seconds?
When I stop the program and only the background processes are running, nothing is using much CPU, and the only thing using any significant RAM is the journald process (just under 3%)
Given that I run the python program as a systemd process, I know that journald is constantly logging its output to a file for me to view later. Could this logging interfering be with the python program or using up significant CPU time?

Another example is this:

Nov 24 18:06:25 p1s-cam python[10144]: GPIO input signal received
Nov 24 18:06:32 p1s-cam python[10144]: [48:29:43.888212106] [10153]  INFO Camera camera.cpp:1197 configuring streams: (0) 3552x2592-RGB888 (1) 4608x2592-SBGGR10_CSI2P
Nov 24 18:06:32 p1s-cam python[10144]: [48:29:43.905554514] [10148]  INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 4608x2592-SBGGR10_1X10 - Selected unicam format: 4608x2592-pBAA
Nov 24 18:06:32 p1s-cam python[10144]: frame filename = "./frames/0555.jpg"
Nov 24 18:06:32 p1s-cam python[10144]: [48:29:44.243058112] [10153]  INFO Camera camera.cpp:1197 configuring streams: (0) 1480x1080-BGR888 (1) 2304x1296-SBGGR10_CSI2P
Nov 24 18:06:32 p1s-cam python[10144]: [48:29:44.257257715] [10148]  INFO RPI vc4.cpp:622 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 2304x1296-SBGGR10_1X10 - Selected unicam format: 2304x1296-pBAA
Nov 24 18:06:32 p1s-cam python[10144]: Layer Number = 556
Nov 24 18:06:32 p1s-cam python[10144]: stop_encoder_time     0.031, switch_to_still_time     0.081, get_request_time     0.284, switch_to_video_time     0.067, start_encoder_time     0.033, full_time     0.497
Nov 24 18:06:32 p1s-cam python[10144]: stop_encoder_time_max 0.436, switch_to_still_time_max 1.100, get_request_time_max 0.288, switch_to_video_time_max 0.455, start_encoder_time_max 0.058, full_time_max 1.514
Nov 24 18:06:32 p1s-cam python[10144]: frame save finished: "./frames/0555.jpg"
Nov 24 18:06:32 p1s-cam python[10144]: jpeg_time     3.851
Nov 24 18:06:32 p1s-cam python[10144]: jpeg_time_max 8.686

You can see that strangely despite python itself reporting the full time to run the function as 0.5 secs, the first INFO message from picam2 doesn't appear in the logs until 7 secs after the GPIO message in the journald logs, which is contradictory. So this suggests maybe journald itself is lagging and not logging the messages at the actual time they occurred, so maybe I need to switch from print statements to python logging so that each message is output with an accurate timestamp?

In the systemd definition file, I run the program with python -u to make sure that the output is unbuffered. This is so that all my print statements line up in time with when they were executed rather than appearing as sudden chunks of text whenever python decides to flush the buffer to disk.
However, I have read that using unbuffered mode can cause program slowdowns if there is a lot of text being written. For my program, it writes about 12 lines/1594 characters per second at most (includes all my print statements and the camera info being written by picamera2), so maybe this counts as a large amount of text and is causing journald to occasionally lag during its logging and lag the whole system?

My final note is that last night I would get multiple failures in a row even though I was restarting the python program/systemd service each time, whereas today everything went fine on my first test. So I'm not sure if this indicates that there was a process running last night that is not running right now, such as journald doing a lot of logging cleanup or something in the background.
I was considering restarting the pi to try and get rid of the problem last night, but on the other hand if the problem can just occur naturally due to the pi being on for a while, and it goes away with restarting, that restarting isn't really fixing the issue, it's just hiding it during my tests, only for it to fail in the real world/"production" later on.

@davidplowman
Copy link
Collaborator

Hi again. A few things...

  • Fixing that test code shouldn't be difficult if you wanted to. I think it only needs an event to be set after req.release() which gets waited on before picam2.configure(still_config).
  • I do want to look at this some more this week, though I can't 100% promise when.
  • The only advantage a Pi 3B+ has over a Pi 4 might be slightly lower power consumption, so far as I'm aware. Unless that's important, or you really hate micro-HDMI sockets, I think a Pi 4 is just faster and better in all other respects!
  • Again, can't 100% promise that 1GB model will resolve this, though it did seem to be the case last week. I'll try it again when I return to this topic!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants