Support for YUV #121

mat-hek · 2023-12-19T16:21:25Z

mat-hek
Dec 19, 2023

Hi there, thanks for a great library! I'd like to overlay a PNG image on top of a YUV image and get a YUV image. Does Image support YUV?

kipcole9 · 2023-12-19T16:49:45Z

kipcole9
Dec 19, 2023
Maintainer

Image is based upon vix by @akash-akya - and in turn it's libvips that provides the underlying imaging engine implementation.

According to the libvips color documentation, YUV isn't a supported colourspace unfortunately.

I suppose you could use Nx to do the transformation yourself, perhaps using these transforms but that would require some development by you.

Alternatively, you could leverage eVision by @cocoa-xu which is based upon OpenCV. You could either use evision itself (which if your use case is as simple as you suggest may be the easiest). Or if you have a use for Image in other ways, there is good interop between image/vix/nx/evision so it would be reasonably efficient to follow that path too.

I've you attach a sample image in YUV colourspace, I'll willing to do some experimentation on an image/evision version of what you're looking for.

1 reply

mat-hek Dec 20, 2023
Author

I've you attach a sample image in YUV colourspace, I'll willing to do some experimentation on an image/evision version of what you're looking for.

That's generous, I'd appreciate it! Here's the image: image.yuv.zip. It's YUV I420, 1920x1080. Here's what it looks like (it's a screenshot, so it's not perfectly accurate):

Just FYI, apart from this, I'll probably need to support I422 and I444 too - these are common in H264 video, which I'm dealing with.

kipcole9 · 2023-12-24T15:49:37Z

kipcole9
Dec 24, 2023
Maintainer

Thanks @mat-hek. I'm a little confused on the image example. YUV is a colourspace, not a file format. I've tried treating the file as a .png, .jpg and .tiff but it doesn't seem to by any of those. Any chance you can add the example, but in one of those file formats (with the YUV colourspace of course).

0 replies

kipcole9 · 2023-12-24T16:48:52Z

kipcole9
Dec 24, 2023
Maintainer

You can tell from my comment that this is not an area of expertise for me at all! I've been doing some further reading and so far my findings are:

If you can provide the Y, U and V bands then this example code looks like it can convert from YUV (I420) to RGB. Although that looks to be true for planar data only.
eVision can do frame capture from a video file or from a camera. You can use this via the Image.Video module. The functions will return an RGB formatted image.

Therefore thinking about your problem, the best I can come up with so far is:

Use Image.Video.image_from_video/2 to read a frame. This will return an RGB image.
Use Image.compose/2 to compose your overlay onto the frame image
(waves hands) convert back to YUV.

I wonder if using ffmpeg might be an easier fit for your use case? This example seems similar to your requirement?

0 replies

kipcole9 · 2023-12-24T16:53:35Z

kipcole9
Dec 24, 2023
Maintainer

There is an ffmpeg command line interface library at https://hexdocs.pm/ffmpex/readme.html that might help.
And lastly, the Membrane team is doing amazing work and they might have some good ideas?

PS: Not giving up on this, just not sure of the path forward.

0 replies

kipcole9 · 2023-12-24T18:13:44Z

kipcole9
Dec 24, 2023
Maintainer

OK, one more thought. I do think eVision will be the better tool for this (better than image). I think the workflow would be:

Read the image file, or capture it from a video file or camera. If you use Evision.VideoCapture.read/2 this will return an RGB frame image. Alternatively, read the raw data and using Evision.cvtColorTwoPlane/4 convert it to RGB.
Open your overlay image and overlay it on the frame. I don't know the function that does that, but if you google opencv image overlay I think you'll be good.
Convert the resulting image to YUV using Evision.cvt_color/3.
Save the raw data somewhere and use ffmpeg to combine the data into your output video. Or save it to a video stream using the functions in Evision.VideoWriter

I have no doubt that I have oversimplified the details above but I think its the right general direction.

For converting back to YUV there appears to be a number of different color spaces that meet your requirements:

iex> Evision.Constant.cv_COLOR_RGB<tab>
cv_COLOR_RGB2BGR/0          cv_COLOR_RGB2BGR555/0       cv_COLOR_RGB2BGR565/0       cv_COLOR_RGB2BGRA/0
cv_COLOR_RGB2GRAY/0         cv_COLOR_RGB2HLS/0          cv_COLOR_RGB2HLS_FULL/0     cv_COLOR_RGB2HSV/0
cv_COLOR_RGB2HSV_FULL/0     cv_COLOR_RGB2Lab/0          cv_COLOR_RGB2Luv/0          cv_COLOR_RGB2RGBA/0
cv_COLOR_RGB2XYZ/0          cv_COLOR_RGB2YCrCb/0        cv_COLOR_RGB2YUV/0          cv_COLOR_RGB2YUV_I420/0
cv_COLOR_RGB2YUV_IYUV/0     cv_COLOR_RGB2YUV_YV12/0     cv_COLOR_RGBA2BGR/0         cv_COLOR_RGBA2BGR555/0
cv_COLOR_RGBA2BGR565/0      cv_COLOR_RGBA2BGRA/0        cv_COLOR_RGBA2GRAY/0        cv_COLOR_RGBA2mRGBA/0
cv_COLOR_RGBA2RGB/0         cv_COLOR_RGBA2YUV_I420/0    cv_COLOR_RGBA2YUV_IYUV/0    cv_COLOR_RGBA2YUV_YV12/0

If thats not enough, then using ffmpeg like the example at https://video.stackexchange.com/questions/12105/add-an-image-overlay-in-front-of-video-using-ffmpeg is probably the best go-to.

0 replies

mat-hek · 2024-01-04T15:48:19Z

mat-hek
Jan 4, 2024
Author

Thanks for this very comprehensive answer!

YUV is a colourspace, not a file format

Indeed it is, I should have made this clear. There is a simple file format called Y4M that holds plain YUV, but it's for video, so it can carry multiple YUV frames, and I don't think it's popular.

I wonder if using ffmpeg might be an easier fit for your use case? This example seems similar to your requirement?

Indeed, FFmpeg can do this, but I think it's too heavy for the job, especially if I already have a stream of decoded YUV frames

And lastly, the Membrane team is doing amazing work and they might have some good ideas?

I'm afraid Membrane doesn't support this, because I'm trying to add it there now 😄

Read the image file, or capture it from a video file or camera. If you use Evision.VideoCapture.read/2 this will return an RGB frame image. Alternatively, read the raw data and using Evision.cvtColorTwoPlane/4 convert it to RGB.

Here comes the first problem, how do I convert YUV to the OpenCV matrix. I cannot take it from the camera, and for cvtColorTwoPlane I'd need to manually separate Y and UV planes. Also, it says it only supports I420, which may be not enough for me in the end. I'll ask in the EVision repo, but I'll suppose that this or manually constructing the matrix is my only option anyway.

Open your overlay image and overlay it on the frame. I don't know the function that does that, but if you google opencv image overlay I think you'll be good.

It seems doable with OpenCV, but a bit tricky, especially when the overlay image has some opacity in it. Would it be feasible to use Image to do that once I have RGB?

Convert the resulting image to YUV using Evision.cvt_color/3.

Makes sense

Save the raw data somewhere and use ffmpeg to combine the data into your output video. Or save it to a video stream using the functions in Evision.VideoWriter

I'm good once I have the output YUV. Then I'm going to use Membrane to stream further.

Thanks again, OFC I'll post here if I have something ;)

1 reply

kipcole9 Jan 4, 2024
Maintainer

Well it sounds like I need to go do some math then. Not my strong suit but I know it's possible to to do (and probably quite efficient with Nx) so I'll keep going if you are willing to collaborate - given you're aiming to add this to Membrane (apologies for not recognising you up front!)

I don't really want to get into the file format but I think I can work out the colorspace conversion.

kipcole9 · 2024-01-04T23:37:26Z

kipcole9
Jan 4, 2024
Maintainer

I did some more googling and perhaps this Python lib encapsulates most of what you're after? It looks well written and should convert to a Elixir/Nx reasonably well.

I'm ok to tackle an Elixir implementation if you think it meets your needs. I'll make it a new library image_yuv.

0 replies

kipcole9 · 2024-01-04T23:54:42Z

kipcole9
Jan 4, 2024
Maintainer

It seems doable with OpenCV, but a bit tricky, especially when the overlay image has some opacity in it. Would it be feasible to use Image to do that once I have RGB?

Yes, this is easy in Image when you have an RGB image, as well as having control over different blend modes for the overlay.

0 replies

mat-hek · 2024-01-05T17:14:55Z

mat-hek
Jan 5, 2024
Author

Ok, so it turns out that my friends from Video compositor had the same problem and they wrote the RGB <-> YUV 420 by hand, so I rewrote it in Elixir. I suppose it's not very performant and I'm not sure I did everything right, but it seems to work! Here's the POC:

Mix.install([:vix, :image])

defmodule YUV420pToRGB do
  def convert(image, width) do
    y_size = trunc(byte_size(image) * 2 / 3)
    uv_size = trunc(byte_size(image) * 1 / 6)

    <<y_plane::binary-size(y_size), u_plane::binary-size(uv_size), v_plane::binary-size(uv_size)>> =
      image

    IO.iodata_to_binary(convert_rows(y_plane, u_plane, v_plane, width))
  end

  defp convert_rows(<<>>, <<>>, <<>>, _width) do
    []
  end

  defp convert_rows(y_plane, u_plane, v_plane, width) do
    uv_width = div(width, 2)
    <<y_row::binary-size(width), y_next_row::binary-size(width), y_plane::binary>> = y_plane
    <<u_row::binary-size(uv_width), u_plane::binary>> = u_plane
    <<v_row::binary-size(uv_width), v_plane::binary>> = v_plane

    [
      convert_pixels(y_row, u_row, v_row),
      convert_pixels(y_next_row, u_row, v_row) | convert_rows(y_plane, u_plane, v_plane, width)
    ]
  end

  defp convert_pixels(<<>>, <<>>, <<>>) do
    []
  end

  defp convert_pixels(<<y1, y2, y_row::binary>>, <<u, u_row::binary>>, <<v, v_row::binary>>) do
    [convert_pixel(y1, u, v), convert_pixel(y2, u, v) | convert_pixels(y_row, u_row, v_row)]
  end

  defp convert_pixel(y, u, v) do
    r = clamp(y + 1.40200 * (v - 128.0))
    g = clamp(y - 0.34414 * (u - 128.0) - 0.71414 * (v - 128.0))
    b = clamp(y + 1.77200 * (u - 128.0))
    <<r, g, b>>
  end

  defp clamp(number) when number > 255, do: 255
  defp clamp(number) when number < 0, do: 0
  defp clamp(number), do: round(number)
end

defmodule RGBToYUV444p do
  def convert(image) do
    do_convert(image, {[], [], []})
  end

  defp do_convert(<<>>, {y_plane, u_plane, v_plane}) do
    IO.iodata_to_binary([Enum.reverse(y_plane), Enum.reverse(u_plane), Enum.reverse(v_plane)])
  end

  defp do_convert(<<r, g, b, image::binary>>, {y_plane, u_plane, v_plane}) do
    y = clamp(0.299 * r + 0.587 * g + 0.114 * b)
    u = clamp(-0.168736 * r - 0.331264 * g + 0.5 * b + 128.0)
    v = clamp(0.5 * r + -0.418688 * g + -0.081312 * b + 128.0)
    do_convert(image, {[y | y_plane], [u | u_plane], [v | v_plane]})
  end

  defp clamp(number) when number > 255, do: 255
  defp clamp(number) when number < 0, do: 0
  defp clamp(number), do: round(number)
end

defmodule YUV444pToYUV420p do
  def convert(image, width) do
    plane_size = byte_size(image) |> div(3)

    <<y_plane::binary-size(plane_size), u_plane::binary-size(plane_size),
      v_plane::binary-size(plane_size)>> = image

    IO.iodata_to_binary([y_plane, convert_rows(u_plane, width), convert_rows(v_plane, width)])
  end

  defp convert_rows(<<>>, _width) do
    []
  end

  defp convert_rows(plane, width) do
    <<row::binary-size(width), next_row::binary-size(width), plane::binary>> = plane
    [convert_pixels(row, next_row) | convert_rows(plane, width)]
  end

  defp convert_pixels(<<>>, <<>>) do
    []
  end

  defp convert_pixels(<<a, b, row::binary>>, <<c, d, next_row::binary>>) do
    result = round((a + b + c + d) / 4)
    [<<result>> | convert_pixels(row, next_row)]
  end
end

yuv = File.read!("image.yuv")
rgb = YUV420pToRGB.convert(yuv, 1920)
{:ok, vix} = Vix.Vips.Image.new_from_binary(rgb, 1920, 1080, 3, :VIPS_FORMAT_UCHAR)
overlay = Image.open!("membrane.png")
composed = Image.compose!(vix, overlay)

{:ok, composed_srgb} = Vix.Vips.Image.write_to_binary(composed)
# It seems that Vips cannot convert to plain RGB
composed_rgb = for <<r, g, b, _a <- composed_srgb>>, do: <<r, g, b>>, into: <<>>

yuv444 = RGBToYUV444p.convert(composed_rgb)
output_yuv = YUV444pToYUV420p.convert(yuv444, 1920)

# Convert back to RGB to look up
rgb = YUV420pToRGB.convert(output_yuv, 1920)
{:ok, vix} = Vix.Vips.Image.new_from_binary(rgb, 1920, 1080, 3, :VIPS_FORMAT_UCHAR)
Image.write!(vix, "output.jpeg")

Output:

I may need the support for yuv422 and yuv444 and maybe others - I'm not sure if there's something else actually used out there - so if you're willing to help, we can collaborate and make it image_yuv ;) The python library you mentioned seems right, but it seems it only supports yuv444.

0 replies

kipcole9 · 2024-01-06T05:19:49Z

kipcole9
Jan 6, 2024
Maintainer

Thats very cool, thank you.

The python library you mentioned seems right, but it seems it only supports yuv444

It supports 4:4:0, 4:2:2 and 4:2:0 but colour conversions are done through 4:4:4 since that can be represented as a regular 2d matrix. And it supports them in planar mode and interleaved. Actually these formats specifically:

pixel_formats.register(YUV422P)
pixel_formats.register(YUV422P10LE)
pixel_formats.register(YUV422P10BE)
pixel_formats.register(YUV422P16LE)
pixel_formats.register(YUV422P16BE)
pixel_formats.register(YUV422P9LE)
pixel_formats.register(YUV422P9BE)
pixel_formats.register(YUV422P12LE)
pixel_formats.register(YUV422P12BE)
pixel_formats.register(YUV422P14LE)
pixel_formats.register(YUV422P14BE)

It also supports:

0 replies

kipcole9 · 2024-01-06T05:22:44Z

kipcole9
Jan 6, 2024
Maintainer

Net net I'm going to try and re-implement the Python package and use your code (and hopefully your engagement) to verify, test and improve.

1 reply

mat-hek Jan 6, 2024
Author

Awesome! I can help with implementation too if you need ;)

kipcole9 · 2024-01-07T10:19:17Z

kipcole9
Jan 7, 2024
Maintainer

I'm creating a repo called image_yuv that will also integrate with the rest of image. Here are some thoughts for comment (and I'll move this to the new repo when I initiate it). Your feedback and comments are very warmly welcomes

Integrate with `Image`

Image.open/2 will be able to open .y4m files This format has just enough metadata to be able to identify the width, height, encoding and colourspace of the stream.
- It will be treated as a multi-page image (one per frame)
- Image.open/2 will convert each frame to 4:4:4 encoding and then to sRGB colourspace so that it integrates easily with the rest of Image
Image.write/3 will write image to .y4m files with options to specify the encoding, byte order and colourspace (primarily REC601, REC709, REC2020, REC2100)

Image.YUV

Image.YUV will have the functions to read and write .y4m files
Image.YUV will have a stream function to be able to stream frames from .y4m files. This will be important for memory management since we are dealing with raw image data.
Image.YUV will have functions to decode a raw YUV image. To do so it will need to be provided with the width, height, encoding and colourspace since these can't (easily?) be derived from the raw stream.
Image.YUV will have the functions to convert between colourspaces (REC601, REC709 etc)
Image.YUV will have functions to convert between encodings (4:4:4, 4:2:2 and 4:2:0 primarily) with byte order indication.
Image.YUV will have functions to encode a Vix.Vips.Image.t image (the image type in the image library).
Image.YUV will wherever possible, leverage Nx in order to maximise performance. That means that Nx will be a required dependency of Image.YUV. Today it is an optional dependency of image.

0 replies

kipcole9 · 2024-01-08T10:21:59Z

kipcole9
Jan 8, 2024
Maintainer

And ... small change in plan. I've been overcomplicating things. I'm just going to add an Image.YUV module to image. Its very easy (with libvips) to rescale between 4:4:4, 4:2:2 and 4:2:0 (other can be added as required). as long as the data is planar (which you example is.

I also have the matrices for converting between BT.601 (SD) BT.709 (HD) and BT.2020 (UHDTV) and RGB.

I expect to have a version you can test out on your Tuesday. libvips is doing all the heavy lifting, Nx isn't required and I think the performance will be acceptable (and better than your current implementation. I'm leveraging the hex package y4m for reading and writing .y4m files which is a big help but of course raw pixel data is also totally ok.

I've learned a lot already from this - thanks for the opportunity.

8 replies

mat-hek Jan 8, 2024
Author

MJPEG is just a series of JPEGs IIRC ;)

kipcole9 Jan 8, 2024
Maintainer

One comment on a quick scan is

The JPEG standard is based on the RGB color space as a starting point. It is converted to YCbCr for compression purposes.

Which means it will be libjpeg doing the conversion/compression and that won't be visible in the libvips API. But .... now that I've been through the learning curve I think a target of 150ms for the conversation to RGB for a YUV image should be reasonable (fingers crossed).

mat-hek Jan 8, 2024
Author

target of 150ms for the conversation to RGB for a YUV image should be reasonable

Hmm, it's way better than 1,5s, but on the other hand this code:

{:ok, vix} = Vix.Vips.Image.new_from_binary(rgb, 1920, 1080, 3, :VIPS_FORMAT_UCHAR)
composed = Image.compose!(vix, overlay)
{:ok, composed_srgb} = Vix.Vips.Image.write_to_binary(composed)

runs in just 5ms ;) Maybe it would be better to have the conversion as a NIF? If we fit in 15ms, we could process 60 FPS live video without parallelization, so it would be worth it IMO

kipcole9 Jan 8, 2024
Maintainer

The key performance element is the matrix math for the colour conversion. There's some very good SIMD optimisations in recent libvips, certainly in 8.15, so I'm also curious to see. All of the math is happening in the NIF - the performance constraints won't be the BEAM runtime.

mat-hek Jan 8, 2024
Author

Ok, keeping my fingers crossed then 🤞

kipcole9 · 2024-01-08T10:56:06Z

kipcole9
Jan 8, 2024
Maintainer

One other limitation (at least in the first iteration) will be that the data is 8 bits per channel. It shouldn't be too hard to adapt to 10 or 12 bit data. Is that something you're likely to need? I'm not confident the rest of image is very robust with higher bit depth images.

1 reply

mat-hek Jan 8, 2024
Author

I think I won't need that for now

kipcole9 · 2024-01-08T15:16:39Z

kipcole9
Jan 8, 2024
Maintainer

I've got a basic implementation up a running on my Mac M1 Max, 64Gb. The results are not correct, but the computational effort isn't likely to change. This is using your 1920x1080 test image at the top of this thread. The results are quite promising:

Name                                              ips        average  deviation         median         99th %
Decoding and then converting YUV to RGB        1.62 K      616.68 μs    ±41.47%      592.25 μs     1854.32 μs
Converting YUV to RGB                          1.59 K      630.48 μs    ±40.37%      606.71 μs     1818.84 μs

Comparison:
Decoding and then converting YUV to RGB        1.62 K
Converting YUV to RGB                          1.59 K - 1.02x slower +13.80 μs

Memory usage statistics:

Name                                       Memory usage
Decoding and then converting YUV to RGB        11.92 KB
Converting YUV to RGB                          11.70 KB - 0.98x memory usage -0.22656 KB

Less than a millisecond and less than 12KB of memory is pretty good I think. The only thing is that the results are incorrect :-). Well it's late at night so I'll tackle it again in the morning.

0 replies

kipcole9 · 2024-01-08T15:38:37Z

kipcole9
Jan 8, 2024
Maintainer

The code is located https://github.com/elixir-image/image/blob/yuv/lib/image/yuv.ex but it certainly isn't producing the expected result yet. I'm following the principles of libvips/libvips#2561 where the key function is Operation.recomb/2. Which I certainly don't properly understand.

You can try it for yourself if you're curious by:

{:ok, f} = File.open("/path/to/image.yuv")
data = IO.binread(f, :all)
decoded = Image.YUV.decode(data, 1920, 1080, :C420)
{:ok, rgb} = Image.YUV.to_rgb(decoded, 1920, 1080, :C420, :bt601)

Time for sleep.

0 replies

kipcole9 · 2024-01-08T22:39:14Z

kipcole9
Jan 8, 2024
Maintainer

The sleep helped. We are up and running for YUV image decoding. In the yuv branch, the functions Image.YUV.new_from_file/5 and Image.YUV.new_from_binary/5 are the key entry points. I have not bolted on the .y4m file support yet.

The performance looks pretty good (all due to libvips). This is with your 1920x1080 4:2:0 image (using BT601 colourspace).

Name                                            ips        average  deviation         median         99th %
Converting YUV binary to an RGB image        1.46 K      686.10 μs    ±30.97%      647.58 μs     1686.91 μs

Memory usage statistics:

Name                                     Memory usage
Converting YUV binary to an RGB image        14.34 KB

I'll work next on image encoding to YUV which should be pretty straightforward now. Feel free to try out the yuv branch if you're so inclined.

0 replies

kipcole9 · 2024-01-09T04:24:16Z

kipcole9
Jan 9, 2024
Maintainer

I've added Image.YUV.to_yuv/3, Image.YUV.write_to_file/4 and Image.YUV.write_to_binary/3. In basic testing, round tripping seems to work (that is, take your test .yuv file, convert to an RGB image, then convert back to YUV and then back again to RGB.

The performance of converting RGB to YUV is much slower than YUV to RGB and I'm not sure why. I'll look into it later today - it won't change the API.

For now I think this is API complete other than .y4m file and streaming support. Therefore I hope it can replace your hand-rolled code since this is quite a bit faster due to leveraging libvips. Comments, feedback, suggestions very welcome.

Here is the benchmark data:

Name                                                           ips        average  deviation         median         99th %
Converting YUV 4:2:0 binary in BT601 to an RGB image        1.37 K        0.73 ms    ±28.41%        0.71 ms        1.81 ms
Converting an RGB image to YUV 4:2:0 binary in BT601      0.0173 K       57.96 ms     ±2.67%       57.94 ms       62.81 ms

Comparison:
Converting YUV 4:2:0 binary in BT601 to an RGB image        1.37 K
Converting an RGB image to YUV 4:2:0 binary in BT601      0.0173 K - 79.36x slower +57.23 ms

0 replies

mat-hek · 2024-01-09T09:23:28Z

mat-hek
Jan 9, 2024
Author

The YUV->RGB->YUV conversion works well, thanks! I have some trouble trying to convert RGB->YUV after doing the composition:

overlay = Image.open!("membrane.png")
yuv = File.read!("image.yuv")
{:ok, rgb} = Image.YUV.new_from_binary(yuv, 1920, 1080, :C420)
composed = Image.compose!(rgb, overlay)
{:ok, output_yuv} = Image.YUV.write_to_binary(composed, :C420)

I'm getting {:error, "operation build: recomb: bands in must equal matrix width"}. It seems it comes from here. Do you have an idea what may be the reason?

0 replies

kipcole9 · 2024-01-09T09:29:01Z

kipcole9
Jan 9, 2024
Maintainer

That will be because your composed image has an alpha band and will need to be flattened first. The error messages aren't great I agree but they come straight from libvips.

Would you try a Image.flatten(composed) before the Image.YUV.write_to_binary/2 call?

I'm hesitant to flatten by default but perhaps I should since the code doesn't support YUV with alpha anyway. Thoughts?

1 reply

mat-hek Jan 9, 2024
Author

Works with Image.flatten, thanks ;)

I'm hesitant to flatten by default but perhaps I should since the code doesn't support YUV with alpha anyway. Thoughts?

Now I think this error actually makes sense ;) But I guess flattening by default or detecting that the image has alpha or matching on the libvips error and throwing a better error would improve UX

kipcole9 · 2024-01-09T09:38:50Z

kipcole9
Jan 9, 2024
Maintainer

OK, I've pushed a commit that flattens the image in Image.YUV.to_yuv. I think that should get you past the issue.

0 replies

kipcole9 · 2024-01-09T10:00:01Z

kipcole9
Jan 9, 2024
Maintainer

Handling libvips errors is a broader problem - fix it in one place and 100 more crop up. I'll still trying to find a better way to wrap those errors but I haven't go there yet.

0 replies

kipcole9 · 2024-01-09T10:01:33Z

kipcole9
Jan 9, 2024
Maintainer

I'll work on some tests and improve the docs but otherwise I think Image.YUV does what is advertised.

Is there anything else you need from Image for your work?

1 reply

mat-hek Jan 9, 2024
Author

Feature-wise everything works perfectly, thanks again! The performance is problematic though, because I can't parallelize it:

Benchee.run(%{
    convert: fn ->
      {:ok, rgb} = Image.YUV.new_from_binary(yuv, 1920, 1080, :C420)
      {:ok, yuv} = Image.YUV.write_to_binary(rgb, :C420)
    end,
    convert_many: fn ->
      Task.async_stream(1..4, fn _i ->
        {:ok, rgb} = Image.YUV.new_from_binary(yuv, 1920, 1080, :C420)
        {:ok, yuv} = Image.YUV.write_to_binary(rgb, :C420)
      end)
      |> Enum.to_list()
    end
})

Name                   ips        average  deviation         median         99th %
convert               6.19      161.45 ms     ±2.99%      160.85 ms      176.03 ms
convert_many          1.69      593.37 ms     ±1.05%      592.64 ms      602.58 ms

Comparison:
convert               6.19
convert_many          1.69 - 3.68x slower +431.92 ms

Do you have an idea why? I'm using an Intel i5 Macbook.

kipcole9 · 2024-01-09T13:59:11Z

kipcole9
Jan 9, 2024
Maintainer

Thats possibly related to the fact that libvips is already multithreaded and therefore you are getting CPU contention. You might try different concurrency settings wtth Image.put_concurrency/1.

I'll also run your script on my system (default concurrency of 10) and see what result I get.

13 replies

kipcole9 Jan 9, 2024
Maintainer

Doing the whole thing in YUV 4:4:4 would work (I think) - with some reservations about flattening the alpha channel after composition. But it needs to be 4:4:4 since libvips sees everything as a matrix.

I don't think logo composition is going to be a constraint since libvips will only focus on the pixels that need changing. My thats to be tested of course.

mat-hek Jan 9, 2024
Author

libvips will only focus on the pixels that need changing

It may be for the composition itself, but for the subsampling probably not, since the planes are extracted separately in new_scaled_plane/3

mat-hek Jan 9, 2024
Author

Doing the whole thing in YUV 4:4:4 would work (I think) - with some reservations about flattening the alpha channel after composition. But it needs to be 4:4:4 since libvips sees everything as a matrix.

Hmm, what if we operated on each plane separately?

kipcole9 Jan 9, 2024
Maintainer

Agreed - without solving the subsampling issue the rest is just speculation. I've posted in the Nx channel on the Elixir slack to see if I can get some help with translating the numpy code. I'm going to get some sleep now - I appreciate your patience with this.

kipcole9 Jan 9, 2024
Maintainer

Operate on each plane - sounds at least plausible. I don't think I have the smarts to work out how to scale the logo in a pixel perfect manner but I see where you're heading.

kipcole9 · 2024-01-09T14:37:41Z

kipcole9
Jan 9, 2024
Maintainer

I'm seeing only a 20% speed up too. I've tried a lot of different combinations with no change in concurrency. I am seeing CPU utilisation of over 90% (of the available 10 cores) though so I assume the issue is CPU exhaustion, not some kind of lock contention. I looked at CPU utilisation just running one conversion (no concurrency) and it used up to 6 cores so libvips is definitely threading (as expected).

The key performance constraint is the chroma subsampling (image resizing in libvips terms). I'll try a different sampling kernel and see if that makes a performance difference. Of course even it it does, it make have an affect on image quality (chroma at least).

2 replies

kipcole9 Jan 9, 2024
Maintainer

The python code to subsample 4:2:2 using numpy is:

B = A.copy() # 4:2:0

B[1::2, :] = B[::2, :]
# Vertically, every second element equals to element above itself.

B[:, 1::2] = B[:, ::2]
# Horizontally, every second element equals to the element on its left side.

I don't really know how to equate this to Nx functions - so if you have any ideas let me know? This should be substantially faster than the current implementation.

mat-hek Jan 9, 2024
Author

No experience with Nx, sorry. But yeah, scaling is definitely more complex than this

kipcole9 · 2024-01-10T01:53:27Z

kipcole9
Jan 10, 2024
Maintainer

I've managed to get the to_yuv function down from 58ms to 43ms on my machine. Still not fast enough for your needs I know. The improvement comes from using Vix.Vips.Operation.subsample/3.

Further testing shows the biggest performance hit is Vix.Vips.Image.write_to_binary/1 - it's about 11ms per call and there are three calls. I don't know how to avoid this since we ultimately need to serialise the three planes in order for a planar image.

I did some testing using Nx which did not improve performance. Since the bottleneck is write_to_binary/1, thats no surprising.

Still not giving up, I have some more things to try.

Basic conversion

Convert to YUV: 3.8ms
Write to binary (unencoded YUV image): 10.8ms
Total of convert to YUV and write to binary: 14.3ms

Full encode: 43ms

Convert to YUV: 4.3ms
3 x write to binary (each plane: 10.8 x 3 = 32.4ms
So far 36.7ms (unaccounted for 7ms)

0 replies

mat-hek · 2024-01-17T16:32:12Z

mat-hek
Jan 17, 2024
Author

So, for the record, here's what I ended up with so far. This script puts an overlay image over a video:

Mix.install([
  :membrane_h264_plugin,
  :membrane_h264_ffmpeg_plugin,
  :membrane_file_plugin,
  :membrane_hackney_plugin,
  :req,
  :image
])

defmodule Membrane.OverlayFilter do
  use Membrane.Filter

  alias Membrane.RawVideo
  alias Vix.Vips.Image, as: Vimage
  alias Vix.Vips.Operation

  def_input_pad :input, accepted_format: RawVideo
  def_output_pad :output, accepted_format: RawVideo

  def_options overlay_image: [spec: String.t()]

  @impl true
  def handle_init(_ctx, options) do
    overlay_planes = open_overlay(options.overlay_image)
    {[], %{overlay_planes: overlay_planes}}
  end

  @impl true
  def handle_buffer(:input, buffer, ctx, state) do
    %RawVideo{width: width, height: height} = ctx.pads.input.stream_format
    {overlay_y, overlay_u, overlay_v} = state.overlay_planes

    {image_y, image_u, image_v} = open_planes(buffer.payload, width, height)
    composed_y = compose_to_binary(image_y, overlay_y)
    composed_u = compose_to_binary(image_u, overlay_u)
    composed_v = compose_to_binary(image_v, overlay_v)
    composed = composed_y <> composed_u <> composed_v

    {[buffer: {:output, %{buffer | payload: composed}}], state}
  end

  defp open_overlay(path) do
    overlay = Image.open!(path)
    {:ok, overlay_yuv} = Image.YUV.write_to_binary(overlay, :C420)
    planes = open_planes(overlay_yuv, Image.width(overlay), Image.height(overlay))
    add_alpha(planes, overlay)
  end

  defp open_planes(yuv, width, height) do
    half_width = div(width, 2)
    half_height = div(height, 2)
    y_size = width * height
    uv_size = half_width * half_height
    <<y::binary-size(y_size), u::binary-size(uv_size), v::binary-size(uv_size)>> = yuv

    {:ok, y} = Vimage.new_from_binary(y, width, height, 1, :VIPS_FORMAT_UCHAR)
    {:ok, u} = Vimage.new_from_binary(u, half_width, half_height, 1, :VIPS_FORMAT_UCHAR)
    {:ok, v} = Vimage.new_from_binary(v, half_width, half_height, 1, :VIPS_FORMAT_UCHAR)
    {y, u, v}
  end

  defp add_alpha(planes, image) do
    alpha = image[3]
    downsized_alpha = Operation.subsample!(alpha, 2, 2)

    {y, u, v} = planes

    y = Operation.bandjoin!([y, alpha])
    u = Operation.bandjoin!([u, downsized_alpha])
    v = Operation.bandjoin!([v, downsized_alpha])

    {y, u, v}
  end

  defp compose_to_binary(image, overlay) do
    composed = Image.compose!(image, overlay, x: -1, y: 0)
    {:ok, binary} = Vimage.write_to_binary(composed[0])
    binary
  end
end

defmodule Example do
  def run() do
    import Membrane.ChildrenSpec

    pipeline = Membrane.RCPipeline.start_link!()

    File.write!(
      "membrane.png",
      Req.get!("https://avatars.githubusercontent.com/u/25247695?s=200&v=4").body
    )

    Membrane.RCPipeline.exec_actions(pipeline,
      spec:
        child(%Membrane.Hackney.Source{
          location:
            "https://raw.githubusercontent.com/membraneframework/static/gh-pages/samples/big-buck-bunny/bun33s_720x480.h264",
          hackney_opts: [follow_redirects: true]
        })
        |> child(Membrane.H264.Parser)
        |> child(Membrane.H264.FFmpeg.Decoder)
        |> child(%Membrane.OverlayFilter{overlay_image: "membrane.png"})
        |> child(Membrane.H264.FFmpeg.Encoder)
        |> child(%Membrane.File.Sink{location: "output.h264"})
    )
  end
end

Example.run()

Process.sleep(:infinity)

Thanks again @kipcole9 and @akash-akya for your help!

0 replies

kipcole9 · 2024-01-17T23:39:42Z

kipcole9
Jan 17, 2024
Maintainer

Well done @mat-hek. I'm going to publish an update to image with the Image.YUV module but note that it's not suitable for real-time frame processing.

Given you aren't using anything image library specific perhaps you can just depend on vix itself? And then change Image.compose!(image, overlay, x: -1, y: 0) to the equivalent Vix.Vips.Operation.composite2 call. You lose some of the convenience of the options handling but you also reduce your dependency list.

1 reply

mat-hek Jan 18, 2024
Author

I'm using Image.YUV to convert the overlay image to YUV ;) And apart from my particular use case, I want this to serve as an example of a user-friendly way of overlaying stuff on video, and Image is most suitable for this IMO

kipcole9 · 2024-01-17T23:49:17Z

kipcole9
Jan 17, 2024
Maintainer

One other small thing would be to test on an image that has an odd number of pixels on one or both dimenions. The integer division when calculating the size of the U and V planes might not work as expected. I think I've seen some other implementations that do the equivalent of div(width + 1, 2) to ensure the size rounds up for odd number image dimensions.

1 reply

mat-hek Jan 18, 2024
Author

Good point, however, it seems like an extremely rare use case. AFAIR FFmpeg doesn't even support videos with odd widths or heights. But if it's easy to fix, why not.

kipcole9 · 2024-01-18T00:47:55Z

kipcole9
Jan 18, 2024
Maintainer

I've published Image version 0.41.0 with the following changelog entry:

Enhancements

Adds Image.YUV module that provides functions to convert between YUV and RGB image data. Thanks very much to @mat-hek for the collaboration. This module makes it easier to work with video image data which is typically in YUV encoded. The module supports 4:4:4, 4:2:2 and 4:2:0 encoding in either of the BT601 or BT709 colorspaces.

0 replies

Support for YUV #121

mat-hek Dec 19, 2023

Replies: 30 comments · 31 replies

kipcole9 Dec 19, 2023 Maintainer

mat-hek Dec 20, 2023 Author

kipcole9 Dec 24, 2023 Maintainer

kipcole9 Dec 24, 2023 Maintainer

kipcole9 Dec 24, 2023 Maintainer

kipcole9 Dec 24, 2023 Maintainer

mat-hek Jan 4, 2024 Author

kipcole9 Jan 4, 2024 Maintainer

kipcole9 Jan 4, 2024 Maintainer

kipcole9 Jan 4, 2024 Maintainer

mat-hek Jan 5, 2024 Author

kipcole9 Jan 6, 2024 Maintainer

kipcole9 Jan 6, 2024 Maintainer

mat-hek Jan 6, 2024 Author

kipcole9 Jan 7, 2024 Maintainer

Integrate with Image

Image.YUV

kipcole9 Jan 8, 2024 Maintainer

mat-hek Jan 8, 2024 Author

kipcole9 Jan 8, 2024 Maintainer

mat-hek Jan 8, 2024 Author

kipcole9 Jan 8, 2024 Maintainer

mat-hek Jan 8, 2024 Author

kipcole9 Jan 8, 2024 Maintainer

mat-hek Jan 8, 2024 Author

kipcole9 Jan 8, 2024 Maintainer

kipcole9 Jan 8, 2024 Maintainer

kipcole9 Jan 8, 2024 Maintainer

kipcole9 Jan 9, 2024 Maintainer

mat-hek Jan 9, 2024 Author

kipcole9 Jan 9, 2024 Maintainer

mat-hek Jan 9, 2024 Author

kipcole9 Jan 9, 2024 Maintainer

kipcole9 Jan 9, 2024 Maintainer

kipcole9 Jan 9, 2024 Maintainer

mat-hek Jan 9, 2024 Author

kipcole9 Jan 9, 2024 Maintainer

kipcole9 Jan 9, 2024 Maintainer

mat-hek Jan 9, 2024 Author

mat-hek Jan 9, 2024 Author

kipcole9 Jan 9, 2024 Maintainer

kipcole9 Jan 9, 2024 Maintainer

kipcole9 Jan 9, 2024 Maintainer

kipcole9 Jan 9, 2024 Maintainer

mat-hek Jan 9, 2024 Author

kipcole9 Jan 10, 2024 Maintainer

Basic conversion

Full encode: 43ms

mat-hek Jan 17, 2024 Author

kipcole9 Jan 17, 2024 Maintainer

mat-hek Jan 18, 2024 Author

kipcole9 Jan 17, 2024 Maintainer

mat-hek Jan 18, 2024 Author

kipcole9 Jan 18, 2024 Maintainer

Enhancements

mat-hek
Dec 19, 2023

Replies: 30 comments 31 replies

kipcole9
Dec 19, 2023
Maintainer

mat-hek Dec 20, 2023
Author

kipcole9
Dec 24, 2023
Maintainer

kipcole9
Dec 24, 2023
Maintainer

kipcole9
Dec 24, 2023
Maintainer

kipcole9
Dec 24, 2023
Maintainer

mat-hek
Jan 4, 2024
Author

kipcole9 Jan 4, 2024
Maintainer

kipcole9
Jan 4, 2024
Maintainer

kipcole9
Jan 4, 2024
Maintainer

mat-hek
Jan 5, 2024
Author

kipcole9
Jan 6, 2024
Maintainer

kipcole9
Jan 6, 2024
Maintainer

mat-hek Jan 6, 2024
Author

kipcole9
Jan 7, 2024
Maintainer

Integrate with `Image`

kipcole9
Jan 8, 2024
Maintainer

mat-hek Jan 8, 2024
Author

kipcole9 Jan 8, 2024
Maintainer

mat-hek Jan 8, 2024
Author

kipcole9 Jan 8, 2024
Maintainer

mat-hek Jan 8, 2024
Author

kipcole9
Jan 8, 2024
Maintainer

mat-hek Jan 8, 2024
Author

kipcole9
Jan 8, 2024
Maintainer

kipcole9
Jan 8, 2024
Maintainer

kipcole9
Jan 8, 2024
Maintainer

kipcole9
Jan 9, 2024
Maintainer

mat-hek
Jan 9, 2024
Author

kipcole9
Jan 9, 2024
Maintainer

mat-hek Jan 9, 2024
Author

kipcole9
Jan 9, 2024
Maintainer

kipcole9
Jan 9, 2024
Maintainer

kipcole9
Jan 9, 2024
Maintainer

mat-hek Jan 9, 2024
Author

kipcole9
Jan 9, 2024
Maintainer

kipcole9 Jan 9, 2024
Maintainer

mat-hek Jan 9, 2024
Author

mat-hek Jan 9, 2024
Author

kipcole9 Jan 9, 2024
Maintainer

kipcole9 Jan 9, 2024
Maintainer

kipcole9
Jan 9, 2024
Maintainer

kipcole9 Jan 9, 2024
Maintainer

mat-hek Jan 9, 2024
Author

kipcole9
Jan 10, 2024
Maintainer

mat-hek
Jan 17, 2024
Author

kipcole9
Jan 17, 2024
Maintainer

mat-hek Jan 18, 2024
Author

kipcole9
Jan 17, 2024
Maintainer

mat-hek Jan 18, 2024
Author

kipcole9
Jan 18, 2024
Maintainer