-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CMPTO_J2k Encoding / Decoding Latency #416
Comments
(just for the reference, this relates to GH-406, eg. this comment) |
Hi, I've played with it a bit on a GeForce 1080 Ti, the baseline command is:
( The important thing is the ratelimit option - I've added it to get the latency when the encoder is not fully utilized. Here are results that I got:
(it seems to work in a similar fashion if R12L is used instead) So my thoughts are following:
So at least according to my measurements it seems like the latency scales rather linearly to the quality unless the performance is the bottleneck. Can you confirm the conclusions? Namely that for the 215 ms case you get ~5 fps and that/if |
Hi Martin, Here are my results based on video data captured via a blackmagic IO card. The signal captured is UHD 23.98 4:4:4 12 bit (R12L). I have switched systems at this point for my testing platform as the GPU I was previously using died. I am now using a 1080ti for my testing so I expect our results to be fairly similar.
We are seeing pretty similar results from the looks of it. If I rate limit to 3 with a quality setting of 1.0 the duration is around 62-65ms, in line with the results of quality 1.0 and ratelimit of 5.
My colleagues and I have been able to confirm a few things:
These changes can have negative effects on stream stability, so in the long term I hope to find an alternative way to reduce latency if possible. Do you know how much latency is introduced by the decklink capture / streaming protocol / decklink output process? My assumption was that we should expect 1 frame for each of those steps in addition to the per frame encoding time. I don't have any evidence to back up those numbers though. |
Sorry if this seems rude, but a 1080Ti is a rather old card. Would these issues be solved simply by utilizing a newer generation GPU? |
Hey, not rude at all. I don't think the issue here is GPU power. What i'm finding is that even if i lower the quality of the stream, which reduces the per frame encode duration, I don't necessarily see an improvement in latency. We are currently comparing to another service that we believe uses ultragrid and comprimato on the back end and they appear to be several frames faster (~3 frames) than the main branch of ultragrid, and I'm trying to find where the additional latency is coming from. I could be off the mark here and perhaps the comprimato encode / decode process is not the issue. I have several systems deployed using ada 4000 gpus and while they can handle more complex heavy noise based encodes / decodes than the 1080ti can and more simultaneous streams as the 1080ti, I don't see an end to end latency improvement for streams that are not utilizing 100% of our gpu's capabilities. Assuming we can keep the per frame encoding duration and decoding duration below 40ms, that should only add 2 frames of latency end to end (excluding any issues with the signal chain upstream and downstream of the ultragrid process). I am currently measuring 5-6 frames latency. We were able to shave 2 frames off by reducing the pool size of the encoder and by modifying a buffer within ultragrid that I believe is there to accommodate packet retransmission or reordering, but the modification of that buffer is not ideal. Other perspectives are always welcome! |
Do you mean playout buffer delay? It is in milliseconds, so 32 ms. But the value of 32 is just the initial value, it is still overridden by 1/fps later, anyways.
More or less so. The compression can vary, as you can see. We did some evaluation in the past, but it isn't up-to date. At the time, the latency was mostly influenced by the latency of displaying device, which has improved with newer devices, so 4K Extreme values may be representative these days. |
but you must also add the actual compression and decompression duration, so provided that we assume the baseline latency eg. of 3 frames as on the linked page, you must add a frame time for compression and another for decompression. Which would yield 5 frames, so that it seems to be legit values for me. Having eg. 1 frame E2E latency (without compression/decompression, to yield 3 overall) doesn't sound realistic to me. |
Hi,
As suggested, I wanted to look into seeing if there is a way to reduce the latency of the CMPTO J2K codec. We have measured the latency by comparing the source signal on a reference monitor with burned in timecode, side by side with a monitor of the same model displaying the output of and Ultragrid decoder.
We have observed a latency of approx 4-6 frames when encoding an 8-10bit 422-444 signal. The latency seems to increase by several frames when encoding 12bit 444 to approx 6-7 frames.
Below are screenshots of the reported video encoding times and the corresponding settings. I have noticed, as expected, that reducing the quality reduces the encoding time of each frame. I have not yet verified if this results in a reduction in end to end latency, which I am hoping to test soon. Is this the behaviour I should expect to see?
UHD 444 10bit - Quality=1 - MCT Enabled - Tiles=1 - Pool=1
UHD 444 10bit - Quality=.5 - MCT Enabled - Tiles=1 - Pool=1
UHD 444 12bit - Quality=1 - MCT Enabled - Tiles=1 - Pool=1
UHD 444 12bit - Quality=.6 - MCT Enabled - Tiles=1 - Pool=1
UHD 444 12bit - Quality=.5 - MCT Enabled - Tiles=1 - Pool=1
For our use case, we feel that this solution is right on the edge of being acceptable for the majority of our users, so even a reduction of 2-3 frames could greatly improve the experience and accuracy of the remote work being done.
Thanks in advance.
The text was updated successfully, but these errors were encountered: