-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend EncodedVideoChunkMetadata for Spatial Scalability #756
base: main
Are you sure you want to change the base?
Conversation
@kalradivyanshu @fippo PTAL. |
@aboba Looks good. So we just set L3T3 in encoder, and each frame will tell us which spatial and temporal layer it belongs in, and which frames are its dependencies, then in the decoder nothing changes, we just make sure all dependencies are fed in before feeding in the frame, and it just works, right? |
For a frame to be decodable, all its dependencies need to have been received and decoded without an error callback. From the conference server perspective, this means not only tracking what frames were sent to each participant, but also the transport status (whether the frame was completely received) and whether it was successfully decoded. Currently the underlying encoder API limits avenues available for repair to keyframe generation, retransmission and forward error correction. Alternate Long Term Reference (LTR) frames or layer refresh (LRR) are not yet supported. |
index.src.html
Outdated
8. If |encoderConfig|.{{VideoEncoderConfig/alpha}} is set to `"keep"`: | ||
8. If |encoderConfig|.{{VideoEncoderConfig/scalabilityMode}} | ||
describes multiple [=spatial layers=]: | ||
1. Let |svc| be a new {{SvcOutputMetadata}} instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will override everything produced by the step 8 for SVC modes that have both temporal and spatial layers. for example: L2T2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have fixed this. PTAL.
index.src.html
Outdated
@@ -1704,6 +1717,9 @@ | |||
|
|||
dictionary SvcOutputMetadata { | |||
unsigned long temporalLayerId; | |||
unsigned long spatialLayerId; | |||
unsigned long long frameId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frameId
and dependencies
are never set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Djuffin @tonyherre dependencies
would be set to the sequence of frameId
values that the encodedChunk
depends on. But the bigger question is the behavior of frameId
, which is discussed here: w3c/webrtc-encoded-transform#220
@tonyherre PTAL |
Thankyou so much for this @aboba , what all is left in this PR to get it accepted in the spec? |
@kalradivyanshu It has been noted that spatial scalability is not widely used today because it is not hardware accelerated and therefore create power and thermal issues on mobile devices. As a result, applications are using spatial simulcast instead. Also, the current WebCodecs API does not support layer refresh, which means that if a spatial frame is lost, a base layer keyframe is required, rather than just creating a new spatial frame referencing a received base-layer frame (e.g. moving to a new Long-Term Reference). @Djuffin has argued that these problems need to be fixed before spatial scalability could become popular in WebCodecs, and therefore that it would make sense to focus on a new encoder API that can address the problems rather than just shipping a (potentially unusable) feature. |
Oh ok. Couple of things:
While I agree with the issues @Djuffin raised, I honestly feel that since the new API is atleast a year away, spatial scalability should be added or at the very least stuff like switch frames should be added to make simulcast more usable. Without any of these, the only solution is to do simulcast with keyframes requests for every switch, which in turn will add a huge load on encoder and also the decoder and network. Thank you both for all your work! |
Fixes #619
Rebase and update of PR #654
Related: w3c/webrtc-encoded-transform#220
💥 Error: 400 Bad Request 💥
PR Preview failed to build. (Last tried on Jan 9, 2024, 10:27 PM UTC).
More
PR Preview relies on a number of web services to run. There seems to be an issue with the following one:
🚨 CSS Spec Preprocessor - CSS Spec Preprocessor is the web service used to build Bikeshed specs.
🔗 Related URL
If you don't have enough information above to solve the error by yourself (or to understand to which web service the error is related to, if any), please file an issue.