-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New VAAPI definition for multi-frame processing #112
Conversation
Multi-frame processing is an optimization for multi-stream transcoding scenario, that allows to combine several jobs from different parallel pipelines inside one GPU task execution to better reuse engines that can't be loaded fully in single frame case, as well as decrease CPU overhead for task submission. VAStatus vaCreateMFContext (VADisplay dpy, VAMFContextID *mf_context);Description:Entry point to create new Multi-Frame context interface encapsulating common for all streams memory objects and structures required for single GPU task submission from several VAContextID's. Arguments:VADisplay dpy - display adapter. Return value:VA_STATUS_SUCCESS - operation successful. VAStatus vaMFAddContext (VADisplay dpy, VAMFContextID mf_context, VAContextID context);Description:Provide ability to associate each context used for submission and common Multi-Frame context. Arguments:VADisplay dpy - display adapter. Return value:VA_STATUS_SUCCESS - operation successful, context was added. Limitations:Number of contexts per Multi-Frame context - should not be limited. VAStatus vaMFReleaseContext (VADisplay dpy, VAMFContextID mf_context, VAContextID context);Description:Provides ability to remove association between Multi-Frame context and particular encode context. This is not necessary, can be removed, as the same can be done through vaDestroyContext, however such support brings hidden from App, unclear logic in driver. Arguments:VADisplay dpy - display adapter. Return value:VA_STATUS_SUCCESS - operation successful, context was removed. VAStatus vaMFSubmit (VADisplay dpy, VAMFContextID mf_context, VAContextID *contexts, int num_contexts);Description:Provides ability to submit frames from multiple contexts streams for execution. Arguments:VADisplay dpy - display adapter. Return value:VA_STATUS_SUCCESS - operation successful, context was removed. Schematic code flow examples:1. multiple context before adding MFP supportSimplified scheme assuming all streams are running in the same thread, in the reality there are no big amount of use cases where it goes this path.
2. MFP flow exampleMFP is added in addition to current code and not breaking logic for contexts working not through MFP. In reality it will be more complicated as multiple contexts are working in different threads for multi-stream transcoding, so real vaMFSubmit require to sync all threads before task submission, for cases with transcoding from single source to multiple output it is relatively simple as source is synchronization point.
|
|
va/va.c
Outdated
CHECK_DISPLAY(dpy); | ||
ctx = CTX(dpy); | ||
|
||
vaStatus = ctx->vtable->vaCreateMFContext( ctx, mf_context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check ctx->vtable->vaCreateMFContext against NULL first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIxed
va/va.c
Outdated
CHECK_DISPLAY(dpy); | ||
ctx = CTX(dpy); | ||
|
||
vaStatus = ctx->vtable->vaMFAddContext( ctx, context, mf_context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check ctx->vtable->vaMFAddContext against NULL first and the similar check for other new APIs below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIxed
va/va.h
Outdated
/** | ||
* vaCreateMFContext - Create a multi-frame context | ||
* Multi-frame context allow to run tasks from different | ||
* contexts in single batch buffer for performance optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the underlying HW might not support batch buffer, could you use some common words?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIxed
va/va.h
Outdated
VAStatus vaMFAddContext ( | ||
VADisplay dpy, | ||
VAContextID context, | ||
VAMFContextID mfe_context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please replace mfe_context with mf_context in this commit, we should use the same name style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIxed
There are some trailing whitespace, could you please delete those trailing whitespaces when updating your patch?
|
c6ad65f
to
31ebb4e
Compare
Thanks Haihao, I fixed issues. Also updated descriptions to remove uncertainty for different feature support.
|
Just for curious, do you have a case that adds a decoder context and a encoder context in the same Multi-Frame context in practice? |
How does application do when vaMFAddContext returns an error,
Which one is right ? |
Added notes to vaAddContext error code description - can't continue work if VA_STATUS_ERROR_OPERATION_FAILED, can continue in other cases. |
"Just for curious, do you have a case that adds a decoder context and a encoder context in the same Multi-Frame context in practice?" |
@sreerenjb @lizhong1008 @xuguangxin Could you help to review the patches? How do you think if adding multi-frame processing in gstreamer-vaapi, FFmpeg and libyami? |
@artem-shaporenko Could you update the comment in va.h too when you update the description on github? |
@xhaihao Do you think it'll be useful to put full description into comments in va.h? |
"gstreamer-vaapi, FFmpeg and libyami" |
@artem-shaporenko We can use doxygen to generate the document for user in libva, so you should document how to use multi frame processing clearly in va.h. |
va/va.h
Outdated
* and other contexts passed this call, rejected context can continue work in stand-alone mode. | ||
* VA_STATUS_ERROR_UNSUPPORTED_PROFILE - Current context with Particular VAEntrypoint is supported | ||
* but VAProfile is not supported(so for example H264 encode or FEI is supported, but H265/Mpeg2 not | ||
* supported at a moment). Application can continue with current mf_context and other contexts passed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“H264 encode or FEI is supported”,is this a typo? should be "H264 encoder of FEI is supported" or "H264 encode without FEI is supported"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is a good idea to list which profile/entrypoint is not supported or not support in the API file. It means you must update this comments when driver behavior has been changed.
Why don't define a vaQueryXXX interface (just like vaQueryConfigEntrypoints) to access drive limitation and capacity? Before calling vaCreateMFContext and vaMFAddContext, we can query firstly, then we can know what feature is supported and avoid many inexplicable error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“H264 encode or FEI is supported”,is this a typo? should be "H264 encoder of FEI is supported" or "H264 encode without FEI is supported"?
No It's not a typo, it is just an example, I can change to exact entry point names so it is more clear
I don't think it is a good idea to list which profile/entrypoint is not supported or not support in the API file. It means you must update this comments when driver behavior has been changed.
One more time it is not listing of supported features, but examples(covered by 'for example').
I don't think it is a good idea to list which profile/entrypoint is not supported or not support in the API file. It means you must update this comments when driver behavior has been changed.
As by architecture and assumed design there are not issue to continue with single context mode without Multi-Frame, if Multi-Frame is not supported - why add excessive APIs?
inexplicable error
Errors are explicable as described.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"One more time it is not listing of supported features, but examples(covered by 'for example')"
Yes, I know they are examples, but I think they are examples of some driver (iHD driver) implementation status (Please correct me if I am wrong).
"so for example H264 encode or FEI is supported, but H265/Mpeg2 not supported at a moment",maybe h264/mpeg2 are supported one day, will it be necessary to change this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"As by architecture and assumed design there are not issue to continue with single context mode without Multi-Frame, if Multi-Frame is not supported - why add excessive APIs?"
Here the vaQueryXXX interface I mean to to get Multi-Frame-Processing (MFP) capacity not for singe context mode. It looks like there are many limitations for iHD driver implementation of MFP as your examples in the comments here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, yes, current iHD driver doesn't support mpeg2 and H265, will it be better to remove example comment or change it to something else not related to current iHD driver support, like "HEVC supported, but Mpeg2 not supported" or put particular entry point name?
Limitations are shown by particular implementation, we can introduce query interface, but how does this help, how this function will be used from an application, what is benefit over current suggested path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I don't object adding some query function, but it is not necessarily required here as can work without it - can be added later as improvement. Posted some code examples below to show why I don't think query function will be really useful in general.
va/va.h
Outdated
* All contexts passed should be associated through vaMFAddContext | ||
* and call sequence Begin/Render/End performed. | ||
* This call is non-blocking. The client can start another | ||
* Begin/Render/End/vaMFSubmit sequence on a different render targets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As current definition of VAEndPicture, if VAEndPicture is called, the server should start processing all pending operations. Seems it will conflict with vaMFSubmit, because before vaMFSubmit, VAEndPicture has processed current frame. The mean it requires change current implementation of VAEndPicture, or VAEndPicture should not be called for the case of multi-frame. Am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't require change to vaEndPicture, but to context working with multiframe.
So context knows that it is working through multi-frame and doesn't perform submission during vaEndPicture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"doesn't perform submission during vaEndPicture" seems conflict with the definition of VAEndPicture: " if VAEndPicture is called, the server should start processing all pending operations". Am I right? Any driver implementation should follow and make sure without any conflict with API definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it conflicts with definition of vaEndPicture, but the same time all using vaMFSubmit will be aware of this change to behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so maybe there are two ways to avoid this conflict: 1. change the definition of VAEndPicture, add one more comment to indicate vaEndPicture won't process the pending operations in the MFP cases. 2. In the MFP case, no need to call vaEndPicture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it enough to have comment for vaMFSubmit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posted example below why vaEndPicture behavior should be the way it described in vaMFSubmit
|
@lizhong1008 Yes we can meet and discuss.
Number of context added is not limited, so any number of context can be added.
It will come together with iHD driver opensource implementation and Media SDK implementation over it. Multi-Frame API not dependent on reference frame and anything else, depending on driver implementation it can support particular vaEntryPoint/Profiles/etc. Any issues with parameters for decode/vpp/encode either reference picture or anything else should be reported during vaBeginPicture/vaRenderPicture/vaEndPicture, vaMFSubmit only report issues associated with multi-frame submission. Such architecture allows to keep minimum changes for both driver and application. |
@fhvwy could you help to provide your input from FFmpeg side? |
Regarding query and vaEndPicture, here are simple schematic code examples(processing is shown as serial, in real case it most likely will be multi-thread, but general code flow is something like below):
1. multiple context before adding MFP support.VADisplay dpy; .... 2. MFP flow example with changes in bold, clearly visible - MFP is added in addition to current code and not breaking logic for contexts working not through MFPVADisplay dpy; VAContextID submitContexts[N]; //frame contexts - to submit, M - max number of frames to submit, number of overall contexts associated with Multi-Frame can be bigger than maximum number of frames to submit if(mfSupported) 3. Added Query function - just only MFContext logic, addion from(2) shown in bold. Clearly visible that any query function added will be just an addition above context management logic, can be useful for debug at some cases, but in general complicating code and requires additional support from driver.VADisplay dpy; VAContextID submitContexts[N]; //frame contexts - to submit, M - max number of frames to submit, number of overall contexts associated with Multi-Frame can be bigger than maximum number of frames to submit if(mfSupported) |
@xhaihao, @lizhong1008 , @sreerenjb, any other concern on API? |
@artem-shaporenko LGTM for current API definition, will review your pull request of sample code. |
@lizhong1008 What LGTM mean? |
Ok, got it - "looks good for me" |
@artem-shaporenko , yes it means "looks good to me" |
@xhaihao if there are no concerns can we merge this PR into v2.0-next so it can be used |
aeb5067
to
f192640
Compare
combined into single commit to avoid API change in different commits. |
va/va_backend.h
Outdated
VAContextID *contexts, | ||
int num_contexts | ||
); | ||
|
||
VAStatus (*vaCreateBuffer) ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the new callback function pointers after the existing callback functions and reduce the reserved bytes. otherwise it will break compatibility between libva and driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
f192640
to
e2d1956
Compare
va/va_backend.h
Outdated
/** \brief Reserved bytes for future use, must be zero */ | ||
unsigned long reserved[64]; | ||
unsigned long reserved[63]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new array should be an array of 60 unsigned long integers because you add 4 callback functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry. done, not 60, but 56 as long is 32 bit, but pointer is 64 bit for linux 64
I still have few more concerns. I would like to take a step back and think what we really trying to achieve here: IIUC, the intention is to utilize the extra hardware blocks more efficiently. On the other hand, if the driver really knows the hw capabilities, why shouldn't it possible to schedule tasks in a better way without having Of course the new apis bring the benefits of "more gpu tasks in single batchbuffer and decrease the cpu overload". My question is, how much performance difference Please correct me If I am wrong about the understanding of hw block utilization. |
@artem-shaporenko Can you clarify the behaviour of this API with respect to pipelining and dependencies between operations? To offer some specific examples, do the following two cases work: Internal dependency:
External dependency:
|
@sreerenjb you are right that performance improvement will depend on VME amount(and other factors depending on target HW). This is why we need to put more effort to implement proper query function that report enough data for different situation so application/middleware can properly manage multi-frame submission. In general talking about performance - anyway developer need to test how it works on different HW to make sure performance is in expected range. So the same is without multi-frame, you need to test how many parallel transcodings you need to do on different HW and don't have any report from driver about it. |
e2d1956
to
087a90b
Compare
In this case driver should return an error - VA_STATUS_ERROR_INVALID_CONTEXT, there should not be any dependencies between 2 tasks combined into one operation.
In this case behavior will be the same as in single frame mode - if frame order is right(VP - first, encoder - second) - will be encoded properly, otherwise output will be broken |
@xhaihao Can you please merge latest update for VTable changes into v2.0-next branch? |
va/va_backend.h
Outdated
/** \brief Reserved bytes for future use, must be zero */ | ||
unsigned long reserved[64]; | ||
unsigned long reserved[56]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You added 4 hook functions only, so the reserved bytes should be 60 * sizeof(unsigned long) now, not 56 * sizeof(unsigned long).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsigned long is 32 bit isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed that to be [60], but please make sure you won't get it broken
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsigned long is 32bit on 32bit OS and 64bit on 64bit OS
va/va.c
Outdated
@@ -439,6 +442,7 @@ static VAStatus va_openDriver(VADisplay dpy, char *driver_name) | |||
CHECK_VTABLE(vaStatus, ctx, BeginPicture); | |||
CHECK_VTABLE(vaStatus, ctx, RenderPicture); | |||
CHECK_VTABLE(vaStatus, ctx, EndPicture); | |||
CHECK_VTABLE(vaStatus, ctx, MFSubmit); | |||
CHECK_VTABLE(vaStatus, ctx, SyncSurface); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new functions are not mandatory to implement, could you remove the checks for the new functions ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
va/va.c
Outdated
|
||
CHECK_DISPLAY(dpy); | ||
ctx = CTX(dpy); | ||
CHECK_VTABLE(vaStatus, ctx, CreateMFContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here vaStatus is set to VA_STATUS_ERROR_UNKNOWN if vaCreateMFContext is not implemented by the backend driver, however vaCreateMFContext is not a mandatory function, so please return VA_STATUS_ERROR_UNIMPLEMENTED instead of VA_STATUS_ERROR_UNKNOWN. The same change should be applied to other new functions
va/va.c
Outdated
@@ -1225,7 +1225,7 @@ VAStatus vaMFSubmit ( | |||
) | |||
{ | |||
VADriverContextP ctx; | |||
VAStatus vaStatus; | |||
VAStatus vaStatus = VA_STATUS_SUCCESS; | |||
|
|||
CHECK_DISPLAY(dpy); | |||
ctx = CTX(dpy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not needed if returning VA_STATUS_ERROR_UNIMPLEMENTED at once when ctx->vtable->vaCreateMFContext is NULL,
BTW Could you please to remove all trailing whitespaces in this patch, then squash the two commits into one commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
f724da3
to
aa3d902
Compare
CHECK_DISPLAY(dpy); | ||
ctx = CTX(dpy); | ||
if(ctx->vtable->vaCreateMFContext == NULL) | ||
vaStatus = VA_STATUS_ERROR_UNIMPLEMENTED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*mf_context may be a random value for this case, so va_TraceCreateMFContext shouldn't be called when ctx->vtable->vaCreateMFContext is a NULL pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, fixed
…, FEI Encode/ENC/Pre-ENC, and VPP in future. Signed-off-by: Artem Shaporenko [email protected]
aa3d902
to
bd52f82
Compare
New VAAPI definition for multi-frame processing applicable for different entry points.
Signed-off-by: Artem Shaporenko [email protected]