New VAAPI definition for multi-frame processing #112

artem-shaporenko · 2017-09-06T07:37:57Z

New VAAPI definition for multi-frame processing applicable for different entry points.

Signed-off-by: Artem Shaporenko [email protected]

artem-shaporenko · 2017-09-06T07:38:30Z

Multi-frame processing is an optimization for multi-stream transcoding scenario, that allows to combine several jobs from different parallel pipelines inside one GPU task execution to better reuse engines that can't be loaded fully in single frame case, as well as decrease CPU overhead for task submission.

VAStatus vaCreateMFContext (VADisplay dpy, VAMFContextID *mf_context);

Description:

Entry point to create new Multi-Frame context interface encapsulating common for all streams memory objects and structures required for single GPU task submission from several VAContextID's.
Allocation: This call only creates an instance, doesn't allocate any additional memory.
Support identification: Application can identify multi-frame feature support by ability to create multi-frame context. If driver supports multi-frame - call successful, mf_context != NULL and VAStatus = VA_STATUS_SUCCESS, otherwise if multi-frame processing not supported driver returns VA_STATUS_ERROR_UNIMPLEMENTED and mf_context = NULL.

Arguments:

VADisplay dpy - display adapter.
VAMFContextID *mf_context - Multi-Frame context encapsulating all associated context for multi-frame submission.

Return value:

VA_STATUS_SUCCESS - operation successful.
VA_STATUS_ERROR_UNIMPLEMENTED - no support for multi-frame.

VAStatus vaMFAddContext (VADisplay dpy, VAMFContextID mf_context, VAContextID context);

Description:

Provide ability to associate each context used for submission and common Multi-Frame context.
Query - try to add context to understand if it is supported.
Allocation: this call allocates and/or reallocates all memory objects common for all streams associated with particular Multi-Frame context.
All memory required for each context(pixel buffers, per stream buffers such as PAKObj, motion vector buffers, predictor buffers, etc.) allocated during standard vaCreateContext call for each stream.
Runtime dependency - if current implementation doesn't allow to run different entry points/profile, first context added will set entry point/profile for whole Multi-Frame context, all other entry points and profiles will be rejected to be added.

Arguments:

VADisplay dpy - display adapter.
VAContextID context - context being associated with Multi-Frame context.
VAMFContextID mf_context - multi-frame context used to associate other contexts and for multi-frame submission.

Return value:

VA_STATUS_SUCCESS - operation successful, context was added.
VA_STATUS_ERROR_OPERATION_FAILED - something unexpected happened - application have to close current mf_context and associated contexts and start working with new ones.
VA_STATUS_ERROR_INVALID_CONTEXT - ContextID is invalid - based on implementation driver can support different entry points inside one multi-frame context, or can reject support and return this error, error means entry point and/or profile contradicts with previously added, application can continue with current mf_context and other contexts passed this call, rejected context can continue work in stand-alone mode or other mf_context.
VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT - particular VAContextID created with VAConfigID associated with VAEntrypoint that is not supported, as an example VAEntrypointVLD not supported(as decoder doesn't make sense for Multi-Frame operation), VAEntrypointVideoProc and/or VAEntrypointFEI not supported at a moment, etc. Application can continue with current mf_context and other contexts passed this call, rejected context can continue work in stand-alone mode.
VA_STATUS_ERROR_UNSUPPORTED_PROFILE - Current context created with particular VAEntrypoint is supported, but VAProfile is not supported. Application can continue with current mf_context and other contexts passed this call, rejected context can continue work in stand-alone mode.

Limitations:

Number of contexts per Multi-Frame context - should not be limited.
Based on implementation - can limit: different VAEntrypoint or VAProfile inside one multi-frame context, or limit support of particular VAEntrypoint/VAProfile for multi-frame operation.

VAStatus vaMFReleaseContext (VADisplay dpy, VAMFContextID mf_context, VAContextID context);

Description:

Provides ability to remove association between Multi-Frame context and particular encode context. This is not necessary, can be removed, as the same can be done through vaDestroyContext, however such support brings hidden from App, unclear logic in driver.

Arguments:

VADisplay dpy - display adapter.
VAContextID context - context being removed from Multi-Frame context.
VAMFContextID mf_context - Multi-Frame context used to associate context and for submission.

Return value:

VA_STATUS_SUCCESS - operation successful, context was removed.
VA_STATUS_ERROR_OPERATION_FAILED - something unexpected happened.

VAStatus vaMFSubmit (VADisplay dpy, VAMFContextID mf_context, VAContextID *contexts, int num_contexts);

Description:

Provides ability to submit frames from multiple contexts streams for execution.

Arguments:

VADisplay dpy - display adapter.
VAMFContextID mf_context - Multi-Frame context used for submission.
VAContextID *contexts - contexts with frames ready for submission.
int num_contexts - number of context in contexts.

Return value:

VA_STATUS_SUCCESS - operation successful, context was removed.
VA_STATUS_ERROR_INVALID_CONTEXT - mf_context or one of contexts are invalid due to mf_context not created or one of contexts not assotiated with mf_context through vaAddContext.
VA_STATUS_ERROR_INVALID_PARAMETER - one of context has not submitted it's frame through vaBeginPicture vaRenderPicture vaEndPicture call sequence.

Schematic code flow examples:

1. multiple context before adding MFP support

Simplified scheme assuming all streams are running in the same thread, in the reality there are no big amount of use cases where it goes this path.

VADisplay dpy;
VAContextID contexts[N];//all available contexts
VAStatus sts = VA_STATUS_SUCCESS;
....
//initialize display, etc.
.....
for( int i=0; i<N; i++)
{
    VAConfigID stream_config;
    ...
    //fill config
    ...
    sts = vaCreateConfig(dpy,...., &stream_config);
    CHECKSTS(sts);
    sts = vaCreateContext(dpy, stream_config, ...., &contexts[i]);
    CHECKSTS(sts);
}

....
// prepare streams ready to submit frames, fill contexts vector.
....
//runtime submission
for( int i=0; i<N; i++ )
{
    sts = vaBeginPicture(dpy, contexts[i], in_surf);
    CHECKSTS(sts);
    sts = vaRenderPicture(dpy, contexts[i], buffers, buffer_count);
    CHECKSTS(sts);
    sts = vaEndPicture(dpy, contexts[i]); // Delay real submission untill vaMFSubmit()
    CHECKSTS(sts);
}

2. MFP flow example

MFP is added in addition to current code and not breaking logic for contexts working not through MFP. In reality it will be more complicated as multiple contexts are working in different threads for multi-stream transcoding, so real vaMFSubmit require to sync all threads before task submission, for cases with transcoding from single source to multiple output it is relatively simple as source is synchronization point.

VADisplay dpy;
VAMFContextID mf_context;
VAContextID contexts[N];//all available contexts
bool contexts_for_mf[N];
VAStatus sts = VA_STATUS_SUCCESS;
bool mfSupported;
....
//initialize display, etc.
...
sts = vaCreateMFContext(dpy, &mf_context);
mfSupported = (sts == VA_STATUS_SUCCESS);
...
for( int i=0; i<N; i++)
{
    VAConfigID stream_config;
    ...
    //fill config
    ...
    sts = vaCreateConfig(dpy,...., &stream_config);
    CHECKSTS(sts);
    sts = vaCreateContext(dpy, stream_config, ...., &contexts[i]);
    CHECKSTS(sts);
   sts = vaAddContext(dpy, mf_context, &contexts[i]); // Add this context into the Multi-Frame context
    added_contexts[i] = (sts == VA_STATUS_SUCCESS); // save context state - if it is available for multi-frame or not.
}

VAContextID submitContexts[N]; //frame contexts - to submit, M - max number of frames to submit, number of overall contexts associated with Multi-Frame can be bigger than maximum number of frames to submit
....
// prepare streams ready to submit frames, fill contexts vector.
....
//runtime
int M=0;
for( int i=0; i<N; i++ )
{
    sts = vaBeginPicture(dpy, contexts[i], in_surf);
    CHECKSTS(sts);
    sts = vaRenderPicture(dpy, contexts[i], buffers, buffer_count);
    CHECKSTS(sts);
    sts = vaEndPicture(dpy, contexts[i]); // Delay real submission untill vaMFSubmit() if context is applicable
    CHECKSTS(sts);
    if(contexts_for_mf[N])
        submitContexts[M++] = contexts[i];
}

if(mfSupported)
{
    sts = vaMFSubmit(dpy, mf_context, contexts, M); // Submit through Multi-Frame context
}

xhaihao · 2017-09-08T05:14:05Z

Could you please consider a more generic API? e.g. maybe a driverX can support Multi-frame processing for decode,
Could you update the descriptor for vaMFAddContext/vaMFReleaseContext/vaMFSubmit etc? Currently these functions are limited for encoder context only. e.g.

Provide ability to associate each encoder context used for submission and common MF context
AContextID context - encoder context being removed from MFE context.

Can two VA contexts which have different profile/entrypoint pairs be added into the same multi-frame context? e.g. one context for VPP and one context for HEVC encoding. I think it is reasonable to require both MF context and VA context have the same profile/entrypoint pair. How about to add parameters to pass profile/entrypoint to the driver when calling vaCreateMFContext(), or add a parameter to pass VA config ID to the driver? In this way, the driver can return the right value upon the input profile/entrypoint pair or VA config when calling vaCreateMFContext() and vaMFAddContext() etc.
Batch buffer, EU etc should be platform dependent, We should add a generic API in libva, could you update the comment with some common words?

xhaihao · 2017-09-08T05:16:45Z

va/va.c

+    CHECK_DISPLAY(dpy);
+    ctx = CTX(dpy);
+
+    vaStatus = ctx->vtable->vaCreateMFContext( ctx, mf_context);


Please check ctx->vtable->vaCreateMFContext against NULL first.

xhaihao · 2017-09-08T05:17:45Z

va/va.c

+    CHECK_DISPLAY(dpy);
+    ctx = CTX(dpy);
+
+    vaStatus = ctx->vtable->vaMFAddContext( ctx, context, mf_context);


Please check ctx->vtable->vaMFAddContext against NULL first and the similar check for other new APIs below

xhaihao · 2017-09-08T05:20:50Z

va/va.h

+/**
+ * vaCreateMFContext - Create a multi-frame context
+ *  Multi-frame context allow to run tasks from different
+ *  contexts in single batch buffer for performance optimization.


the underlying HW might not support batch buffer, could you use some common words?

xhaihao · 2017-09-08T05:23:01Z

va/va.h

+VAStatus vaMFAddContext (
+    VADisplay dpy,
+    VAContextID context,
+    VAMFContextID mfe_context


please replace mfe_context with mf_context in this commit, we should use the same name style.

xhaihao · 2017-09-08T06:15:34Z

There are some trailing whitespace, could you please delete those trailing whitespaces when updating your patch?

.git/rebase-apply/patch:136: trailing whitespace.

After association is done, current context will not submit
.git/rebase-apply/patch:170: trailing whitespace.

Make the end of rendering for a pictures in contexts passed with submission.
.git/rebase-apply/patch:174: trailing whitespace.

This call is non-blocking. The client can start another
.git/rebase-apply/patch:178: trailing whitespace.

contexts: list of contexts submitting their tasks for multi-frame operation.
warning: 4 lines add whitespace errors.

artem-shaporenko · 2017-09-08T17:35:46Z

Thanks Haihao, I fixed issues. Also updated descriptions to remove uncertainty for different feature support.
About API itself:

It can support decoder or anything(any entry point) created through vaCreateContext depending on underling driver/hw implementation.
Fixed
By design - yes any entrypoint/profile can be combined together, but this depend on driver implementation, so one implementation can require same entrypoint/profile and reject adding contexts with different ones, another implementation can support different profiles and/or entry point. So it doesn't make sense to add entrypoint/profile to vaCreateMFContext
Fixed

xhaihao · 2017-09-12T08:21:51Z

Just for curious, do you have a case that adds a decoder context and a encoder context in the same Multi-Frame context in practice?

xhaihao · 2017-09-12T08:32:41Z

How does application do when vaMFAddContext returns an error,

try to add the remaining contexts to MF context
or
Release the MF context at once,
or
Don't add the remaining context and submit the current MFcontext?

Which one is right ?

artem-shaporenko · 2017-09-12T09:32:38Z

Added notes to vaAddContext error code description - can't continue work if VA_STATUS_ERROR_OPERATION_FAILED, can continue in other cases.

artem-shaporenko · 2017-09-12T10:06:21Z

"Just for curious, do you have a case that adds a decoder context and a encoder context in the same Multi-Frame context in practice?"
Not now

xhaihao · 2017-09-13T02:47:24Z

@sreerenjb @lizhong1008 @xuguangxin Could you help to review the patches? How do you think if adding multi-frame processing in gstreamer-vaapi, FFmpeg and libyami?

xhaihao · 2017-09-13T02:49:12Z

@artem-shaporenko Could you update the comment in va.h too when you update the description on github?

artem-shaporenko · 2017-09-13T05:26:41Z

@xhaihao Do you think it'll be useful to put full description into comments in va.h?

artem-shaporenko · 2017-09-13T06:24:38Z

"gstreamer-vaapi, FFmpeg and libyami"
ffmpeg will add support for sure(at least MSDK path) - we are working on it.

xhaihao · 2017-09-14T03:13:31Z

@artem-shaporenko We can use doxygen to generate the document for user in libva, so you should document how to use multi frame processing clearly in va.h.

lizhong1008 · 2017-09-18T07:06:49Z

va/va.h

+ *  and other contexts passed this call, rejected context can continue work in stand-alone mode.
+ *  VA_STATUS_ERROR_UNSUPPORTED_PROFILE - Current context with Particular VAEntrypoint is supported
+ *  but VAProfile is not supported(so for example H264 encode or FEI is supported, but H265/Mpeg2 not
+ *  supported at a moment). Application can continue with current mf_context and other contexts passed


“H264 encode or FEI is supported”，is this a typo? should be "H264 encoder of FEI is supported" or "H264 encode without FEI is supported"?

I don't think it is a good idea to list which profile/entrypoint is not supported or not support in the API file. It means you must update this comments when driver behavior has been changed.
Why don't define a vaQueryXXX interface (just like vaQueryConfigEntrypoints) to access drive limitation and capacity? Before calling vaCreateMFContext and vaMFAddContext, we can query firstly, then we can know what feature is supported and avoid many inexplicable error.

“H264 encode or FEI is supported”，is this a typo? should be "H264 encoder of FEI is supported" or "H264 encode without FEI is supported"?

No It's not a typo, it is just an example, I can change to exact entry point names so it is more clear

I don't think it is a good idea to list which profile/entrypoint is not supported or not support in the API file. It means you must update this comments when driver behavior has been changed.

One more time it is not listing of supported features, but examples(covered by 'for example').

I don't think it is a good idea to list which profile/entrypoint is not supported or not support in the API file. It means you must update this comments when driver behavior has been changed.

As by architecture and assumed design there are not issue to continue with single context mode without Multi-Frame, if Multi-Frame is not supported - why add excessive APIs?

inexplicable error

Errors are explicable as described.

"One more time it is not listing of supported features, but examples(covered by 'for example')"
Yes, I know they are examples, but I think they are examples of some driver (iHD driver) implementation status (Please correct me if I am wrong).
"so for example H264 encode or FEI is supported, but H265/Mpeg2 not supported at a moment",maybe h264/mpeg2 are supported one day, will it be necessary to change this comment?

"As by architecture and assumed design there are not issue to continue with single context mode without Multi-Frame, if Multi-Frame is not supported - why add excessive APIs?"
Here the vaQueryXXX interface I mean to to get Multi-Frame-Processing (MFP) capacity not for singe context mode. It looks like there are many limitations for iHD driver implementation of MFP as your examples in the comments here.

Ok, yes, current iHD driver doesn't support mpeg2 and H265, will it be better to remove example comment or change it to something else not related to current iHD driver support, like "HEVC supported, but Mpeg2 not supported" or put particular entry point name?

Limitations are shown by particular implementation, we can introduce query interface, but how does this help, how this function will be used from an application, what is benefit over current suggested path?

In general I don't object adding some query function, but it is not necessarily required here as can work without it - can be added later as improvement. Posted some code examples below to show why I don't think query function will be really useful in general.

lizhong1008 · 2017-09-18T07:56:28Z

va/va.h

+ * All contexts passed should be associated through vaMFAddContext
+ * and call sequence Begin/Render/End performed.
+ * This call is non-blocking. The client can start another 
+ * Begin/Render/End/vaMFSubmit sequence on a different render targets.


As current definition of VAEndPicture, if VAEndPicture is called, the server should start processing all pending operations. Seems it will conflict with vaMFSubmit, because before vaMFSubmit, VAEndPicture has processed current frame. The mean it requires change current implementation of VAEndPicture, or VAEndPicture should not be called for the case of multi-frame. Am I right?

It doesn't require change to vaEndPicture, but to context working with multiframe.
So context knows that it is working through multi-frame and doesn't perform submission during vaEndPicture

"doesn't perform submission during vaEndPicture" seems conflict with the definition of VAEndPicture: " if VAEndPicture is called, the server should start processing all pending operations". Am I right? Any driver implementation should follow and make sure without any conflict with API definition.

Yes, it conflicts with definition of vaEndPicture, but the same time all using vaMFSubmit will be aware of this change to behavior.

so maybe there are two ways to avoid this conflict: 1. change the definition of VAEndPicture, add one more comment to indicate vaEndPicture won't process the pending operations in the MFP cases. 2. In the MFP case, no need to call vaEndPicture.

Isn't it enough to have comment for vaMFSubmit?

Posted example below why vaEndPicture behavior should be the way it described in vaMFSubmit

lizhong1008 · 2017-09-18T08:25:02Z

@artem-shaporenko,

Multi-frame processing is a good feature, but seems this pull request has not described it well. Let me take an example, "Number of contexts per Multi-Frame context - should not be limited." How we know what is the maximum number of context?
We have a libva sync meeting, do you think it is a good idea to invert you to present this feature? It should can help us to know more details.
There is a rule, if a new feature is introduced to VAAPI, there should be an open source driver which has already implemented it (and would better an open source middleware has verified it), otherwise it would be merged.
So, what feature of multi-frame has been verified by driver and middleware? decoder, encoder, vpp or the three combined case? Only encoder case verified, right? It is risky for other cases. Let me take h264 decoder as example, h264 has maximal 16 reference pictures, will it decoded error if multi-frames are send but missed some reference picture? It is also risky for multi-frame case combined encoder and vpp.
FFmpeg has two paths, one is ffmpeg-qsv which is msdk path you mentioned. The other is ffmpeg-vaapi which will call libva directly, for this path, it requires to understand this API accurately.

artem-shaporenko · 2017-09-18T09:35:00Z

@lizhong1008 Yes we can meet and discuss.

Multi-frame processing is a good feature, but seems this pull request has not described it well. Let me take an example, "Number of contexts per Multi-Frame context - should not be limited." How we know what is the maximum number of context?
We have a libva sync meeting, do you think it is a good idea to invert you to present this feature? It should can help us to know more details.

Number of context added is not limited, so any number of context can be added.

There is a rule, if a new feature is introduced to VAAPI, there should be an open source driver which has already implemented it (and would better an open source middleware has verified it), otherwise it would be merged.
So, what feature of multi-frame has been verified by driver and middleware? decoder, encoder, vpp or the three combined case? Only encoder case verified, right? It is risky for other cases. Let me take h264 decoder as example, h264 has maximal 16 reference pictures, will it decoded error if multi-frames are send but missed some reference picture? It is also risky for multi-frame case combined encoder and vpp.

It will come together with iHD driver opensource implementation and Media SDK implementation over it.

Multi-Frame API not dependent on reference frame and anything else, depending on driver implementation it can support particular vaEntryPoint/Profiles/etc. Any issues with parameters for decode/vpp/encode either reference picture or anything else should be reported during vaBeginPicture/vaRenderPicture/vaEndPicture, vaMFSubmit only report issues associated with multi-frame submission. Such architecture allows to keep minimum changes for both driver and application.
Somehow missed this explanation - will add.

xhaihao · 2017-09-19T01:27:06Z

@fhvwy could you help to provide your input from FFmpeg side?

artem-shaporenko · 2017-09-19T10:40:08Z

Regarding query and vaEndPicture, here are simple schematic code examples(processing is shown as serial, in real case it most likely will be multi-thread, but general code flow is something like below):

Showing code for multiple context before adding MFP support.
Added MFP support - showing current expected example
Added Query function - just only MFContext logic

1. multiple context before adding MFP support.

VADisplay dpy;
VAContextID contexts[N];//all available contexts
VAStatus sts = VA_STATUS_SUCCESS;
....
//initialize display, etc.
.....
for( int i=0; i<N; i++)
{
VAConfigID stream_config;
...
//fill config
...
sts = vaCreateConfig(dpy,...., &stream_config);
CHECKSTS(sts);
sts = vaCreateContext(dpy, stream_config, ...., &contexts[i]);
CHECKSTS(sts);
}

....
// prepare streams ready to submit frames, fill contexts vector.
....
//runtime submission
for( int i=0; i<N; i++ )
{
sts = vaBeginPicture(dpy, contexts[i], in_surf);
CHECKSTS(sts);
sts = vaRenderPicture(dpy, contexts[i], buffers, buffer_count);
CHECKSTS(sts);
sts = vaEndPicture(dpy, contexts[i]); // Delay real submission untill vaMFSubmit()
CHECKSTS(sts);
}

2. MFP flow example with changes in bold, clearly visible - MFP is added in addition to current code and not breaking logic for contexts working not through MFP

VADisplay dpy;
VAMFContextID mf_context;
VAContextID contexts[N];//all available contexts
bool added_contexts[N];
VAStatus sts = VA_STATUS_SUCCESS;
bool mfSupported;
....
//initialize display, etc.
...
sts = vaCreateMFContext(dpy, &mf_context);
mfSupported = (sts == VA_STATUS_SUCCESS);
.....
for( int i=0; i<N; i++)
{
VAConfigID stream_config;
...
//fill config
...
sts = vaCreateConfig(dpy,...., &stream_config);
CHECKSTS(sts);
sts = vaCreateContext(dpy, stream_config, ...., &contexts[i]);
CHECKSTS(sts);
sts = vaAddContext(dpy, &contexts[i], mf_context); // Add this context into the Multi-Frame context
added_contexts[i] = (sts == VA_STATUS_SUCCESS); // save context state - if it is available for multi-frame or not.
}

VAContextID submitContexts[N]; //frame contexts - to submit, M - max number of frames to submit, number of overall contexts associated with Multi-Frame can be bigger than maximum number of frames to submit
....
// prepare streams ready to submit frames, fill contexts vector.
....
//runtime
int M=0;
for( int i=0; i<N; i++ )
{
sts = vaBeginPicture(dpy, contexts[i], in_surf);
CHECKSTS(sts);
sts = vaRenderPicture(dpy, contexts[i], buffers, buffer_count);
CHECKSTS(sts);
sts = vaEndPicture(dpy, contexts[i]); // Delay real submission untill vaMFSubmit() if context is applicable
CHECKSTS(sts);
if(added_contexts[N])
submitContexts[M++] = contexts[i];
}

if(mfSupported)
{
sts = vaMFSubmit(dpy, mf_context, contexts, M); // Submit through Multi-Frame context
}

3. Added Query function - just only MFContext logic, addion from(2) shown in bold. Clearly visible that any query function added will be just an addition above context management logic, can be useful for debug at some cases, but in general complicating code and requires additional support from driver.

VADisplay dpy;
VAMFContextID mf_context;
VAContextID contexts[N];//all available contexts
bool added_contexts[N];
VAStatus sts = VA_STATUS_SUCCESS;
bool mfSupported;
MFQueryParam query_param;//some list of paraters to report
....
//initialize display, etc.
...
sts = vaQueryMF(dpy, &query_param);//some vaQueryXXX function.
...
//check multi-frame operation support based on query functionality
mfSupported = CheckFoo(query_param);
...
if(mfSupported)
{
sts = vaCreateMFContext(dpy, &mf_context);
mfSupported = mfSupported && (sts == VA_STATUS_SUCCESS);
}
.....
for( int i=0; i<N; i++)
{
VAConfigID stream_config;
...
//fill config
...
sts = vaCreateConfig(dpy,...., &stream_config);
CHECKSTS(sts);
sts = vaCreateContext(dpy, stream_config, ...., &contexts[i]);
CHECKSTS(sts);
if(mfSupported)
{
sts = vaAddContext(dpy, &contexts[i], mf_context); // Add this context into the Multi-Frame context
added_contexts[i] = (sts == VA_STATUS_SUCCESS); // save context state - if it is available for multi-frame or not.
}
}

VAContextID submitContexts[N]; //frame contexts - to submit, M - max number of frames to submit, number of overall contexts associated with Multi-Frame can be bigger than maximum number of frames to submit
....
// prepare streams ready to submit frames, fill contexts vector.
....
//runtime
int M=0;
for( int i=0; i<N; i++ )
{
sts = vaBeginPicture(dpy, contexts[i], in_surf);
CHECKSTS(sts);
sts = vaRenderPicture(dpy, contexts[i], buffers, buffer_count);
CHECKSTS(sts);
sts = vaEndPicture(dpy, contexts[i]); // Delay real submission untill vaMFSubmit() if context is applicable
CHECKSTS(sts);
if(added_contexts[N])
submitContexts[M++] = contexts[i];
}

if(mfSupported)
{
sts = vaMFSubmit(dpy, mf_context, contexts, M); // Submit through Multi-Frame context
CHECKSTS(sts);
}

artem-shaporenko · 2017-10-06T19:33:00Z

@xhaihao, @lizhong1008 , @sreerenjb, any other concern on API?
Query and tool in va_utils will come in later pull requests.

lizhong1008 · 2017-10-11T08:27:06Z

@artem-shaporenko LGTM for current API definition, will review your pull request of sample code.

artem-shaporenko · 2017-10-11T09:01:49Z

@lizhong1008 What LGTM mean?

artem-shaporenko · 2017-10-11T13:48:26Z

Ok, got it - "looks good for me"

lizhong1008 · 2017-10-12T02:36:06Z

@artem-shaporenko , yes it means "looks good to me"

artem-shaporenko · 2017-10-12T10:38:12Z

@xhaihao if there are no concerns can we merge this PR into v2.0-next so it can be used

artem-shaporenko · 2017-10-19T09:20:31Z

combined into single commit to avoid API change in different commits.
@xhaihao please update v2.0 next branch

xhaihao · 2017-10-24T21:33:52Z

va/va_backend.h

+		VAContextID *contexts,
+		int num_contexts
+	);
+
 	VAStatus (*vaCreateBuffer) (


Please add the new callback function pointers after the existing callback functions and reduce the reserved bytes. otherwise it will break compatibility between libva and driver.

xhaihao · 2017-10-25T16:34:06Z

va/va_backend.h

        /** \brief Reserved bytes for future use, must be zero */
-        unsigned long reserved[64];
+        unsigned long reserved[63];


The new array should be an array of 60 unsigned long integers because you add 4 callback functions.

Yes, sorry. done, not 60, but 56 as long is 32 bit, but pointer is 64 bit for linux 64

sreerenjb · 2017-10-25T18:42:54Z

@artem-shaporenko @xhaihao

I still have few more concerns.

I would like to take a step back and think what we really trying to achieve here: IIUC, the intention is to utilize the extra hardware blocks more efficiently.
Let's talk about this based on a simple use case: assume the Platform has 4 VMEs and we are trying to achieve maximum efficiency in the parallel encoding of 4 streams.
Here there is no way middleware components can know the capability of actual hardware(how many units available, cost of hw thread instantiation and how hw is utilizing them etc).
Which means, only the underlined driver can do optimization based on HW capability. If that is the case, I am wondering what is the real use case
of these APIs? Middleware has no idea about the cases(combinations of operations) where hw can actually do optimization in parallel operations,
It can only perform some random job scheduling with the assumption of hardware can perform better, isn't it?
Or do we need to assume that hw can always perform better if submit the jobs together? Then the next question is, how many jobs in single go can give better results :)

On the other hand, if the driver really knows the hw capabilities, why shouldn't it possible to schedule tasks in a better way without having
any specific APIs? Let's take the previous example itself. The first encode job submitted by user space is running on first VME block and rest of them are idle(assuming hw not instantiated parallel thread for other VMEs). Now if the user requesting for a second encode (either from a same process or as a different process), driver should be internally able to monitor the hardware
utilization and assigns the second VME block for the new encode job.

Of course the new apis bring the benefits of "more gpu tasks in single batchbuffer and decrease the cpu overload". My question is, how much performance difference
it can really bring comparing with the idea of "driver internally handle all resource utilization without new APIs but with multiple batch buffers".

Please correct me If I am wrong about the understanding of hw block utilization.

fhvwy · 2017-10-29T16:33:57Z

@artem-shaporenko Can you clarify the behaviour of this API with respect to pipelining and dependencies between operations? To offer some specific examples, do the following two cases work:

Internal dependency:

Create a scaler and an encoder context and add them to an MF context.
Submit frames to the scaler, submit the scaler output to the encoder (in the same way that pipelining currently works).
Submit the MF context.

External dependency:

Create two each of scaler and encoder contexts.
Create two MF contexts, add the two scalers to one of them and the two encoders to the other.
Submit frames to both scalers, submit the scaler outputs to the encoders.
Submit the two MF contexts, in either order.

artem-shaporenko · 2017-10-30T14:38:43Z

@sreerenjb you are right that performance improvement will depend on VME amount(and other factors depending on target HW). This is why we need to put more effort to implement proper query function that report enough data for different situation so application/middleware can properly manage multi-frame submission. In general talking about performance - anyway developer need to test how it works on different HW to make sure performance is in expected range. So the same is without multi-frame, you need to test how many parallel transcodings you need to do on different HW and don't have any report from driver about it.

artem-shaporenko · 2017-10-30T15:00:26Z

@fhvwy

Create a scaler and an encoder context and add them to an MF context.
Submit frames to the scaler, submit the scaler output to the encoder (in the same way that pipelining currently works).
Submit the MF context.

In this case driver should return an error - VA_STATUS_ERROR_INVALID_CONTEXT, there should not be any dependencies between 2 tasks combined into one operation.

External dependency:
Create two each of scaler and encoder contexts.
Create two MF contexts, add the two scalers to one of them and the two encoders to the other.
Submit frames to both scalers, submit the scaler outputs to the encoders.
Submit the two MF contexts, in either order.

In this case behavior will be the same as in single frame mode - if frame order is right(VP - first, encoder - second) - will be encoded properly, otherwise output will be broken

artem-shaporenko · 2017-11-13T12:52:25Z

@xhaihao Can you please merge latest update for VTable changes into v2.0-next branch?

xhaihao · 2017-11-13T23:50:44Z

va/va_backend.h

        /** \brief Reserved bytes for future use, must be zero */
-        unsigned long reserved[64];
+        unsigned long reserved[56];


You added 4 hook functions only, so the reserved bytes should be 60 * sizeof(unsigned long) now, not 56 * sizeof(unsigned long).

unsigned long is 32 bit isn't it?

I changed that to be [60], but please make sure you won't get it broken

unsigned long is 32bit on 32bit OS and 64bit on 64bit OS

xhaihao · 2017-11-14T00:05:44Z

va/va.c

@@ -439,6 +442,7 @@ static VAStatus va_openDriver(VADisplay dpy, char *driver_name)
                    CHECK_VTABLE(vaStatus, ctx, BeginPicture);
                    CHECK_VTABLE(vaStatus, ctx, RenderPicture);
                    CHECK_VTABLE(vaStatus, ctx, EndPicture);
+                    CHECK_VTABLE(vaStatus, ctx, MFSubmit);
                    CHECK_VTABLE(vaStatus, ctx, SyncSurface);


The new functions are not mandatory to implement, could you remove the checks for the new functions ?

xhaihao · 2017-11-14T00:09:10Z

va/va.c

+
+    CHECK_DISPLAY(dpy);
+    ctx = CTX(dpy);
+    CHECK_VTABLE(vaStatus, ctx, CreateMFContext);


Here vaStatus is set to VA_STATUS_ERROR_UNKNOWN if vaCreateMFContext is not implemented by the backend driver, however vaCreateMFContext is not a mandatory function, so please return VA_STATUS_ERROR_UNIMPLEMENTED instead of VA_STATUS_ERROR_UNKNOWN. The same change should be applied to other new functions

xhaihao · 2017-11-14T00:16:07Z

va/va.c

@@ -1225,7 +1225,7 @@ VAStatus vaMFSubmit (
 )
 {
    VADriverContextP ctx;
-    VAStatus vaStatus;
+    VAStatus vaStatus = VA_STATUS_SUCCESS;

    CHECK_DISPLAY(dpy);
    ctx = CTX(dpy);


It is not needed if returning VA_STATUS_ERROR_UNIMPLEMENTED at once when ctx->vtable->vaCreateMFContext is NULL,

BTW Could you please to remove all trailing whitespaces in this patch, then squash the two commits into one commit?

xhaihao · 2017-11-14T19:38:49Z

va/va.c

+    CHECK_DISPLAY(dpy);
+    ctx = CTX(dpy);
+    if(ctx->vtable->vaCreateMFContext == NULL)
+        vaStatus = VA_STATUS_ERROR_UNIMPLEMENTED;


*mf_context may be a random value for this case, so va_TraceCreateMFContext shouldn't be called when ctx->vtable->vaCreateMFContext is a NULL pointer.

agree, fixed

…, FEI Encode/ENC/Pre-ENC, and VPP in future. Signed-off-by: Artem Shaporenko [email protected]

artem-shaporenko changed the title ~~New VAAPI definition for multi-frame processing applicable for Encode…~~ New VAAPI definition for multi-frame processing Sep 6, 2017

xhaihao reviewed Sep 8, 2017

View reviewed changes

artem-shaporenko force-pushed the libva_mfe branch from c6ad65f to 31ebb4e Compare September 8, 2017 17:11

xhaihao requested review from sreerenjb, xuguangxin, lizhong1008 and xhaihao September 13, 2017 02:44

artem-shaporenko closed this Sep 13, 2017

artem-shaporenko reopened this Sep 13, 2017

lizhong1008 reviewed Sep 18, 2017

View reviewed changes

xhaihao requested a review from fhvwy September 19, 2017 01:23

xhaihao added this to the libva 2.1 release milestone Oct 13, 2017

artem-shaporenko force-pushed the libva_mfe branch from aeb5067 to f192640 Compare October 19, 2017 09:04

xhaihao reviewed Oct 24, 2017

View reviewed changes

artem-shaporenko force-pushed the libva_mfe branch from f192640 to e2d1956 Compare October 25, 2017 04:11

xhaihao reviewed Oct 25, 2017

View reviewed changes

artem-shaporenko force-pushed the libva_mfe branch from e2d1956 to 087a90b Compare October 30, 2017 14:49

xhaihao reviewed Nov 13, 2017

View reviewed changes

xhaihao reviewed Nov 14, 2017

View reviewed changes

artem-shaporenko force-pushed the libva_mfe branch 2 times, most recently from f724da3 to aa3d902 Compare November 14, 2017 08:17

xhaihao reviewed Nov 14, 2017

View reviewed changes

New VAAPI definition for multi-frame processing applicable for Encode…

bd52f82

…, FEI Encode/ENC/Pre-ENC, and VPP in future. Signed-off-by: Artem Shaporenko [email protected]

artem-shaporenko force-pushed the libva_mfe branch from aa3d902 to bd52f82 Compare November 15, 2017 08:25

Merge branch 'master' into libva_mfe

5e30ac8

xhaihao approved these changes Nov 21, 2017

View reviewed changes

xhaihao merged commit df192cf into intel:master Nov 22, 2017

New VAAPI definition for multi-frame processing #112

New VAAPI definition for multi-frame processing #112

Conversation

artem-shaporenko commented Sep 6, 2017 • edited Loading

artem-shaporenko commented Sep 6, 2017 • edited Loading

VAStatus vaCreateMFContext (VADisplay dpy, VAMFContextID *mf_context);

Description:

Arguments:

Return value:

VAStatus vaMFAddContext (VADisplay dpy, VAMFContextID mf_context, VAContextID context);

Description:

Arguments:

Return value:

Limitations:

VAStatus vaMFReleaseContext (VADisplay dpy, VAMFContextID mf_context, VAContextID context);

Description:

Arguments:

Return value:

VAStatus vaMFSubmit (VADisplay dpy, VAMFContextID mf_context, VAContextID *contexts, int num_contexts);

Description:

Arguments:

Return value:

Schematic code flow examples:

1. multiple context before adding MFP support

2. MFP flow example

xhaihao commented Sep 8, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xhaihao commented Sep 8, 2017

artem-shaporenko commented Sep 8, 2017

xhaihao commented Sep 12, 2017

xhaihao commented Sep 12, 2017 • edited Loading

artem-shaporenko commented Sep 12, 2017

artem-shaporenko commented Sep 12, 2017 • edited Loading

xhaihao commented Sep 13, 2017

xhaihao commented Sep 13, 2017

artem-shaporenko commented Sep 13, 2017

artem-shaporenko commented Sep 13, 2017

xhaihao commented Sep 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artem-shaporenko Sep 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lizhong1008 commented Sep 18, 2017 • edited Loading

artem-shaporenko commented Sep 18, 2017

xhaihao commented Sep 19, 2017

artem-shaporenko commented Sep 19, 2017

1. multiple context before adding MFP support.

2. MFP flow example with changes in bold, clearly visible - MFP is added in addition to current code and not breaking logic for contexts working not through MFP

artem-shaporenko commented Oct 6, 2017

lizhong1008 commented Oct 11, 2017

artem-shaporenko commented Oct 11, 2017

artem-shaporenko commented Oct 11, 2017

lizhong1008 commented Oct 12, 2017

artem-shaporenko commented Oct 12, 2017 • edited by chivakker Loading

artem-shaporenko commented Oct 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sreerenjb commented Oct 25, 2017

fhvwy commented Oct 29, 2017

artem-shaporenko commented Oct 30, 2017

artem-shaporenko commented Sep 6, 2017 •

edited

Loading

artem-shaporenko commented Sep 6, 2017 •

edited

Loading

xhaihao commented Sep 8, 2017 •

edited

Loading

xhaihao commented Sep 12, 2017 •

edited

Loading

artem-shaporenko commented Sep 12, 2017 •

edited

Loading

artem-shaporenko Sep 18, 2017 •

edited

Loading

lizhong1008 commented Sep 18, 2017 •

edited

Loading

artem-shaporenko commented Oct 12, 2017 •

edited by chivakker

Loading

xhaihao Nov 14, 2017 •

edited

Loading