Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New VAAPI definition for multi-frame processing #112

Merged
merged 2 commits into from
Nov 22, 2017

Conversation

artem-shaporenko
Copy link
Contributor

@artem-shaporenko artem-shaporenko commented Sep 6, 2017

New VAAPI definition for multi-frame processing applicable for different entry points.

Signed-off-by: Artem Shaporenko [email protected]

@artem-shaporenko
Copy link
Contributor Author

artem-shaporenko commented Sep 6, 2017

Multi-frame processing is an optimization for multi-stream transcoding scenario, that allows to combine several jobs from different parallel pipelines inside one GPU task execution to better reuse engines that can't be loaded fully in single frame case, as well as decrease CPU overhead for task submission.

VAStatus vaCreateMFContext (VADisplay dpy, VAMFContextID *mf_context);

Description:

Entry point to create new Multi-Frame context interface encapsulating common for all streams memory objects and structures required for single GPU task submission from several VAContextID's.
Allocation: This call only creates an instance, doesn't allocate any additional memory.
Support identification: Application can identify multi-frame feature support by ability to create multi-frame context. If driver supports multi-frame - call successful, mf_context != NULL and VAStatus = VA_STATUS_SUCCESS, otherwise if multi-frame processing not supported driver returns VA_STATUS_ERROR_UNIMPLEMENTED and mf_context = NULL.

Arguments:

VADisplay dpy - display adapter.
VAMFContextID *mf_context - Multi-Frame context encapsulating all associated context for multi-frame submission.

Return value:

VA_STATUS_SUCCESS - operation successful.
VA_STATUS_ERROR_UNIMPLEMENTED - no support for multi-frame.

VAStatus vaMFAddContext (VADisplay dpy, VAMFContextID mf_context, VAContextID context);

Description:

Provide ability to associate each context used for submission and common Multi-Frame context.
Query - try to add context to understand if it is supported.
Allocation: this call allocates and/or reallocates all memory objects common for all streams associated with particular Multi-Frame context.
All memory required for each context(pixel buffers, per stream buffers such as PAKObj, motion vector buffers, predictor buffers, etc.) allocated during standard vaCreateContext call for each stream.
Runtime dependency - if current implementation doesn't allow to run different entry points/profile, first context added will set entry point/profile for whole Multi-Frame context, all other entry points and profiles will be rejected to be added.

Arguments:

VADisplay dpy - display adapter.
VAContextID context - context being associated with Multi-Frame context.
VAMFContextID mf_context - multi-frame context used to associate other contexts and for multi-frame submission.

Return value:

VA_STATUS_SUCCESS - operation successful, context was added.
VA_STATUS_ERROR_OPERATION_FAILED - something unexpected happened - application have to close current mf_context and associated contexts and start working with new ones.
VA_STATUS_ERROR_INVALID_CONTEXT - ContextID is invalid - based on implementation driver can support different entry points inside one multi-frame context, or can reject support and return this error, error means entry point and/or profile contradicts with previously added, application can continue with current mf_context and other contexts passed this call, rejected context can continue work in stand-alone mode or other mf_context.
VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT - particular VAContextID created with VAConfigID associated with VAEntrypoint that is not supported, as an example VAEntrypointVLD not supported(as decoder doesn't make sense for Multi-Frame operation), VAEntrypointVideoProc and/or VAEntrypointFEI not supported at a moment, etc. Application can continue with current mf_context and other contexts passed this call, rejected context can continue work in stand-alone mode.
VA_STATUS_ERROR_UNSUPPORTED_PROFILE - Current context created with particular VAEntrypoint is supported, but VAProfile is not supported. Application can continue with current mf_context and other contexts passed this call, rejected context can continue work in stand-alone mode.

Limitations:

Number of contexts per Multi-Frame context - should not be limited.
Based on implementation - can limit: different VAEntrypoint or VAProfile inside one multi-frame context, or limit support of particular VAEntrypoint/VAProfile for multi-frame operation.

VAStatus vaMFReleaseContext (VADisplay dpy, VAMFContextID mf_context, VAContextID context);

Description:

Provides ability to remove association between Multi-Frame context and particular encode context. This is not necessary, can be removed, as the same can be done through vaDestroyContext, however such support brings hidden from App, unclear logic in driver.

Arguments:

VADisplay dpy - display adapter.
VAContextID context - context being removed from Multi-Frame context.
VAMFContextID mf_context - Multi-Frame context used to associate context and for submission.

Return value:

VA_STATUS_SUCCESS - operation successful, context was removed.
VA_STATUS_ERROR_OPERATION_FAILED - something unexpected happened.

VAStatus vaMFSubmit (VADisplay dpy, VAMFContextID mf_context, VAContextID *contexts, int num_contexts);

Description:

Provides ability to submit frames from multiple contexts streams for execution.

Arguments:

VADisplay dpy - display adapter.
VAMFContextID mf_context - Multi-Frame context used for submission.
VAContextID *contexts - contexts with frames ready for submission.
int num_contexts - number of context in contexts.

Return value:

VA_STATUS_SUCCESS - operation successful, context was removed.
VA_STATUS_ERROR_INVALID_CONTEXT - mf_context or one of contexts are invalid due to mf_context not created or one of contexts not assotiated with mf_context through vaAddContext.
VA_STATUS_ERROR_INVALID_PARAMETER - one of context has not submitted it's frame through vaBeginPicture vaRenderPicture vaEndPicture call sequence.

Schematic code flow examples:

1. multiple context before adding MFP support

Simplified scheme assuming all streams are running in the same thread, in the reality there are no big amount of use cases where it goes this path.

VADisplay dpy;
VAContextID contexts[N];//all available contexts
VAStatus sts = VA_STATUS_SUCCESS;
....
//initialize display, etc.
.....
for( int i=0; i<N; i++)
{
    VAConfigID stream_config;
    ...
    //fill config
    ...
    sts = vaCreateConfig(dpy,...., &stream_config);
    CHECKSTS(sts);
    sts = vaCreateContext(dpy, stream_config, ...., &contexts[i]);
    CHECKSTS(sts);
}

....
// prepare streams ready to submit frames, fill contexts vector.
....
//runtime submission
for( int i=0; i<N; i++ )
{
    sts = vaBeginPicture(dpy, contexts[i], in_surf);
    CHECKSTS(sts);
    sts = vaRenderPicture(dpy, contexts[i], buffers, buffer_count);
    CHECKSTS(sts);
    sts = vaEndPicture(dpy, contexts[i]); // Delay real submission untill vaMFSubmit()
    CHECKSTS(sts);
}

2. MFP flow example

MFP is added in addition to current code and not breaking logic for contexts working not through MFP. In reality it will be more complicated as multiple contexts are working in different threads for multi-stream transcoding, so real vaMFSubmit require to sync all threads before task submission, for cases with transcoding from single source to multiple output it is relatively simple as source is synchronization point.

VADisplay dpy;
VAMFContextID mf_context;
VAContextID contexts[N];//all available contexts
bool contexts_for_mf[N];
VAStatus sts = VA_STATUS_SUCCESS;
bool mfSupported;
....
//initialize display, etc.
...
sts = vaCreateMFContext(dpy, &mf_context);
mfSupported = (sts == VA_STATUS_SUCCESS);
...
for( int i=0; i<N; i++)
{
    VAConfigID stream_config;
    ...
    //fill config
    ...
    sts = vaCreateConfig(dpy,...., &stream_config);
    CHECKSTS(sts);
    sts = vaCreateContext(dpy, stream_config, ...., &contexts[i]);
    CHECKSTS(sts);
   sts = vaAddContext(dpy, mf_context, &contexts[i]); // Add this context into the Multi-Frame context
    added_contexts[i] = (sts == VA_STATUS_SUCCESS); // save context state - if it is available for multi-frame or not.
}

VAContextID submitContexts[N]; //frame contexts - to submit, M - max number of frames to submit, number of overall contexts associated with Multi-Frame can be bigger than maximum number of frames to submit
....
// prepare streams ready to submit frames, fill contexts vector.
....
//runtime
int M=0;
for( int i=0; i<N; i++ )
{
    sts = vaBeginPicture(dpy, contexts[i], in_surf);
    CHECKSTS(sts);
    sts = vaRenderPicture(dpy, contexts[i], buffers, buffer_count);
    CHECKSTS(sts);
    sts = vaEndPicture(dpy, contexts[i]); // Delay real submission untill vaMFSubmit() if context is applicable
    CHECKSTS(sts);
    if(contexts_for_mf[N])
        submitContexts[M++] = contexts[i];
}

if(mfSupported)
{
    sts = vaMFSubmit(dpy, mf_context, contexts, M); // Submit through Multi-Frame context
}

@artem-shaporenko artem-shaporenko changed the title New VAAPI definition for multi-frame processing applicable for Encode… New VAAPI definition for multi-frame processing Sep 6, 2017
@xhaihao
Copy link
Contributor

xhaihao commented Sep 8, 2017

  1. Could you please consider a more generic API? e.g. maybe a driverX can support Multi-frame processing for decode,

  2. Could you update the descriptor for vaMFAddContext/vaMFReleaseContext/vaMFSubmit etc? Currently these functions are limited for encoder context only. e.g.

Provide ability to associate each encoder context used for submission and common MF context
AContextID context - encoder context being removed from MFE context.

  1. Can two VA contexts which have different profile/entrypoint pairs be added into the same multi-frame context? e.g. one context for VPP and one context for HEVC encoding. I think it is reasonable to require both MF context and VA context have the same profile/entrypoint pair. How about to add parameters to pass profile/entrypoint to the driver when calling vaCreateMFContext(), or add a parameter to pass VA config ID to the driver? In this way, the driver can return the right value upon the input profile/entrypoint pair or VA config when calling vaCreateMFContext() and vaMFAddContext() etc.

  2. Batch buffer, EU etc should be platform dependent, We should add a generic API in libva, could you update the comment with some common words?

va/va.c Outdated
CHECK_DISPLAY(dpy);
ctx = CTX(dpy);

vaStatus = ctx->vtable->vaCreateMFContext( ctx, mf_context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check ctx->vtable->vaCreateMFContext against NULL first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIxed

va/va.c Outdated
CHECK_DISPLAY(dpy);
ctx = CTX(dpy);

vaStatus = ctx->vtable->vaMFAddContext( ctx, context, mf_context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check ctx->vtable->vaMFAddContext against NULL first and the similar check for other new APIs below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIxed

va/va.h Outdated
/**
* vaCreateMFContext - Create a multi-frame context
* Multi-frame context allow to run tasks from different
* contexts in single batch buffer for performance optimization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the underlying HW might not support batch buffer, could you use some common words?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIxed

va/va.h Outdated
VAStatus vaMFAddContext (
VADisplay dpy,
VAContextID context,
VAMFContextID mfe_context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please replace mfe_context with mf_context in this commit, we should use the same name style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIxed

@xhaihao
Copy link
Contributor

xhaihao commented Sep 8, 2017

There are some trailing whitespace, could you please delete those trailing whitespaces when updating your patch?

.git/rebase-apply/patch:136: trailing whitespace.

  • After association is done, current context will not submit
    .git/rebase-apply/patch:170: trailing whitespace.
  • Make the end of rendering for a pictures in contexts passed with submission.
    .git/rebase-apply/patch:174: trailing whitespace.
  • This call is non-blocking. The client can start another
    .git/rebase-apply/patch:178: trailing whitespace.
  • contexts: list of contexts submitting their tasks for multi-frame operation.
    warning: 4 lines add whitespace errors.

@artem-shaporenko
Copy link
Contributor Author

Thanks Haihao, I fixed issues. Also updated descriptions to remove uncertainty for different feature support.
About API itself:

  1. It can support decoder or anything(any entry point) created through vaCreateContext depending on underling driver/hw implementation.
  2. Fixed
  3. By design - yes any entrypoint/profile can be combined together, but this depend on driver implementation, so one implementation can require same entrypoint/profile and reject adding contexts with different ones, another implementation can support different profiles and/or entry point. So it doesn't make sense to add entrypoint/profile to vaCreateMFContext
  4. Fixed

@xhaihao
Copy link
Contributor

xhaihao commented Sep 12, 2017

Just for curious, do you have a case that adds a decoder context and a encoder context in the same Multi-Frame context in practice?

@xhaihao
Copy link
Contributor

xhaihao commented Sep 12, 2017

How does application do when vaMFAddContext returns an error,

  1. try to add the remaining contexts to MF context
    or
  2. Release the MF context at once,
    or
  3. Don't add the remaining context and submit the current MFcontext?

Which one is right ?

@artem-shaporenko
Copy link
Contributor Author

Added notes to vaAddContext error code description - can't continue work if VA_STATUS_ERROR_OPERATION_FAILED, can continue in other cases.

@artem-shaporenko
Copy link
Contributor Author

artem-shaporenko commented Sep 12, 2017

"Just for curious, do you have a case that adds a decoder context and a encoder context in the same Multi-Frame context in practice?"
Not now

@xhaihao
Copy link
Contributor

xhaihao commented Sep 13, 2017

@sreerenjb @lizhong1008 @xuguangxin Could you help to review the patches? How do you think if adding multi-frame processing in gstreamer-vaapi, FFmpeg and libyami?

@xhaihao
Copy link
Contributor

xhaihao commented Sep 13, 2017

@artem-shaporenko Could you update the comment in va.h too when you update the description on github?

@artem-shaporenko
Copy link
Contributor Author

@xhaihao Do you think it'll be useful to put full description into comments in va.h?

@artem-shaporenko
Copy link
Contributor Author

"gstreamer-vaapi, FFmpeg and libyami"
ffmpeg will add support for sure(at least MSDK path) - we are working on it.

@xhaihao
Copy link
Contributor

xhaihao commented Sep 14, 2017

@artem-shaporenko We can use doxygen to generate the document for user in libva, so you should document how to use multi frame processing clearly in va.h.

va/va.h Outdated
* and other contexts passed this call, rejected context can continue work in stand-alone mode.
* VA_STATUS_ERROR_UNSUPPORTED_PROFILE - Current context with Particular VAEntrypoint is supported
* but VAProfile is not supported(so for example H264 encode or FEI is supported, but H265/Mpeg2 not
* supported at a moment). Application can continue with current mf_context and other contexts passed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“H264 encode or FEI is supported”,is this a typo? should be "H264 encoder of FEI is supported" or "H264 encode without FEI is supported"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is a good idea to list which profile/entrypoint is not supported or not support in the API file. It means you must update this comments when driver behavior has been changed.
Why don't define a vaQueryXXX interface (just like vaQueryConfigEntrypoints) to access drive limitation and capacity? Before calling vaCreateMFContext and vaMFAddContext, we can query firstly, then we can know what feature is supported and avoid many inexplicable error.

Copy link
Contributor Author

@artem-shaporenko artem-shaporenko Sep 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“H264 encode or FEI is supported”,is this a typo? should be "H264 encoder of FEI is supported" or "H264 encode without FEI is supported"?

No It's not a typo, it is just an example, I can change to exact entry point names so it is more clear

I don't think it is a good idea to list which profile/entrypoint is not supported or not support in the API file. It means you must update this comments when driver behavior has been changed.

One more time it is not listing of supported features, but examples(covered by 'for example').

I don't think it is a good idea to list which profile/entrypoint is not supported or not support in the API file. It means you must update this comments when driver behavior has been changed.

As by architecture and assumed design there are not issue to continue with single context mode without Multi-Frame, if Multi-Frame is not supported - why add excessive APIs?

inexplicable error

Errors are explicable as described.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"One more time it is not listing of supported features, but examples(covered by 'for example')"
Yes, I know they are examples, but I think they are examples of some driver (iHD driver) implementation status (Please correct me if I am wrong).
"so for example H264 encode or FEI is supported, but H265/Mpeg2 not supported at a moment",maybe h264/mpeg2 are supported one day, will it be necessary to change this comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"As by architecture and assumed design there are not issue to continue with single context mode without Multi-Frame, if Multi-Frame is not supported - why add excessive APIs?"
Here the vaQueryXXX interface I mean to to get Multi-Frame-Processing (MFP) capacity not for singe context mode. It looks like there are many limitations for iHD driver implementation of MFP as your examples in the comments here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yes, current iHD driver doesn't support mpeg2 and H265, will it be better to remove example comment or change it to something else not related to current iHD driver support, like "HEVC supported, but Mpeg2 not supported" or put particular entry point name?

Limitations are shown by particular implementation, we can introduce query interface, but how does this help, how this function will be used from an application, what is benefit over current suggested path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I don't object adding some query function, but it is not necessarily required here as can work without it - can be added later as improvement. Posted some code examples below to show why I don't think query function will be really useful in general.

va/va.h Outdated
* All contexts passed should be associated through vaMFAddContext
* and call sequence Begin/Render/End performed.
* This call is non-blocking. The client can start another
* Begin/Render/End/vaMFSubmit sequence on a different render targets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As current definition of VAEndPicture, if VAEndPicture is called, the server should start processing all pending operations. Seems it will conflict with vaMFSubmit, because before vaMFSubmit, VAEndPicture has processed current frame. The mean it requires change current implementation of VAEndPicture, or VAEndPicture should not be called for the case of multi-frame. Am I right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't require change to vaEndPicture, but to context working with multiframe.
So context knows that it is working through multi-frame and doesn't perform submission during vaEndPicture

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"doesn't perform submission during vaEndPicture" seems conflict with the definition of VAEndPicture: " if VAEndPicture is called, the server should start processing all pending operations". Am I right? Any driver implementation should follow and make sure without any conflict with API definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it conflicts with definition of vaEndPicture, but the same time all using vaMFSubmit will be aware of this change to behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so maybe there are two ways to avoid this conflict: 1. change the definition of VAEndPicture, add one more comment to indicate vaEndPicture won't process the pending operations in the MFP cases. 2. In the MFP case, no need to call vaEndPicture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it enough to have comment for vaMFSubmit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posted example below why vaEndPicture behavior should be the way it described in vaMFSubmit

@lizhong1008
Copy link
Contributor

lizhong1008 commented Sep 18, 2017

@artem-shaporenko,

  1. Multi-frame processing is a good feature, but seems this pull request has not described it well. Let me take an example, "Number of contexts per Multi-Frame context - should not be limited." How we know what is the maximum number of context?
    We have a libva sync meeting, do you think it is a good idea to invert you to present this feature? It should can help us to know more details.

  2. There is a rule, if a new feature is introduced to VAAPI, there should be an open source driver which has already implemented it (and would better an open source middleware has verified it), otherwise it would be merged.
    So, what feature of multi-frame has been verified by driver and middleware? decoder, encoder, vpp or the three combined case? Only encoder case verified, right? It is risky for other cases. Let me take h264 decoder as example, h264 has maximal 16 reference pictures, will it decoded error if multi-frames are send but missed some reference picture? It is also risky for multi-frame case combined encoder and vpp.

  3. FFmpeg has two paths, one is ffmpeg-qsv which is msdk path you mentioned. The other is ffmpeg-vaapi which will call libva directly, for this path, it requires to understand this API accurately.

@artem-shaporenko
Copy link
Contributor Author

@lizhong1008 Yes we can meet and discuss.

Multi-frame processing is a good feature, but seems this pull request has not described it well. Let me take an example, "Number of contexts per Multi-Frame context - should not be limited." How we know what is the maximum number of context?
We have a libva sync meeting, do you think it is a good idea to invert you to present this feature? It should can help us to know more details.

Number of context added is not limited, so any number of context can be added.

There is a rule, if a new feature is introduced to VAAPI, there should be an open source driver which has already implemented it (and would better an open source middleware has verified it), otherwise it would be merged.
So, what feature of multi-frame has been verified by driver and middleware? decoder, encoder, vpp or the three combined case? Only encoder case verified, right? It is risky for other cases. Let me take h264 decoder as example, h264 has maximal 16 reference pictures, will it decoded error if multi-frames are send but missed some reference picture? It is also risky for multi-frame case combined encoder and vpp.

It will come together with iHD driver opensource implementation and Media SDK implementation over it.

Multi-Frame API not dependent on reference frame and anything else, depending on driver implementation it can support particular vaEntryPoint/Profiles/etc. Any issues with parameters for decode/vpp/encode either reference picture or anything else should be reported during vaBeginPicture/vaRenderPicture/vaEndPicture, vaMFSubmit only report issues associated with multi-frame submission. Such architecture allows to keep minimum changes for both driver and application.
Somehow missed this explanation - will add.

@xhaihao xhaihao requested a review from fhvwy September 19, 2017 01:23
@xhaihao
Copy link
Contributor

xhaihao commented Sep 19, 2017

@fhvwy could you help to provide your input from FFmpeg side?

@artem-shaporenko
Copy link
Contributor Author

Regarding query and vaEndPicture, here are simple schematic code examples(processing is shown as serial, in real case it most likely will be multi-thread, but general code flow is something like below):

  1. Showing code for multiple context before adding MFP support.
  2. Added MFP support - showing current expected example
  3. Added Query function - just only MFContext logic

1. multiple context before adding MFP support.

VADisplay dpy;
VAContextID contexts[N];//all available contexts
VAStatus sts = VA_STATUS_SUCCESS;
....
//initialize display, etc.
.....
for( int i=0; i<N; i++)
{
VAConfigID stream_config;
...
//fill config
...
sts = vaCreateConfig(dpy,...., &stream_config);
CHECKSTS(sts);
sts = vaCreateContext(dpy, stream_config, ...., &contexts[i]);
CHECKSTS(sts);
}

....
// prepare streams ready to submit frames, fill contexts vector.
....
//runtime submission
for( int i=0; i<N; i++ )
{
sts = vaBeginPicture(dpy, contexts[i], in_surf);
CHECKSTS(sts);
sts = vaRenderPicture(dpy, contexts[i], buffers, buffer_count);
CHECKSTS(sts);
sts = vaEndPicture(dpy, contexts[i]); // Delay real submission untill vaMFSubmit()
CHECKSTS(sts);
}

2. MFP flow example with changes in bold, clearly visible - MFP is added in addition to current code and not breaking logic for contexts working not through MFP

VADisplay dpy;
VAMFContextID mf_context;
VAContextID contexts[N];//all available contexts
bool added_contexts[N];
VAStatus sts = VA_STATUS_SUCCESS;
bool mfSupported;
....
//initialize display, etc.
...
sts = vaCreateMFContext(dpy, &mf_context);
mfSupported = (sts == VA_STATUS_SUCCESS);
.....

for( int i=0; i<N; i++)
{
VAConfigID stream_config;
...
//fill config
...
sts = vaCreateConfig(dpy,...., &stream_config);
CHECKSTS(sts);
sts = vaCreateContext(dpy, stream_config, ...., &contexts[i]);
CHECKSTS(sts);
sts = vaAddContext(dpy, &contexts[i], mf_context); // Add this context into the Multi-Frame context
added_contexts[i] = (sts == VA_STATUS_SUCCESS); // save context state - if it is available for multi-frame or not.

}

VAContextID submitContexts[N]; //frame contexts - to submit, M - max number of frames to submit, number of overall contexts associated with Multi-Frame can be bigger than maximum number of frames to submit
....
// prepare streams ready to submit frames, fill contexts vector.
....
//runtime
int M=0;
for( int i=0; i<N; i++ )
{
sts = vaBeginPicture(dpy, contexts[i], in_surf);
CHECKSTS(sts);
sts = vaRenderPicture(dpy, contexts[i], buffers, buffer_count);
CHECKSTS(sts);
sts = vaEndPicture(dpy, contexts[i]); // Delay real submission untill vaMFSubmit() if context is applicable
CHECKSTS(sts);
if(added_contexts[N])
submitContexts[M++] = contexts[i];

}

if(mfSupported)
{
sts = vaMFSubmit(dpy, mf_context, contexts, M); // Submit through Multi-Frame context
}

3. Added Query function - just only MFContext logic, addion from(2) shown in bold. Clearly visible that any query function added will be just an addition above context management logic, can be useful for debug at some cases, but in general complicating code and requires additional support from driver.

VADisplay dpy;
VAMFContextID mf_context;
VAContextID contexts[N];//all available contexts
bool added_contexts[N];
VAStatus sts = VA_STATUS_SUCCESS;
bool mfSupported;
MFQueryParam query_param;//some list of paraters to report
....
//initialize display, etc.
...
sts = vaQueryMF(dpy, &query_param);//some vaQueryXXX function.
...
//check multi-frame operation support based on query functionality
mfSupported = CheckFoo(query_param);

...
if(mfSupported)
{

sts = vaCreateMFContext(dpy, &mf_context);
mfSupported = mfSupported && (sts == VA_STATUS_SUCCESS);
}
.....
for( int i=0; i<N; i++)
{
VAConfigID stream_config;
...
//fill config
...
sts = vaCreateConfig(dpy,...., &stream_config);
CHECKSTS(sts);
sts = vaCreateContext(dpy, stream_config, ...., &contexts[i]);
CHECKSTS(sts);
if(mfSupported)
{

sts = vaAddContext(dpy, &contexts[i], mf_context); // Add this context into the Multi-Frame context
added_contexts[i] = (sts == VA_STATUS_SUCCESS); // save context state - if it is available for multi-frame or not.
}
}

VAContextID submitContexts[N]; //frame contexts - to submit, M - max number of frames to submit, number of overall contexts associated with Multi-Frame can be bigger than maximum number of frames to submit
....
// prepare streams ready to submit frames, fill contexts vector.
....
//runtime
int M=0;
for( int i=0; i<N; i++ )
{
sts = vaBeginPicture(dpy, contexts[i], in_surf);
CHECKSTS(sts);
sts = vaRenderPicture(dpy, contexts[i], buffers, buffer_count);
CHECKSTS(sts);
sts = vaEndPicture(dpy, contexts[i]); // Delay real submission untill vaMFSubmit() if context is applicable
CHECKSTS(sts);
if(added_contexts[N])
submitContexts[M++] = contexts[i];
}

if(mfSupported)
{
sts = vaMFSubmit(dpy, mf_context, contexts, M); // Submit through Multi-Frame context
CHECKSTS(sts);
}

@artem-shaporenko
Copy link
Contributor Author

@xhaihao, @lizhong1008 , @sreerenjb, any other concern on API?
Query and tool in va_utils will come in later pull requests.

@lizhong1008
Copy link
Contributor

@artem-shaporenko LGTM for current API definition, will review your pull request of sample code.

@artem-shaporenko
Copy link
Contributor Author

@lizhong1008 What LGTM mean?

@artem-shaporenko
Copy link
Contributor Author

Ok, got it - "looks good for me"

@lizhong1008
Copy link
Contributor

@artem-shaporenko , yes it means "looks good to me"

@artem-shaporenko
Copy link
Contributor Author

artem-shaporenko commented Oct 12, 2017

@xhaihao if there are no concerns can we merge this PR into v2.0-next so it can be used

@artem-shaporenko
Copy link
Contributor Author

combined into single commit to avoid API change in different commits.
@xhaihao please update v2.0 next branch

va/va_backend.h Outdated
VAContextID *contexts,
int num_contexts
);

VAStatus (*vaCreateBuffer) (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the new callback function pointers after the existing callback functions and reduce the reserved bytes. otherwise it will break compatibility between libva and driver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

va/va_backend.h Outdated
/** \brief Reserved bytes for future use, must be zero */
unsigned long reserved[64];
unsigned long reserved[63];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new array should be an array of 60 unsigned long integers because you add 4 callback functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry. done, not 60, but 56 as long is 32 bit, but pointer is 64 bit for linux 64

@sreerenjb
Copy link
Contributor

@artem-shaporenko @xhaihao

I still have few more concerns.

I would like to take a step back and think what we really trying to achieve here: IIUC, the intention is to utilize the extra hardware blocks more efficiently.
Let's talk about this based on a simple use case: assume the Platform has 4 VMEs and we are trying to achieve maximum efficiency in the parallel encoding of 4 streams.
Here there is no way middleware components can know the capability of actual hardware(how many units available, cost of hw thread instantiation and how hw is utilizing them etc).
Which means, only the underlined driver can do optimization based on HW capability. If that is the case, I am wondering what is the real use case
of these APIs? Middleware has no idea about the cases(combinations of operations) where hw can actually do optimization in parallel operations,
It can only perform some random job scheduling with the assumption of hardware can perform better, isn't it?
Or do we need to assume that hw can always perform better if submit the jobs together? Then the next question is, how many jobs in single go can give better results :)

On the other hand, if the driver really knows the hw capabilities, why shouldn't it possible to schedule tasks in a better way without having
any specific APIs? Let's take the previous example itself. The first encode job submitted by user space is running on first VME block and rest of them are idle(assuming hw not instantiated parallel thread for other VMEs). Now if the user requesting for a second encode (either from a same process or as a different process), driver should be internally able to monitor the hardware
utilization and assigns the second VME block for the new encode job.

Of course the new apis bring the benefits of "more gpu tasks in single batchbuffer and decrease the cpu overload". My question is, how much performance difference
it can really bring comparing with the idea of "driver internally handle all resource utilization without new APIs but with multiple batch buffers".

Please correct me If I am wrong about the understanding of hw block utilization.

@fhvwy
Copy link
Contributor

fhvwy commented Oct 29, 2017

@artem-shaporenko Can you clarify the behaviour of this API with respect to pipelining and dependencies between operations? To offer some specific examples, do the following two cases work:

Internal dependency:

  • Create a scaler and an encoder context and add them to an MF context.
  • Submit frames to the scaler, submit the scaler output to the encoder (in the same way that pipelining currently works).
  • Submit the MF context.

External dependency:

  • Create two each of scaler and encoder contexts.
  • Create two MF contexts, add the two scalers to one of them and the two encoders to the other.
  • Submit frames to both scalers, submit the scaler outputs to the encoders.
  • Submit the two MF contexts, in either order.

@artem-shaporenko
Copy link
Contributor Author

@sreerenjb you are right that performance improvement will depend on VME amount(and other factors depending on target HW). This is why we need to put more effort to implement proper query function that report enough data for different situation so application/middleware can properly manage multi-frame submission. In general talking about performance - anyway developer need to test how it works on different HW to make sure performance is in expected range. So the same is without multi-frame, you need to test how many parallel transcodings you need to do on different HW and don't have any report from driver about it.

@artem-shaporenko
Copy link
Contributor Author

@fhvwy

Create a scaler and an encoder context and add them to an MF context.
Submit frames to the scaler, submit the scaler output to the encoder (in the same way that pipelining currently works).
Submit the MF context.

In this case driver should return an error - VA_STATUS_ERROR_INVALID_CONTEXT, there should not be any dependencies between 2 tasks combined into one operation.

External dependency:
Create two each of scaler and encoder contexts.
Create two MF contexts, add the two scalers to one of them and the two encoders to the other.
Submit frames to both scalers, submit the scaler outputs to the encoders.
Submit the two MF contexts, in either order.

In this case behavior will be the same as in single frame mode - if frame order is right(VP - first, encoder - second) - will be encoded properly, otherwise output will be broken

@artem-shaporenko
Copy link
Contributor Author

@xhaihao Can you please merge latest update for VTable changes into v2.0-next branch?

va/va_backend.h Outdated
/** \brief Reserved bytes for future use, must be zero */
unsigned long reserved[64];
unsigned long reserved[56];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You added 4 hook functions only, so the reserved bytes should be 60 * sizeof(unsigned long) now, not 56 * sizeof(unsigned long).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsigned long is 32 bit isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed that to be [60], but please make sure you won't get it broken

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsigned long is 32bit on 32bit OS and 64bit on 64bit OS

va/va.c Outdated
@@ -439,6 +442,7 @@ static VAStatus va_openDriver(VADisplay dpy, char *driver_name)
CHECK_VTABLE(vaStatus, ctx, BeginPicture);
CHECK_VTABLE(vaStatus, ctx, RenderPicture);
CHECK_VTABLE(vaStatus, ctx, EndPicture);
CHECK_VTABLE(vaStatus, ctx, MFSubmit);
CHECK_VTABLE(vaStatus, ctx, SyncSurface);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new functions are not mandatory to implement, could you remove the checks for the new functions ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

va/va.c Outdated

CHECK_DISPLAY(dpy);
ctx = CTX(dpy);
CHECK_VTABLE(vaStatus, ctx, CreateMFContext);
Copy link
Contributor

@xhaihao xhaihao Nov 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here vaStatus is set to VA_STATUS_ERROR_UNKNOWN if vaCreateMFContext is not implemented by the backend driver, however vaCreateMFContext is not a mandatory function, so please return VA_STATUS_ERROR_UNIMPLEMENTED instead of VA_STATUS_ERROR_UNKNOWN. The same change should be applied to other new functions

va/va.c Outdated
@@ -1225,7 +1225,7 @@ VAStatus vaMFSubmit (
)
{
VADriverContextP ctx;
VAStatus vaStatus;
VAStatus vaStatus = VA_STATUS_SUCCESS;

CHECK_DISPLAY(dpy);
ctx = CTX(dpy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not needed if returning VA_STATUS_ERROR_UNIMPLEMENTED at once when ctx->vtable->vaCreateMFContext is NULL,

BTW Could you please to remove all trailing whitespaces in this patch, then squash the two commits into one commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@artem-shaporenko artem-shaporenko force-pushed the libva_mfe branch 2 times, most recently from f724da3 to aa3d902 Compare November 14, 2017 08:17
CHECK_DISPLAY(dpy);
ctx = CTX(dpy);
if(ctx->vtable->vaCreateMFContext == NULL)
vaStatus = VA_STATUS_ERROR_UNIMPLEMENTED;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*mf_context may be a random value for this case, so va_TraceCreateMFContext shouldn't be called when ctx->vtable->vaCreateMFContext is a NULL pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, fixed

…, FEI Encode/ENC/Pre-ENC, and VPP in future.

Signed-off-by: Artem Shaporenko [email protected]
@xhaihao xhaihao merged commit df192cf into intel:master Nov 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants