-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Luocheng/vpux/prealloc mem kmb #1
base: releases/vpux/2021/3
Are you sure you want to change the base?
Luocheng/vpux/prealloc mem kmb #1
Conversation
add_definitions(-DUSE_PREALLOC_MEM) | ||
set(HDDL2_DEP "HddlUnite::HddlUnite") | ||
else() | ||
message(WARNING "hddl2_params.hpp could not find. Preallocate in KMB feature is disabled.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove specified string, such as "KMB".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -85,6 +85,7 @@ Options: | |||
-t Optional. Time, in seconds, to execute topology. | |||
-progress Optional. Show progress bar (can affect performance measurement). Default values is "false". | |||
-shape Optional. Set shape for input. For example, "input1[1,3,224,224],input2[1,4]" or "[1,3,224,224]" in case of one input size. | |||
-use_prealloc_mem Optional. Prealloc remote memory in xBay to execute infer request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it will be better if use "-use_remote_mem"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -97,6 +97,11 @@ static const char load_config_message[] = "Optional. Path to XML/YAML/JSON file | |||
static const char dump_config_message[] = "Optional. Path to XML/YAML/JSON file to dump IE parameters, which were set by application."; | |||
#endif | |||
|
|||
#ifdef USE_PREALLOC_MEM | |||
// @brief message for preallocing memory option | |||
static const char use_prealloc_mem_message[] = "Optional. Prealloc remote memory in xBay to execute infer request."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static const char use_prealloc_mem_message[] = "Optional. Prealloc remote memory in device to execute infer request."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
size_t width; | ||
size_t height; | ||
remoteIE.GetWxH(width, height); | ||
const size_t nv12Size = width * height * 3 / 2 * batchSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need support pure NN without preprocess, so the input can be RGB_Plannar that can feed into NN directly.
NV12 buffer will go into PP first and then to NN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's first support pure inference without PP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pure inference without pp done.
@@ -324,3 +326,89 @@ void fillBlobs(const std::vector<std::string>& inputFiles, | |||
} | |||
} | |||
} | |||
|
|||
#ifdef USE_PREALLOC_MEM | |||
void fillRemoteBlobs(RemoteHelper& remoteIE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FillRemoteBlobsNV12
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fillRemoteBlobs removed. The function merged into fillBlobs.
auto minputHolder = minput->rmap(); | ||
auto inputBlobData = minputHolder.as<uint8_t*>(); | ||
|
||
BGR2NV12(inputBlobData, width, height, batchSize, data.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will do CSC + Resize or only CSC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preprocess removed.
using namespace InferenceEngine; | ||
|
||
#define REMOTE_IMAGE_WIDTH 1920 | ||
#define REMOTE_IMAGE_HEIGHT 1080 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need assign the input resolution if need do PP, the parameter can be put into benchmark input config file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preprocessing removed.
THROW_IE_EXCEPTION << "Could not open file: " << graphPath; | ||
} | ||
std::istream graphBlob(&blobFile); | ||
return ie.ImportNetwork(graphBlob, _contextPtr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we also need support online compiling IR. - call LoadNetwork()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoadNetwork supported.
@@ -205,4 +206,27 @@ void load_config(const std::string& filename, | |||
} | |||
} | |||
} | |||
|
|||
void BGR2NV12(uint8_t* src, size_t width, size_t height, size_t imageNum, uint8_t* dst) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why convert RGB to NV12?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSC removed.(no needed anymore)
@riverlijunjie Preprocessing removed, please help to review, thanks. |
ExecutableNetwork exeNetwork; | ||
|
||
#ifdef USE_REMOTE_MEM | ||
RemoteHelper remoteHelper; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get the device_name="VPUX"?
If yes, we can put the the init code block into "if(device == "VPUX")" like "CPU" or "GPU" below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Move the initialization to the branch of the 'VPUX'.
|
||
using namespace InferenceEngine; | ||
|
||
class RemoteHelper::Impl { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better if name it RemoteContextHelper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
As I understand the benchmark_app is image workload case, right? |
Video workload is used for E2E pipeline, but benchmark didn't provide such test case for it, especially for KPI. So we need some official KPI data for video workload. |
Do we need to add CPU_THROUGHPUT_STREAMS and CPU_THREADS_NUM configuration in video case? |
We can use the command line 'benchmark_app -load_config config.yml ...' to get support 'VPUX_THROUGHPUT_STREAMS, VPUX_INFERENCE_SHAVES and etc' and the config.yml is just like: %YAML:1.0VPUX: { VPUX_THROUGHPUT_STREAMS:"3", VPUX_INFERENCE_SHAVES:"16"} |
9528f90
to
d7d1ef9
Compare
Add support for multiple outputs
Bym/pdpd frontend/op add relu & softmax
* Moved cmake/templates to <root> * Removed ngraph versioning, reused IE one * Merged converage * Removed duplicatde ngraph cmake options * Moved dependencies to <root>/cmake * Removed installing of VERSION * Start #1 * cpack * Added component type * Added installation of tests targets * Added ngraph tests target install * Fixed runtime dependencies location * Disable GNA unit tests * Revert "Disable GNA unit tests" This reverts commit da53986. * Installed only core component * Replaced ENABLE_DEV_PKG_INSTALL with EXCLUDE_FROM_ALL * Removed extra cmake options
* [FrontEnd]enable pdpd ops conversion part3 * Add adaptive pool2d op conversion (#1) * param support tensor (#2) * add missing sync_batch_norm * Update pow.cpp * deal empty axis (#5) * deal empty axis * apply review comments * fix code style * fix code style * change shape to i32 * fix code in shape * fix code style * fix paddle code style * remove redandent ops * fix maxAdativePool * fix expand_v2 * remove redandent code Co-authored-by: Mang Guo <[email protected]> Co-authored-by: Luo Cheng <[email protected]>
…f POT (openvinotoolkit#17398) * Update model_optimization_guide.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update home.rst * Update ptq_introduction.md * Update Introduction.md * Update Introduction.md * Update Introduction.md * Update ptq_introduction.md * Update ptq_introduction.md * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update quantization_w_accuracy_control.md * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update model_optimization_guide.md * Update ptq_introduction.md * Update quantization_w_accuracy_control.md * Update model_optimization_guide.md * Update quantization_w_accuracy_control.md * Update model_optimization_guide.md * Update quantization_w_accuracy_control.md * Update model_optimization_guide.md * Update Introduction.md * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update quantization_w_accuracy_control.md * Update ptq_introduction.md * Update Introduction.md * Update model_optimization_guide.md * Update basic_quantization_flow.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update quantization_w_accuracy_control.md * Update Introduction.md * Update FrequentlyAskedQuestions.md * Update model_optimization_guide.md * Update Introduction.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update model_optimization_guide.md * Update ptq_introduction.md * Update ptq_introduction.md * added code snippet (#1) * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update quantization_w_accuracy_control.md * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update ptq_introduction.md * Update model_optimization_guide.md * Update basic_quantization_flow.md * Update ptq_introduction.md * Update quantization_w_accuracy_control.md * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update basic_quantization_flow.md * Update ptq_introduction.md * Update ptq_introduction.md * Delete ptq_introduction.md * Update FrequentlyAskedQuestions.md * Update Introduction.md * Update quantization_w_accuracy_control.md * Update introduction.md * Update basic_quantization_flow.md code blocks * Update quantization_w_accuracy_control.md code snippets * Update docs/optimization_guide/nncf/ptq/code/ptq_torch.py Co-authored-by: Alexander Suslov <[email protected]> * Update model_optimization_guide.md * Optimization docs proofreading (#2) * images updated * delete reminder * review * text review * change images to original ones * Update filter_pruning.md code blocks * Update basic_quantization_flow.md * Update quantization_w_accuracy_control.md * Update images (#3) * images updated * delete reminder * review * text review * change images to original ones * Update filter_pruning.md code blocks * update images * resolve conflicts * resolve conflicts * change images to original ones * resolve conflicts * update images * fix conflicts * Update model_optimization_guide.md * Update docs/optimization_guide/nncf/ptq/code/ptq_tensorflow.py Co-authored-by: Alexander Suslov <[email protected]> * Update docs/optimization_guide/nncf/ptq/code/ptq_torch.py Co-authored-by: Alexander Suslov <[email protected]> * Update docs/optimization_guide/nncf/ptq/code/ptq_onnx.py Co-authored-by: Alexander Suslov <[email protected]> * Update docs/optimization_guide/nncf/ptq/code/ptq_aa_openvino.py Co-authored-by: Alexander Suslov <[email protected]> * Update docs/optimization_guide/nncf/ptq/code/ptq_openvino.py Co-authored-by: Alexander Suslov <[email protected]> * table format fix * Update headers * Update qat.md code blocks --------- Co-authored-by: Alexander Suslov <[email protected]> Co-authored-by: Tatiana Savina <[email protected]>
* Remove `set_preprocess.cpp` * Remove `preprocessing.hpp` * Remove `locale.hpp` - ported to `CanCompileModelWithCustomLocale` * Port `version.cpp` and remove legacy * Revert shared `version.hpp`
* Delete `ngraph/visibility.hpp` * Delete `ngraph/log.hpp` * Delete `ngraph/file_util.hpp` * Delete `ngraph/type.hpp` * Delete `ngraph/dimension.hpp` * Delete `ngraph/coordinate.hpp` * ClangFormat * Fix build * Fix pyngraph * Remove comment * Fix build
This PR will be closed in a week because of 2 weeks of no activity. |
Details:
When passing -use_remote_mem in command line the inferrequest will allocate remote memory using HddlUnite. Then all infer will use the remote memory and there is no need to copy the image memory from IA to remote device. The creation steps:
-- load image from the specified folder
-- allocate remote memory and copy the image to it
-- set remote memory handle
-- do the infer request in the benchmark loop