-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance]: How to assign model inference to specific CPUs? #27083
Comments
hi @LinGeLin Do you run two models in one application process? |
yes |
@wangleis Any suggestions? Or is it just that inferencing multiple models will inherently interfere with each other? |
@LinGeLin Reserving specific CPU resource for specific model in CPU inference is planned but not enabled. Ticket CVS-154222 is created to follow this issue. Will update to you when the feature is enabled in master branch. |
### Details: - Add property `ov::hint::enable_cpu_reservation` to reserve CPU resource in CPU inference - `ov::hint::enable_cpu_reservation` defaults to false, user can explicitly set it to true to enable CPU reservation. - update proc_type_table before stream scheduling in compile_model() ### Tickets: - *CVS-155268* - *#27083 --------- Co-authored-by: Shen, Wanglei <[email protected]> Co-authored-by: Chen Peter <[email protected]>
@LinGeLin PR was merged. You can try with latest nightly. https://storage.openvinotoolkit.org/repositories/openvino/packages/nightly/latest/ |
### Details: - Add property `ov::hint::enable_cpu_reservation` to reserve CPU resource in CPU inference - `ov::hint::enable_cpu_reservation` defaults to false, user can explicitly set it to true to enable CPU reservation. - update proc_type_table before stream scheduling in compile_model() ### Tickets: - *CVS-155268* - *openvinotoolkit#27083 --------- Co-authored-by: Shen, Wanglei <[email protected]> Co-authored-by: Chen Peter <[email protected]>
OpenVINO Version
2024.4.0
Operating System
Ubuntu 20.04 (LTS)
Device used for inference
CPU
OpenVINO installation
Build from source
Programming Language
C++
Hardware Architecture
x86 (64 bits)
Model used
ps model
Model quantization
No
Target Platform
No response
Performance issue description
I am developing a gRPC project using C++ and integrating OpenVINO (ov) into it. The project involves multiple thread pools for preprocessing. I have observed that the inference performance is significantly lower than the data measured by benchmark_app. I suspect that this is due to thread competition between ov and the preprocessing threads in the project. I conducted the following tests:
infer_thread=24
, the utilization of all 24 CPUs fluctuates around 50%.infer_thread=16
, the utilization of the first 16 CPUs is around 80%, while the utilization of the last 8 CPUs is 0%.Since my project runs with two models loaded simultaneously, I want to dedicate CPUs 0-11 to Model A, CPUs 12-19 to Model B, and CPUs 20-23 for other operations in the project. However, I haven't found an interface in ov to bind CPUs when loading models. Are there any other suggestions? Thank you.
Step-by-step reproduction
No response
Issue submission checklist
The text was updated successfully, but these errors were encountered: