Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the issue of cpu affinity mask changed on a NUMA machine with SEQ #25455

Merged

Conversation

sunxiaoxia2022
Copy link
Contributor

@sunxiaoxia2022 sunxiaoxia2022 commented Jul 9, 2024

Details:

  • Fix the issue of cpu affinity mask changed on a NUMA machine with SEQ

Tickets:

@sunxiaoxia2022 sunxiaoxia2022 requested a review from a team as a code owner July 9, 2024 06:25
@github-actions github-actions bot added the category: inference OpenVINO Runtime library - Inference label Jul 9, 2024
@dmitry-gorokhov dmitry-gorokhov added this to the 2024.4 milestone Jul 11, 2024
@dmitry-gorokhov dmitry-gorokhov added the category: CPU OpenVINO CPU plugin label Jul 11, 2024
@dmitry-gorokhov dmitry-gorokhov self-assigned this Jul 11, 2024
@dmitry-gorokhov
Copy link
Contributor

dmitry-gorokhov commented Jul 11, 2024

General comment:
I don't think restoring affinity mask on compiled model destructor stage is a good enough solution. User application usually interleaves some additional computational logic with OV inference, so the pipeline looks in the following way: [user code] -> [OV infer] -> [user code] -> [OV Infer] -> .... -> [compiled model desctructor]. So the current solution still affects used code behavior.
I could propose 2 solution: a) Restore the mask at the end of Infer call b) Create separate thread for compilation/inference inside OV if enable pinning is ON.

@wangleis What do you think?

@wangleis
Copy link
Contributor

@dmitry-gorokhov Since CPU mask affects whole process, option b may not work. Xiaoxia will update PR to change and restore CPU mask during inferring like option a.

@github-actions github-actions bot removed the category: CPU OpenVINO CPU plugin label Jul 15, 2024
@sunxiaoxia2022
Copy link
Contributor Author

@dmitry-gorokhov I put cpu pinning before and after the task(). Please take a look again.

@dmitry-gorokhov
Copy link
Contributor

AS we discussed on the sync. Lets clarify 2 things first:

  1. What overheads is created by [un]pin_stream_to_cpus() calls on each inference?
  2. Does OV created separate thread for inference in SEQ mode? If yes does it mean application thread is not affected by OV pinning/affinity behavior?

@sunxiaoxia2022
Copy link
Contributor Author

AS we discussed on the sync. Lets clarify 2 things first:

  1. What overheads is created by [un]pin_stream_to_cpus() calls on each inference?
  2. Does OV created separate thread for inference in SEQ mode? If yes does it mean application thread is not affected by OV pinning/affinity behavior?
  1. I tested the performance of master and this PR with a small model ebgan(onnx, FP32) by 10 times.
    Test machine: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
    Test command: ./benchmark_app -m /home/openvino-ci-69/xiaoxia/models/cv_bench_cache/ww21_weekly_23.0.0-10926-b4452d56304-API2.0/ebgan/onnx/onnx/FP32/1/dldt/ebgan.xml -d CPU -hint none -nstreams 1 -nthreads 1
    Test result: the values are the average of 10 times.
    master: latency: 1.829 ms, throughput: 545.311 FPS
    This PR: latency: 1.833 ms, throughput: 543.561 FPS. latency increased 0.21% compared with master, throughput dropped 0.32%.

  2. For asnyc mode, OV created separate thread for inference in SEQ which is the same as TBB. So application thread is not affected by OV pinning.
    But for sync mode, OV not create separate thread, it uses app thread. So app thread is affected by OV pinning.

@dmitry-gorokhov
Copy link
Contributor

Thanks @sunxiaoxia2022 !
So to complete the task we need to fix the behavior for sync mode as well. I would propose to spawn separate thread in case of SEQ build, sync API amd enable_pinning==true.

@wangleis Do you have any objections?

@wangleis
Copy link
Contributor

@dmitry-gorokhov In sync mode, app is blocked during inference and app thread is used for inference. Since we will restore CPU mask after inference, current solution should be ok for sync mode from my understanding.

@dmitry-gorokhov
Copy link
Contributor

@dmitry-gorokhov In sync mode, app is blocked during inference and app thread is used for inference. Since we will restore CPU mask after inference, current solution should be ok for sync mode from my understanding.

Good point. Agree. Approved!

@wangleis wangleis added this pull request to the merge queue Jul 18, 2024
Merged via the queue into openvinotoolkit:master with commit 08c4195 Jul 18, 2024
122 checks passed
@wangleis wangleis deleted the xiaoxia/cpu_affinity_SEQ branch July 18, 2024 14:56
spran180 pushed a commit to spran180/openvino that referenced this pull request Jul 27, 2024
…openvinotoolkit#25455)

### Details:
- *Fix the issue of cpu affinity mask changed on a NUMA machine with
SEQ*

### Tickets:
 - *CVS-143272*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: inference OpenVINO Runtime library - Inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants