-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AUTO] Fix the timing issue in AUTO inference #27290
base: master
Are you sure you want to change the base?
[AUTO] Fix the timing issue in AUTO inference #27290
Conversation
…into ywang2/fix_the_timing_issue
src/plugins/auto/src/schedule.cpp
Outdated
std::unique_lock<std::mutex> lck(worker_infer_mutex); | ||
if (!idle_workerrequests.try_pop(worker)) { | ||
idle_workerrequests_cv.wait(lck, [&idle_workerrequests, &worker] { | ||
return idle_workerrequests.try_pop(worker); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall, this solution seems to wait forever until it can get a worker, this is not by design(by design we have m_infer_pipeline_tasks for tasks which are not able to schedule workers at some moment).
maybe we can consider increase cpu worker infer request to 2 to avoid this deadlock, or if we need to use cv to fix this issue, at least can we try align with the m_infer_pipeline_tasks design?
src/plugins/auto/src/schedule.cpp
Outdated
// This is necessary to handle the case where a request worker is popped from the idle queue before being pushed back. | ||
// Without at least 2 requests, there could be a situation where no requests are available for inference, | ||
// leading to potential deadlocks. | ||
num_requests = num_requests <= 1 ? 2 : num_requests; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's trigger local CI tests for auto to ensure no regression
(m_context->m_device_priorities.end() == it_numrequests || it_numrequests->num_requests_per_devices == -1) | ||
? optimal_num | ||
: it_numrequests->num_requests_per_devices; | ||
num_requests = num_requests <= 1 && m_context->m_performance_hint == ov::hint::PerformanceMode::THROUGHPUT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why tput here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed offline earlier, not having this option will cause a hang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, that was about cumulative tput, is not it? otherwise, how can you expect this PR to fix the customer issue which is in latency mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I got it. will debug why it caused a hang without this option.
Details:
Tickets: