Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AUTO] Fix the timing issue in AUTO inference #27290

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions src/plugins/auto/src/schedule.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,13 @@ void Schedule::generate_workers(const std::string& device, const SoCompiledModel
OPENVINO_THROW("Every device used with AUTO should support query optimal_number_of_infer_requests property from compiled model ",
iie.what());
}
const auto num_requests = (m_context->m_device_priorities.end() == it_numrequests ||
it_numrequests->num_requests_per_devices == -1) ? optimal_num : it_numrequests->num_requests_per_devices;
auto num_requests =
(m_context->m_device_priorities.end() == it_numrequests || it_numrequests->num_requests_per_devices == -1)
? optimal_num
: it_numrequests->num_requests_per_devices;
num_requests = num_requests <= 1 && m_context->m_performance_hint == ov::hint::PerformanceMode::THROUGHPUT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why tput here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline earlier, not having this option will cause a hang.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, that was about cumulative tput, is not it? otherwise, how can you expect this PR to fix the customer issue which is in latency mode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I got it. will debug why it caused a hang without this option.

? 2
: num_requests;
auto& worker_requests = m_worker_requests[device];
auto& idle_worker_requests = m_idle_worker_requests[device];
worker_requests.resize(num_requests);
Expand Down
Loading