-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OV Performance Hints (CPU and GPU logic for selecting the actual configs), while AUTO/MULTI are passing them thru) #6993
Conversation
296825c
to
193375b
Compare
@mashoujiang fyi, this PR passes the perf hints thru the AUTO (and MULTI) |
ab27fff
to
2f79fd4
Compare
Co-authored-by: Tatiana Savina <[email protected]>
Co-authored-by: Tatiana Savina <[email protected]>
Co-authored-by: Tatiana Savina <[email protected]>
Co-authored-by: Tatiana Savina <[email protected]>
Co-authored-by: Tatiana Savina <[email protected]>
Co-authored-by: Tatiana Savina <[email protected]>
Co-authored-by: Tatiana Savina <[email protected]>
Co-authored-by: Tatiana Savina <[email protected]>
Co-authored-by: Tatiana Savina <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AUTO plugin only pass hints instead of fullconfig need handle in following PR.
1a7a8e2
to
cdc0655
Compare
df26364
to
6af6358
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general LGTM. Minor comment left
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No major objections
…igs), while AUTO/MULTI are passing them thru) (openvinotoolkit#6993) * rebasing the perf-modes-2021.3 to the 2021.4 Caveats: the (explicit) setting #streams is not disabled (as it was before for experiments with DLBenchmark), and the logic slighlty differ (streamsSet) (cherry picked from commit 1ae1edc) * overriding streams (to force the TPUT mode to the DLBenchnark) (cherry picked from commit 7f506cd) * disabling reducing #streams to fully mimic baseline c4df94d of the 2021.3 (before experiments) (cherry picked from commit 85073dd) * clang/identation (cherry picked from commit 050a415) * splitting the Transformation to general and CPU specific. Now hopefully,this fully mimics the baseline c4df94d of the 2021.3 (before experiments), as the streams reduce num (as well as early exit on GRU/LSTM/TensorIterator) is deisabled (cherry picked from commit e98b2c1) * disabling GRU/LSTM/TI + reducing of streams + 5D considered compute-limited only for int8 (cherry picked from commit 32b8d80) * refactored to avoid compute_limited_ratio, reverted the reducing #streams, removed LSTM from limitations (cherry picked from commit f2b9721) * isa-based threshold logic (cherry picked from commit b218457) * mode->hint (cherry picked from commit ec20aa8) * optional PERFORMANCE_HINT_NUM_REQUESTS (cherry picked from commit 5a3883e) * moving the perfHints to the common OV config class + initial tests (CPU only, as the actual AUTO/MULTI should be accommodated on the master) (cherry picked from commit (then fixed)45bafe7d527f466507dea0693aeed51be4ebf776) * AUTO support for PerfHints * MULTI support for PerfHints * Enabling Perf hints for the GPU plugin * brushing settings output a bit * disabling "throughput" perf hint being default (until OV 2.0) * uncommenting the logic which was disabled to force the DLBenchmark to use the throughput mode by default * removing dead and experimental code, and debug printfs * clang/code-style * code-review remarks * Moved the output of the actual params that the hint produced to the right place * aligning MULTI's GetConfig beh to HETERO's as captured in the preso (CVS-59960) ratified with the ArchForum * clang * benchmark_app brushing * Update inference-engine/samples/benchmark_app/README.md * propagating the perf hints thru one more scenario in the merged AUTO-MULTI * fixed mispint * Python benchmark_app update for perf hints * addresssing reviewers comments on the python benchmark_app * simplifying/brushing logic a bit * refactor the heuristic to the separate file (to be shared with iGPU soon) * refactor conversion of modes to the specific GPU config per feedback from Vladimir
For the GPU (until Auto-Batching) the logic is very simple (and mimicks the benchmark_app), for the CPU in contrast the network-based heuristics selects the #streams.
@ArtemySkrebkov-intel and @rzubarev fyi