Skip to content

Commit

Permalink
cpu: gemm: fix nocopy dispatching regressions
Browse files Browse the repository at this point in the history
Rolls back to previous and more conservative no-copy dispatching for
sequential mode to avoid performance regressions. This still keeps the
better performance for inner product primitive listed in #525.
  • Loading branch information
aaraujom committed Mar 12, 2020
1 parent 51bbe3a commit 3a4b5c5
Showing 1 changed file with 8 additions and 7 deletions.
15 changes: 8 additions & 7 deletions src/cpu/gemm/gemm_driver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -826,17 +826,18 @@ static inline bool nocopy_checker_avx512(int nthr, const int transa,

bool is_lda_verybad = lda % VERYBAD_LD_MULT == 0;

// Crude threshold to nocopy kernels if copy overhead is significant
// and nthr greater than 1.
if (nthr > 1 && 1.0 / m + 1.0 / n >= FORCE_NOCOPY_THRESH
// Copy-based performs better for TN case with small N in sequential case.
if (nthr == 1 && is_TN_case && m > 100
&& ((m < 1200 && n < 200 && k < 1200)
|| (is_lda_bad && is_ldb_bad)))
return false;

// Crude threshold for nocopy kernels if copy overhead is significant.
if (1.0 / m + 1.0 / n >= FORCE_NOCOPY_THRESH
&& !(is_lda_verybad && is_NT_case)) {
return true;
}

// Copy-based performs better for TN case with small N in sequential case.
if (nthr == 1 && is_TN_case && m > 100 && m < 1200 && n < 200 && k < 1200)
return false;

// Copy strategy usually performs better than nocopy on "bad" leading
// dimensions.
if (is_ld_bad) {
Expand Down

0 comments on commit 3a4b5c5

Please sign in to comment.