-
Notifications
You must be signed in to change notification settings - Fork 6.8k
DOT product too slow on CPU and GPU compared to np and pytorch #17971
Comments
@anko-intel please help take a look for this issue, thanks. |
Hi @djaym7 |
@anko-intel here are the results |
Any relation to #17980 for cpu side? Are you comparing to mkl enabled np / pytorch? |
Hi @djaym7 Additional measurements on the master with MxNet Profiler enabled show that > 80us is spent between python and time noted by Profiler for dot operation. Results in table below, neglecting measurement noise, shows that differences between time measured in python and MKL are almost the same as between python and MXNet Profiler, so it confirms python <-> C++ API issue. In the last table there are results for MxNet when both profiler and MKL verbose are enabled (adding additional time for both measurements). We can see here that the difference between python time and profile time is similar to the results in the previous tables and it is the most significant one. Exact results of my measurements could be find in logs: dot_issue_logs.zip |
@djaym7 please take a review whether @anko-intel answered your question :) |
Given below are times for CPU-np, mxnet-CPU, mxnet-GPU, pytorch-CPU, pytorch-GPU.
CPU times on numpy (np) and and pytorch CPU are comparable but mxnet is crazy slow.
GPU time on pytorch is way faster than mxnet.
hardware/software : sagemaker p3.2x latest (1.6.0 cu102mkl)
The text was updated successfully, but these errors were encountered: