You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After updated to nightly-build PyTorch 1.12, a performance test is made to compare 'mps' over 'cpu' as shown below:
importtorchfromtqdmimporttrangeDTYPE=torch.float32MAT_SIZE=5000DEVICE= ["cpu", "mps"][0] # it's CPU nowmat=torch.randn([MAT_SIZE, MAT_SIZE], dtype=DTYPE, device=DEVICE)
foriintrange(N_ITER:=100):
mat@=mat# <--- Main Computation HEREprint(mat[0, 0], end="") # avoid sync-issue when using 'mps'
It's true that "mps" is somehow faster than "cpu" on this M1-Pro chip. However, I soon noticed that it's not utilizing all the 10 CPU cores when device="cpu"?
Specifically, Activity Monitor.app shows that it only use ≈200% of CPU.
After further experiments, I found some interesting facts:
As mentioned above, device="cpu" on version 1.12 will not using all CPU cores on M1 Pro Chip.
When switching back to version 1.11, device="cpu"do take advantage of all the CPU cores.
Although 2. is true, it's actually slower than 1.12! i.e. device="cpu" on 1.12 uses less CPU and less power yet got better performance.
Althogh 1. is true, manully running N(such as 2) instances of this script will let the performance of each script drop down to 1/N of its original. (while indeed more CPU cores are scheduled and more Watts are consumed..)
I'm wondering the reason of 1.2.3. and 4., and am not sure whether it's some bug of PyTorch or some mistake in my experiments or in my mind.
Versions
PyTorch version: 1.12.0.dev20220518
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 12.3.1 (arm64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2.5)
CMake version: Could not collect
Libc version: N/A
Python version: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:14) [Clang 12.0.1 ] (64-bit runtime)
Python platform: macOS-12.3.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
You can try to test this further by installing numpy with Accelerate BLAS via conda-forge: conda-forge/numpy-feedstock#253
You should see a similar effect vs. OpenBLAS.
🐛 Describe the bug
After updated to nightly-build PyTorch 1.12, a performance test is made to compare
'mps'
over'cpu'
as shown below:It's true that
"mps"
is somehow faster than"cpu"
on this M1-Pro chip.However, I soon noticed that it's not utilizing all the 10 CPU cores when device="cpu"?
Specifically,
Activity Monitor.app
shows that it only use ≈200% of CPU.After further experiments, I found some interesting facts:
device="cpu"
on version 1.12 will not using all CPU cores on M1 Pro Chip.device="cpu"
do take advantage of all the CPU cores.2.
is true, it's actually slower than 1.12! i.e.device="cpu"
on 1.12 uses less CPU and less power yet got better performance.1.
is true, manully running N(such as 2) instances of this script will let the performance of each script drop down to 1/N of its original. (while indeed more CPU cores are scheduled and more Watts are consumed..)I'm wondering the reason of
1.
2.
3.
and4.
, and am not sure whether it's some bug of PyTorch or some mistake in my experiments or in my mind.Versions
PyTorch version: 1.12.0.dev20220518
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 12.3.1 (arm64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2.5)
CMake version: Could not collect
Libc version: N/A
Python version: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:14) [Clang 12.0.1 ] (64-bit runtime)
Python platform: macOS-12.3.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.21.6
[pip3] torch==1.12.0.dev20220518
[pip3] torchlibrosa==0.0.9
[pip3] torchvision==0.9.0a0
[conda] numpy 1.21.6 py39h690d673_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
[conda] pytorch 1.12.0.dev20220518 py3.9_0 pytorch-nightly
[conda] torchlibrosa 0.0.9 pypi_0 pypi
[conda] torchvision 0.9.1 py39h0a40b5a_0_cpu https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
The text was updated successfully, but these errors were encountered: