1.12 on M1 Pro Chip not using all the CPU cores. (device="cpu") #77938

PkuCuipy · 2022-05-20T04:07:10Z

🐛 Describe the bug

After updated to nightly-build PyTorch 1.12, a performance test is made to compare 'mps' over 'cpu' as shown below:

import torch
from tqdm import trange

DTYPE = torch.float32
MAT_SIZE = 5000
DEVICE = ["cpu", "mps"][0]      # it's CPU now

mat = torch.randn([MAT_SIZE, MAT_SIZE], dtype=DTYPE, device=DEVICE)

for i in trange(N_ITER := 100):
    mat @= mat                  # <--- Main Computation HERE
    print(mat[0, 0], end="")    # avoid sync-issue when using 'mps'

It's true that "mps" is somehow faster than "cpu" on this M1-Pro chip.
However, I soon noticed that it's not utilizing all the 10 CPU cores when device="cpu"?
Specifically, Activity Monitor.app shows that it only use ≈200% of CPU.

After further experiments, I found some interesting facts:

As mentioned above, device="cpu" on version 1.12 will not using all CPU cores on M1 Pro Chip.
When switching back to version 1.11, device="cpu" do take advantage of all the CPU cores.
Although 2. is true, it's actually slower than 1.12! i.e. device="cpu" on 1.12 uses less CPU and less power yet got better performance.
Althogh 1. is true, manully running N(such as 2) instances of this script will let the performance of each script drop down to 1/N of its original. (while indeed more CPU cores are scheduled and more Watts are consumed..)

I'm wondering the reason of 1. 2. 3. and 4., and am not sure whether it's some bug of PyTorch or some mistake in my experiments or in my mind.

Versions

PyTorch version: 1.12.0.dev20220518
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 12.3.1 (arm64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2.5)
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:14) [Clang 12.0.1 ] (64-bit runtime)
Python platform: macOS-12.3.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.6
[pip3] torch==1.12.0.dev20220518
[pip3] torchlibrosa==0.0.9
[pip3] torchvision==0.9.0a0
[conda] numpy 1.21.6 py39h690d673_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
[conda] pytorch 1.12.0.dev20220518 py3.9_0 pytorch-nightly
[conda] torchlibrosa 0.0.9 pypi_0 pypi
[conda] torchvision 0.9.1 py39h0a40b5a_0_cpu https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge

The text was updated successfully, but these errors were encountered:

psobolewskiPhD · 2022-05-20T09:51:37Z

Sounds like now "cpu" uses the matrix coprocessor AMX via the Accelerate library.
See: https://stackoverflow.com/questions/67587455/accelerate-framework-uses-only-one-core-on-mac-m1/67590869#67590869

PkuCuipy · 2022-05-20T13:47:20Z

Sounds like now "cpu" uses the matrix coprocessor AMX via the Accelerate library.
See: https://stackoverflow.com/questions/67587455/accelerate-framework-uses-only-one-core-on-mac-m1/67590869#67590869

Thanks a lot! seems perfectly solved my confusion!

psobolewskiPhD · 2022-05-20T14:01:44Z

You can try to test this further by installing numpy with Accelerate BLAS via conda-forge:
conda-forge/numpy-feedstock#253
You should see a similar effect vs. OpenBLAS.

PkuCuipy closed this as completed May 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.12 on M1 Pro Chip not using all the CPU cores. (device="cpu") #77938

1.12 on M1 Pro Chip not using all the CPU cores. (device="cpu") #77938

PkuCuipy commented May 20, 2022 •

edited

Loading

psobolewskiPhD commented May 20, 2022

PkuCuipy commented May 20, 2022

psobolewskiPhD commented May 20, 2022

1.12 on M1 Pro Chip not using all the CPU cores. (device="cpu") #77938

1.12 on M1 Pro Chip not using all the CPU cores. (device="cpu") #77938

Comments

PkuCuipy commented May 20, 2022 • edited Loading

🐛 Describe the bug

Versions

psobolewskiPhD commented May 20, 2022

PkuCuipy commented May 20, 2022

psobolewskiPhD commented May 20, 2022

PkuCuipy commented May 20, 2022 •

edited

Loading