You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@vlad-penkin Thank you for reply. I think the performance of the two layernorm (~50%) and softmax (~60%) is still not good compare with A100.
./scripts/capture-hw-details.sh
LIBIGC1_VERSION=1.0.16900.24-914
LEVEL_ZERO_VERSION=1.3.29735.27-914
AGAMA_VERSION=914
GPU_DEVICE=Intel(R) Data Center GPU Max 1550
Hi, I found some triton kernels generated by torch benchmark vit-base model are slower than A100. It seems the bandwidth are pretty low. I'm using public pytorch master branch + XPU build.
PVC:
python cat_layernorm.py
0.403ms 0.039GB 96.59GB/s
python gelu.py
0.362ms 0.155GB 428.07GB/s
python layernorm.py
0.296ms 0.058GB 196.21GB/s
python safe_softmax.py
0.495ms 0.238GB 481.51GB/s
A100:
python cat_layernorm_nv.py
0.089ms 0.039GB 437.12GB/s
python gelu_nv.py
0.144ms 0.155GB 1073.02GB/s
python layernorm_nv.py
0.062ms 0.058GB 930.15GB/s
python safe_softmax_nv.py
0.261ms 0.238GB 913.15GB/s
reproducer.zip
The text was updated successfully, but these errors were encountered: