Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matmul regression in ToM compared to MLPerf branch. #106

Open
MaheshRavishankar opened this issue Oct 15, 2024 · 4 comments
Open

Matmul regression in ToM compared to MLPerf branch. #106

MaheshRavishankar opened this issue Oct 15, 2024 · 4 comments
Assignees

Comments

@MaheshRavishankar
Copy link
Contributor

MaheshRavishankar commented Oct 15, 2024

For reproduction.

Input Model:
https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/punet.mlir

Input data :
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.0.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.1.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.2.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.3.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.4.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.5.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/punet_weights.irpa

I built IREE on main and used the TD script in https://github.com/nod-ai/sdxl-scripts/blob/shared/sdxl_on_main/int8-model/specs/attention_and_matmul_spec.mlir

Compilation command for IREE on main

iree-compile \
    --iree-execution-model=async-external \
    --iree-hal-target-backends=rocm \
    --iree-hip-target=gfx942 \
    --iree-hip-waves-per-eu=2 \
    --iree-codegen-gpu-native-math-precision=true \
    --iree-codegen-llvmgpu-use-vector-distribution \
    --iree-codegen-transform-dialect-library= \
    --iree-dispatch-creation-enable-aggressive-fusion=true \
    --iree-global-opt-propagate-transposes=true \
    --iree-llvmgpu-enable-prefetch=true \
    --iree-opt-aggressively-propagate-transposes=true \
    --iree-opt-const-eval=false \
    --iree-opt-outer-dim-concat=true \
    --iree-opt-data-tiling=false \
    --iree-preprocessing-pass-pipeline="builtin.module(util.func(iree-global-opt-raise-special-ops, iree-flow-canonicalize), iree-preprocessing-transpose-convolution-pipeline,  iree-preprocessing-pad-to-intrinsics, util.func(iree-preprocessing-generalize-linalg-matmul-experimental))" \
    --iree-vm-target-truncate-unsupported-floats \                                                                                                                                                                   ${PUNET_MODEL} \
    -o ${VMFB} \                                                               

Run Command :

iree-benchmark-module \
    --device=hip:0 \
    --device_allocator=caching \
    --function=main \
    --hip_allow_inline_execution=true \
    --hip_use_stream=true \
    --input=1x4x128x128xf16=@inference_input.0.bin \
    --input=1xf16=@inference_input.1.bin \
    --input=2x64x2048xf16=@inference_input.2.bin \
    --input=2x1280xf16=@inference_input.3.bin \
    --input=2x6xf16=@inference_input.4.bin \
    --input=1xf16=@inference_input.5.bin \
    --module=${VMFB} \
    --parameters=model=punet_weights.irpa 

For compilation on MLPerf I used the same inputs/weights but used
IREE Commit : https://github.com/iree-org/iree/tree/mlperf_v4.1_20240726
TD script : https://github.com/nod-ai/sdxl-scripts/blob/mlperf_v4.1_20240726/int8-model/specs/attention_and_matmul_spec.mlir

iree-compile
    --iree-execution-model=async-external \
    --iree-hal-target-backends=rocm \
    --iree-rocm-target-chip=gfx942 \
    --iree-rocm-waves-per-eu=2 \
    --iree-codegen-gpu-native-math-precision=true \
    --iree-codegen-llvmgpu-use-vector-distribution \
    --iree-codegen-transform-dialect-library=${TD_SPEC} \
    --iree-flow-enable-aggressive-fusion=true \                                                                                                                                                                      --iree-global-opt-propagate-transposes=true \
    --iree-llvmgpu-enable-prefetch=true \
    --iree-opt-aggressively-propagate-transposes=true \
    --iree-opt-const-eval=false \
    --iree-opt-outer-dim-concat=true \
    --iree-opt-data-tiling=false \
    --iree-preprocessing-pass-pipeline="builtin.module(util.func(iree-global-opt-raise-special-ops, iree-flow-canonicalize), iree-preprocessing-transpose-convolution-pipeline,  util.func(iree-preprocessing-pa\d-to-intrinsics), util.func(iree-preprocessing-generalize-linalg-matmul-experimental))" \
    --iree-vm-target-truncate-unsupported-floats \
    ${PUNET_MODEL} \
    -o ${VMFB} \

and same run command

There are in general slow down in performance. Particular issues are in the following three matmul dispatches

main$async_matmul_156_matmul_like_*
main$async_matmul_143_matmul_like_*
main$async_matmul_158_matmul_like_*

Attached is the IR log before and after strategy selection and lowering

sdxl_mlperf_matmul_143.dump.mlir.txt
sdxl_mlperf_matmul_156.dump.mlir.txt
sdxl_mlperf_matmul_158.dump.mlir.txt
sdxl_tom_matmul_143.dump.mlir.txt
sdxl_tom_matmul_156.dump.mlir.txt
sdxl_tom_matmul_158.dump.mlir.txt

@MaheshRavishankar
Copy link
Contributor Author

I see that there is some differences in configuration setting. One of them has reordering set, one of them has prefetching enabled on main but not on MLPerf branch.

cc @kuhar and @bangtianliu

@MaheshRavishankar
Copy link
Contributor Author

MaheshRavishankar commented Oct 15, 2024

Attaching traces as well

MLPerf:
sdxl_mlperf

ToM:
sdxl_tom

(well images of traces. I have the traces, but cant upload it)

@kuhar
Copy link
Member

kuhar commented Oct 16, 2024

@MaheshRavishankar is this CPX or SPX?

@MaheshRavishankar
Copy link
Contributor Author

I ran it in SpX mode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants