Matmul regression in ToM compared to MLPerf branch. #106

MaheshRavishankar · 2024-10-15T23:01:12Z

For reproduction.

Input Model:
https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/punet.mlir

Input data :
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.0.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.1.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.2.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.3.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.4.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/inference_input.5.bin
wget https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sdxl-punet/punet_weights.irpa

I built IREE on main and used the TD script in https://github.com/nod-ai/sdxl-scripts/blob/shared/sdxl_on_main/int8-model/specs/attention_and_matmul_spec.mlir

Compilation command for IREE on main

iree-compile \
    --iree-execution-model=async-external \
    --iree-hal-target-backends=rocm \
    --iree-hip-target=gfx942 \
    --iree-hip-waves-per-eu=2 \
    --iree-codegen-gpu-native-math-precision=true \
    --iree-codegen-llvmgpu-use-vector-distribution \
    --iree-codegen-transform-dialect-library= \
    --iree-dispatch-creation-enable-aggressive-fusion=true \
    --iree-global-opt-propagate-transposes=true \
    --iree-llvmgpu-enable-prefetch=true \
    --iree-opt-aggressively-propagate-transposes=true \
    --iree-opt-const-eval=false \
    --iree-opt-outer-dim-concat=true \
    --iree-opt-data-tiling=false \
    --iree-preprocessing-pass-pipeline="builtin.module(util.func(iree-global-opt-raise-special-ops, iree-flow-canonicalize), iree-preprocessing-transpose-convolution-pipeline,  iree-preprocessing-pad-to-intrinsics, util.func(iree-preprocessing-generalize-linalg-matmul-experimental))" \
    --iree-vm-target-truncate-unsupported-floats \                                                                                                                                                                   ${PUNET_MODEL} \
    -o ${VMFB} \

Run Command :

iree-benchmark-module \
    --device=hip:0 \
    --device_allocator=caching \
    --function=main \
    --hip_allow_inline_execution=true \
    --hip_use_stream=true \
    --input=1x4x128x128xf16=@inference_input.0.bin \
    --input=1xf16=@inference_input.1.bin \
    --input=2x64x2048xf16=@inference_input.2.bin \
    --input=2x1280xf16=@inference_input.3.bin \
    --input=2x6xf16=@inference_input.4.bin \
    --input=1xf16=@inference_input.5.bin \
    --module=${VMFB} \
    --parameters=model=punet_weights.irpa

For compilation on MLPerf I used the same inputs/weights but used
IREE Commit : https://github.com/iree-org/iree/tree/mlperf_v4.1_20240726
TD script : https://github.com/nod-ai/sdxl-scripts/blob/mlperf_v4.1_20240726/int8-model/specs/attention_and_matmul_spec.mlir

iree-compile
    --iree-execution-model=async-external \
    --iree-hal-target-backends=rocm \
    --iree-rocm-target-chip=gfx942 \
    --iree-rocm-waves-per-eu=2 \
    --iree-codegen-gpu-native-math-precision=true \
    --iree-codegen-llvmgpu-use-vector-distribution \
    --iree-codegen-transform-dialect-library=${TD_SPEC} \
    --iree-flow-enable-aggressive-fusion=true \                                                                                                                                                                      --iree-global-opt-propagate-transposes=true \
    --iree-llvmgpu-enable-prefetch=true \
    --iree-opt-aggressively-propagate-transposes=true \
    --iree-opt-const-eval=false \
    --iree-opt-outer-dim-concat=true \
    --iree-opt-data-tiling=false \
    --iree-preprocessing-pass-pipeline="builtin.module(util.func(iree-global-opt-raise-special-ops, iree-flow-canonicalize), iree-preprocessing-transpose-convolution-pipeline,  util.func(iree-preprocessing-pa\d-to-intrinsics), util.func(iree-preprocessing-generalize-linalg-matmul-experimental))" \
    --iree-vm-target-truncate-unsupported-floats \
    ${PUNET_MODEL} \
    -o ${VMFB} \

and same run command

There are in general slow down in performance. Particular issues are in the following three matmul dispatches

main$async_matmul_156_matmul_like_*
main$async_matmul_143_matmul_like_*
main$async_matmul_158_matmul_like_*

Attached is the IR log before and after strategy selection and lowering

sdxl_mlperf_matmul_143.dump.mlir.txt
sdxl_mlperf_matmul_156.dump.mlir.txt
sdxl_mlperf_matmul_158.dump.mlir.txt
sdxl_tom_matmul_143.dump.mlir.txt
sdxl_tom_matmul_156.dump.mlir.txt
sdxl_tom_matmul_158.dump.mlir.txt

The text was updated successfully, but these errors were encountered:

MaheshRavishankar · 2024-10-15T23:10:39Z

I see that there is some differences in configuration setting. One of them has reordering set, one of them has prefetching enabled on main but not on MLPerf branch.

cc @kuhar and @bangtianliu

MaheshRavishankar · 2024-10-15T23:15:40Z

Attaching traces as well

MLPerf:

ToM:

(well images of traces. I have the traces, but cant upload it)

kuhar · 2024-10-16T00:44:45Z

@MaheshRavishankar is this CPX or SPX?

MaheshRavishankar · 2024-10-16T00:53:29Z

I ran it in SpX mode

MaheshRavishankar assigned kuhar and bangtianliu Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matmul regression in ToM compared to MLPerf branch. #106

Matmul regression in ToM compared to MLPerf branch. #106

MaheshRavishankar commented Oct 15, 2024 •

edited

Loading

MaheshRavishankar commented Oct 15, 2024

MaheshRavishankar commented Oct 15, 2024 •

edited

Loading

kuhar commented Oct 16, 2024

MaheshRavishankar commented Oct 16, 2024

Matmul regression in ToM compared to MLPerf branch. #106

Matmul regression in ToM compared to MLPerf branch. #106

Comments

MaheshRavishankar commented Oct 15, 2024 • edited Loading

MaheshRavishankar commented Oct 15, 2024

MaheshRavishankar commented Oct 15, 2024 • edited Loading

kuhar commented Oct 16, 2024

MaheshRavishankar commented Oct 16, 2024

MaheshRavishankar commented Oct 15, 2024 •

edited

Loading

MaheshRavishankar commented Oct 15, 2024 •

edited

Loading