iree-codegen-cpu-materialize-host-encoding fails with invalid reference to GlobalOp #18628

sogartar · 2024-09-27T21:42:17Z

What happened?

I have input MLIR that is exported from the test in this PR.
It's compilation with assertions on fails with

iree-compile: repo/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceAnalysis.cpp:180: auto mlir::iree_compiler::IREE::HAL::DeviceAnalysis::gatherDeviceTargets(mlir::Attribute, mlir::Operation *, SetVector<IREE::HAL::DeviceTargetAttr> &)::(anonymous class)::operator()(auto) const [symRefAttr:auto = mlir::SymbolRefAttr]: Assertion `globalOp && "global reference must be valid"' failed.

Full error message is in the reproducer zip.

Steps to reproduce your issue

Download the reproducer sharded-toy-llama.zip.
Run compile.sh.

What component(s) does this issue relate to?

Compiler

Version information

5a2dd56

Additional context

No response

The text was updated successfully, but these errors were encountered:

benvanik · 2024-09-27T21:44:44Z

that pass is fundamentally incompatible with multi-device - you need to disable data tiling

sogartar · 2024-09-27T22:23:25Z

@benvanik How do you disable data tiling? I tried with

--iree-opt-data-tiling=false
--iree-llvmcpu-disable-arm-sme-tiling

But I get the same error.

sogartar · 2024-09-27T22:25:53Z

@hanhanW do you know how to disable it?

hanhanW · 2024-09-27T22:35:04Z

Adding --iree-opt-data-tiling=false should disable the data-tiling. I can't tell much without looking at logs.

sogartar · 2024-09-30T11:46:54Z

@hanhanW, I forgot to mention in the description that the full error message is in the reproducer archive.

sogartar · 2024-09-30T13:35:15Z

@benvanik, aside from the problem that this pass should not be called at all, should flow.tensor.transfer ops that reference a nonexistent device be considered illegal? The verifier did not complain. We have

%2 = flow.tensor.transfer %1 : tensor<?x?x?x?xf32>{%c3, %c26, %c2, %c14} to #hal.device.affinity<@__device_1>

but there is no corresponding

util.global private @__device_1 ...

flow.tensor.transfer does not have the op interface SymbolUserOpInterface and does not verify the referenced symbol.

benvanik · 2024-09-30T15:06:18Z

The attribute is what is referencing the symbol, not the op. We verify these later on but because this pass is running in the wrong place it happens before the verification. I'm guessing you went in and deleted ops manually? There should not be a case where this happens without hand-editing (if there is that's what we need to track down).

sogartar · 2024-09-30T15:34:02Z

I'm guessing you went in and deleted ops manually?

I have not hand-edited. The input IR references the symbol through promises. E.g.

%56 = flow.tensor.transfer %cast_2 : tensor<?x?xi32>{%dim_4, %dim_6} to #hal.device.promise<@__device_1>

The attribute is what is referencing the symbol, not the op.

One could make the same argument for function call ops. The attribute contains the symbol, but the op still verifies the symbol and it can not be any other way as the attribute itself is not bound to the graph (does not have this context).

There should not be a case where this happens without hand-editing (if there is that's what we need to track down).

@benvanik I will track where the origin of the problem is. But shouldn't we add this to the flow.tensor.transfer verification? It will be a lot easier to track this problem with such a verification.

benvanik · 2024-09-30T15:35:07Z

Promises are ok - affinities are not. Promises do not need the symbol to exist.

The op does not know that the affinity attribute even has a symbol. The cases are different.

IanNod · 2024-09-30T17:29:10Z

Here is an mlir-print-ir-after-all dump compiling for hip giving the same issue with following iree compile command:

./build/tools/iree-compile ../sharktank/program.mlir --iree-input-type=torch --iree-hal-target-device=hip[0] --iree-hal-target-device=hip[1] --mlir-print-ir-after-all --iree-opt-data-tiling=false --iree-gl
obal-opt-enable-early-materialization=false -o program.vmfb &> hip.txt

https://sharkpublic.blob.core.windows.net/sharkpublic/ian/hip.txt

hanhanW · 2024-09-30T17:32:49Z

There is a bug in getFuncExecutableTargetAttrs on data-tiling path. The bug is likely in HAL device analysis and I can take a look.

There is another issue about disabling data-tiling. I thought that --iree-opt-data-tiling=false should work, but it does not work on my machine. It still runs the set_encoding passes even with the flag. I'll take a look.

sogartar · 2024-09-30T20:52:41Z

The input for pass iree-auto-input-conversion lacks information about @__device_1.
It only has about the default local device.

module attributes {hal.device.targets = [#hal.device.target<"local", [#hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "znver4", cpu_features = "+prfchw,-cldemote,+avx,+aes,+sahf,+pclmul,-xop,+crc32,+xsaves,-avx512fp16,-usermsr,-sm4,-egpr,+sse4.1,+avx512ifma,+xsave,+sse4.2,-tsxldtrk,-sm3,-ptwrite,-widekl,+invpcid,+64bit,+xsavec,-avx10.1-512,+avx512vpopcntdq,+cmov,-avx512vp2intersect,+avx512cd,+movbe,-avxvnniint8,-ccmp,-amx-int8,-kl,-avx10.1-256,+evex512,-avxvnni,-rtm,+adx,+avx2,-hreset,-movdiri,-serialize,-sha512,+vpclmulqdq,+avx512vl,-uintr,-cf,+clflushopt,-raoint,-cmpccxadd,+bmi,-amx-tile,+sse,-avx10.2-256,+gfni,-avxvnniint16,-amx-fp16,-zu,-ndd,+xsaveopt,+rdrnd,+avx512f,-amx-bf16,+avx512bf16,+avx512vnni,-push2pop2,+cx8,+avx512bw,+sse3,+pku,-nf,+fsgsbase,+clzero,+mwaitx,-lwp,+lzcnt,+sha,-movdir64b,-ppx,+wbnoinvd,-enqcmd,-avx10.2-512,-avxneconvert,-tbm,-pconfig,-amx-complex,+ssse3,+cx16,+bmi2,+fma,+popcnt,-avxifma,+f16c,+avx512bitalg,+rdpru,+clwb,+mmx,+sse2,+rdseed,+avx512vbmi2,-prefetchi,+rdpid,-fma4,+avx512vbmi,+shstk,+vaes,-waitpkg,-sgx,+fxsr,+avx512dq,+sse4a", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", native_vector_size = 64 : i64, target_triple = "x86_64-unknown-unknown-eabi-elf"}>]> : !hal.device], iree.consteval} {

Later at iree-hal-materialize-target-devices pass this becomes

util.global private @__device_0 = #hal.device.target<"local", [#hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "znver4", cpu_features = "+prfchw,-cldemote,+avx,+aes,+sahf,+pclmul,-xop,+crc32,+xsaves,-avx512fp16,-usermsr,-sm4,-egpr,+sse4.1,+avx512ifma,+xsave,+sse4.2,-tsxldtrk,-sm3,-ptwrite,-widekl,+invpcid,+64bit,+xsavec,-avx10.1-512,+avx512vpopcntdq,+cmov,-avx512vp2intersect,+avx512cd,+movbe,-avxvnniint8,-ccmp,-amx-int8,-kl,-avx10.1-256,+evex512,-avxvnni,-rtm,+adx,+avx2,-hreset,-movdiri,-serialize,-sha512,+vpclmulqdq,+avx512vl,-uintr,-cf,+clflushopt,-raoint,-cmpccxadd,+bmi,-amx-tile,+sse,-avx10.2-256,+gfni,-avxvnniint16,-amx-fp16,-zu,-ndd,+xsaveopt,+rdrnd,+avx512f,-amx-bf16,+avx512bf16,+avx512vnni,-push2pop2,+cx8,+avx512bw,+sse3,+pku,-nf,+fsgsbase,+clzero,+mwaitx,-lwp,+lzcnt,+sha,-movdir64b,-ppx,+wbnoinvd,-enqcmd,-avx10.2-512,-avxneconvert,-tbm,-pconfig,-amx-complex,+ssse3,+cx16,+bmi2,+fma,+popcnt,-avxifma,+f16c,+avx512bitalg,+rdpru,+clwb,+mmx,+sse2,+rdseed,+avx512vbmi2,-prefetchi,+rdpid,-fma4,+avx512vbmi,+shstk,+vaes,-waitpkg,-sgx,+fxsr,+avx512dq,+sse4a", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", native_vector_size = 64 : i64, target_triple = "x86_64-unknown-unknown-eabi-elf"}>]> : !hal.device

Whatever is generating the jit IR for iree-auto-input-conversion is dropping the device definitions that are present befor iree-consteval-jit-globals.

#device_target_local_0_ = #hal.device.target<"local", {ordinal = 0 : index}, [#executable_target_embedded_elf_x86_64_]> : !hal.device
#device_target_local_1_ = #hal.device.target<"local", {ordinal = 1 : index}, [#executable_target_embedded_elf_x86_64_]> : !hal.device
module @module attributes {stream.affinity.default = #hal.device.affinity<@__device_0>} {
  util.global private @__device_0 = #device_target_local_0_
  util.global private @__device_1 = #device_target_local_1_

sogartar · 2024-09-30T21:13:19Z

It makes sense that iree-consteval-jit-globals would ignore device placement while evaluating to constants and use only the default 'local' device.
Maybe for each declared device it should set it to point to the single CPU device that is defined at time of compilation.
E.g.

#device_target_local_0_ = #hal.device.target<"local", {ordinal = 0 : index}, [#executable_target_embedded_elf_x86_64_]> : !hal.device
module @module {
  util.global private @__device_0 = #device_target_local_0_
  util.global private @__device_1 = #device_target_local_0_

hanhanW · 2024-10-01T00:53:30Z

Sorry that I was busy on other stuff today, so no more updates from my side. I'll prioritize this issue tomorrow.

sogartar · 2024-10-01T14:29:01Z

I am working on a PR for iree-consteval-jit-globals to define all the required devices to point to the single default/local device in the JITed module.

benvanik · 2024-10-01T14:47:29Z

Hold up - that's not the correct approach.

benvanik · 2024-10-01T14:57:30Z

Two issues: data tiling as it is plumbed through codegen today is fundamentally incompatible with jit eval, and jit eval today should be stripping all transfer ops and device references.

Data tiling needs to separate itself from executable target and only use that as a way to seed default information. Data tiling must always be overridable independently of executable target once it gets to codegen. That is, we must be able to codegen data tiling behavior for one executable target on a different executable target. Today it always assumes it can use the executable target to derive the information it needs. Things like jit eval and hoisting do not work if we can't do that - and we need jit eval and hoisting.

In order for that to work jit eval needs to ensure that information is present in the IR separate from the target used for jitting (the host). This means that the split of executable target and tiling information needs to happen before jit eval such that what device is running any particular op is independent from what data tiling is being performed. After that point no device information from the original program is required in the jit eval program, and the only device that should exist is the host default device. All device affinities and transfer ops should be stripped.

So don't alias devices - remove them all. There should be no transfer ops in a jit eval function, and no device affinities specified.

sogartar · 2024-10-01T15:41:45Z

OK, I will add stripping of flow.tensor.transfer ops in the iree-consteval-jit-globals pass.

I will have to see what other ops can have device affinities and strip them as well. Maybe some ops can just have the affinities stripped without removing the whole op if it is performing some "actual" operation.

sogartar · 2024-10-02T13:48:15Z

Here is a PR #18663 that strips the affinities.

sogartar added bug 🐞 Something isn't working compiler/dialects Relating to the IREE compiler dialects (flow, hal, vm) codegen/llvm LLVM code generation compiler backend and removed compiler/dialects Relating to the IREE compiler dialects (flow, hal, vm) labels Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iree-codegen-cpu-materialize-host-encoding fails with invalid reference to GlobalOp #18628

iree-codegen-cpu-materialize-host-encoding fails with invalid reference to GlobalOp #18628

sogartar commented Sep 27, 2024 •

edited

Loading

benvanik commented Sep 27, 2024

sogartar commented Sep 27, 2024

sogartar commented Sep 27, 2024

hanhanW commented Sep 27, 2024

sogartar commented Sep 30, 2024

sogartar commented Sep 30, 2024

benvanik commented Sep 30, 2024 •

edited

Loading

sogartar commented Sep 30, 2024 •

edited

Loading

benvanik commented Sep 30, 2024

IanNod commented Sep 30, 2024

hanhanW commented Sep 30, 2024

sogartar commented Sep 30, 2024

sogartar commented Sep 30, 2024 •

edited

Loading

hanhanW commented Oct 1, 2024

sogartar commented Oct 1, 2024

benvanik commented Oct 1, 2024

benvanik commented Oct 1, 2024 •

edited

Loading

sogartar commented Oct 1, 2024 •

edited

Loading

sogartar commented Oct 2, 2024

iree-codegen-cpu-materialize-host-encoding fails with invalid reference to GlobalOp #18628

iree-codegen-cpu-materialize-host-encoding fails with invalid reference to GlobalOp #18628

Comments

sogartar commented Sep 27, 2024 • edited Loading

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

benvanik commented Sep 27, 2024

sogartar commented Sep 27, 2024

sogartar commented Sep 27, 2024

hanhanW commented Sep 27, 2024

sogartar commented Sep 30, 2024

sogartar commented Sep 30, 2024

benvanik commented Sep 30, 2024 • edited Loading

sogartar commented Sep 30, 2024 • edited Loading

benvanik commented Sep 30, 2024

IanNod commented Sep 30, 2024

hanhanW commented Sep 30, 2024

sogartar commented Sep 30, 2024

sogartar commented Sep 30, 2024 • edited Loading

hanhanW commented Oct 1, 2024

sogartar commented Oct 1, 2024

benvanik commented Oct 1, 2024

benvanik commented Oct 1, 2024 • edited Loading

sogartar commented Oct 1, 2024 • edited Loading

sogartar commented Oct 2, 2024

sogartar commented Sep 27, 2024 •

edited

Loading

benvanik commented Sep 30, 2024 •

edited

Loading

sogartar commented Sep 30, 2024 •

edited

Loading

sogartar commented Sep 30, 2024 •

edited

Loading

benvanik commented Oct 1, 2024 •

edited

Loading

sogartar commented Oct 1, 2024 •

edited

Loading