-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iree-codegen-cpu-materialize-host-encoding fails with invalid reference to GlobalOp #18628
Comments
that pass is fundamentally incompatible with multi-device - you need to disable data tiling |
@benvanik How do you disable data tiling? I tried with
But I get the same error. |
@hanhanW do you know how to disable it? |
Adding |
@hanhanW, I forgot to mention in the description that the full error message is in the reproducer archive. |
@benvanik, aside from the problem that this pass should not be called at all, should %2 = flow.tensor.transfer %1 : tensor<?x?x?x?xf32>{%c3, %c26, %c2, %c14} to #hal.device.affinity<@__device_1> but there is no corresponding util.global private @__device_1 ...
|
The attribute is what is referencing the symbol, not the op. We verify these later on but because this pass is running in the wrong place it happens before the verification. I'm guessing you went in and deleted ops manually? There should not be a case where this happens without hand-editing (if there is that's what we need to track down). |
I have not hand-edited. The input IR references the symbol through promises. E.g.
One could make the same argument for function call ops. The attribute contains the symbol, but the op still verifies the symbol and it can not be any other way as the attribute itself is not bound to the graph (does not have this context).
@benvanik I will track where the origin of the problem is. But shouldn't we add this to the |
Promises are ok - affinities are not. Promises do not need the symbol to exist. The op does not know that the affinity attribute even has a symbol. The cases are different. |
Here is an mlir-print-ir-after-all dump compiling for hip giving the same issue with following iree compile command:
https://sharkpublic.blob.core.windows.net/sharkpublic/ian/hip.txt |
There is a bug in There is another issue about disabling data-tiling. I thought that |
The input for pass module attributes {hal.device.targets = [#hal.device.target<"local", [#hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "znver4", cpu_features = "+prfchw,-cldemote,+avx,+aes,+sahf,+pclmul,-xop,+crc32,+xsaves,-avx512fp16,-usermsr,-sm4,-egpr,+sse4.1,+avx512ifma,+xsave,+sse4.2,-tsxldtrk,-sm3,-ptwrite,-widekl,+invpcid,+64bit,+xsavec,-avx10.1-512,+avx512vpopcntdq,+cmov,-avx512vp2intersect,+avx512cd,+movbe,-avxvnniint8,-ccmp,-amx-int8,-kl,-avx10.1-256,+evex512,-avxvnni,-rtm,+adx,+avx2,-hreset,-movdiri,-serialize,-sha512,+vpclmulqdq,+avx512vl,-uintr,-cf,+clflushopt,-raoint,-cmpccxadd,+bmi,-amx-tile,+sse,-avx10.2-256,+gfni,-avxvnniint16,-amx-fp16,-zu,-ndd,+xsaveopt,+rdrnd,+avx512f,-amx-bf16,+avx512bf16,+avx512vnni,-push2pop2,+cx8,+avx512bw,+sse3,+pku,-nf,+fsgsbase,+clzero,+mwaitx,-lwp,+lzcnt,+sha,-movdir64b,-ppx,+wbnoinvd,-enqcmd,-avx10.2-512,-avxneconvert,-tbm,-pconfig,-amx-complex,+ssse3,+cx16,+bmi2,+fma,+popcnt,-avxifma,+f16c,+avx512bitalg,+rdpru,+clwb,+mmx,+sse2,+rdseed,+avx512vbmi2,-prefetchi,+rdpid,-fma4,+avx512vbmi,+shstk,+vaes,-waitpkg,-sgx,+fxsr,+avx512dq,+sse4a", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", native_vector_size = 64 : i64, target_triple = "x86_64-unknown-unknown-eabi-elf"}>]> : !hal.device], iree.consteval} { Later at util.global private @__device_0 = #hal.device.target<"local", [#hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "znver4", cpu_features = "+prfchw,-cldemote,+avx,+aes,+sahf,+pclmul,-xop,+crc32,+xsaves,-avx512fp16,-usermsr,-sm4,-egpr,+sse4.1,+avx512ifma,+xsave,+sse4.2,-tsxldtrk,-sm3,-ptwrite,-widekl,+invpcid,+64bit,+xsavec,-avx10.1-512,+avx512vpopcntdq,+cmov,-avx512vp2intersect,+avx512cd,+movbe,-avxvnniint8,-ccmp,-amx-int8,-kl,-avx10.1-256,+evex512,-avxvnni,-rtm,+adx,+avx2,-hreset,-movdiri,-serialize,-sha512,+vpclmulqdq,+avx512vl,-uintr,-cf,+clflushopt,-raoint,-cmpccxadd,+bmi,-amx-tile,+sse,-avx10.2-256,+gfni,-avxvnniint16,-amx-fp16,-zu,-ndd,+xsaveopt,+rdrnd,+avx512f,-amx-bf16,+avx512bf16,+avx512vnni,-push2pop2,+cx8,+avx512bw,+sse3,+pku,-nf,+fsgsbase,+clzero,+mwaitx,-lwp,+lzcnt,+sha,-movdir64b,-ppx,+wbnoinvd,-enqcmd,-avx10.2-512,-avxneconvert,-tbm,-pconfig,-amx-complex,+ssse3,+cx16,+bmi2,+fma,+popcnt,-avxifma,+f16c,+avx512bitalg,+rdpru,+clwb,+mmx,+sse2,+rdseed,+avx512vbmi2,-prefetchi,+rdpid,-fma4,+avx512vbmi,+shstk,+vaes,-waitpkg,-sgx,+fxsr,+avx512dq,+sse4a", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", native_vector_size = 64 : i64, target_triple = "x86_64-unknown-unknown-eabi-elf"}>]> : !hal.device Whatever is generating the jit IR for #device_target_local_0_ = #hal.device.target<"local", {ordinal = 0 : index}, [#executable_target_embedded_elf_x86_64_]> : !hal.device
#device_target_local_1_ = #hal.device.target<"local", {ordinal = 1 : index}, [#executable_target_embedded_elf_x86_64_]> : !hal.device
module @module attributes {stream.affinity.default = #hal.device.affinity<@__device_0>} {
util.global private @__device_0 = #device_target_local_0_
util.global private @__device_1 = #device_target_local_1_ |
It makes sense that #device_target_local_0_ = #hal.device.target<"local", {ordinal = 0 : index}, [#executable_target_embedded_elf_x86_64_]> : !hal.device
module @module {
util.global private @__device_0 = #device_target_local_0_
util.global private @__device_1 = #device_target_local_0_ |
Sorry that I was busy on other stuff today, so no more updates from my side. I'll prioritize this issue tomorrow. |
I am working on a PR for |
Hold up - that's not the correct approach. |
Two issues: data tiling as it is plumbed through codegen today is fundamentally incompatible with jit eval, and jit eval today should be stripping all transfer ops and device references. Data tiling needs to separate itself from executable target and only use that as a way to seed default information. Data tiling must always be overridable independently of executable target once it gets to codegen. That is, we must be able to codegen data tiling behavior for one executable target on a different executable target. Today it always assumes it can use the executable target to derive the information it needs. Things like jit eval and hoisting do not work if we can't do that - and we need jit eval and hoisting. In order for that to work jit eval needs to ensure that information is present in the IR separate from the target used for jitting (the host). This means that the split of executable target and tiling information needs to happen before jit eval such that what device is running any particular op is independent from what data tiling is being performed. After that point no device information from the original program is required in the jit eval program, and the only device that should exist is the host default device. All device affinities and transfer ops should be stripped. So don't alias devices - remove them all. There should be no transfer ops in a jit eval function, and no device affinities specified. |
OK, I will add stripping of I will have to see what other ops can have device affinities and strip them as well. Maybe some ops can just have the affinities stripped without removing the whole op if it is performing some "actual" operation. |
Here is a PR #18663 that strips the affinities. |
What happened?
I have input MLIR that is exported from the test in this PR.
It's compilation with assertions on fails with
Full error message is in the reproducer zip.
Steps to reproduce your issue
compile.sh
.What component(s) does this issue relate to?
Compiler
Version information
5a2dd56
Additional context
No response
The text was updated successfully, but these errors were encountered: