[LLVM] Optimize G_SHUFFLE_VECTOR into more efficient generic opcodes #41

ValentijnvdBeek · 2024-05-17T15:39:14Z

This merge requests implements an infrastructure based on streams that allows for the creation of shuffle vector pattern masks that can be used to replace inefficient scalarizations with faster generic opcodes. It implements five patterns that are useful for the AIE going forwards, namely one that matches the concatenation of vectors, the same in reverse, the insertion of a vector and the merging of two vectors.

Some todos in the merge request that can be done after the rest has been OK'ed

The assertions that check the initial state are done a bit adhoc, it might be nice to pass them as function pointers as well
Inefficient matching for larger groups of patterns. The patterns now match the minimum of what they can. For example, only the first two parts of vectors are transformed. Most of these can be extended, but this leads to very large set of patterns quickly
There are two matching functions, since the concat version does a lot of additional work that I am not sure what it does. I should probably figure out if those can be fully merged or whether just the simply variant ought to survive
Use llvm::SmallVector and not std::vector

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

llvm/test/CodeGen/AArch64/GlobalISel/prelegalizercombiner-undef.mir

llvm/test/CodeGen/AArch64/ext-narrow-index.ll

konstantinschwarz · 2024-06-21T19:06:31Z

llvm/test/CodeGen/AArch64/vecreduce-add.ll

-; CHECK-GI-DOT-NEXT:    addv s0, v0.4s
-; CHECK-GI-DOT-NEXT:    fmov w0, s0
-; CHECK-GI-DOT-NEXT:    ret
+; CHECK-LABEL: add_pair_v8i16_v4i32_double_sext_zext_shuffle:


konstantinschwarz · 2024-06-21T19:07:43Z

llvm/test/CodeGen/AIE/aie2/intrinsics-shufflevec.ll

-; CHECK-NEXT:    mova r16, #13
-; CHECK-NEXT:    vextract.s32 r5, x2, r16
-; CHECK-NEXT:    j #.LBB0_3
+; CHECK-NEXT:    nopb ; nopa ; nops ; nopx ; vmov wl0, wh0; nopv


llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

github-actions · 2024-06-25T09:39:01Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff ea89ed6223aa93251b9708c16b9d605ab30dccdb 07244e73f9104198bb9cb52273afc033367e0429 -- llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp llvm/lib/Target/AIE/AIE2PreLegalizerCombiner.cpp

View the diff from clang-format here.

diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index 70297e7d43..b42b0b7c29 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -352,32 +352,31 @@ Register CombinerHelper::createUnmergeValue(
     TargetReg = MRI.createGenericVirtualRegister(HalfSizeTy);
   }
 
-    // Each destination fits n times into the source and each iteration we
-    // exactly half the source. Therefore we need to pick on which side we want
-    // to iterate on.
-    const uint32_t DstNumElements =
-        DstTy.isVector() ? DstTy.getNumElements() : 1;
-    const uint32_t HalfWay = Start + ((End - Start) / 2);
-    const uint32_t Position = DestinationIndex * DstNumElements;
-
-    uint32_t NextStart, NextEnd;
-    if (Position < HalfWay) {
-      Builder.buildInstr(TargetOpcode::G_UNMERGE_VALUES, {TargetReg, TmpReg},
-                         {SrcReg});
-      NextStart = Start;
-      NextEnd = HalfWay;
-    } else {
-      Builder.buildInstr(TargetOpcode::G_UNMERGE_VALUES, {TmpReg, TargetReg},
-                         {SrcReg});
-      NextStart = HalfWay;
-      NextEnd = End;
-    }
+  // Each destination fits n times into the source and each iteration we
+  // exactly half the source. Therefore we need to pick on which side we want
+  // to iterate on.
+  const uint32_t DstNumElements = DstTy.isVector() ? DstTy.getNumElements() : 1;
+  const uint32_t HalfWay = Start + ((End - Start) / 2);
+  const uint32_t Position = DestinationIndex * DstNumElements;
+
+  uint32_t NextStart, NextEnd;
+  if (Position < HalfWay) {
+    Builder.buildInstr(TargetOpcode::G_UNMERGE_VALUES, {TargetReg, TmpReg},
+                       {SrcReg});
+    NextStart = Start;
+    NextEnd = HalfWay;
+  } else {
+    Builder.buildInstr(TargetOpcode::G_UNMERGE_VALUES, {TmpReg, TargetReg},
+                       {SrcReg});
+    NextStart = HalfWay;
+    NextEnd = End;
+  }
 
-    if (HalfSizeTy.isVector() && DstTy != HalfSizeTy)
-      return createUnmergeValue(MI, TargetReg, DstReg, DestinationIndex,
-                                NextStart, NextEnd);
+  if (HalfSizeTy.isVector() && DstTy != HalfSizeTy)
+    return createUnmergeValue(MI, TargetReg, DstReg, DestinationIndex,
+                              NextStart, NextEnd);
 
-    return DstReg;
+  return DstReg;
 }
 
 bool CombinerHelper::tryCombineShuffleVector(MachineInstr &MI) {

konstantinschwarz

There are plenty of upstream tests failing. Could you please take a look?

llvm/lib/Target/AIE/AIECombine.td

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

llvm/test/CodeGen/AIE/aie2/GlobalISel/prelegalizercombiner-shufflevector.mir

llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

gbossu · 2024-08-14T15:47:15Z

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

@@ -399,6 +399,54 @@ adderGenerator(const int32_t From, const int32_t To, const int32_t StepSize) {
  };
 }

+Register CombinerHelper::createUnmergeValue(


Curious: Maybe IRTranslator already has a similar piece of code that we can re-use?

Not from what I can see, G_UNMERGE_VALUES isn't meant for vectors to begin with. It only supports scalars, according to the documentation. Most backends just support this as an extension since there is no other opcode to do it. In theory, this will be ripped out in favour of a simple G_SUBREGISTER_EXTRACT in the near future if that opcode gets proper support. Utils.cpp does have something that is a bit similar, called llvm::extractParts, but that splits into two parts while this code goes through it like a binary tree.

In any case, I would keep it like this since this code should be replaced before upstreaming and that is easier if we are the only ones depending on it.

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

ValentijnvdBeek · 2024-08-15T11:28:03Z

llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h

+  Register createUnmergeValue(MachineInstr &MI, const Register SrcReg,
+                              const Register DstReg, uint8_t DestinationIndex,
+                              const uint32_t Start, const uint32_t End);
+


G_UNMERGE_VALUE isn't really meant to be used for vectors like this, it is actually defined as a scalar opcode. There is a nice drop-in replacement opcode called G_EXTRACT_SUBVECTOR which has landed in our tree in early August, but there is no support in other backends for this yet. So this implementation is a necessary evil until the other opcode is able to replace it.

ValentijnvdBeek · 2024-08-15T11:35:47Z

llvm/test/CodeGen/AArch64/ext-narrow-index.ll

-; CHECK-GISEL-NEXT:    ext v0.16b, v0.16b, v1.16b, #8
-; CHECK-GISEL-NEXT:    // kill: def $d0 killed $d0 killed $q0
+; CHECK-GISEL-NEXT:    mov d0, v0.d[1]
 ; CHECK-GISEL-NEXT:    ret


This test is:

define <8 x i8> @i8_off8(<16 x i8> %arg1, <16 x i8> %arg2) { entry: %shuffle = shufflevector <16 x i8> %arg1, <16 x i8> %arg2, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15> ret <8 x i8> %shuffle }

This turns into:

bb.1.entry: liveins: $q0, $q1 %0:_(<16 x s8>) = COPY $q0 %3:_(<8 x s8>), %2:_(<8 x s8>) = G_UNMERGE_VALUES %0:_(<16 x s8>) $d0 = COPY %2:_(<8 x s8>) RET_ReallyLR implicit $d0

This is expected

ValentijnvdBeek · 2024-08-15T11:36:43Z

llvm/test/CodeGen/AArch64/ext-narrow-index.ll

-; CHECK-GISEL-NEXT:    movi v1.2d, #0000000000000000
-; CHECK-GISEL-NEXT:    ext v0.16b, v0.16b, v1.16b, #8
-; CHECK-GISEL-NEXT:    // kill: def $d0 killed $d0 killed $q0
+; CHECK-GISEL-NEXT:    mov d0, v0.d[1]
 ; CHECK-GISEL-NEXT:    ret
 entry:
  %shuffle = shufflevector <16 x i8> %arg1, <16 x i8> zeroinitializer, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>


This is:

define <8 x i8> @i8_zero_off8(<16 x i8> %arg1) { entry: %shuffle = shufflevector <16 x i8> %arg1, <16 x i8> zeroinitializer, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15> ret <8 x i8> %shuffle }

And turns into:

bb.1.entry: liveins: $q0, $q1 %0:_(<16 x s8>) = COPY $q0 %5:_(<8 x s8>), %4:_(<8 x s8>) = G_UNMERGE_VALUES %0:_(<16 x s8>) $d0 = COPY %4:_(<8 x s8>) RET_ReallyLR implicit $d0

this is expected

ValentijnvdBeek · 2024-08-15T11:57:47Z

llvm/test/CodeGen/AArch64/vecreduce-add.ll

-; CHECK-GI-NEXT:    ushll2 v0.4s, v0.8h, #0
 ; CHECK-GI-NEXT:    ushll v5.4s, v1.4h, #0
-; CHECK-GI-NEXT:    ushll2 v1.4s, v1.8h, #0
 ; CHECK-GI-NEXT:    ushll v6.4s, v2.4h, #0
-; CHECK-GI-NEXT:    ushll2 v2.4s, v2.8h, #0
 ; CHECK-GI-NEXT:    ushll v7.4s, v3.4h, #0
-; CHECK-GI-NEXT:    ushll2 v3.4s, v3.8h, #0
-; CHECK-GI-NEXT:    add v0.4s, v4.4s, v0.4s
-; CHECK-GI-NEXT:    add v1.4s, v5.4s, v1.4s
-; CHECK-GI-NEXT:    add v2.4s, v6.4s, v2.4s
-; CHECK-GI-NEXT:    add v3.4s, v7.4s, v3.4s
+; CHECK-GI-NEXT:    uaddw2 v0.4s, v4.4s, v0.8h
+; CHECK-GI-NEXT:    uaddw2 v1.4s, v5.4s, v1.8h
+; CHECK-GI-NEXT:    uaddw2 v2.4s, v6.4s, v2.8h
+; CHECK-GI-NEXT:    uaddw2 v3.4s, v7.4s, v3.8h
 ; CHECK-GI-NEXT:    add v0.4s, v0.4s, v1.4s
 ; CHECK-GI-NEXT:    add v1.4s, v2.4s, v3.4s
 ; CHECK-GI-NEXT:    add v0.4s, v0.4s, v1.4s


this is for me the most surprising test result and hardest to parse. The input LLVM IR looks like:

define i32 @add_pair_v8i16_v4i32_double_sext_zext_shuffle(<8 x i16> %ax, <8 x i16> %ay, <8 x i16> %bx, <8 x i16> %by) { entry: %axx = zext <8 x i16> %ax to <8 x i32> %s1h = shufflevector <8 x i32> %axx, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %s1l = shufflevector <8 x i32> %axx, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7> %axs = add <4 x i32> %s1h, %s1l %ayy = zext <8 x i16> %ay to <8 x i32> %s2h = shufflevector <8 x i32> %ayy, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %s2l = shufflevector <8 x i32> %ayy, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7> %ays = add <4 x i32> %s2h, %s2l %az = add <4 x i32> %axs, %ays %bxx = zext <8 x i16> %bx to <8 x i32> %s3h = shufflevector <8 x i32> %bxx, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %s3l = shufflevector <8 x i32> %bxx, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7> %bxs = add <4 x i32> %s3h, %s3l %byy = zext <8 x i16> %by to <8 x i32> %s4h = shufflevector <8 x i32> %byy, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %s4l = shufflevector <8 x i32> %byy, <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7> %bys = add <4 x i32> %s4h, %s4l %bz = add <4 x i32> %bxs, %bys %z = add <4 x i32> %az, %bz %z2 = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %z) ret i32 %z2 }

This is translated into the following global isel IR:

bb.1.entry: liveins: $q0, $q1, $q2, $q3 %0:_(<8 x s16>) = COPY $q0 %1:_(<8 x s16>) = COPY $q1 %2:_(<8 x s16>) = COPY $q2 %3:_(<8 x s16>) = COPY $q3 %33:_(<4 x s16>), %34:_(<4 x s16>) = G_UNMERGE_VALUES %0:_(<8 x s16>) %35:_(<4 x s32>) = G_ZEXT %33:_(<4 x s16>) %36:_(<4 x s32>) = G_ZEXT %34:_(<4 x s16>) %8:_(<4 x s32>) = G_ADD %35:_, %36:_ %37:_(<4 x s16>), %38:_(<4 x s16>) = G_UNMERGE_VALUES %1:_(<8 x s16>) %39:_(<4 x s32>) = G_ZEXT %37:_(<4 x s16>) %40:_(<4 x s32>) = G_ZEXT %38:_(<4 x s16>) %12:_(<4 x s32>) = G_ADD %39:_, %40:_ %13:_(<4 x s32>) = G_ADD %8:_, %12:_ %41:_(<4 x s16>), %42:_(<4 x s16>) = G_UNMERGE_VALUES %2:_(<8 x s16>) %43:_(<4 x s32>) = G_ZEXT %41:_(<4 x s16>) %44:_(<4 x s32>) = G_ZEXT %42:_(<4 x s16>) %17:_(<4 x s32>) = G_ADD %43:_, %44:_ %45:_(<4 x s16>), %46:_(<4 x s16>) = G_UNMERGE_VALUES %3:_(<8 x s16>) %47:_(<4 x s32>) = G_ZEXT %45:_(<4 x s16>) %48:_(<4 x s32>) = G_ZEXT %46:_(<4 x s16>) %21:_(<4 x s32>) = G_ADD %47:_, %48:_ %22:_(<4 x s32>) = G_ADD %17:_, %21:_ %23:_(<4 x s32>) = G_ADD %13:_, %22:_ %24:_(s32) = G_VECREDUCE_ADD %23:_(<4 x s32>) $w0 = COPY %24:_(s32) RET_ReallyLR implicit $w0

This corresponds to the the shufflevectors perfectly and actually shows that they simplify the unmerges as well, which is a welcome bonus.

ValentijnvdBeek · 2024-08-15T11:59:28Z

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

+    if (DstNumElts <= 2)
+      return false;


Maybe do these tests in a callback? Now, if one fails, it will exit the combiner completely rather than continuing on. For now, this is fine, but with more patterns that are close, this might be problems.

ValentijnvdBeek · 2024-08-15T12:05:25Z

llvm/test/CodeGen/AArch64/arm64-neon-copy.ll

 ; CHECK-GI-NEXT:    mov v0.b[2], v3.b[0]
 ; CHECK-GI-NEXT:    mov v0.b[3], v4.b[0]
 ; CHECK-GI-NEXT:    mov v0.b[4], v5.b[0]
 ; CHECK-GI-NEXT:    mov v0.b[5], v6.b[0]
 ; CHECK-GI-NEXT:    mov v0.b[6], v7.b[0]
 ; CHECK-GI-NEXT:    mov v0.b[7], v16.b[0]
-; CHECK-GI-NEXT:    tbl v0.16b, { v0.16b, v1.16b }, v2.16b
+; CHECK-GI-NEXT:    mov v0.d[1], v1.d[0]


The input LLVM IR is:

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────define <16 x i8> @test_concat_v16i8_v8i8_v16i8(<8 x i8> %x, <16 x i8> %y) #0 { entry: %vecext = extractelement <8 x i8> %x, i32 0 %vecinit = insertelement <16 x i8> undef, i8 %vecext, i32 0 %vecext1 = extractelement <8 x i8> %x, i32 1 %vecinit2 = insertelement <16 x i8> %vecinit, i8 %vecext1, i32 1 %vecext3 = extractelement <8 x i8> %x, i32 2 %vecinit4 = insertelement <16 x i8> %vecinit2, i8 %vecext3, i32 2 %vecext5 = extractelement <8 x i8> %x, i32 3 %vecinit6 = insertelement <16 x i8> %vecinit4, i8 %vecext5, i32 3 %vecext7 = extractelement <8 x i8> %x, i32 4 %vecinit8 = insertelement <16 x i8> %vecinit6, i8 %vecext7, i32 4 %vecext9 = extractelement <8 x i8> %x, i32 5 %vecinit10 = insertelement <16 x i8> %vecinit8, i8 %vecext9, i32 5 %vecext11 = extractelement <8 x i8> %x, i32 6 %vecinit12 = insertelement <16 x i8> %vecinit10, i8 %vecext11, i32 6 %vecext13 = extractelement <8 x i8> %x, i32 7 %vecinit14 = insertelement <16 x i8> %vecinit12, i8 %vecext13, i32 7 %vecinit30 = shufflevector <16 x i8> %vecinit14, <16 x i8> %y, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 2> ret <16 x i8> %vecinit30 }

Our outputted post-combiner IR is:

bb.1.entry: liveins: $d0, $q1 %0:_(<8 x s8>) = COPY $d0 %1:_(<16 x s8>) = COPY $q1 %3:_(s64) = G_CONSTANT i64 0 %2:_(s8) = G_EXTRACT_VECTOR_ELT %0:_(<8 x s8>), %3:_(s64) %7:_(s64) = G_CONSTANT i64 1 %6:_(s8) = G_EXTRACT_VECTOR_ELT %0:_(<8 x s8>), %7:_(s64) %10:_(s64) = G_CONSTANT i64 2 %9:_(s8) = G_EXTRACT_VECTOR_ELT %0:_(<8 x s8>), %10:_(s64) %13:_(s64) = G_CONSTANT i64 3 %12:_(s8) = G_EXTRACT_VECTOR_ELT %0:_(<8 x s8>), %13:_(s64) %16:_(s64) = G_CONSTANT i64 4 %15:_(s8) = G_EXTRACT_VECTOR_ELT %0:_(<8 x s8>), %16:_(s64) %19:_(s64) = G_CONSTANT i64 5 %18:_(s8) = G_EXTRACT_VECTOR_ELT %0:_(<8 x s8>), %19:_(s64) %22:_(s64) = G_CONSTANT i64 6 %21:_(s8) = G_EXTRACT_VECTOR_ELT %0:_(<8 x s8>), %22:_(s64) %25:_(s64) = G_CONSTANT i64 7 %24:_(s8) = G_EXTRACT_VECTOR_ELT %0:_(<8 x s8>), %25:_(s64) %34:_(<8 x s8>) = G_BUILD_VECTOR %2:_(s8), %6:_(s8), %9:_(s8), %12:_(s8), %15:_(s8), %18:_(s8), %21:_(s8), %24:_(s8) %31:_(<8 x s8>), %32:_(<8 x s8>) = G_UNMERGE_VALUES %1:_(<16 x s8>) %27:_(<16 x s8>) = G_CONCAT_VECTORS %34:_(<8 x s8>), %31:_(<8 x s8>) $q0 = COPY %27:_(<16 x s8>) RET_ReallyLR implicit $q0

The IR is alright, it correctly puts the vector together, however there is a combiner missing that takes the extractions, turns it into a shufflevector and then optimizes it out.

ValentijnvdBeek · 2024-08-15T12:06:35Z

llvm/test/CodeGen/AArch64/arm64-neon-copy.ll

+; CHECK-GI-NEXT:    // kill: def $d0 killed $d0 def $q0
 ; CHECK-GI-NEXT:    mov s2, v0.s[1]
 ; CHECK-GI-NEXT:    mov v0.s[1], v2.s[0]
-; CHECK-GI-NEXT:    ldr q2, [x8, :lo12:.LCPI135_0]
-; CHECK-GI-NEXT:    tbl v0.16b, { v0.16b, v1.16b }, v2.16b
+; CHECK-GI-NEXT:    mov v0.d[1], v1.d[0]
 ; CHECK-GI-NEXT:    ret


Same as above

ValentijnvdBeek · 2024-09-12T11:32:40Z

To explain the lateness of the reviews: ASUS managed to lose my hawkpoint laptop and have been promising that they will ship it soon (tm) for the past three weeks. I will see if I can get some progress done on my old 4 core

These generators are used to match onto shufflemask for optimizations. The idea is that each shufflemask essentially encodes a function that turns one vector into another. Generators are those functions and allow us to match shufflevectors by generating masks. Since masks are frequently very similar, this allows to define many masks in relatively few lines.

We check for iterative shift masks which corresponds to the CONCAT_VECTOR instruction.

…get size

… of vector A and B

…rs together

… of two vectors together

ValentijnvdBeek · 2024-09-23T22:37:17Z

@konstantinschwarz @gbossu friendly review bump. Hopefully the tests don't fail (I forgot where the git-clang-format hook was again) :)

Changes from the last time:

Split some changes up into different commits
Fixed a few nits (the ones recently in a nit commit as I am bettering my life)
Explained the assembly changes outside AIE2
Rebased onto current Main

It is a bit late, so I hope I haven't missed anything. Good luck today!

ValentijnvdBeek · 2024-10-22T13:23:07Z

FYI: there will be a conflict issue whenever LLVM main is merged into this branch due to CombinerHelper::tryCombineShuffleVector being split up into two parts. To incorporate it into this branch, you simply put the stream matching code in the new matchVectorMaskSequence function and it should be fine.

The relevant PR is: llvm/llvm-project#110545

@jsetoain the above PR is interesting for you since it sidesteps the need to convert into shufflevector explicitly. It will optimize sequences of inserts/extracts/buildvector automatically. You may want to cherry-pick the original version which turns into a G_SHUFFLE_VECTOR and then optimizes that rather than the current one that runs the optimizer directly.

Cheers!

konstantinschwarz · 2024-10-25T19:27:09Z

Hi @ValentijnvdBeek, I finally came back to this, sorry for the long delay.
In #224 I implemented an alternative combine for matching the G_SHUFFLE_VECTOR -> G_EXTRACT_SUBVECTOR (still G_UNMERGE_VALUES until the next upstream sync).

It uses the already available helper functions for creating contiguous shuffle masks. WDYT?

ValentijnvdBeek · 2024-10-28T22:18:14Z

Hi @ValentijnvdBeek, I finally came back to this, sorry for the long delay. In #224 I implemented an alternative combine for matching the G_SHUFFLE_VECTOR -> G_EXTRACT_SUBVECTOR (still G_UNMERGE_VALUES until the next upstream sync).

It uses the already available helper functions for creating contiguous shuffle masks. WDYT?

It is fine, I am just happy if the work doesn't totally bit rot. I like the linked code, for the most part, it takes my approach and makes it less of a tech demo (which this sort of is a bit). You do cover a part of the approach though, which admittedly the one that this implements, but there are some worries that I have about the general cost of this combiner in the future. It is pretty expensive and putting it in many different combiners does add a lot of code (and causes it to check the same values a lot). But as an approach within AIE, that is probably fine.

So as of this PR, what could be an idea is that I close it and you guys pick it over for those combiners. I will then take it back, be inspired by your approach and rewrite it with some of my crazier ideas to reduce the complexity so that I can push it upstream directly. Together with maybe #129, that way it is also out of your hair and review todo list. Is that an idea @konstantinschwarz?

ValentijnvdBeek requested a review from konstantinschwarz May 17, 2024 15:39

ValentijnvdBeek self-assigned this May 17, 2024

ValentijnvdBeek added llvm:instcombine Code that modifies the combiner llvm:core Modifies non-AIE specific code backend:aie Code that modifies AIE code labels May 17, 2024

ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from 10708e3 to 0417b1c Compare May 22, 2024 09:15

ValentijnvdBeek mentioned this pull request May 23, 2024

[AIE2] Rewrite G_CONCAT_VECTOR selection as table-gen #48

Merged

konstantinschwarz reviewed May 30, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp Outdated Show resolved Hide resolved

ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from 0417b1c to 429acc5 Compare May 31, 2024 09:59

ValentijnvdBeek force-pushed the vvandebe.shufflevector.basic branch from 2f752d1 to 3e692b4 Compare June 11, 2024 09:13

ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from 429acc5 to 72a0a03 Compare June 11, 2024 09:47

ValentijnvdBeek changed the base branch from vvandebe.shufflevector.basic to aie-public June 11, 2024 09:49

ValentijnvdBeek changed the base branch from aie-public to vvandebe.shufflevector.basic June 11, 2024 09:54

ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from 72a0a03 to 228e499 Compare June 14, 2024 12:31

ValentijnvdBeek marked this pull request as ready for review June 14, 2024 12:34

ValentijnvdBeek requested review from abhinay-anubola, abnikant, andcarminati, gbossu, khallouh, martien-de-jong, SagarMaheshwari99 and stephenneuendorffer as code owners June 14, 2024 12:34

ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from 228e499 to 0e664aa Compare June 14, 2024 12:56

ValentijnvdBeek mentioned this pull request Jun 15, 2024

[AIE2] Combiners for 8x8->8x8 and 8x4->4x8 matrix transposes #76

Open

konstantinschwarz requested changes Jun 21, 2024

View reviewed changes

ValentijnvdBeek force-pushed the vvandebe.shufflevector.basic branch from 3e692b4 to 79d7e96 Compare June 24, 2024 10:08

ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch 2 times, most recently from f872cf4 to 73a92c2 Compare June 25, 2024 09:36

ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from ea44d18 to 4c022df Compare August 7, 2024 19:04

konstantinschwarz reviewed Aug 8, 2024

View reviewed changes

llvm/lib/Target/AIE/AIECombine.td Outdated Show resolved Hide resolved

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp Outdated Show resolved Hide resolved

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp Outdated Show resolved Hide resolved

ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch 2 times, most recently from b49d34c to f855c29 Compare August 12, 2024 12:27

konstantinschwarz reviewed Aug 13, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp Show resolved Hide resolved

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp Show resolved Hide resolved

llvm/test/CodeGen/AIE/aie2/GlobalISel/prelegalizercombiner-shufflevector.mir Outdated Show resolved Hide resolved

ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from f855c29 to 4d6af83 Compare August 13, 2024 14:46

gbossu reviewed Aug 14, 2024

View reviewed changes

llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h Show resolved Hide resolved

gbossu reviewed Aug 14, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp Show resolved Hide resolved

gbossu reviewed Aug 14, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp Outdated Show resolved Hide resolved

ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from 4d6af83 to 13cc82a Compare August 15, 2024 08:52

ValentijnvdBeek commented Aug 15, 2024

View reviewed changes

llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h Show resolved Hide resolved

ValentijnvdBeek commented Aug 15, 2024

View reviewed changes

ValentijnvdBeek and others added 2 commits September 23, 2024 23:53

[AIE2] Enable G_CONCAT_VECTOR optimizations for AIE2

5a53fd6

ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from 13cc82a to c38562a Compare September 23, 2024 22:30

ValentijnvdBeek and others added 7 commits September 24, 2024 00:34

[GISel][CombinerHelper] Counter stream that counts from 0 to N

0cd7ef6

We check for iterative shift masks which corresponds to the CONCAT_VECTOR instruction.

[GISel][CombinerHelper] Use a stream to check for G_CONCAT_VECTOR

b5e22cc

We check for iterative shift masks which corresponds to the CONCAT_VECTOR instruction.

[NIT] Remove some useless newlines and add a newline for the test

d3f07f5

[GISel][CombinerHelper] Add a helper that unmerges a vector to a tar…

8d757e7

…get size

[GISel][CombinerHelper] Add matcher code for unmerging the first half…

e4b0f01

… of vector A and B

[GISel][CombinerHelper] Add a function that chains a list of generato…

38729a7

…rs together

[GISel][CombinerHelper] Add a combiner to concatenate the first halfs…

ebe6489

… of two vectors together

ValentijnvdBeek force-pushed the vvandebe.shufflevector.pattern.optimization branch from c38562a to ebe6489 Compare September 23, 2024 22:36

ValentijnvdBeek mentioned this pull request Oct 28, 2024

[GlobalISel] Match G_SHUFFLE_VECTORs representing sub-vector extracts #224

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVM] Optimize G_SHUFFLE_VECTOR into more efficient generic opcodes #41

[LLVM] Optimize G_SHUFFLE_VECTOR into more efficient generic opcodes #41

ValentijnvdBeek commented May 17, 2024 •

edited

Loading

konstantinschwarz Jun 21, 2024

konstantinschwarz Jun 21, 2024

github-actions bot commented Jun 25, 2024 •

edited

Loading

konstantinschwarz left a comment

gbossu Aug 14, 2024

ValentijnvdBeek Aug 15, 2024

ValentijnvdBeek Aug 15, 2024

ValentijnvdBeek Aug 15, 2024

ValentijnvdBeek Aug 15, 2024

ValentijnvdBeek Aug 15, 2024

ValentijnvdBeek Aug 15, 2024

ValentijnvdBeek Aug 15, 2024

ValentijnvdBeek Aug 15, 2024

ValentijnvdBeek commented Sep 12, 2024

ValentijnvdBeek commented Sep 23, 2024

ValentijnvdBeek commented Oct 22, 2024

konstantinschwarz commented Oct 25, 2024

ValentijnvdBeek commented Oct 28, 2024

[LLVM] Optimize G_SHUFFLE_VECTOR into more efficient generic opcodes #41

Are you sure you want to change the base?

[LLVM] Optimize G_SHUFFLE_VECTOR into more efficient generic opcodes #41

Conversation

ValentijnvdBeek commented May 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jun 25, 2024 • edited Loading

konstantinschwarz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ValentijnvdBeek commented Sep 12, 2024

ValentijnvdBeek commented Sep 23, 2024

ValentijnvdBeek commented Oct 22, 2024

konstantinschwarz commented Oct 25, 2024

ValentijnvdBeek commented Oct 28, 2024

ValentijnvdBeek commented May 17, 2024 •

edited

Loading

github-actions bot commented Jun 25, 2024 •

edited

Loading