[AArch64][GlobalISel] Legalization cycle with shufflevector #81244

nikic · 2024-02-09T11:05:24Z

; RUN: llc -mtriple=aarch64-- -O0 < %s
define <4 x i8> @test(<2 x i8> %arg) {
  %shuffle = shufflevector <2 x i8> %arg, <2 x i8> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
  ret <4 x i8> %shuffle
}

Results in a GlobalISel legalization cycle.

llvmbot · 2024-02-09T11:05:40Z

@llvm/issue-subscribers-backend-aarch64

Author: Nikita Popov (nikic)

```llvm ; RUN: llc -mtriple=aarch64-- -O0 < %s define <16 x i8> @test(<2 x i8> %arg) { %shuffle = shufflevector <2 x i8> %arg, <2 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison> ret <16 x i8> %shuffle } ``` Results in a GlobalISel legalization cycle.

nikic · 2024-02-09T11:21:56Z

We start out with

bb.1 (%ir-block.0):
  liveins: $d0
  %1:_(<2 x s32>) = COPY $d0
  %0:_(<2 x s8>) = G_TRUNC %1:_(<2 x s32>)
  %8:_(s8) = G_IMPLICIT_DEF
  %6:_(<2 x s8>) = G_BUILD_VECTOR %8:_(s8), %8:_(s8)
  %7:_(<4 x s8>) = G_CONCAT_VECTORS %0:_(<2 x s8>), %6:_(<2 x s8>)
  %5:_(<4 x s16>) = G_ANYEXT %7:_(<4 x s8>)
  $d0 = COPY %5:_(<4 x s16>)
  RET_ReallyLR implicit $d0

Then after one iteration we get:

bb.1 (%ir-block.0):
  liveins: $d0
  %1:_(<2 x s32>) = COPY $d0
  %9:_(s32), %10:_(s32) = G_UNMERGE_VALUES %1:_(<2 x s32>)
  %11:_(s16) = G_TRUNC %9:_(s32)
  %12:_(s16) = G_TRUNC %10:_(s32)
  %13:_(<2 x s16>) = G_BUILD_VECTOR %11:_(s16), %12:_(s16)
  %0:_(<2 x s8>) = G_TRUNC %13:_(<2 x s16>)
  %15:_(s32) = G_IMPLICIT_DEF
  %14:_(s32) = COPY %15:_(s32)
  %16:_(<2 x s32>) = G_BUILD_VECTOR %14:_(s32), %15:_(s32)
  %6:_(<2 x s8>) = G_TRUNC %16:_(<2 x s32>)
  %7:_(<4 x s8>) = G_CONCAT_VECTORS %0:_(<2 x s8>), %6:_(<2 x s8>)
  %5:_(<4 x s16>) = G_ANYEXT %7:_(<4 x s8>)
  $d0 = COPY %5:_(<4 x s16>)
  RET_ReallyLR implicit $d0

After two we get:

bb.1 (%ir-block.0):
  liveins: $d0
  %1:_(<2 x s32>) = COPY $d0
  %0:_(<2 x s8>) = G_TRUNC %1:_(<2 x s32>)
  %15:_(s32) = G_IMPLICIT_DEF
  %14:_(s32) = COPY %15:_(s32)
  %22:_(s16) = G_TRUNC %14:_(s32)
  %23:_(s16) = G_TRUNC %15:_(s32)
  %24:_(<2 x s16>) = G_BUILD_VECTOR %22:_(s16), %23:_(s16)
  %6:_(<2 x s8>) = G_TRUNC %24:_(<2 x s16>)
  %7:_(<4 x s8>) = G_CONCAT_VECTORS %0:_(<2 x s8>), %6:_(<2 x s8>)
  %5:_(<4 x s16>) = G_ANYEXT %7:_(<4 x s8>)
  $d0 = COPY %5:_(<4 x s16>)
  RET_ReallyLR implicit $d0

And then it repeats.

tschuett · 2024-02-09T14:04:47Z

llc -march=aarch64 -global-isel -stop-after=irtranslator foo.ll -o foo.mir:

 bb.1 (%ir-block.0):
    liveins: $d0

    %1:_(<2 x s32>) = COPY $d0
    %0:_(<2 x s8>) = G_TRUNC %1(<2 x s32>)
    %4:_(s8) = G_CONSTANT i8 0
    %3:_(<2 x s8>) = G_BUILD_VECTOR %4(s8), %4(s8)
    %2:_(<4 x s8>) = G_SHUFFLE_VECTOR %0(<2 x s8>), %3, shufflemask(0, 1, undef, undef)
    %5:_(<4 x s16>) = G_ANYEXT %2(<4 x s8>)
    $d0 = COPY %5(<4 x s16>)
    RET_ReallyLR implicit $d0

llc -march=aarch64 -run-pass=aarch64-prelegalizer-combiner foo.mir -o foo2.mir

  liveins: $d0

    %1:_(<2 x s32>) = COPY $d0
    %0:_(<2 x s8>) = G_TRUNC %1(<2 x s32>)
    %6:_(<2 x s8>) = G_IMPLICIT_DEF
    %7:_(<4 x s8>) = G_CONCAT_VECTORS %0(<2 x s8>), %6(<2 x s8>)
    %5:_(<4 x s16>) = G_ANYEXT %7(<4 x s8>)
    $d0 = COPY %5(<4 x s16>)
    RET_ReallyLR implicit $d0

tschuett · 2024-02-09T14:15:14Z

The G_CONCAT_VECTORS is illegal. There are no attempts to make illegal concat vectors legal:

llvm-project/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

Line 956 in a9e546c

getActionDefinitionsBuilder(G_CONCAT_VECTORS)

The G_IMPLICIT_DEF is illegal.

llvm-project/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

Line 90 in a9e546c

{G_IMPLICIT_DEF, G_FREEZE, G_CONSTANT_FOLD_BARRIER})

G_CONCAT_VECTORS and G_IMPLICIT_DEF would need a .moreElementsToNextPow2(0);.

tschuett · 2024-02-09T18:28:42Z

It seems to be a bigger issue. I tried .moreElementsToNextPow2(0);., but it is still insufficient.

nikic · 2024-02-09T19:44:03Z

I spent way too much time today trying to figure this out, but have a really hard time wrapping my head around GlobalISel legalization.

I think the issue could be related to three things:

Trying to legalize trunc from <2 x s32> to <2 x s8> by splitting it into trunc <2 x s32> to <2 x s16> and then trunc <2 x s16> to <2 x s8>
Ability to combine such trunc pairs back to the original trunc.
Scalarization of trunc into scalar trunc and build_vector, where the build_vector then gets legalized back into ... a trunc.

cc @arsenm

tschuett · 2024-02-09T19:56:39Z

%7:_(<4 x s8>) = G_CONCAT_VECTORS %0(<2 x s8>), %6(<2 x s8>)

must become

%7:_(<4 x s32>) = G_CONCAT_VECTORS %0(<2 x s32>), %6(<2 x s32>)

Note the change from s8 to s32. The legalizer has no means to perform this change.
Then

%6:_(<2 x s8>) = G_IMPLICIT_DEF

must become

%6:_(<2 x s32>) = G_IMPLICIT_DEF

which is legal.

tschuett · 2024-02-09T19:59:10Z

Maybe a .widenScalarOrEltToNextPow2(0) for G_CONCAT_VECTOR could improve the situation.

If we have something like G_TRUNC from v2s32 to v2s16, then lowering this to a concat of two G_TRUNC s32 to s16 followed by G_TRUNC from v2s16 to v2s8 does not bring us any closer to legality. In fact, the first part of that is a G_BUILD_VECTOR whose legalization will produce a new G_TRUNC from v2s32 to v2s16, and both G_TRUNCs will then get combined to the original, causing a legalization cycle. Make the lowering condition more precise, by requiring that the original vector is >128 bits, which is I believe the only case where this specific splitting approach is useful. Note that this doesn't actually produce a legal result (the alwaysLegal is a lie, as before), but it will cause a proper globalisel abort instead of an infinite legalization loop. Fixes #81244.

nikic · 2024-02-13T08:34:09Z

/cherry-pick 070848c

If we have something like G_TRUNC from v2s32 to v2s16, then lowering this to a concat of two G_TRUNC s32 to s16 followed by G_TRUNC from v2s16 to v2s8 does not bring us any closer to legality. In fact, the first part of that is a G_BUILD_VECTOR whose legalization will produce a new G_TRUNC from v2s32 to v2s16, and both G_TRUNCs will then get combined to the original, causing a legalization cycle. Make the lowering condition more precise, by requiring that the original vector is >128 bits, which is I believe the only case where this specific splitting approach is useful. Note that this doesn't actually produce a legal result (the alwaysLegal is a lie, as before), but it will cause a proper globalisel abort instead of an infinite legalization loop. Fixes llvm#81244. (cherry picked from commit 070848c)

llvmbot · 2024-02-13T08:39:06Z

/pull-request #81581

If we have something like G_TRUNC from v2s32 to v2s16, then lowering this to a concat of two G_TRUNC s32 to s16 followed by G_TRUNC from v2s16 to v2s8 does not bring us any closer to legality. In fact, the first part of that is a G_BUILD_VECTOR whose legalization will produce a new G_TRUNC from v2s32 to v2s16, and both G_TRUNCs will then get combined to the original, causing a legalization cycle. Make the lowering condition more precise, by requiring that the original vector is >128 bits, which is I believe the only case where this specific splitting approach is useful. Note that this doesn't actually produce a legal result (the alwaysLegal is a lie, as before), but it will cause a proper globalisel abort instead of an infinite legalization loop. Fixes llvm#81244. (cherry picked from commit 070848c)

nikic added backend:AArch64 llvm:globalisel labels Feb 9, 2024

nikic added this to the LLVM 18.X Release milestone Feb 9, 2024

github-project-automation bot added this to LLVM Release Status Feb 9, 2024

nikic moved this to Needs Fix in LLVM Release Status Feb 9, 2024

nikic mentioned this issue Feb 9, 2024

Update to LLVM 18 rust-lang/rust#120055

Merged

nikic mentioned this issue Feb 12, 2024

[AArch64][GISel] Don't pointlessly lower G_TRUNC #81479

Merged

nikic closed this as completed in #81479 Feb 13, 2024

github-project-automation bot moved this from Needs Fix to Done in LLVM Release Status Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64][GlobalISel] Legalization cycle with shufflevector #81244

[AArch64][GlobalISel] Legalization cycle with shufflevector #81244

nikic commented Feb 9, 2024 •

edited

Loading

llvmbot commented Feb 9, 2024

nikic commented Feb 9, 2024

tschuett commented Feb 9, 2024

tschuett commented Feb 9, 2024 •

edited

Loading

tschuett commented Feb 9, 2024

nikic commented Feb 9, 2024

tschuett commented Feb 9, 2024

tschuett commented Feb 9, 2024 •

edited

Loading

nikic commented Feb 13, 2024

llvmbot commented Feb 13, 2024

[AArch64][GlobalISel] Legalization cycle with shufflevector #81244

[AArch64][GlobalISel] Legalization cycle with shufflevector #81244

Comments

nikic commented Feb 9, 2024 • edited Loading

llvmbot commented Feb 9, 2024

nikic commented Feb 9, 2024

tschuett commented Feb 9, 2024

tschuett commented Feb 9, 2024 • edited Loading

tschuett commented Feb 9, 2024

nikic commented Feb 9, 2024

tschuett commented Feb 9, 2024

tschuett commented Feb 9, 2024 • edited Loading

nikic commented Feb 13, 2024

llvmbot commented Feb 13, 2024

nikic commented Feb 9, 2024 •

edited

Loading

tschuett commented Feb 9, 2024 •

edited

Loading

tschuett commented Feb 9, 2024 •

edited

Loading