-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Recurse on second operand of two operand shuffles #79197
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -238,26 +238,44 @@ define <64 x half> @interleave_v32f16(<32 x half> %x, <32 x half> %y) { | |
define <64 x float> @interleave_v32f32(<32 x float> %x, <32 x float> %y) { | ||
; V128-LABEL: interleave_v32f32: | ||
; V128: # %bb.0: | ||
; V128-NEXT: addi sp, sp, -16 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was the diff I was originally concerned by, but thinking about it further, I think this is fine. All of the increases in register pressure happen in cases where we're legalizing a shuffle which is more than m8. Such shuffles aren't going to be real common in auto-vectorized code. |
||
; V128-NEXT: .cfi_def_cfa_offset 16 | ||
; V128-NEXT: csrr a0, vlenb | ||
; V128-NEXT: slli a0, a0, 3 | ||
; V128-NEXT: sub sp, sp, a0 | ||
; V128-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 8 * vlenb | ||
; V128-NEXT: vmv8r.v v0, v16 | ||
; V128-NEXT: addi a0, sp, 16 | ||
; V128-NEXT: vs8r.v v16, (a0) # Unknown-size Folded Spill | ||
; V128-NEXT: vmv8r.v v16, v8 | ||
; V128-NEXT: vsetivli zero, 16, e32, m8, ta, ma | ||
; V128-NEXT: vslidedown.vi v0, v8, 16 | ||
; V128-NEXT: vslidedown.vi v8, v0, 16 | ||
; V128-NEXT: vsetivli zero, 16, e32, m4, ta, ma | ||
; V128-NEXT: vwaddu.vv v24, v0, v8 | ||
; V128-NEXT: li a0, -1 | ||
; V128-NEXT: vwmaccu.vx v24, a0, v8 | ||
; V128-NEXT: lui a1, %hi(.LCPI10_0) | ||
; V128-NEXT: addi a1, a1, %lo(.LCPI10_0) | ||
; V128-NEXT: li a2, 32 | ||
; V128-NEXT: vsetvli zero, a2, e32, m8, ta, mu | ||
; V128-NEXT: vle16.v v12, (a1) | ||
; V128-NEXT: vsetivli zero, 16, e32, m8, ta, ma | ||
; V128-NEXT: vslidedown.vi v0, v16, 16 | ||
; V128-NEXT: vsetivli zero, 16, e32, m4, ta, ma | ||
; V128-NEXT: vwaddu.vv v8, v0, v16 | ||
; V128-NEXT: vwmaccu.vx v8, a0, v16 | ||
; V128-NEXT: lui a1, 699051 | ||
; V128-NEXT: addi a1, a1, -1366 | ||
; V128-NEXT: li a2, 32 | ||
; V128-NEXT: vmv.s.x v0, a1 | ||
; V128-NEXT: vrgatherei16.vv v24, v16, v12, v0.t | ||
; V128-NEXT: vsetvli zero, a2, e32, m8, ta, ma | ||
; V128-NEXT: vmerge.vvm v24, v8, v24, v0 | ||
; V128-NEXT: vsetivli zero, 16, e32, m4, ta, ma | ||
; V128-NEXT: vwaddu.vv v0, v8, v16 | ||
; V128-NEXT: vwmaccu.vx v0, a0, v16 | ||
; V128-NEXT: addi a1, sp, 16 | ||
; V128-NEXT: vl8r.v v8, (a1) # Unknown-size Folded Reload | ||
; V128-NEXT: vwaddu.vv v0, v16, v8 | ||
; V128-NEXT: vwmaccu.vx v0, a0, v8 | ||
; V128-NEXT: vmv8r.v v8, v0 | ||
; V128-NEXT: vmv8r.v v16, v24 | ||
; V128-NEXT: csrr a0, vlenb | ||
; V128-NEXT: slli a0, a0, 3 | ||
; V128-NEXT: add sp, sp, a0 | ||
; V128-NEXT: addi sp, sp, 16 | ||
; V128-NEXT: ret | ||
; | ||
; V512-LABEL: interleave_v32f32: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An aside, do you have a plan for handling cases where one shuffle is an identity op? e.g.
After this patch we get
But we can do this in one slidedown:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the point of this patch series was to better handle identity permutes. :)
Your example looks less like a case where we fail to realize one side is a identity permute, and more like we fail to realize that the vmerge can be folded back into the LHS in this case.
One simple thing we could do - extend the swap logic to prefer the identity on the LHS. This feels more like hiding the real issue than a deep fix, but it wouldn't be hard to do.
Can you file a bug for this? I don't plan to look at it immediately, but it'd be good not to forget it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my thinking as well. We could do this as a generic MI transform on PseudoVMERGE in RISCVFoldMasks.cpp when the mask is emulating a VL tail truncation, e.g:
i.e. a mask where the MSBs emulate taking the tail from the %true operand.
I don't think we'll generate many vmerges with this type of mask outside of this shuffle lowering, but this is one way of doing it without SelectionDAG.