-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LV] Ignore some costs when loop gets fully unrolled #106699
[LV] Ignore some costs when loop gets fully unrolled #106699
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Igor Kirillov (igogo-x86) ChangesWhen fixed-width VF equals the number of iterations, comparison instruction and induction operation will be DCEed later. Ignoring the costs of these instructions improves the cost model. I have one benchmark that significantly improves when vectorised with fixed VF instead of scalable on AArch64 due to full unrolling. But I am not sure how to ignore the costs properly. This patch, as it is, makes new and legacy cost models out of sync and compiler fails with assertion error on this line:
Identifying variables to ignore in Is the approach shown in the patch ok, and can I wait for the legacy cost model to be removed? If not, how can it be done differently? Full diff: https://github.com/llvm/llvm-project/pull/106699.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 6babfd1eee9108..7dc9a60cb0f034 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7112,6 +7112,26 @@ LoopVectorizationPlanner::precomputeCosts(VPlan &Plan, ElementCount VF,
continue;
IVInsts.push_back(CI);
}
+
+ // If the given VF loop gets fully unrolled, ignore the costs of comparison
+ // and increment instruction, as they'll get simplified away
+ auto TC = CM.PSE.getSE()->getSmallConstantTripCount(OrigLoop);
+ auto *Cmp = OrigLoop->getLatchCmpInst();
+ if (Cmp && VF.isFixed() && VF.getFixedValue() == TC) {
+ CostCtx.SkipCostComputation.insert(Cmp);
+ for (Instruction *IVInst : IVInsts) {
+ bool IsSimplifiedAway = true;
+ for (auto *UIV : IVInst->users()) {
+ if (!Legal->isInductionVariable(UIV) && UIV != Cmp) {
+ IsSimplifiedAway = false;
+ break;
+ }
+ }
+ if (IsSimplifiedAway)
+ CostCtx.SkipCostComputation.insert(IVInst);
+ }
+ }
+
for (Instruction *IVInst : IVInsts) {
if (CostCtx.skipCostComputation(IVInst, VF.isVector()))
continue;
|
Ping |
IIUC this should catch cases where the vector loop only executes a single iteration, right? Ideally we would first remove the redundant compare-and-branch and induction recipes, and then rely on the VPlan-based cost model to consider this difference. For the former, I just shared #108378 to do that. For the latter, I think we first need to change the cost of the exit condition to be computed by the VPlan-based cost model, instead of pre-computing it. As you mentioned, this is leading to divergences between legacy and VPlan-based cost model. Ideally we would first migrate computing costs for all recipes to the VPlan-based cost model, ensuring the don't regress vs the legacy cost model, which is still in the works. |
As far as I understand,
Is |
9a5cbbc
to
5bff837
Compare
@fhahn I updated the patch to synchronise both models. Also, ping on my previous questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea is good, i.e. teach the vectoriser about exit costs when the branch and compare will be removed. However, I think you've assumed that the induction variables have no other uses and so have almost certainly underestimated the costs in some loops. Also, I noticed that this patch doesn't remove the branch cost either. It's possible that removing the branch and comparison costs are enough to get what you want? It might be easier than analysing the users of all the incremented induction variables.
SmallPtrSet<const Value *, 16> ValuesToIgnoreForVF; | ||
auto TC = PSE.getSE()->getSmallConstantTripCount(TheLoop); | ||
auto *Cmp = TheLoop->getLatchCmpInst(); | ||
if (Cmp && TC == VF.getKnownMinValue()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment you're also including scalable VFs here and I think the only way we can remove the compare, branch and increment for a scalable VF is if:
- We are not tail-folding. Have you checked that we still remove the branch and compare for scalable VFs?
- We are tail-folding. In this case you can actually check if TC <= VF.getKnownMinValue(). If you still want to enable this optimisation for tail-folding too then it makes sense to try <= instead for fixed-width and scalable VFs and see if we also remove the branch and compare.
It might be worth pulling out some of the common code in both expectedCost
and precomputeCosts
to move into a static function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out scalable VF isn't that simple. Branch, next-iteration-induction, and comparison are simplified but only with tail-folding. So, I assume we have to keep only fixed-width FV, as I don't see how I can check the type of prediction here. Branch cost is zero, so I don't do anything about it.
5bff837
to
a9b9cad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing previous comments @igogo-x86! I have a few more comments - I'm most worried about the changes to the RISCV tail-folding tests.
entry: | ||
br label %for.body | ||
|
||
for.cond.cleanup: ; preds = %for.body |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename the exit block name here to something more readable? For example, for.cond.cleanup -> ext and it would be nice to move this after the loop, i.e.
entry:
br label %for.body
for.body:
...
exit:
...
Also, I think in the exit block you should be able to write:
exit:
ret i64 %add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it would be good to pre-commit the test or move it into it's own commit within this branch. That way we can see the effect your code changes have on the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll pre-commit them once I'll be more sure that the patch will go through :)
; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[SRC:%.*]], i64 [[TMP7]] | ||
; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, ptr [[TMP8]], i32 0 | ||
; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <vscale x 4 x i8> @llvm.masked.load.nxv4i8.p0(ptr [[TMP9]], i32 1, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i8> poison) | ||
; CHECK-NEXT: [[TMP10:%.*]] = shl <vscale x 4 x i8> [[WIDE_MASKED_LOAD]], shufflevector (<vscale x 4 x i8> insertelement (<vscale x 4 x i8> poison, i8 1, i64 0), <vscale x 4 x i8> poison, <vscale x 4 x i32> zeroinitializer) | ||
; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <8 x i8> @llvm.masked.load.v8i8.p0(ptr [[TMP9]], i32 1, <8 x i1> [[ACTIVE_LANE_MASK]], <8 x i8> poison) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't have expected this test to be changed by your patch. I think your code changes need an extra check that we're not tail-folding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's also bothering me. But where I'm adding this code, I can't see how to understand if scalable vectorisation is available and whether it would be tail-folded or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand,
optimizeForVFAndUF
runs after the cost model is calculated and taken into account to choose the best vectorisation factor:VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
Is
optimizeForVFAndUF
eventually going to be run for all VFs before the cost model decision? If so, now VPlan implies all possible VF. So, will it generate new VPLans ifoptimizeForVFAndUF
changes something?
Eventually yes, eventually there should be no need to special case this. VPlans are valid for a range, that could be split. I guess for now doing it as workaround in precomputeCosts
seems OK, even though it is not really intended for that (ideally we only remove stuff from there at this point). Could you add a comment to make sure this is not seen as encouragement to add more code that's not directly related to the transition to the VPlan-based cost model?
@@ -7224,6 +7259,14 @@ LoopVectorizationPlanner::precomputeCosts(VPlan &Plan, ElementCount VF, | |||
continue; | |||
IVInsts.push_back(CI); | |||
} | |||
|
|||
// If with the given VF loop gets fully unrolled, ignore the costs of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If with the given VF loop gets fully unrolled, ignore the costs of | |
// If the vector loop gets executed exactly once with the given VF .... |
if (!Cmp) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the early return here, IV's could still be simplified?
Would be good to have a test case for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See test_two_ivs
test
for (const auto &[IV, IndDesc] : IL) { | ||
// Get next iteration value of the induction variable | ||
Instruction *IVInst = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we completely ignore any IV and update instruction, if the start is a constant? All uses of the update will be replaced by a constant for fixed VFs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which instructions could that be? We have only PHINodes in the IV chain unprocessed, but they already have zero cost
llvm/test/Transforms/LoopVectorize/AArch64/fully-unrolled-cost.ll
Outdated
Show resolved
Hide resolved
llvm/test/Transforms/LoopVectorize/AArch64/fully-unrolled-cost.ll
Outdated
Show resolved
Hide resolved
// Check that this value used only to exit the loop | ||
for (auto *UIV : IVInst->users()) { | ||
if (UIV != IV && UIV != Cmp) { | ||
IsSimplifiedAway = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to have a test case for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See test_external_iv_user
test
a9b9cad
to
fbb939a
Compare
fbb939a
to
166f376
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but can you address the nit for the tests before landing? Also, I think it would be good to change the commit message to reflect what you've done in the patch.
; This test shows that comparison and next iteration IV have zero cost if the | ||
; vector loop gets executed exactly once with the given VF. | ||
define i64 @test(ptr %a, ptr %b) #0 { | ||
; CHECK: LV: Checking a loop in 'test' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first CHECK line in every function I think it's worth changing to CHECK-LABEL
, since this divides the CHECK lines into blocks per function. It also makes error reporting more accurate when something does go wrong.
// away. | ||
SmallPtrSet<Instruction *, 2> ValuesToIgnoreForVF; | ||
auto TC = PSE.getSE()->getSmallConstantTripCount(TheLoop); | ||
if (VF.isFixed() && TC == VF.getFixedValue() && !foldTailByMasking()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If TC == VF, no tail should remain, so we shouldn't try to fold the tail by masking. Could you make this an assert instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course you're right! There is code in computeMaxVF that disables tail-folding. In that case an assert does make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fhahn Well, if I make it as assert, these tests fail:
LLVM :: Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
LLVM :: Transforms/LoopVectorize/RISCV/low-trip-count.ll
LLVM :: Transforms/LoopVectorize/RISCV/short-trip-count.ll
LLVM :: Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-cost.ll
Without assert two RISCV tests show different output (that's why I added this check in the first place)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because in computeMaxVF
the variable MaxPowerOf2RuntimeVF
is bigger than the trip count and so the remainder of MaxPowerOf2RuntimeVF/TC
is non-zero. Basically we only disable tail-folding if we can prove for every single possible VF there will not be a tail. So I believe @igogo-x86 is right and we need an explicit check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, suppose TC=8 and the max VF=16. In this case we could choose a VF of 16 and there will be a remainder, but the cost model may prefer VF=8 where there isn't a remainder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking, that makes sense.
166f376
to
b494e64
Compare
SmallPtrSet<Instruction *, 2> ValuesToIgnoreForVF; | ||
auto TC = PSE.getSE()->getSmallConstantTripCount(TheLoop); | ||
if (VF.isFixed() && TC == VF.getFixedValue()) { | ||
assert(!foldTailByMasking()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this intentionally added in the latest version? IIUC this is causing crashes as per the earlier comment thread (which seems to have been dropped by Github here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, no. I was checking the behaviour and forgot to undo the changes
b494e64
to
86b9f1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just about to accept this @igogo-x86, but I suddenly spotted something. Sorry I didn't see this until now, although it's easily fixed. :(
Loop *L, const LoopVectorizationLegality::InductionList &IL, | ||
SmallPtrSetImpl<Instruction *> &InstsToIgnore) { | ||
auto *Cmp = L->getLatchCmpInst(); | ||
if (Cmp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I just thought of something. What if there is an additional use of Cmp
outside the loop? Should we also be checking that there is only a single use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is the only latch instruction, its value would be simplified to false
when the control flow leaves the loop. So, we shouldn't worry about that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may be right, but have you tried this out with real IR? There would be more than one use of the cmp so I wasn't sure if InstCombine would apply the fold in this case. In any case, adding an extra one use check wouldn't affect the loops you care about I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean getLatchCmpInst
ensures there is only a single user? Would be good to make sure we have a test (and maybe assert that there's a single user)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't check the number of users but checks that there's only one latch in the loop. The latch is something that makes a loop to be a loop. If there is a single latch and we don't jump to the loop header, then we are exiting the loop, and the value of the condition is constant.
See, it simplifies this condition outside away (pre LV):
https://godbolt.org/z/s9q1558zY
And if the compiler couldn't do it, we should've better taught him to do it rather than adding unnecessary checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking about something else @igogo-x86:
define i1 @foo(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src) {
entry:
br label %for.body
for.body:
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%arrayidx = getelementptr inbounds nuw i32, ptr %src, i64 %indvars.iv
%0 = load i32, ptr %arrayidx, align 4
%arrayidx2 = getelementptr inbounds nuw i32, ptr %dst, i64 %indvars.iv
%1 = load i32, ptr %arrayidx2, align 4
%add = add nsw i32 %1, %0
store i32 %add, ptr %arrayidx2, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond.not = icmp eq i64 %indvars.iv.next, 16
br i1 %exitcond.not, label %exit, label %for.body
exit:
ret i1 %exitcond.not
}
This is a case where in theory the final comparison is used after the loop. That's what I was hoping you could try out. Anyway, I ran this with the following command:
opt -mcpu=neoverse-v1 -p loop-vectorize,loop-unroll -force-vector-width=4 -force-vector-interleave=1 -S -debug-only=loop-vectorize < foo.ll
and it looks like we do indeed unroll the loop and the comparison disappears. Similary we unroll this and the comparison disappears:
define i1 @foo(ptr nocapture noundef %dst, ptr nocapture noundef %p, ptr nocapture noundef readonly %src) {
entry:
br label %for.body
for.body:
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%arrayidx = getelementptr inbounds nuw i32, ptr %src, i64 %indvars.iv
%0 = load i32, ptr %arrayidx, align 4
%arrayidx2 = getelementptr inbounds nuw i32, ptr %dst, i64 %indvars.iv
%1 = load i32, ptr %arrayidx2, align 4
%add = add nsw i32 %1, %0
store i32 %add, ptr %arrayidx2, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond.not = icmp eq i64 %indvars.iv.next, 16
%arrayidx3 = getelementptr inbounds nuw i1, ptr %p, i64 %indvars.iv
store i1 %exitcond.not, ptr %arrayidx3, align 4
br i1 %exitcond.not, label %exit, label %for.body
exit:
ret i1 false
}
In this case I'm happy that the comparison disappears, although I don't think an extra use check would have done any harm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to add a test case ,if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Loop *L, const LoopVectorizationLegality::InductionList &IL, | ||
SmallPtrSetImpl<Instruction *> &InstsToIgnore) { | ||
auto *Cmp = L->getLatchCmpInst(); | ||
if (Cmp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking about something else @igogo-x86:
define i1 @foo(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src) {
entry:
br label %for.body
for.body:
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%arrayidx = getelementptr inbounds nuw i32, ptr %src, i64 %indvars.iv
%0 = load i32, ptr %arrayidx, align 4
%arrayidx2 = getelementptr inbounds nuw i32, ptr %dst, i64 %indvars.iv
%1 = load i32, ptr %arrayidx2, align 4
%add = add nsw i32 %1, %0
store i32 %add, ptr %arrayidx2, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond.not = icmp eq i64 %indvars.iv.next, 16
br i1 %exitcond.not, label %exit, label %for.body
exit:
ret i1 %exitcond.not
}
This is a case where in theory the final comparison is used after the loop. That's what I was hoping you could try out. Anyway, I ran this with the following command:
opt -mcpu=neoverse-v1 -p loop-vectorize,loop-unroll -force-vector-width=4 -force-vector-interleave=1 -S -debug-only=loop-vectorize < foo.ll
and it looks like we do indeed unroll the loop and the comparison disappears. Similary we unroll this and the comparison disappears:
define i1 @foo(ptr nocapture noundef %dst, ptr nocapture noundef %p, ptr nocapture noundef readonly %src) {
entry:
br label %for.body
for.body:
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%arrayidx = getelementptr inbounds nuw i32, ptr %src, i64 %indvars.iv
%0 = load i32, ptr %arrayidx, align 4
%arrayidx2 = getelementptr inbounds nuw i32, ptr %dst, i64 %indvars.iv
%1 = load i32, ptr %arrayidx2, align 4
%add = add nsw i32 %1, %0
store i32 %add, ptr %arrayidx2, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond.not = icmp eq i64 %indvars.iv.next, 16
%arrayidx3 = getelementptr inbounds nuw i1, ptr %p, i64 %indvars.iv
store i1 %exitcond.not, ptr %arrayidx3, align 4
br i1 %exitcond.not, label %exit, label %for.body
exit:
ret i1 false
}
In this case I'm happy that the comparison disappears, although I don't think an extra use check would have done any harm.
86b9f1e
to
a380b34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks. would be good to add @david-arm 's test case
Loop *L, const LoopVectorizationLegality::InductionList &IL, | ||
SmallPtrSetImpl<Instruction *> &InstsToIgnore) { | ||
auto *Cmp = L->getLatchCmpInst(); | ||
if (Cmp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to add a test case ,if possible
When VF has a fixed width and equals the number of iterations, and we are not tail folding by masking, comparison instruction and induction operation will be DCEed later. Ignoring the costs of these instructions improves the cost model.
Add AArch64 test
* Fixing comments * Adding more tests * Remove cmp latch presence requirements
a380b34
to
7417e08
Compare
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/162/builds/12190 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/78/builds/11135 Here is the relevant piece of the build log for the reference
|
I've fixed a warning from this PR with: 9099d69 |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/138/builds/7515 Here is the relevant piece of the build log for the reference
|
Instruction *IVInst = | ||
cast<Instruction>(IV->getIncomingValueForBlock(L->getLoopLatch())); | ||
if (all_of(IVInst->users(), | ||
[&](const User *U) { return U == IV || U == Cmp; })) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://lab.llvm.org/buildbot/#/builders/78/builds/11135:
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian/libc-x86_64-debian-dbg-runtimes-build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2699:49: error: reference to local binding 'IV' declared in enclosing function 'addFullyUnrolledInstructionsToIgnore'
[&](const User *U) { return U == IV || U == Cmp; }))
^
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian/libc-x86_64-debian-dbg-runtimes-build/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2694:21: note: 'IV' declared here
for (const auto &[IV, IndDesc] : IL) {
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah 9099d69 thanks @kazutakahirata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed the warning with: 9099d69
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/14076 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/14747 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/10362 Here is the relevant piece of the build log for the reference
|
When VF has a fixed width and equals the number of iterations, and we are not tail folding by masking, comparison instruction and induction operation will be DCEed later. Ignoring the costs of these instructions improves the cost model.
When fixed-width VF equals the number of iterations, comparison instruction and induction operation will be DCEed later. Ignoring the costs of these instructions improves the cost model.
I have one benchmark that significantly improves when vectorised with fixed VF instead of scalable on AArch64 due to full unrolling. But I am not sure how to ignore the costs properly. This patch, as it is, makes new and legacy cost models out of sync and compiler fails with assertion error on this line:
Identifying variables to ignore in
LoopVectorizationCostModel::getInstructionCost
is much more challenging.Is the approach shown in the patch ok, and can I wait for the legacy cost model to be removed? If not, how can it be done differently?