-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VPlan] Compute induction end values in VPlan. #112145
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
Model updating IV users directly in VPlan, replace fixupIVUsers. Depends on llvm#110004, llvm#109975 and llvm#112145.
3b414ba
to
90c7f53
Compare
944283e
to
8001f75
Compare
8001f75
to
087668b
Compare
Allow setting the name to use for the generated IR value of the derived IV in preparations for #112145. This is analogous to VPInstruction::Name.
087668b
to
e9ad7c2
Compare
Use createDerivedIV to compute IV end values directly in VPlan, instead of creating them up-front. This allows updating IV users outside the loop as follow-up. Depends on llvm#110004 and llvm#109975.
e9ad7c2
to
e58cb96
Compare
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-backend-systemz Author: Florian Hahn (fhahn) ChangesUse createDerivedIV to compute IV end values directly in VPlan, instead This allows updating IV users outside the loop as follow-up. Depends on #110004 and Patch is 203.94 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/112145.diff 54 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index fbcf181a45a664..0298ab523307db 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -233,8 +233,8 @@ class VPBuilder {
VPDerivedIVRecipe *createDerivedIV(InductionDescriptor::InductionKind Kind,
FPMathOperator *FPBinOp, VPValue *Start,
- VPCanonicalIVPHIRecipe *CanonicalIV,
- VPValue *Step, const Twine &Name = "") {
+ VPValue *CanonicalIV, VPValue *Step,
+ const Twine &Name = "") {
return tryInsertInstruction(
new VPDerivedIVRecipe(Kind, FPBinOp, Start, CanonicalIV, Step, Name));
}
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 4a4553e4a8db8d..1ee596502f9d44 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -513,21 +513,14 @@ class InnerLoopVectorizer {
/// Fix the non-induction PHIs in \p Plan.
void fixNonInductionPHIs(VPTransformState &State);
- /// Create a ResumePHI VPInstruction for the induction \p InductionPhiIRI to
- /// resume iteration count in the scalar epilogue from where the vectorized
- /// loop left off, and add it to the scalar preheader of VPlan. Also creates
- /// the induction resume value, and the value for the bypass block, if needed.
- /// \p Step is the SCEV-expanded induction step to use. In cases where the
- /// loop skeleton is more complicated (i.e., epilogue vectorization) and the
- /// resume values can come from an additional bypass block,
- /// \p MainVectorTripCount provides the trip count of the main vector loop,
- /// used to compute the resume value reaching the scalar loop preheader
- /// directly from this additional bypass block.
- void createInductionResumeVPValue(VPIRInstruction *InductionPhiIRI,
- const InductionDescriptor &ID, Value *Step,
- ArrayRef<BasicBlock *> BypassBlocks,
- VPBuilder &ScalarPHBuilder,
- Value *MainVectorTripCount = nullptr);
+ /// Create the bypass resume value coming from the additional bypass block. \p
+ /// Step is the SCEV-expanded induction step to use. \p MainVectorTripCount
+ /// provides the trip count of the main vector loop, used to compute the
+ /// resume value reaching the scalar loop preheader directly from this
+ /// additional bypass block.
+ void createInductionBypassValue(PHINode *OrigPhi,
+ const InductionDescriptor &ID, Value *Step,
+ Value *MainVectorTripCount);
/// Returns the original loop trip count.
Value *getTripCount() const { return TripCount; }
@@ -584,17 +577,10 @@ class InnerLoopVectorizer {
/// vector loop preheader, middle block and scalar preheader.
void createVectorLoopSkeleton(StringRef Prefix);
- /// Create new phi nodes for the induction variables to resume iteration count
- /// in the scalar epilogue, from where the vectorized loop left off.
- /// In cases where the loop skeleton is more complicated (i.e. epilogue
- /// vectorization), \p MainVectorTripCount provides the trip count of the main
- /// loop, used to compute these resume values. If \p IVSubset is provided, it
- /// contains the phi nodes for which resume values are needed, because they
- /// will generate wide induction phis in the epilogue loop.
- void
- createInductionResumeVPValues(const SCEV2ValueTy &ExpandedSCEVs,
- Value *MainVectorTripCount = nullptr,
- SmallPtrSetImpl<PHINode *> *IVSubset = nullptr);
+ /// Create values for the induction variables to resume iteration count
+ /// in the bypass block.
+ void createInductionBypassValues(const SCEV2ValueTy &ExpandedSCEVs,
+ Value *MainVectorTripCount);
/// Allow subclasses to override and print debug traces before/after vplan
/// execution, when trace information is requested.
@@ -2613,21 +2599,11 @@ void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
nullptr, Twine(Prefix) + "scalar.ph");
}
-void InnerLoopVectorizer::createInductionResumeVPValue(
- VPIRInstruction *InductionPhiRI, const InductionDescriptor &II, Value *Step,
- ArrayRef<BasicBlock *> BypassBlocks, VPBuilder &ScalarPHBuilder,
+void InnerLoopVectorizer::createInductionBypassValue(
+ PHINode *OrigPhi, const InductionDescriptor &II, Value *Step,
Value *MainVectorTripCount) {
- // TODO: Move to LVP or general VPlan construction, once no IR values are
- // generated.
- auto *OrigPhi = cast<PHINode>(&InductionPhiRI->getInstruction());
- Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
- assert(VectorTripCount && "Expected valid arguments");
-
Instruction *OldInduction = Legal->getPrimaryInduction();
- // For the primary induction the end values are known.
- Value *EndValue = VectorTripCount;
Value *EndValueFromAdditionalBypass = MainVectorTripCount;
- // Otherwise compute them accordingly.
if (OrigPhi != OldInduction) {
IRBuilder<> B(LoopVectorPreHeader->getTerminator());
@@ -2635,10 +2611,6 @@ void InnerLoopVectorizer::createInductionResumeVPValue(
if (isa_and_nonnull<FPMathOperator>(II.getInductionBinOp()))
B.setFastMathFlags(II.getInductionBinOp()->getFastMathFlags());
- EndValue = emitTransformedIndex(B, VectorTripCount, II.getStartValue(),
- Step, II.getKind(), II.getInductionBinOp());
- EndValue->setName("ind.end");
-
// Compute the end value for the additional bypass (if applicable).
if (MainVectorTripCount) {
B.SetInsertPoint(getAdditionalBypassBlock(),
@@ -2650,22 +2622,12 @@ void InnerLoopVectorizer::createInductionResumeVPValue(
}
}
- auto *ResumePhiRecipe = ScalarPHBuilder.createNaryOp(
- VPInstruction::ResumePhi,
- {Plan.getOrAddLiveIn(EndValue), Plan.getOrAddLiveIn(II.getStartValue())},
- OrigPhi->getDebugLoc(), "bc.resume.val");
- assert(InductionPhiRI->getNumOperands() == 0 &&
- "InductionPhiRI should not have any operands");
- InductionPhiRI->addOperand(ResumePhiRecipe);
-
- if (EndValueFromAdditionalBypass) {
- // Store the bypass value here, as it needs to be added as operand to its
- // scalar preheader phi node after the epilogue skeleton has been created.
- // TODO: Directly add as extra operand to the VPResumePHI recipe.
- assert(!Induction2AdditionalBypassValue.contains(OrigPhi) &&
- "entry for OrigPhi already exits");
- Induction2AdditionalBypassValue[OrigPhi] = EndValueFromAdditionalBypass;
- }
+ // Store the bypass value here, as it needs to be added as operand to its
+ // scalar preheader phi node after the epilogue skeleton has been created.
+ // TODO: Directly add as extra operand to the VPResumePHI recipe.
+ assert(!Induction2AdditionalBypassValue.contains(OrigPhi) &&
+ "entry for OrigPhi already exits");
+ Induction2AdditionalBypassValue[OrigPhi] = EndValueFromAdditionalBypass;
}
/// Return the expanded step for \p ID using \p ExpandedSCEVs to look up SCEV
@@ -2682,29 +2644,15 @@ static Value *getExpandedStep(const InductionDescriptor &ID,
return I->second;
}
-void InnerLoopVectorizer::createInductionResumeVPValues(
- const SCEV2ValueTy &ExpandedSCEVs, Value *MainVectorTripCount,
- SmallPtrSetImpl<PHINode *> *IVSubset) {
- // We are going to resume the execution of the scalar loop.
- // Go over all of the induction variable PHIs of the scalar loop header and
- // fix their starting values, which depend on the counter of the last
- // iteration of the vectorized loop. If we come from one of the
- // LoopBypassBlocks then we need to start from the original start value.
- // Otherwise we provide the trip count from the main vector loop.
- VPBasicBlock *ScalarPHVPBB = Plan.getScalarPreheader();
- VPBuilder ScalarPHBuilder(ScalarPHVPBB, ScalarPHVPBB->begin());
- for (VPRecipeBase &R : *Plan.getScalarHeader()) {
- auto *PhiR = cast<VPIRInstruction>(&R);
- auto *Phi = dyn_cast<PHINode>(&PhiR->getInstruction());
- if (!Phi)
- break;
- if (!Legal->getInductionVars().contains(Phi) ||
- (IVSubset && !IVSubset->contains(Phi)))
- continue;
- const InductionDescriptor &II = Legal->getInductionVars().find(Phi)->second;
- createInductionResumeVPValue(PhiR, II, getExpandedStep(II, ExpandedSCEVs),
- LoopBypassBlocks, ScalarPHBuilder,
- MainVectorTripCount);
+void InnerLoopVectorizer::createInductionBypassValues(
+ const SCEV2ValueTy &ExpandedSCEVs, Value *MainVectorTripCount) {
+ assert(MainVectorTripCount && "Must have bypass information");
+
+ for (const auto &InductionEntry : Legal->getInductionVars()) {
+ PHINode *OrigPhi = InductionEntry.first;
+ const InductionDescriptor &II = InductionEntry.second;
+ createInductionBypassValue(OrigPhi, II, getExpandedStep(II, ExpandedSCEVs),
+ MainVectorTripCount);
}
}
@@ -2766,8 +2714,8 @@ InnerLoopVectorizer::createVectorizedLoopSkeleton(
// faster.
emitMemRuntimeChecks(LoopScalarPreHeader);
- // Emit phis for the new starting index of the scalar loop.
- createInductionResumeVPValues(ExpandedSCEVs);
+ Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
+ assert(VectorTripCount && "Expected valid arguments");
return {LoopVectorPreHeader, nullptr};
}
@@ -7848,19 +7796,6 @@ EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton(
// Generate the induction variable.
EPI.VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
- // Generate VPValues and ResumePhi recipes for wide inductions in the epilogue
- // plan only. Other inductions only need a resume value for the canonical
- // induction, which will get created during epilogue skeleton construction.
- SmallPtrSet<PHINode *, 4> WideIVs;
- for (VPRecipeBase &H :
- EPI.EpiloguePlan.getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
- if (auto *WideIV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&H))
- WideIVs.insert(WideIV->getPHINode());
- else if (auto *PtrIV = dyn_cast<VPWidenPointerInductionRecipe>(&H))
- WideIVs.insert(cast<PHINode>(PtrIV->getUnderlyingValue()));
- }
- createInductionResumeVPValues(ExpandedSCEVs, nullptr, &WideIVs);
-
return {LoopVectorPreHeader, nullptr};
}
@@ -8049,13 +7984,14 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
EPResumeVal->addIncoming(Init, EPI.MainLoopIterationCountCheck);
}
- // Generate induction resume values. These variables save the new starting
- // indexes for the scalar loop. They are used to test if there are any tail
- // iterations left once the vector loop has completed.
- // Note that when the vectorized epilogue is skipped due to iteration count
- // check, then the resume value for the induction variable comes from
- // the trip count of the main vector loop, passed as the second argument.
- createInductionResumeVPValues(ExpandedSCEVs, EPI.VectorTripCount);
+ Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
+ assert(VectorTripCount && "Expected valid arguments");
+
+ // Generate bypass values from the bypass blocks. Note that when the
+ // vectorized epilogue is skipped due to iteration count check, then the
+ // resume value for the induction variable comes from the trip count of the
+ // main vector loop, passed as the second argument.
+ createInductionBypassValues(ExpandedSCEVs, EPI.VectorTripCount);
return {LoopVectorPreHeader, EPResumeVal};
}
@@ -8858,13 +8794,64 @@ static void addCanonicalIVRecipes(VPlan &Plan, Type *IdxTy, bool HasNUW,
{CanonicalIVIncrement, &Plan.getVectorTripCount()}, DL);
}
+static VPValue *addResumeValuesForInduction(VPHeaderPHIRecipe *PhiR,
+ VPBuilder &Builder,
+ VPBuilder &ScalarPHBuilder,
+ VPTypeAnalysis &TypeInfo,
+ VPValue *VectorTC) {
+ PHINode *OrigPhi;
+ const InductionDescriptor *ID;
+ VPValue *Start;
+ VPValue *Step;
+ Type *ScalarTy;
+ bool IsCanonical = false;
+ if (auto *WideIV = dyn_cast<VPWidenIntOrFpInductionRecipe>(PhiR)) {
+ if (WideIV->getTruncInst())
+ return nullptr;
+ OrigPhi = cast<PHINode>(WideIV->getUnderlyingValue());
+ ID = &WideIV->getInductionDescriptor();
+ Start = WideIV->getStartValue();
+ Step = WideIV->getStepValue();
+ ScalarTy = WideIV->getScalarType();
+ IsCanonical = WideIV->isCanonical();
+ } else if (auto *WideIV = dyn_cast<VPWidenPointerInductionRecipe>(PhiR)) {
+ OrigPhi = cast<PHINode>(WideIV->getUnderlyingValue());
+ ID = &WideIV->getInductionDescriptor();
+ Start = WideIV->getStartValue();
+ Step = WideIV->getOperand(1);
+ ScalarTy = Start->getLiveInIRValue()->getType();
+ } else {
+ return nullptr;
+ }
+
+ VPValue *EndValue = VectorTC;
+ if (!IsCanonical) {
+ EndValue = Builder.createDerivedIV(
+ ID->getKind(),
+ dyn_cast_or_null<FPMathOperator>(ID->getInductionBinOp()), Start,
+ VectorTC, Step);
+ }
+
+ if (ScalarTy != TypeInfo.inferScalarType(EndValue)) {
+ EndValue = Builder.createScalarCast(Instruction::Trunc, EndValue, ScalarTy);
+ }
+
+ auto *ResumePhiRecipe =
+ ScalarPHBuilder.createNaryOp(VPInstruction::ResumePhi, {EndValue, Start},
+ OrigPhi->getDebugLoc(), "bc.resume.val");
+ return ResumePhiRecipe;
+}
+
/// Create resume phis in the scalar preheader for first-order recurrences and
/// reductions and update the VPIRInstructions wrapping the original phis in the
/// scalar header.
static void addScalarResumePhis(VPRecipeBuilder &Builder, VPlan &Plan) {
+ VPTypeAnalysis TypeInfo(Plan.getCanonicalIV()->getScalarType());
auto *ScalarPH = Plan.getScalarPreheader();
auto *MiddleVPBB = cast<VPBasicBlock>(ScalarPH->getSinglePredecessor());
VPBuilder ScalarPHBuilder(ScalarPH);
+ VPBuilder VectorPHBuilder(
+ cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor()));
VPBuilder MiddleBuilder(MiddleVPBB, MiddleVPBB->getFirstNonPhi());
VPValue *OneVPV = Plan.getOrAddLiveIn(
ConstantInt::get(Plan.getCanonicalIV()->getScalarType(), 1));
@@ -8874,6 +8861,13 @@ static void addScalarResumePhis(VPRecipeBuilder &Builder, VPlan &Plan) {
if (!ScalarPhiI)
break;
auto *VectorPhiR = cast<VPHeaderPHIRecipe>(Builder.getRecipe(ScalarPhiI));
+
+ if (VPValue *ResumePhi = addResumeValuesForInduction(
+ VectorPhiR, VectorPHBuilder, ScalarPHBuilder, TypeInfo,
+ &Plan.getVectorTripCount())) {
+ ScalarPhiIRI->addOperand(ResumePhi);
+ continue;
+ }
if (!isa<VPFirstOrderRecurrencePHIRecipe, VPReductionPHIRecipe>(VectorPhiR))
continue;
// The backedge value provides the value to resume coming out of a loop,
@@ -9635,7 +9629,6 @@ void VPDerivedIVRecipe::execute(VPTransformState &State) {
State.Builder, CanonicalIV, getStartValue()->getLiveInIRValue(), Step,
Kind, cast_if_present<BinaryOperator>(FPBinOp));
DerivedIV->setName(Name);
- assert(DerivedIV != CanonicalIV && "IV didn't need transforming?");
State.set(this, DerivedIV, VPLane(0));
}
@@ -10299,6 +10292,39 @@ bool LoopVectorizePass::processLoop(Loop *L) {
EPI, &LVL, &CM, BFI, PSI, Checks,
*BestMainPlan);
+ // Collect PHI nodes of wide inductions in the VPlan for the epilogue.
+ // Those will need their resume-values computed from the main vector
+ // loop. Others can be removed in the main VPlan.
+ SmallPtrSet<PHINode *, 2> WidenedPhis;
+ for (VPRecipeBase &R :
+ BestEpiPlan.getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
+ if (!isa<VPWidenIntOrFpInductionRecipe,
+ VPWidenPointerInductionRecipe>(&R))
+ continue;
+ if (isa<VPWidenIntOrFpInductionRecipe>(&R))
+ WidenedPhis.insert(
+ cast<VPWidenIntOrFpInductionRecipe>(&R)->getPHINode());
+ else
+ WidenedPhis.insert(
+ cast<PHINode>(R.getVPSingleValue()->getUnderlyingValue()));
+ }
+ for (VPRecipeBase &R :
+ *cast<VPIRBasicBlock>(BestMainPlan->getScalarHeader())) {
+ auto *VPIRInst = cast<VPIRInstruction>(&R);
+ auto *IRI = dyn_cast<PHINode>(&VPIRInst->getInstruction());
+ if (!IRI)
+ break;
+ if (WidenedPhis.contains(IRI) ||
+ !LVL.getInductionVars().contains(IRI))
+ continue;
+ VPRecipeBase *ResumePhi =
+ VPIRInst->getOperand(0)->getDefiningRecipe();
+ VPIRInst->setOperand(0, BestMainPlan->getOrAddLiveIn(
+ Constant::getNullValue(IRI->getType())));
+ ResumePhi->eraseFromParent();
+ }
+ VPlanTransforms::removeDeadRecipes(*BestMainPlan);
+
auto ExpandedSCEVs = LVP.executePlan(EPI.MainLoopVF, EPI.MainLoopUF,
*BestMainPlan, MainILV, DT, false);
++LoopsVectorized;
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index b438700a3fe2ce..9e3567d04d2332 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -64,6 +64,7 @@ bool VPRecipeBase::mayWriteToMemory() const {
case VPInstruction::FirstOrderRecurrenceSplice:
case VPInstruction::LogicalAnd:
case VPInstruction::PtrAdd:
+ case VPInstruction::ResumePhi:
return false;
default:
return true;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
index 1948211858d446..f1191fab8350c0 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
@@ -13,9 +13,9 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 8, [[TMP4]]
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]
; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP6:%.*]] = mul i64 [[TMP5]], 8
+; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]
; CHECK-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 8)
; CHECK-NEXT: [[TMP8:%.*]] = call <vscale x 8 x i64> @llvm.stepvector.nxv8i64()
; CHECK-NEXT: [[TMP9:%.*]] = add <vscale x 8 x i64> [[TMP8]], zeroinitializer
@@ -100,9 +100,9 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[WIDE_TRIP_COUNT]], [[TMP4]]
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]
; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP6:%.*]] = mul i64 [[TMP5]], 8
+; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]
; CHECK-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 [[WIDE_TRIP_COUNT]])
; CHECK-NEXT: [[TMP8:%.*]] = call <vscale x 8 x i64> @llvm.stepvector.nxv8i64()
; CHECK-NEXT: [[TMP9:%.*]] = add <vscale x 8 x i64> [[TMP8]], zeroinitializer
diff...
[truncated]
|
Now that #110577 landed, this should be ready for review. Some VPlan tests still need updating, but the relevant IR tests have been updated already. It still needs some updates for outer loop vectorization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice step leveraging ResumePhi and DerivedIV recipes!
Patch should be marked NFC?
@@ -64,6 +64,7 @@ bool VPRecipeBase::mayWriteToMemory() const { | |||
case VPInstruction::FirstOrderRecurrenceSplice: | |||
case VPInstruction::LogicalAnd: | |||
case VPInstruction::PtrAdd: | |||
case VPInstruction::ResumePhi: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independent fix, but testable only now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had to undo this change; we have to create a ResumePhi for the canonical IV in the main plan, which won't have any users but cannot be cleaned up
Instruction *OldInduction = Legal->getPrimaryInduction(); | ||
// For the primary induction the end values are known. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment still applies, to additional end value only:
// For the primary induction the additional bypass end value is known. Otherwise it is computed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added back, thanks
/// Create values for the induction variables to resume iteration count | ||
/// in the bypass block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Create values for the induction variables to resume iteration count | |
/// in the bypass block. | |
/// Create and record the values for induction variables to resume coming from the additional bypass block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
/// provides the trip count of the main vector loop, used to compute the | ||
/// resume value reaching the scalar loop preheader directly from this | ||
/// additional bypass block. | ||
void createInductionBypassValue(PHINode *OrigPhi, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void createInductionBypassValue(PHINode *OrigPhi, | |
void createInductionAdditionalBypassValue(PHINode *OrigPhi, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
ArrayRef<BasicBlock *> BypassBlocks, | ||
VPBuilder &ScalarPHBuilder, | ||
Value *MainVectorTripCount = nullptr); | ||
/// Create the bypass resume value coming from the additional bypass block. \p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Create the bypass resume value coming from the additional bypass block. \p | |
/// Create and record the bypass resume value for an induction Phi coming from the additional bypass block. \p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
@@ -9635,7 +9629,6 @@ void VPDerivedIVRecipe::execute(VPTransformState &State) { | |||
State.Builder, CanonicalIV, getStartValue()->getLiveInIRValue(), Step, | |||
Kind, cast_if_present<BinaryOperator>(FPBinOp)); | |||
DerivedIV->setName(Name); | |||
assert(DerivedIV != CanonicalIV && "IV didn't need transforming?"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such redundant recipes should be cleaned up, if generated at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned up in the latest version, except for the case where the vector trip count is constant null, which will only be set in prepareToExecute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and when Index=VTC=0 it may result in DerivedIV==Index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
// Collect PHI nodes of wide inductions in the VPlan for the epilogue. | ||
// Those will need their resume-values computed from the main vector | ||
// loop. Others can be removed in the main VPlan. | ||
SmallPtrSet<PHINode *, 2> WidenedPhis; | ||
for (VPRecipeBase &R : | ||
BestEpiPlan.getVectorLoopRegion()->getEntryBasicBlock()->phis()) { | ||
if (!isa<VPWidenIntOrFpInductionRecipe, | ||
VPWidenPointerInductionRecipe>(&R)) | ||
continue; | ||
if (isa<VPWidenIntOrFpInductionRecipe>(&R)) | ||
WidenedPhis.insert( | ||
cast<VPWidenIntOrFpInductionRecipe>(&R)->getPHINode()); | ||
else | ||
WidenedPhis.insert( | ||
cast<PHINode>(R.getVPSingleValue()->getUnderlyingValue())); | ||
} | ||
for (VPRecipeBase &R : | ||
*cast<VPIRBasicBlock>(BestMainPlan->getScalarHeader())) { | ||
auto *VPIRInst = cast<VPIRInstruction>(&R); | ||
auto *IRI = dyn_cast<PHINode>(&VPIRInst->getInstruction()); | ||
if (!IRI) | ||
break; | ||
if (WidenedPhis.contains(IRI) || | ||
!LVL.getInductionVars().contains(IRI)) | ||
continue; | ||
VPRecipeBase *ResumePhi = | ||
VPIRInst->getOperand(0)->getDefiningRecipe(); | ||
VPIRInst->setOperand(0, BestMainPlan->getOrAddLiveIn( | ||
Constant::getNullValue(IRI->getType()))); | ||
ResumePhi->eraseFromParent(); | ||
} | ||
VPlanTransforms::removeDeadRecipes(*BestMainPlan); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be outlined and/or placed elsewhere, too detailed for processLoop() which is already too long and doing too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outlined, thanks!
@@ -10299,6 +10292,39 @@ bool LoopVectorizePass::processLoop(Loop *L) { | |||
EPI, &LVL, &CM, BFI, PSI, Checks, | |||
*BestMainPlan); | |||
|
|||
// Collect PHI nodes of wide inductions in the VPlan for the epilogue. | |||
// Those will need their resume-values computed from the main vector | |||
// loop. Others can be removed in the main VPlan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we only need resume values from the main plan, if there is a corresponding wide induction in the epilogue plan.
VPRecipeBase *ResumePhi = | ||
VPIRInst->getOperand(0)->getDefiningRecipe(); | ||
VPIRInst->setOperand(0, BestMainPlan->getOrAddLiveIn( | ||
Constant::getNullValue(IRI->getType()))); | ||
ResumePhi->eraseFromParent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean up redundant resume Phi recipes created for inductions that were not widened? VPlan's DCE cannot collect them because of cross-VPlan def-use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, unfortunately the latest version cannot use VPlan DCE, because we also need to add a ResumePHI for the canonical IV in the main vector loop which. won't have any uses in the main plan. Hence manual removal here.
VPIRInst->setOperand(0, BestMainPlan->getOrAddLiveIn( | ||
Constant::getNullValue(IRI->getType()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deserves some explanation/comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added thanks!
@@ -74,6 +74,7 @@ define void @low_vf_ic_is_better(ptr nocapture noundef %p, i32 %tc, i16 noundef | |||
; CHECK-VS1-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP3]], [[N_MOD_VF]] | |||
; CHECK-VS1-NEXT: [[TMP18:%.*]] = call i64 @llvm.vscale.i64() | |||
; CHECK-VS1-NEXT: [[TMP19:%.*]] = mul i64 [[TMP18]], 16 | |||
; CHECK-VS1-NEXT: [[TMP40:%.*]] = add i64 [[TMP0]], [[N_VEC]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused instruction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes unfortunately now we create the end VPValues during VPlan construction and now during the legacy epilogue skeleton code generation there's no convenient way to remove them for the epilogue only case I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth recording a FIXME somewhere to handle when epilog and main are covered by one VPlan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added TODO, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be gone now, thanks
const InductionDescriptor &ID = WideIV->getInductionDescriptor(); | ||
Type *ScalarTy = TypeInfo.inferScalarType(WideIV); | ||
bool IsCanonical = false; | ||
if (auto *WideIV = dyn_cast<VPWidenIntOrFpInductionRecipe>(PhiR)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (auto *WideIV = dyn_cast<VPWidenIntOrFpInductionRecipe>(PhiR)) { | |
if (auto *WideIntOrFP = dyn_cast<VPWidenIntOrFpInductionRecipe>(PhiR)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
// Truncated wide inductions resume from the last lane of their vector value | ||
// in the last vector iteration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... which is handled elsewhere (so early return null)? Rather than precompute (which is done here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
} | ||
|
||
VPValue *EndValue = VectorTC; | ||
if (!IsCanonical) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better positioned next to setting IsCanonical
above, and folded into if (WideIV->isCanonical())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it can be moved, we need to compute the end value for all cases except wide canonical IVs
/// reductions and update the VPIRInstructions wrapping the original phis in the | ||
/// scalar header. | ||
static void addScalarResumePhis(VPRecipeBuilder &Builder, VPlan &Plan) { | ||
/// Create a ResumePhi for \p PhiR, if it is wide induction recipe. If the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Create a ResumePhi for \p PhiR, if it is wide induction recipe. If the | |
/// Create and return a ResumePhi for \p PhiR, if it is wide induction recipe. If the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
Start, VectorTC, Step); | ||
} | ||
|
||
// EndValue is based on the vector trip count (which has the same type as the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// EndValue is based on the vector trip count (which has the same type as the | |
// EndValue is derived from the vector trip count (which has the same type as the |
consistent with being a DerivedIV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better position addResumeValuesForInduction() above under the condition for inductions below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
if (!isa<VPFirstOrderRecurrencePHIRecipe, VPReductionPHIRecipe>( | ||
VectorPhiR)) { | ||
assert(cast<VPWidenIntOrFpInductionRecipe>(VectorPhiR)->getTruncInst() && | ||
"should only skip truncated wide inductions"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better also handle truncated wide inductions here? Possibly follow-up.
OTOH, better to first handle (legalize) all loop live-outs the same - using the last value computed in the loop, followed by a VPlan2VPlan pass (optimize) that converts non-truncating inductions to use precomputed values instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO, thanks
@@ -9635,7 +9629,6 @@ void VPDerivedIVRecipe::execute(VPTransformState &State) { | |||
State.Builder, CanonicalIV, getStartValue()->getLiveInIRValue(), Step, | |||
Kind, cast_if_present<BinaryOperator>(FPBinOp)); | |||
DerivedIV->setName(Name); | |||
assert(DerivedIV != CanonicalIV && "IV didn't need transforming?"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and when Index=VTC=0 it may result in DerivedIV==Index?
DerivedIV->setName(Name); | ||
assert(DerivedIV != CanonicalIV && "IV didn't need transforming?"); | ||
|
||
// Index may only be set to constant 0 in prepareToExecute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Index may only be set to constant 0 in prepareToExecute. | |
// Index may only be set to constant 0 temporarily in prepareToExecute. |
is this indeed temporarily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, we only create the expression for the vector trip count late and set it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still seem to be missing something here. The fact that prepareToExecute() creates the vector trip count (typically to non zero?) implies that derived IVs that are based on it will miss the opportunity to be folded by simplifyRecipes(), because it's too late.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is the issue; VectorTripCountV is only created during skeleton creation and set as underlying value here, so we miss this opportunity to fold. I adjusted the assert to check if the index is the vector trip count instead and expanded the comment.
Added a TODO to remove this once we construct VTC in VPlan, and can use it for simplifications.
@@ -9439,6 +9418,20 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) { | |||
bool HasNUW = true; | |||
addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), HasNUW, | |||
DebugLoc()); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to reduce the disparity between native and legacy VPlan constructions (ultimately converging). Perhaps have the former also record Ingredient2Recipe for widened inductions it creates during VPInstructionsToVPRecipes(), to then use getRecipe(P) as above, rather than searching linearly for each as below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this
VPRecipeBuilder RecipeBuilder(...); // As in tryToBuildVPlanWithVPRecipes() following addCanonicalIVRecipes();
for (auto &R : Plan->getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
auto *HeaderR = cast<VPHeaderPHIRecipe>(&R);
RecipeBuilder.setRecipe(HeaderR->getUnderlyingInstr(), HeaderR);
}
addScalarResumePhis(*Plan, RecipeBuilder);
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, adjusted, thanks!
B.setFastMathFlags(II.getInductionBinOp()->getFastMathFlags()); | ||
|
||
// Compute the end value for the additional bypass. | ||
B.SetInsertPoint(getAdditionalBypassBlock(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to set two builders at the outset (VectorPHBuilder, AdditionalBypassBuilder)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only a single builder (in the bypass block), moved, thanks!
auto *WideIV = dyn_cast<VPWidenInductionRecipe>(PhiR); | ||
if (!WideIV) | ||
return nullptr; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto *WideIntOrFp = dyn_cast<VPWidenIntOrFpInductionRecipe>(WideIV)); | |
// Truncated wide inductions resume from the last lane of their vector value | |
// in the last vector iteration which is handled elsewhere. | |
if (WidenIntOrFp && WideIntOrFp->getTruncInst()) | |
return nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted, thanks!
|
||
VPValue *Start = WideIV->getStartValue(); | ||
VPValue *Step = WideIV->getStepValue(); | ||
const InductionDescriptor &ID = WideIV->getInductionDescriptor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const InductionDescriptor &ID = WideIV->getInductionDescriptor(); | |
const InductionDescriptor &ID = WideIV->getInductionDescriptor(); | |
VPValue *EndValue = VectorTC; | |
if (!WidenIntOrFp || !WidenIntOrFp->IsCanonical()) { | |
EndValue = VectorPHBuilder.createDerivedIV( | |
ID.getKind(), dyn_cast_or_null<FPMathOperator>(ID.getInductionBinOp()), | |
Start, VectorTC, Step); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted, thanks!
@@ -9481,7 +9445,9 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) { | |||
VPlanTransforms::handleUncountableEarlyExit( | |||
*Plan, *PSE.getSE(), OrigLoop, UncountableExitingBlock, RecipeBuilder); | |||
} | |||
addScalarResumePhis(RecipeBuilder, *Plan); | |||
addScalarResumePhis(*Plan, [&RecipeBuilder](PHINode *P) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simpler to continue pass RecipeBuilder into addScalarResumePhis()? The latter is now also being called by native buildVPlan(), to support induction resumes (right?), so it would need to construct a RecipeBuilder and have it record the mapping from header phis to their recipes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
@@ -9439,6 +9418,20 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) { | |||
bool HasNUW = true; | |||
addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), HasNUW, | |||
DebugLoc()); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this
VPRecipeBuilder RecipeBuilder(...); // As in tryToBuildVPlanWithVPRecipes() following addCanonicalIVRecipes();
for (auto &R : Plan->getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
auto *HeaderR = cast<VPHeaderPHIRecipe>(&R);
RecipeBuilder.setRecipe(HeaderR->getUnderlyingInstr(), HeaderR);
}
addScalarResumePhis(*Plan, RecipeBuilder);
work?
; CHECK: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) } | ||
;. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, run the script with an unneeded argument, removed, thanks
@@ -348,7 +347,7 @@ define void @single_stride_int_iv(ptr %p, i64 %stride) { | |||
; NOSTRIDED-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]] | |||
; NOSTRIDED: scalar.ph: | |||
; NOSTRIDED-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[VECTOR_SCEVCHECK]] ], [ 0, [[ENTRY:%.*]] ] | |||
; NOSTRIDED-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[VECTOR_SCEVCHECK]] ], [ 0, [[ENTRY]] ] | |||
; NOSTRIDED-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[VECTOR_SCEVCHECK]] ], [ 0, [[ENTRY]] ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change to use N_VEC instead of N_VEC * STRIDE is based on scevcheck for unit stride, saving a mul instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
@@ -231,12 +231,15 @@ class VPBuilder { | |||
new VPInstruction(Ptr, Offset, GEPNoWrapFlags::inBounds(), DL, Name)); | |||
} | |||
|
|||
/// Convert the input value \p Current to the corresponding value of an | |||
/// induction with different start and step values, using Start + Current * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// induction with different start and step values, using Start + Current * | |
/// induction with \p Start and \p Step values, using Start + Current * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks!
@@ -9209,9 +9173,9 @@ addUsersInExitBlocks(VPlan &Plan, | |||
static void addExitUsersForFirstOrderRecurrences( | |||
VPlan &Plan, SetVector<VPIRInstruction *> &ExitUsersToFix) { | |||
VPRegionBlock *VectorRegion = Plan.getVectorLoopRegion(); | |||
auto *ScalarPHVPBB = Plan.getScalarPreheader(); | |||
auto *MainScalarPH = Plan.getScalarPreheader(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto *MainScalarPH = Plan.getScalarPreheader(); | |
auto *ScalarPHVPBB = Plan.getScalarPreheader(); |
ahh, here there's only one loop/plan, rather than having both main and epilog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undone, thanks
DerivedIV->setName(Name); | ||
assert(DerivedIV != CanonicalIV && "IV didn't need transforming?"); | ||
|
||
// Index may only be set to constant 0 in prepareToExecute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still seem to be missing something here. The fact that prepareToExecute() creates the vector trip count (typically to non zero?) implies that derived IVs that are based on it will miss the opportunity to be folded by simplifyRecipes(), because it's too late.
Split DerivedIV simplification off from #112145 and use to remove the need for extra checks in createScalarIVSteps. Required an extra simplification run after IV transforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split off the changes to simplify DerivedIV to e1833e3 and use it to simplify createScalarIVSteps
if (isa_and_nonnull<FPMathOperator>(II.getInductionBinOp())) | ||
BypassBuilder.setFastMathFlags( | ||
II.getInductionBinOp()->getFastMathFlags()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
if (isa_and_nonnull<FPMathOperator>(II.getInductionBinOp())) | |
BypassBuilder.setFastMathFlags( | |
II.getInductionBinOp()->getFastMathFlags()); | |
auto *BinOp = II.getInductionBinOp(); | |
if (isa_and_nonnull<FPMathOperator>(BinOp)) | |
BypassBuilder.setFastMathFlags(BinOp->getFastMathFlags()); |
plus additional use below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
if (isa<VPWidenInductionRecipe>(VectorPhiR)) { | ||
if (VPValue *ResumePhi = addResumePhiRecipeForInduction( | ||
VectorPhiR, VectorPHBuilder, ScalarPHBuilder, TypeInfo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (isa<VPWidenInductionRecipe>(VectorPhiR)) { | |
if (VPValue *ResumePhi = addResumePhiRecipeForInduction( | |
VectorPhiR, VectorPHBuilder, ScalarPHBuilder, TypeInfo, | |
if (auto *WideIVR = dyn_cast<VPWidenInductionRecipe>(VectorPhiR)) { | |
if (VPValue *ResumePhi = addResumePhiRecipeForInduction( | |
WideIVR, VectorPHBuilder, ScalarPHBuilder, TypeInfo, |
instead of the dyn_cast in addResumePhiRecipeForInduction().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
|
||
// If index is the vector trip count, the concrete value will only be set in | ||
// prepareToExecute, leading to missed simplifications, e.g. if it is 0. | ||
// TODO: Remove the special case for the vector trip count once it is computed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is clear.
Knowing VTC is zero at compile-time suggests more simplification than eliminating redundant DeriveIV's ... ideally VF's (and UF's) should be restricted so as to prevent this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is slightly counter-intuitive, VTC == 0 can only happen when the original BTC == -1, but should be handled correctly with checking the canonical IV increment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So prepareToExecute() sets the value of VTC too late for any folding to take place. Treating overflow of VTC = max_uint BTC + 1 = 0 by skipping the vector loop, is clear. Question is if we can avoid vectorizing altogether, knowing this is the case at compile-time, considering the effort of computeMaxVF()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, will look into that separately, thanks
VPRecipeBase *ResumePhi = VPIRInst->getOperand(0)->getDefiningRecipe(); | ||
VPIRInst->setOperand( | ||
0, MainPlan.getOrAddLiveIn(Constant::getNullValue(IRI->getType()))); | ||
ResumePhi->eraseFromParent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps worth noting here (also) that this erasure should be followed by dce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a run here; this cleans up the redundant IV computations discussed in one of the tests
@@ -11,6 +11,7 @@ target triple = "aarch64-unknown-linux-gnu" | |||
; VPLANS-LABEL: Checking a loop in 'simple_memset' | |||
; VPLANS: VPlan 'Initial VPlan for VF={vscale x 1,vscale x 2,vscale x 4},UF>=1' { | |||
; VPLANS-NEXT: Live-in vp<[[VFxUF:%.+]]> = VF * UF | |||
; VPLANS-NEXT: Live-in vp<[[VTC:%.+]]> = vector-trip-count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless considered irrelevant for the test, in which case this check should (continue to) be dropped?
Checking only the def w/o any use is somewhat inconsistent, but there are many test changes...
VPIRInst->setOperand( | ||
0, MainPlan.getOrAddLiveIn(Constant::getNullValue(IRI->getType()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPIRinst wraps the header phi in the scalar loop, so it is probaby better to keep it here.
Is this operand needed, set artificially to zero (can convey a type... ;-), or should it best be removed, keeping VPIRInst only as a wrapper of the header phi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment it is needed, otherwise it would complicate VPIRInstruction::execute; but in the future we can disconnect the scalar header for the main vector loop, thus removing the need to update them there. (There is code elsewhere that use the scalar phi to get the resume value from the main vector loop)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... but in the future we can disconnect the scalar header for the main vector loop ...
When can this scalar header be disconnected? Presumably it is needed for phi's that do have wide users in EpiPlan, so could be dropped altogether (or not constructed at all) if and only if all phi's are free of such users?
Can VPIRInst be erased as well (perhaps with make_early_inc_range)?
(thought I responded but cannot see the response here): VPIRinst wraps the header phi in the scalar loop, so it is probably better to keep it here.
A VPIRInst wraps an IRInst in order to (a) add operands to phi's and/or (b) support placing non-VPIRInst recipes between IRInst's, by bumping the builder. If neither is needed, VPIRInst can be dropped?
If an artificial operand is needed to appease VPIRInstruction::execute(), could undef/poison be used instead of zero?
(There is code elsewhere that use the scalar phi to get the resume value from the main vector loop)
So dropping it or using undef would require updating said code in VPIRInstruction::execute() and/or elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both can be removed in the latest version, updated, thanks!
Disconnection should be able after #120918
BypassBuilder.setFastMathFlags( | ||
II.getInductionBinOp()->getFastMathFlags()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BypassBuilder.setFastMathFlags( | |
II.getInductionBinOp()->getFastMathFlags()); | |
BypassBuilder.setFastMathFlags(BinOp->getFastMathFlags()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DOne thanks
/// Create resume phis in the scalar preheader for first-order recurrences and | ||
/// reductions and update the VPIRInstructions wrapping the original phis in the | ||
/// scalar header. | ||
/// Create and return a ResumePhi for \p PhiR, if it is wide induction recipe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Create and return a ResumePhi for \p PhiR, if it is wide induction recipe. | |
/// Create and return a ResumePhi for \p WideIV, unless it is truncated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, added a couple of final thoughts.
/// Create resume phis in the scalar preheader for first-order recurrences and | ||
/// reductions and update the VPIRInstructions wrapping the original phis in the | ||
/// scalar header. | ||
/// Create and return a ResumePhi for \p WideIF, unless it is truncated. If the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Create and return a ResumePhi for \p WideIF, unless it is truncated. If the | |
/// Create and return a ResumePhi for \p WideIV, unless it is truncated. If the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks!
|
||
// If index is the vector trip count, the concrete value will only be set in | ||
// prepareToExecute, leading to missed simplifications, e.g. if it is 0. | ||
// TODO: Remove the special case for the vector trip count once it is computed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So prepareToExecute() sets the value of VTC too late for any folding to take place. Treating overflow of VTC = max_uint BTC + 1 = 0 by skipping the vector loop, is clear. Question is if we can avoid vectorizing altogether, knowing this is the case at compile-time, considering the effort of computeMaxVF()
?
VPIRInst->setOperand( | ||
0, MainPlan.getOrAddLiveIn(Constant::getNullValue(IRI->getType()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... but in the future we can disconnect the scalar header for the main vector loop ...
When can this scalar header be disconnected? Presumably it is needed for phi's that do have wide users in EpiPlan, so could be dropped altogether (or not constructed at all) if and only if all phi's are free of such users?
Can VPIRInst be erased as well (perhaps with make_early_inc_range)?
(thought I responded but cannot see the response here): VPIRinst wraps the header phi in the scalar loop, so it is probably better to keep it here.
A VPIRInst wraps an IRInst in order to (a) add operands to phi's and/or (b) support placing non-VPIRInst recipes between IRInst's, by bumping the builder. If neither is needed, VPIRInst can be dropped?
If an artificial operand is needed to appease VPIRInstruction::execute(), could undef/poison be used instead of zero?
(There is code elsewhere that use the scalar phi to get the resume value from the main vector loop)
So dropping it or using undef would require updating said code in VPIRInstruction::execute() and/or elsewhere?
// There is no corresponding wide induction in the epilogue plan that would | ||
// need a resume value. Set the operand in VPIRInst to zero, so ResumePhi | ||
// can be removed. The resume values for the scalar loop will be created | ||
// during execution of EpiPlan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resume values for the scalar loop will be created
// during execution of EpiPlan.
All resume values, or only those free of wide users in EpiPlan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be all resume values for the scalar loop, those free of wide users are not need to resume in the epilogue vector loop
if (EpiWidenedPhis.contains(IRI)) | ||
continue; | ||
// There is no corresponding wide induction in the epilogue plan that would | ||
// need a resume value. Set the operand in VPIRInst to zero, so ResumePhi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emphasize that this zero is artificial, along with a FIXME to remove it (one way or another)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both can be removed in the latest version, updated, thanks!
Model updating IV users directly in VPlan, replace fixupIVUsers. Depends on llvm#110004, llvm#109975 and llvm#112145.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/199/builds/506 Here is the relevant piece of the build log for the reference
|
Use createDerivedIV to compute IV end values directly in VPlan, instead of creating them up-front. This allows updating IV users outside the loop as follow-up. Depends on llvm#110004 and llvm#109975. PR: llvm#112145
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.
This allows updating IV users outside the loop as follow-up.
Depends on #110004 and
#109975.