-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LoopVectorize] Enable more early exit vectorisation tests #117008
Conversation
@llvm/pr-subscribers-vectorizers Author: David Sherwood (david-arm) ChangesPR #112138 introduced initial support for dispatching to
Patch is 72.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117008.diff 19 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index f1568781252c06..13af2fde44459b 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -1375,6 +1375,13 @@ bool LoopVectorizationLegality::isFixedOrderRecurrence(
}
bool LoopVectorizationLegality::blockNeedsPredication(BasicBlock *BB) const {
+ // The only block currently permitted after the early exiting block is the
+ // loop latch, so only that blocks needs predication.
+ // FIXME: Once we support instructions in the loop that cannot be executed
+ // speculatively, such as stores, we will also need to predicate all blocks
+ // leading up to the early exit too.
+ if (hasUncountableEarlyExit() && BB == TheLoop->getLoopLatch())
+ return true;
return LoopAccessInfo::blockNeedsPredication(BB, TheLoop, DT);
}
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 5073622a095537..80818a162d121c 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -385,6 +385,11 @@ static cl::opt<bool> UseWiderVFIfCallVariantsPresent(
cl::Hidden,
cl::desc("Try wider VFs if they enable the use of vector variants"));
+static cl::opt<bool> EnableEarlyExitVectorization(
+ "enable-early-exit-vectorization", cl::init(false), cl::Hidden,
+ cl::desc(
+ "Enable vectorization of early exit loops with uncountable exits."));
+
// Likelyhood of bypassing the vectorized loop because assumptions about SCEV
// variables not overflowing do not hold. See `emitSCEVChecks`.
static constexpr uint32_t SCEVCheckBypassWeights[] = {1, 127};
@@ -1350,9 +1355,10 @@ class LoopVectorizationCostModel {
LLVM_DEBUG(dbgs() << "LV: Loop does not require scalar epilogue\n");
return false;
}
- // If we might exit from anywhere but the latch, must run the exiting
- // iteration in scalar form.
- if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
+ // If we might exit from anywhere but the latch and early exit vectorization
+ // is disabled, we must run the exiting iteration in scalar form.
+ if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch() &&
+ !(EnableEarlyExitVectorization && Legal->hasUncountableEarlyExit())) {
LLVM_DEBUG(dbgs() << "LV: Loop requires scalar epilogue: not exiting "
"from latch block\n");
return true;
@@ -2568,9 +2574,9 @@ BasicBlock *InnerLoopVectorizer::emitMemRuntimeChecks(BasicBlock *Bypass) {
void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
LoopVectorPreHeader = OrigLoop->getLoopPreheader();
assert(LoopVectorPreHeader && "Invalid loop structure");
- assert((OrigLoop->getUniqueExitBlock() ||
+ assert((OrigLoop->getUniqueLatchExitBlock() ||
Cost->requiresScalarEpilogue(VF.isVector())) &&
- "multiple exit loop without required epilogue?");
+ "loops not exiting via the latch without required epilogue?");
LoopMiddleBlock =
SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT,
@@ -2753,8 +2759,6 @@ void InnerLoopVectorizer::fixupIVUsers(PHINode *OrigPhi,
// value (the value that feeds into the phi from the loop latch).
// We allow both, but they, obviously, have different values.
- assert(OrigLoop->getUniqueExitBlock() && "Expected a single exit block");
-
DenseMap<Value *, Value *> MissingVals;
Value *EndValue = cast<PHINode>(OrigPhi->getIncomingValueForBlock(
@@ -2808,6 +2812,8 @@ void InnerLoopVectorizer::fixupIVUsers(PHINode *OrigPhi,
}
}
+ assert((MissingVals.empty() || OrigLoop->getUniqueExitBlock()) &&
+ "Expected a single exit block for escaping values");
for (auto &I : MissingVals) {
PHINode *PHI = cast<PHINode>(I.first);
// One corner case we have to handle is two IVs "chasing" each-other,
@@ -2939,6 +2945,21 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
PSE.getSE()->forgetLoop(OrigLoop);
PSE.getSE()->forgetBlockAndLoopDispositions();
+ // When dealing with uncountable early exits we create middle.split blocks
+ // between the vector loop region and the exit block. These blocks need
+ // adding to any outer loop.
+ VPRegionBlock *VectorRegion = State.Plan->getVectorLoopRegion();
+ Loop *OuterLoop = OrigLoop->getParentLoop();
+ if (Legal->hasUncountableEarlyExit() && OuterLoop) {
+ VPBasicBlock *MiddleVPBB = State.Plan->getMiddleBlock();
+ VPBlockBase *PredVPBB = MiddleVPBB->getSinglePredecessor();
+ while (PredVPBB && PredVPBB != VectorRegion) {
+ BasicBlock *MiddleSplitBB = State.CFG.VPBB2IRBB[cast<VPBasicBlock>(PredVPBB)];
+ OuterLoop->addBasicBlockToLoop(MiddleSplitBB, *LI);
+ PredVPBB = PredVPBB->getSinglePredecessor();
+ }
+ }
+
// After vectorization, the exit blocks of the original loop will have
// additional predecessors. Invalidate SCEVs for the exit phis in case SE
// looked through single-entry phis.
@@ -2969,7 +2990,6 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
for (Instruction *PI : PredicatedInstructions)
sinkScalarOperands(&*PI);
- VPRegionBlock *VectorRegion = State.Plan->getVectorLoopRegion();
VPBasicBlock *HeaderVPBB = VectorRegion->getEntryBasicBlock();
BasicBlock *HeaderBB = State.CFG.VPBB2IRBB[HeaderVPBB];
@@ -3591,7 +3611,8 @@ void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {
TheLoop->getExitingBlocks(Exiting);
for (BasicBlock *E : Exiting) {
auto *Cmp = dyn_cast<Instruction>(E->getTerminator()->getOperand(0));
- if (Cmp && TheLoop->contains(Cmp) && Cmp->hasOneUse())
+ if (Cmp && TheLoop->contains(Cmp) && Cmp->hasOneUse() &&
+ (TheLoop->getLoopLatch() == E || !Legal->hasUncountableEarlyExit()))
AddToWorklistIfAllowed(Cmp);
}
@@ -4044,7 +4065,8 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
// a bottom-test and a single exiting block. We'd have to handle the fact
// that not every instruction executes on the last iteration. This will
// require a lane mask which varies through the vector loop body. (TODO)
- if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
+ if (Legal->hasUncountableEarlyExit() ||
+ TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
// If there was a tail-folding hint/switch, but we can't fold the tail by
// masking, fallback to a vectorization with a scalar epilogue.
if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {
@@ -4663,7 +4685,9 @@ bool LoopVectorizationPlanner::isCandidateForEpilogueVectorization(
// Epilogue vectorization code has not been auditted to ensure it handles
// non-latch exits properly. It may be fine, but it needs auditted and
// tested.
- if (OrigLoop->getExitingBlock() != OrigLoop->getLoopLatch())
+ // TODO: Add support for loops with an early exit.
+ if (Legal->hasUncountableEarlyExit() ||
+ OrigLoop->getExitingBlock() != OrigLoop->getLoopLatch())
return false;
return true;
@@ -4913,6 +4937,10 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
if (!Legal->isSafeForAnyVectorWidth())
return 1;
+ // We don't attempt to perform interleaving for early exit loops.
+ if (Legal->hasUncountableEarlyExit())
+ return 1;
+
auto BestKnownTC = getSmallBestKnownTC(PSE, TheLoop);
const bool HasReductions = !Legal->getReductionVars().empty();
@@ -7746,11 +7774,14 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
// 2.5 Collect reduction resume values.
auto *ExitVPBB = BestVPlan.getMiddleBlock();
- if (VectorizingEpilogue)
+ if (VectorizingEpilogue) {
+ assert(!ILV.Legal->hasUncountableEarlyExit() &&
+ "Epilogue vectorisation not yet supported with early exits");
for (VPRecipeBase &R : *ExitVPBB) {
fixReductionScalarResumeWhenVectorizingEpilog(
&R, State, State.CFG.VPBB2IRBB[ExitVPBB]);
}
+ }
// 2.6. Maintain Loop Hints
// Keep all loop hints from the original loop on the vector loop (we'll
@@ -7775,6 +7806,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
LoopVectorizeHints Hints(L, true, *ORE);
Hints.setAlreadyVectorized();
}
+
TargetTransformInfo::UnrollingPreferences UP;
TTI.getUnrollingPreferences(L, *PSE.getSE(), UP, ORE);
if (!UP.UnrollVectorizedLoop || CanonicalIVStartValue)
@@ -7787,15 +7819,17 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
ILV.printDebugTracesAtEnd();
// 4. Adjust branch weight of the branch in the middle block.
- auto *MiddleTerm =
- cast<BranchInst>(State.CFG.VPBB2IRBB[ExitVPBB]->getTerminator());
- if (MiddleTerm->isConditional() &&
- hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator())) {
- // Assume that `Count % VectorTripCount` is equally distributed.
- unsigned TripCount = BestVPlan.getUF() * State.VF.getKnownMinValue();
- assert(TripCount > 0 && "trip count should not be zero");
- const uint32_t Weights[] = {1, TripCount - 1};
- setBranchWeights(*MiddleTerm, Weights, /*IsExpected=*/false);
+ if (ExitVPBB) {
+ auto *MiddleTerm =
+ cast<BranchInst>(State.CFG.VPBB2IRBB[ExitVPBB]->getTerminator());
+ if (MiddleTerm->isConditional() &&
+ hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator())) {
+ // Assume that `Count % VectorTripCount` is equally distributed.
+ unsigned TripCount = BestVPlan.getUF() * State.VF.getKnownMinValue();
+ assert(TripCount > 0 && "trip count should not be zero");
+ const uint32_t Weights[] = {1, TripCount - 1};
+ setBranchWeights(*MiddleTerm, Weights, /*IsExpected=*/false);
+ }
}
return State.ExpandedSCEVs;
@@ -8180,7 +8214,7 @@ VPValue *VPRecipeBuilder::createEdgeMask(BasicBlock *Src, BasicBlock *Dst) {
// If source is an exiting block, we know the exit edge is dynamically dead
// in the vector loop, and thus we don't need to restrict the mask. Avoid
// adding uses of an otherwise potentially dead instruction.
- if (OrigLoop->isLoopExiting(Src))
+ if (!Legal->hasUncountableEarlyExit() && OrigLoop->isLoopExiting(Src))
return EdgeMaskCache[Edge] = SrcMask;
VPValue *EdgeMask = getVPValueOrAddLiveIn(BI->getCondition());
@@ -8863,47 +8897,46 @@ static void addScalarResumePhis(VPRecipeBuilder &Builder, VPlan &Plan) {
}
}
-// Collect VPIRInstructions for phis in the original exit block that are modeled
+// Collect VPIRInstructions for phis in the exit blocks that are modeled
// in VPlan and add the exiting VPValue as operand. Some exiting values are not
// modeled explicitly yet and won't be included. Those are un-truncated
// VPWidenIntOrFpInductionRecipe, VPWidenPointerInductionRecipe and induction
// increments.
-static SetVector<VPIRInstruction *> collectUsersInExitBlock(
+static SetVector<VPIRInstruction *> collectUsersInExitBlocks(
Loop *OrigLoop, VPRecipeBuilder &Builder, VPlan &Plan,
const MapVector<PHINode *, InductionDescriptor> &Inductions) {
- auto *MiddleVPBB = Plan.getMiddleBlock();
- // No edge from the middle block to the unique exit block has been inserted
- // and there is nothing to fix from vector loop; phis should have incoming
- // from scalar loop only.
- if (MiddleVPBB->getNumSuccessors() != 2)
- return {};
SetVector<VPIRInstruction *> ExitUsersToFix;
- VPBasicBlock *ExitVPBB = cast<VPIRBasicBlock>(MiddleVPBB->getSuccessors()[0]);
- BasicBlock *ExitingBB = OrigLoop->getExitingBlock();
- for (VPRecipeBase &R : *ExitVPBB) {
- auto *ExitIRI = dyn_cast<VPIRInstruction>(&R);
- if (!ExitIRI)
- continue;
- auto *ExitPhi = dyn_cast<PHINode>(&ExitIRI->getInstruction());
- if (!ExitPhi)
- break;
- Value *IncomingValue = ExitPhi->getIncomingValueForBlock(ExitingBB);
- VPValue *V = Builder.getVPValueOrAddLiveIn(IncomingValue);
- // Exit values for inductions are computed and updated outside of VPlan and
- // independent of induction recipes.
- // TODO: Compute induction exit values in VPlan.
- if ((isa<VPWidenIntOrFpInductionRecipe>(V) &&
- !cast<VPWidenIntOrFpInductionRecipe>(V)->getTruncInst()) ||
- isa<VPWidenPointerInductionRecipe>(V) ||
- (isa<Instruction>(IncomingValue) &&
- OrigLoop->contains(cast<Instruction>(IncomingValue)) &&
- any_of(IncomingValue->users(), [&Inductions](User *U) {
- auto *P = dyn_cast<PHINode>(U);
- return P && Inductions.contains(P);
- })))
- continue;
- ExitUsersToFix.insert(ExitIRI);
- ExitIRI->addOperand(V);
+ for (VPIRBasicBlock *ExitVPBB : Plan.getExitBlocks()) {
+ BasicBlock *ExitBB = ExitVPBB->getIRBasicBlock();
+ for (VPRecipeBase &R : *ExitVPBB) {
+ auto *ExitIRI = dyn_cast<VPIRInstruction>(&R);
+ if (!ExitIRI)
+ continue;
+ auto *ExitPhi = dyn_cast<PHINode>(&ExitIRI->getInstruction());
+ if (!ExitPhi)
+ break;
+ for (BasicBlock *ExitingBB : predecessors(ExitBB)) {
+ if (!OrigLoop->contains(ExitingBB))
+ continue;
+ Value *IncomingValue = ExitPhi->getIncomingValueForBlock(ExitingBB);
+ VPValue *V = Builder.getVPValueOrAddLiveIn(IncomingValue);
+ // Exit values for inductions are computed and updated outside of VPlan
+ // and independent of induction recipes.
+ // TODO: Compute induction exit values in VPlan.
+ if ((isa<VPWidenIntOrFpInductionRecipe>(V) &&
+ !cast<VPWidenIntOrFpInductionRecipe>(V)->getTruncInst()) ||
+ isa<VPWidenPointerInductionRecipe>(V) ||
+ (isa<Instruction>(IncomingValue) &&
+ OrigLoop->contains(cast<Instruction>(IncomingValue)) &&
+ any_of(IncomingValue->users(), [&Inductions](User *U) {
+ auto *P = dyn_cast<PHINode>(U);
+ return P && Inductions.contains(P);
+ })))
+ continue;
+ ExitUsersToFix.insert(ExitIRI);
+ ExitIRI->addOperand(V);
+ }
+ }
}
return ExitUsersToFix;
}
@@ -8911,28 +8944,31 @@ static SetVector<VPIRInstruction *> collectUsersInExitBlock(
// Add exit values to \p Plan. Extracts are added for each entry in \p
// ExitUsersToFix if needed and their operands are updated.
static void
-addUsersInExitBlock(VPlan &Plan,
- const SetVector<VPIRInstruction *> &ExitUsersToFix) {
+addUsersInExitBlocks(VPlan &Plan,
+ const SetVector<VPIRInstruction *> &ExitUsersToFix) {
if (ExitUsersToFix.empty())
return;
- auto *MiddleVPBB = Plan.getMiddleBlock();
- VPBuilder B(MiddleVPBB, MiddleVPBB->getFirstNonPhi());
-
// Introduce extract for exiting values and update the VPIRInstructions
// modeling the corresponding LCSSA phis.
for (VPIRInstruction *ExitIRI : ExitUsersToFix) {
+
VPValue *V = ExitIRI->getOperand(0);
// Pass live-in values used by exit phis directly through to their users in
// the exit block.
if (V->isLiveIn())
continue;
- LLVMContext &Ctx = ExitIRI->getInstruction().getContext();
- VPValue *Ext = B.createNaryOp(VPInstruction::ExtractFromEnd,
- {V, Plan.getOrAddLiveIn(ConstantInt::get(
- IntegerType::get(Ctx, 32), 1))});
- ExitIRI->setOperand(0, Ext);
+ for (VPBlockBase *PredVPB : ExitIRI->getParent()->getPredecessors()) {
+ auto *PredVPBB = cast<VPBasicBlock>(PredVPB);
+ VPBuilder B(PredVPBB, PredVPBB->getFirstNonPhi());
+
+ LLVMContext &Ctx = ExitIRI->getInstruction().getContext();
+ VPValue *Ext = B.createNaryOp(VPInstruction::ExtractFromEnd,
+ {V, Plan.getOrAddLiveIn(ConstantInt::get(
+ IntegerType::get(Ctx, 32), 1))});
+ ExitIRI->setOperand(0, Ext);
+ }
}
}
@@ -9204,11 +9240,17 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
"VPBasicBlock");
RecipeBuilder.fixHeaderPhis();
+ if (Legal->hasUncountableEarlyExit()) {
+ VPlanTransforms::handleUncountableEarlyExit(
+ *Plan, *PSE.getSE(), OrigLoop, Legal->getUncountableExitingBlocks(),
+ RecipeBuilder);
+ }
addScalarResumePhis(RecipeBuilder, *Plan);
- SetVector<VPIRInstruction *> ExitUsersToFix = collectUsersInExitBlock(
+ SetVector<VPIRInstruction *> ExitUsersToFix = collectUsersInExitBlocks(
OrigLoop, RecipeBuilder, *Plan, Legal->getInductionVars());
addExitUsersForFirstOrderRecurrences(*Plan, ExitUsersToFix);
- addUsersInExitBlock(*Plan, ExitUsersToFix);
+ addUsersInExitBlocks(*Plan, ExitUsersToFix);
+
// ---------------------------------------------------------------------------
// Transform initial VPlan: Apply previously taken decisions, in order, to
// bring the VPlan to its final state.
@@ -9968,12 +10010,31 @@ bool LoopVectorizePass::processLoop(Loop *L) {
}
if (LVL.hasUncountableEarlyExit()) {
- reportVectorizationFailure("Auto-vectorization of loops with uncountable "
- "early exit is not yet supported",
- "Auto-vectorization of loops with uncountable "
- "early exit is not yet supported",
- "UncountableEarlyExitLoopsUnsupported", ORE, L);
- return false;
+ if (!EnableEarlyExitVectorization) {
+ reportVectorizationFailure("Auto-vectorization of loops with uncountable "
+ "early exit is disabled",
+ "Auto-vectorization of loops with uncountable "
+ "early exit is disabled",
+ "UncountableEarlyExitLoopsDisabled", ORE,
+ L);
+ return false;
+ }
+ for (BasicBlock *BB : L->blocks()) {
+ for (Instruction &I : *BB) {
+ for (User *U : I.users()) {
+ Instruction *UI = cast<Instruction>(U);
+ if (!L->contains(UI)) {
+ reportVectorizationFailure(
+ "Auto-vectorization of loops with uncountable "
+ "early exit and live-outs is not yet supported",
+ "Auto-vectorization of loop with uncountable "
+ "early exit and live-outs is not yet supported",
+ "UncountableEarlyExitLoopLiveOutsUnsupported", ORE, L);
+ return false;
+ }
+ }
+ }
+ }
}
// Entrance to the VPlan-native vectorization path. Outer loops are processed
@@ -9990,6 +10051,22 @@ bool LoopVectorizePass::processLoop(Loop *L) {
InterleavedAccessInfo IAI(PSE, L, DT, LI, LVL.getLAI());
bool UseInterleaved = TTI->enableInterleavedAccessVectorization();
+ if (LVL.hasUncountableEarlyExit()) {
+ BasicBlock *LoopLatch = L->getLoopLatch();
+ if (IAI.requiresScalarEpilogue() ||
+ llvm::any_of(LVL.getCountableExitingBlocks(), [LoopLatch](BasicBlock *BB) {
+ return BB != LoopLatch;
+ })) {
+ reportVectorizationFailure("Auto-vectorization of early exit loops "
+ "requiring a scalar epilogue is unsupported",
+ "Auto-vectorization of early exit loops "
+ "requiring a scalar epilogue is unsupported",
+ "UncountableEarlyExitUnsupported", ORE,
+ L);
+ return false;
+ }
+ }
+
// If an override option has been passed in for interleaved accesses, use it.
if (EnableInterleavedMemAccesses.getNumOccurrences() > 0)
UseInterleaved = EnableInterleavedMemAccesses;
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 8b1a4aeb88f81f..63c04bbb11e505 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -870,15 +870,9 @@ VPlanPtr VPlan::createInitialVPlan(Type *InductionTy,
auto Plan = std::make_unique<VPlan>(Entry, VecPreheader, ScalarHeader);
// Create SCEV and VPValue for the trip count.
-
- // Currently only loops with countable exits are vectorized, but cal...
[truncated]
|
@llvm/pr-subscribers-llvm-transforms Author: David Sherwood (david-arm) ChangesPR #112138 introduced initial support for dispatching to
Patch is 72.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117008.diff 19 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index f1568781252c06..13af2fde44459b 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -1375,6 +1375,13 @@ bool LoopVectorizationLegality::isFixedOrderRecurrence(
}
bool LoopVectorizationLegality::blockNeedsPredication(BasicBlock *BB) const {
+ // The only block currently permitted after the early exiting block is the
+ // loop latch, so only that blocks needs predication.
+ // FIXME: Once we support instructions in the loop that cannot be executed
+ // speculatively, such as stores, we will also need to predicate all blocks
+ // leading up to the early exit too.
+ if (hasUncountableEarlyExit() && BB == TheLoop->getLoopLatch())
+ return true;
return LoopAccessInfo::blockNeedsPredication(BB, TheLoop, DT);
}
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 5073622a095537..80818a162d121c 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -385,6 +385,11 @@ static cl::opt<bool> UseWiderVFIfCallVariantsPresent(
cl::Hidden,
cl::desc("Try wider VFs if they enable the use of vector variants"));
+static cl::opt<bool> EnableEarlyExitVectorization(
+ "enable-early-exit-vectorization", cl::init(false), cl::Hidden,
+ cl::desc(
+ "Enable vectorization of early exit loops with uncountable exits."));
+
// Likelyhood of bypassing the vectorized loop because assumptions about SCEV
// variables not overflowing do not hold. See `emitSCEVChecks`.
static constexpr uint32_t SCEVCheckBypassWeights[] = {1, 127};
@@ -1350,9 +1355,10 @@ class LoopVectorizationCostModel {
LLVM_DEBUG(dbgs() << "LV: Loop does not require scalar epilogue\n");
return false;
}
- // If we might exit from anywhere but the latch, must run the exiting
- // iteration in scalar form.
- if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
+ // If we might exit from anywhere but the latch and early exit vectorization
+ // is disabled, we must run the exiting iteration in scalar form.
+ if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch() &&
+ !(EnableEarlyExitVectorization && Legal->hasUncountableEarlyExit())) {
LLVM_DEBUG(dbgs() << "LV: Loop requires scalar epilogue: not exiting "
"from latch block\n");
return true;
@@ -2568,9 +2574,9 @@ BasicBlock *InnerLoopVectorizer::emitMemRuntimeChecks(BasicBlock *Bypass) {
void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
LoopVectorPreHeader = OrigLoop->getLoopPreheader();
assert(LoopVectorPreHeader && "Invalid loop structure");
- assert((OrigLoop->getUniqueExitBlock() ||
+ assert((OrigLoop->getUniqueLatchExitBlock() ||
Cost->requiresScalarEpilogue(VF.isVector())) &&
- "multiple exit loop without required epilogue?");
+ "loops not exiting via the latch without required epilogue?");
LoopMiddleBlock =
SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT,
@@ -2753,8 +2759,6 @@ void InnerLoopVectorizer::fixupIVUsers(PHINode *OrigPhi,
// value (the value that feeds into the phi from the loop latch).
// We allow both, but they, obviously, have different values.
- assert(OrigLoop->getUniqueExitBlock() && "Expected a single exit block");
-
DenseMap<Value *, Value *> MissingVals;
Value *EndValue = cast<PHINode>(OrigPhi->getIncomingValueForBlock(
@@ -2808,6 +2812,8 @@ void InnerLoopVectorizer::fixupIVUsers(PHINode *OrigPhi,
}
}
+ assert((MissingVals.empty() || OrigLoop->getUniqueExitBlock()) &&
+ "Expected a single exit block for escaping values");
for (auto &I : MissingVals) {
PHINode *PHI = cast<PHINode>(I.first);
// One corner case we have to handle is two IVs "chasing" each-other,
@@ -2939,6 +2945,21 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
PSE.getSE()->forgetLoop(OrigLoop);
PSE.getSE()->forgetBlockAndLoopDispositions();
+ // When dealing with uncountable early exits we create middle.split blocks
+ // between the vector loop region and the exit block. These blocks need
+ // adding to any outer loop.
+ VPRegionBlock *VectorRegion = State.Plan->getVectorLoopRegion();
+ Loop *OuterLoop = OrigLoop->getParentLoop();
+ if (Legal->hasUncountableEarlyExit() && OuterLoop) {
+ VPBasicBlock *MiddleVPBB = State.Plan->getMiddleBlock();
+ VPBlockBase *PredVPBB = MiddleVPBB->getSinglePredecessor();
+ while (PredVPBB && PredVPBB != VectorRegion) {
+ BasicBlock *MiddleSplitBB = State.CFG.VPBB2IRBB[cast<VPBasicBlock>(PredVPBB)];
+ OuterLoop->addBasicBlockToLoop(MiddleSplitBB, *LI);
+ PredVPBB = PredVPBB->getSinglePredecessor();
+ }
+ }
+
// After vectorization, the exit blocks of the original loop will have
// additional predecessors. Invalidate SCEVs for the exit phis in case SE
// looked through single-entry phis.
@@ -2969,7 +2990,6 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
for (Instruction *PI : PredicatedInstructions)
sinkScalarOperands(&*PI);
- VPRegionBlock *VectorRegion = State.Plan->getVectorLoopRegion();
VPBasicBlock *HeaderVPBB = VectorRegion->getEntryBasicBlock();
BasicBlock *HeaderBB = State.CFG.VPBB2IRBB[HeaderVPBB];
@@ -3591,7 +3611,8 @@ void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {
TheLoop->getExitingBlocks(Exiting);
for (BasicBlock *E : Exiting) {
auto *Cmp = dyn_cast<Instruction>(E->getTerminator()->getOperand(0));
- if (Cmp && TheLoop->contains(Cmp) && Cmp->hasOneUse())
+ if (Cmp && TheLoop->contains(Cmp) && Cmp->hasOneUse() &&
+ (TheLoop->getLoopLatch() == E || !Legal->hasUncountableEarlyExit()))
AddToWorklistIfAllowed(Cmp);
}
@@ -4044,7 +4065,8 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
// a bottom-test and a single exiting block. We'd have to handle the fact
// that not every instruction executes on the last iteration. This will
// require a lane mask which varies through the vector loop body. (TODO)
- if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
+ if (Legal->hasUncountableEarlyExit() ||
+ TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
// If there was a tail-folding hint/switch, but we can't fold the tail by
// masking, fallback to a vectorization with a scalar epilogue.
if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {
@@ -4663,7 +4685,9 @@ bool LoopVectorizationPlanner::isCandidateForEpilogueVectorization(
// Epilogue vectorization code has not been auditted to ensure it handles
// non-latch exits properly. It may be fine, but it needs auditted and
// tested.
- if (OrigLoop->getExitingBlock() != OrigLoop->getLoopLatch())
+ // TODO: Add support for loops with an early exit.
+ if (Legal->hasUncountableEarlyExit() ||
+ OrigLoop->getExitingBlock() != OrigLoop->getLoopLatch())
return false;
return true;
@@ -4913,6 +4937,10 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
if (!Legal->isSafeForAnyVectorWidth())
return 1;
+ // We don't attempt to perform interleaving for early exit loops.
+ if (Legal->hasUncountableEarlyExit())
+ return 1;
+
auto BestKnownTC = getSmallBestKnownTC(PSE, TheLoop);
const bool HasReductions = !Legal->getReductionVars().empty();
@@ -7746,11 +7774,14 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
// 2.5 Collect reduction resume values.
auto *ExitVPBB = BestVPlan.getMiddleBlock();
- if (VectorizingEpilogue)
+ if (VectorizingEpilogue) {
+ assert(!ILV.Legal->hasUncountableEarlyExit() &&
+ "Epilogue vectorisation not yet supported with early exits");
for (VPRecipeBase &R : *ExitVPBB) {
fixReductionScalarResumeWhenVectorizingEpilog(
&R, State, State.CFG.VPBB2IRBB[ExitVPBB]);
}
+ }
// 2.6. Maintain Loop Hints
// Keep all loop hints from the original loop on the vector loop (we'll
@@ -7775,6 +7806,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
LoopVectorizeHints Hints(L, true, *ORE);
Hints.setAlreadyVectorized();
}
+
TargetTransformInfo::UnrollingPreferences UP;
TTI.getUnrollingPreferences(L, *PSE.getSE(), UP, ORE);
if (!UP.UnrollVectorizedLoop || CanonicalIVStartValue)
@@ -7787,15 +7819,17 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
ILV.printDebugTracesAtEnd();
// 4. Adjust branch weight of the branch in the middle block.
- auto *MiddleTerm =
- cast<BranchInst>(State.CFG.VPBB2IRBB[ExitVPBB]->getTerminator());
- if (MiddleTerm->isConditional() &&
- hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator())) {
- // Assume that `Count % VectorTripCount` is equally distributed.
- unsigned TripCount = BestVPlan.getUF() * State.VF.getKnownMinValue();
- assert(TripCount > 0 && "trip count should not be zero");
- const uint32_t Weights[] = {1, TripCount - 1};
- setBranchWeights(*MiddleTerm, Weights, /*IsExpected=*/false);
+ if (ExitVPBB) {
+ auto *MiddleTerm =
+ cast<BranchInst>(State.CFG.VPBB2IRBB[ExitVPBB]->getTerminator());
+ if (MiddleTerm->isConditional() &&
+ hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator())) {
+ // Assume that `Count % VectorTripCount` is equally distributed.
+ unsigned TripCount = BestVPlan.getUF() * State.VF.getKnownMinValue();
+ assert(TripCount > 0 && "trip count should not be zero");
+ const uint32_t Weights[] = {1, TripCount - 1};
+ setBranchWeights(*MiddleTerm, Weights, /*IsExpected=*/false);
+ }
}
return State.ExpandedSCEVs;
@@ -8180,7 +8214,7 @@ VPValue *VPRecipeBuilder::createEdgeMask(BasicBlock *Src, BasicBlock *Dst) {
// If source is an exiting block, we know the exit edge is dynamically dead
// in the vector loop, and thus we don't need to restrict the mask. Avoid
// adding uses of an otherwise potentially dead instruction.
- if (OrigLoop->isLoopExiting(Src))
+ if (!Legal->hasUncountableEarlyExit() && OrigLoop->isLoopExiting(Src))
return EdgeMaskCache[Edge] = SrcMask;
VPValue *EdgeMask = getVPValueOrAddLiveIn(BI->getCondition());
@@ -8863,47 +8897,46 @@ static void addScalarResumePhis(VPRecipeBuilder &Builder, VPlan &Plan) {
}
}
-// Collect VPIRInstructions for phis in the original exit block that are modeled
+// Collect VPIRInstructions for phis in the exit blocks that are modeled
// in VPlan and add the exiting VPValue as operand. Some exiting values are not
// modeled explicitly yet and won't be included. Those are un-truncated
// VPWidenIntOrFpInductionRecipe, VPWidenPointerInductionRecipe and induction
// increments.
-static SetVector<VPIRInstruction *> collectUsersInExitBlock(
+static SetVector<VPIRInstruction *> collectUsersInExitBlocks(
Loop *OrigLoop, VPRecipeBuilder &Builder, VPlan &Plan,
const MapVector<PHINode *, InductionDescriptor> &Inductions) {
- auto *MiddleVPBB = Plan.getMiddleBlock();
- // No edge from the middle block to the unique exit block has been inserted
- // and there is nothing to fix from vector loop; phis should have incoming
- // from scalar loop only.
- if (MiddleVPBB->getNumSuccessors() != 2)
- return {};
SetVector<VPIRInstruction *> ExitUsersToFix;
- VPBasicBlock *ExitVPBB = cast<VPIRBasicBlock>(MiddleVPBB->getSuccessors()[0]);
- BasicBlock *ExitingBB = OrigLoop->getExitingBlock();
- for (VPRecipeBase &R : *ExitVPBB) {
- auto *ExitIRI = dyn_cast<VPIRInstruction>(&R);
- if (!ExitIRI)
- continue;
- auto *ExitPhi = dyn_cast<PHINode>(&ExitIRI->getInstruction());
- if (!ExitPhi)
- break;
- Value *IncomingValue = ExitPhi->getIncomingValueForBlock(ExitingBB);
- VPValue *V = Builder.getVPValueOrAddLiveIn(IncomingValue);
- // Exit values for inductions are computed and updated outside of VPlan and
- // independent of induction recipes.
- // TODO: Compute induction exit values in VPlan.
- if ((isa<VPWidenIntOrFpInductionRecipe>(V) &&
- !cast<VPWidenIntOrFpInductionRecipe>(V)->getTruncInst()) ||
- isa<VPWidenPointerInductionRecipe>(V) ||
- (isa<Instruction>(IncomingValue) &&
- OrigLoop->contains(cast<Instruction>(IncomingValue)) &&
- any_of(IncomingValue->users(), [&Inductions](User *U) {
- auto *P = dyn_cast<PHINode>(U);
- return P && Inductions.contains(P);
- })))
- continue;
- ExitUsersToFix.insert(ExitIRI);
- ExitIRI->addOperand(V);
+ for (VPIRBasicBlock *ExitVPBB : Plan.getExitBlocks()) {
+ BasicBlock *ExitBB = ExitVPBB->getIRBasicBlock();
+ for (VPRecipeBase &R : *ExitVPBB) {
+ auto *ExitIRI = dyn_cast<VPIRInstruction>(&R);
+ if (!ExitIRI)
+ continue;
+ auto *ExitPhi = dyn_cast<PHINode>(&ExitIRI->getInstruction());
+ if (!ExitPhi)
+ break;
+ for (BasicBlock *ExitingBB : predecessors(ExitBB)) {
+ if (!OrigLoop->contains(ExitingBB))
+ continue;
+ Value *IncomingValue = ExitPhi->getIncomingValueForBlock(ExitingBB);
+ VPValue *V = Builder.getVPValueOrAddLiveIn(IncomingValue);
+ // Exit values for inductions are computed and updated outside of VPlan
+ // and independent of induction recipes.
+ // TODO: Compute induction exit values in VPlan.
+ if ((isa<VPWidenIntOrFpInductionRecipe>(V) &&
+ !cast<VPWidenIntOrFpInductionRecipe>(V)->getTruncInst()) ||
+ isa<VPWidenPointerInductionRecipe>(V) ||
+ (isa<Instruction>(IncomingValue) &&
+ OrigLoop->contains(cast<Instruction>(IncomingValue)) &&
+ any_of(IncomingValue->users(), [&Inductions](User *U) {
+ auto *P = dyn_cast<PHINode>(U);
+ return P && Inductions.contains(P);
+ })))
+ continue;
+ ExitUsersToFix.insert(ExitIRI);
+ ExitIRI->addOperand(V);
+ }
+ }
}
return ExitUsersToFix;
}
@@ -8911,28 +8944,31 @@ static SetVector<VPIRInstruction *> collectUsersInExitBlock(
// Add exit values to \p Plan. Extracts are added for each entry in \p
// ExitUsersToFix if needed and their operands are updated.
static void
-addUsersInExitBlock(VPlan &Plan,
- const SetVector<VPIRInstruction *> &ExitUsersToFix) {
+addUsersInExitBlocks(VPlan &Plan,
+ const SetVector<VPIRInstruction *> &ExitUsersToFix) {
if (ExitUsersToFix.empty())
return;
- auto *MiddleVPBB = Plan.getMiddleBlock();
- VPBuilder B(MiddleVPBB, MiddleVPBB->getFirstNonPhi());
-
// Introduce extract for exiting values and update the VPIRInstructions
// modeling the corresponding LCSSA phis.
for (VPIRInstruction *ExitIRI : ExitUsersToFix) {
+
VPValue *V = ExitIRI->getOperand(0);
// Pass live-in values used by exit phis directly through to their users in
// the exit block.
if (V->isLiveIn())
continue;
- LLVMContext &Ctx = ExitIRI->getInstruction().getContext();
- VPValue *Ext = B.createNaryOp(VPInstruction::ExtractFromEnd,
- {V, Plan.getOrAddLiveIn(ConstantInt::get(
- IntegerType::get(Ctx, 32), 1))});
- ExitIRI->setOperand(0, Ext);
+ for (VPBlockBase *PredVPB : ExitIRI->getParent()->getPredecessors()) {
+ auto *PredVPBB = cast<VPBasicBlock>(PredVPB);
+ VPBuilder B(PredVPBB, PredVPBB->getFirstNonPhi());
+
+ LLVMContext &Ctx = ExitIRI->getInstruction().getContext();
+ VPValue *Ext = B.createNaryOp(VPInstruction::ExtractFromEnd,
+ {V, Plan.getOrAddLiveIn(ConstantInt::get(
+ IntegerType::get(Ctx, 32), 1))});
+ ExitIRI->setOperand(0, Ext);
+ }
}
}
@@ -9204,11 +9240,17 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
"VPBasicBlock");
RecipeBuilder.fixHeaderPhis();
+ if (Legal->hasUncountableEarlyExit()) {
+ VPlanTransforms::handleUncountableEarlyExit(
+ *Plan, *PSE.getSE(), OrigLoop, Legal->getUncountableExitingBlocks(),
+ RecipeBuilder);
+ }
addScalarResumePhis(RecipeBuilder, *Plan);
- SetVector<VPIRInstruction *> ExitUsersToFix = collectUsersInExitBlock(
+ SetVector<VPIRInstruction *> ExitUsersToFix = collectUsersInExitBlocks(
OrigLoop, RecipeBuilder, *Plan, Legal->getInductionVars());
addExitUsersForFirstOrderRecurrences(*Plan, ExitUsersToFix);
- addUsersInExitBlock(*Plan, ExitUsersToFix);
+ addUsersInExitBlocks(*Plan, ExitUsersToFix);
+
// ---------------------------------------------------------------------------
// Transform initial VPlan: Apply previously taken decisions, in order, to
// bring the VPlan to its final state.
@@ -9968,12 +10010,31 @@ bool LoopVectorizePass::processLoop(Loop *L) {
}
if (LVL.hasUncountableEarlyExit()) {
- reportVectorizationFailure("Auto-vectorization of loops with uncountable "
- "early exit is not yet supported",
- "Auto-vectorization of loops with uncountable "
- "early exit is not yet supported",
- "UncountableEarlyExitLoopsUnsupported", ORE, L);
- return false;
+ if (!EnableEarlyExitVectorization) {
+ reportVectorizationFailure("Auto-vectorization of loops with uncountable "
+ "early exit is disabled",
+ "Auto-vectorization of loops with uncountable "
+ "early exit is disabled",
+ "UncountableEarlyExitLoopsDisabled", ORE,
+ L);
+ return false;
+ }
+ for (BasicBlock *BB : L->blocks()) {
+ for (Instruction &I : *BB) {
+ for (User *U : I.users()) {
+ Instruction *UI = cast<Instruction>(U);
+ if (!L->contains(UI)) {
+ reportVectorizationFailure(
+ "Auto-vectorization of loops with uncountable "
+ "early exit and live-outs is not yet supported",
+ "Auto-vectorization of loop with uncountable "
+ "early exit and live-outs is not yet supported",
+ "UncountableEarlyExitLoopLiveOutsUnsupported", ORE, L);
+ return false;
+ }
+ }
+ }
+ }
}
// Entrance to the VPlan-native vectorization path. Outer loops are processed
@@ -9990,6 +10051,22 @@ bool LoopVectorizePass::processLoop(Loop *L) {
InterleavedAccessInfo IAI(PSE, L, DT, LI, LVL.getLAI());
bool UseInterleaved = TTI->enableInterleavedAccessVectorization();
+ if (LVL.hasUncountableEarlyExit()) {
+ BasicBlock *LoopLatch = L->getLoopLatch();
+ if (IAI.requiresScalarEpilogue() ||
+ llvm::any_of(LVL.getCountableExitingBlocks(), [LoopLatch](BasicBlock *BB) {
+ return BB != LoopLatch;
+ })) {
+ reportVectorizationFailure("Auto-vectorization of early exit loops "
+ "requiring a scalar epilogue is unsupported",
+ "Auto-vectorization of early exit loops "
+ "requiring a scalar epilogue is unsupported",
+ "UncountableEarlyExitUnsupported", ORE,
+ L);
+ return false;
+ }
+ }
+
// If an override option has been passed in for interleaved accesses, use it.
if (EnableInterleavedMemAccesses.getNumOccurrences() > 0)
UseInterleaved = EnableInterleavedMemAccesses;
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 8b1a4aeb88f81f..63c04bbb11e505 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -870,15 +870,9 @@ VPlanPtr VPlan::createInitialVPlan(Type *InductionTy,
auto Plan = std::make_unique<VPlan>(Entry, VecPreheader, ScalarHeader);
// Create SCEV and VPValue for the trip count.
-
- // Currently only loops with countable exits are vectorized, but cal...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
18421f4
to
86cf8a1
Compare
PR llvm#112138 introduced initial support for dispatching to multiple exit blocks via split middle blocks. This patch fixes a few issues so that we can enable more tests to use the new enable-early-exit-vectorization flag. Fixes are: 1. The code to bail out for any loop live-out values happens too late. This is because collectUsersInExitBlocks ignores induction variables, which get dealt with in fixupIVUsers. I've moved the check much earlier in processLoop by looking for outside users of loop-defined values. 2. We shouldn't yet be interleaving when vectorising loops with uncountable early exits, since we've not added support for this yet. 3. Similarly, we also shouldn't be creating vector epilogues. 4. Similarly, we shouldn't enable tail-folding. 5. The existing implementation doesn't yet support loops that require scalar epilogues, although I plan to add that as part of PR llvm#88385. 6. The new split middle blocks weren't being added to the parent loop.
; RUN: opt -S < %s -p loop-vectorize -enable-early-exit-vectorization -enable-early-exit-vectorization \ | ||
; RUN: | FileCheck %s --check-prefix=MAY_FAULT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given there is only a single RUN line, why change the CHECK prefix? As it stands it's just making the change look larger than necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
"Auto-vectorization of loops with uncountable " | ||
"early exit and live-outs is not yet supported", | ||
"Auto-vectorization of loop with uncountable " | ||
"early exit and live-outs is not yet supported", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'd just present the facts and drop the "yet".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
declare void @init_mem(ptr, i64); | ||
|
||
|
||
define void @early_exit_in_outer_loop1() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some minimum commentary to highlight the difference between the loop1 and loop2 variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
return false; | ||
} | ||
|
||
// Needed to prevent InnerLoopVectorizer::fixupIVUsers from crashing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this comment. What scenario causes fixupIVUsers
to crash? I'm not after huge detail but if there is a specific property of a user that is not supported then it would be nice to mention it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
// We don't attempt to perform interleaving for early exit loops. | ||
if (Legal->hasUncountableEarlyExit()) | ||
return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add more details on why not? Is all that would be needed to deal with multiple parts in AnyOf & co?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's because we'd have to support multiple parts in AnyOf and I think we probably need to investigate the best approach and check that the generated code isn't horrible.
if (LVL.hasUncountableEarlyExit()) { | ||
BasicBlock *LoopLatch = L->getLoopLatch(); | ||
if (IAI.requiresScalarEpilogue() || | ||
llvm::any_of(LVL.getCountableExitingBlocks(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llvm::any_of(LVL.getCountableExitingBlocks(), | |
any_of(LVL.getCountableExitingBlocks(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
"Auto-vectorization of loops with uncountable " | ||
"early exit and live-outs is not yet supported", | ||
"Auto-vectorization of loop with uncountable " | ||
"early exit and live-outs is not yet supported", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding a variant that takes a single message instead of duplicating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! I've added one and I've started using it in this patch, but as a follow-on I'm happy to clean up any other cases in the vectoriser or legality code that take duplicated strings.
BasicBlock *BypassBlock = ILV.getAdditionalBypassBlock(); | ||
for (VPRecipeBase &R : *ExitVPBB) { | ||
fixReductionScalarResumeWhenVectorizingEpilog( | ||
&R, State, State.CFG.VPBB2IRBB[ExitVPBB], BypassBlock); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -4123,7 +4138,8 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) { | |||
// a bottom-test and a single exiting block. We'd have to handle the fact | |||
// that not every instruction executes on the last iteration. This will | |||
// require a lane mask which varies through the vector loop body. (TODO) | |||
if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) { | |||
if (Legal->hasUncountableEarlyExit() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? If there's an uncountable early exit, there won't be a single exiting block and the check below will be true (as there must be a latch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, the latch should always be non-null.
@@ -4753,7 +4769,9 @@ bool LoopVectorizationPlanner::isCandidateForEpilogueVectorization( | |||
// Epilogue vectorization code has not been auditted to ensure it handles | |||
// non-latch exits properly. It may be fine, but it needs auditted and | |||
// tested. | |||
if (OrigLoop->getExitingBlock() != OrigLoop->getLoopLatch()) | |||
// TODO: Add support for loops with an early exit. | |||
if (Legal->hasUncountableEarlyExit() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? If there's an uncountable early exit, there won't be a single exiting block and the check below will be true (as there must be a latch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, the latch should always be non-null.
for (BasicBlock *BB : L->blocks()) { | ||
for (Instruction &I : *BB) { | ||
for (User *U : I.users()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the concern only relates to induction variables can you make use of LoopVectorizationLegality::getInductionVars()
? If not and you need to walk across all the BasicBlocks, would it be sufficient to just iterate through a block's phi
instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit surprised to see those changes were needed given the existing checks in VPlan. I had a look to see why we were missing cases. Looks like we weren't properly handling exit phis with multiple operands properly, which should be fixable #120260
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an issue I believe I described in a previous vectoriser call and at the time I mentioned I had a fix for this already. I just haven't had chance to land the fix because it depended upon this patch landing first. In the spirit of landing small, incremental patches I decided it was best to run all the vectoriser tests with the new flag to get the code defended, then follow on with incremental fixes and additional functionality. This patch holds up several other patches and I would prefer not to make this patch dependent on #120260.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any concerns regarding #120260 or will the future patches rely on the IR based checks?
It is also possible to share multiple stacked PR to give a preview of what changes are in the pipeline (one option is to have a PR just include all commits or do stacked PRs via user branches https://discourse.llvm.org/t/update-on-github-pull-requests/71540/146?u=fhahn)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sounds good to me, I guess we can remove the extra checks again as part of #120260 |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/12513 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/10908 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/65/builds/9553 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/175/builds/10400 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/185/builds/10400 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/137/builds/10531 Here is the relevant piece of the build log for the reference
|
Hi, looks like a merge issue as the tests were passing before I merged. I'll revert the patch and reapply. |
Looks like the test updates were not in sync with current |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/14787 Here is the relevant piece of the build log for the reference
|
@david-arm it looks like the patch wasn't rebased in a while, and some of the check lines were stale and just needed a simple update. The only way to avoid this currently is updating the branch before merging and waiting for the precommit tests to complete again, as otherwise it won't catch failures due to changes on |
That's fast @fhahn - thank you! I was just about to look into it. I do normally rebase downstream and run |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/15495 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/11593 Here is the relevant piece of the build log for the reference
|
@david-arm just noticed it failing locally. Pressing the |
Well anyway, apologies for my silly mistake. I seem to remember in the past there were times I rebased and it didn't trigger a new run, but maybe I was doing something wrong. Even so, I "normally" prefer to be paranoid and run "make check-all" downstream, except for today when I didn't. :) |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/6556 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/24/builds/3370 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/8496 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/17922 Here is the relevant piece of the build log for the reference
|
This ensures that all blocks created during VPlan execution are properly added to an enclosing loop, if present. Split off from #108378 and also needed once more of the skeleton blocks are created directly via VPlan. This also allows removing the custom logic for early-exit loop vectorization added as part of #117008.
…xecute (NFCI). This ensures that all blocks created during VPlan execution are properly added to an enclosing loop, if present. Split off from llvm/llvm-project#108378 and also needed once more of the skeleton blocks are created directly via VPlan. This also allows removing the custom logic for early-exit loop vectorization added as part of llvm/llvm-project#117008.
PR #112138 introduced initial support for dispatching to
multiple exit blocks via split middle blocks. This patch
fixes a few issues so that we can enable more tests to use
the new enable-early-exit-vectorization flag. Fixes are:
too late. This is because collectUsersInExitBlocks ignores
induction variables, which get dealt with in fixupIVUsers.
I've moved the check much earlier in processLoop by looking
for outside users of loop-defined values.
with uncountable early exits, since we've not added support
for this yet.
that require scalar epilogues, although I plan to add that
as part of PR [LoopVectorize] Add support for vectorisation of more early exit loops #88385.
parent loop.