Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. #109975

Merged
merged 10 commits into from
Oct 31, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Sep 25, 2024

Update VPlan to include the scalar loop header. This allows retiring VPLiveOut, as the remaining live-outs can now be handled by adding operands to the wrapped phis in the scalar loop header.

Note that the current version only includes the scalar loop header, no other loop blocks and also does not wrap it in a region block.

@llvmbot
Copy link
Member

llvmbot commented Sep 25, 2024

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Update VPlan to include the scalar loop header. This allows retiring VPLiveOut, as the remaining live-outs can now be handled by adding operands to the wrapped phis in the scalar loop header.

Note that the current version only includes the scalar loop header, no other loop blocks and also does not wrap it in a region block. This can either be included in this PR or in follow-ups as needed.


Full diff: https://github.com/llvm/llvm-project/pull/109975.diff

8 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+8-5)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+15-23)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (-53)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+4-30)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+1-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp (+6-6)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp (-14)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll (+18)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 5e4f33c55610f1..4305750eda26fc 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2956,10 +2956,6 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
                    IVEndValues[Entry.first], LoopMiddleBlock, Plan, State);
   }
 
-  // Fix live-out phis not already fixed earlier.
-  for (const auto &KV : Plan.getLiveOuts())
-    KV.second->fixPhi(Plan, State);
-
   for (Instruction *PI : PredicatedInstructions)
     sinkScalarOperands(&*PI);
 
@@ -8816,7 +8812,14 @@ static void addLiveOutsForFirstOrderRecurrences(
         VPInstruction::ResumePhi, {Resume, FOR->getStartValue()}, {},
         "scalar.recur.init");
     auto *FORPhi = cast<PHINode>(FOR->getUnderlyingInstr());
-    Plan.addLiveOut(FORPhi, ResumePhiRecipe);
+    for (VPRecipeBase &R :
+         *cast<VPIRBasicBlock>(ScalarPHVPBB->getSingleSuccessor())) {
+      auto *IRI = cast<VPIRInstruction>(&R);
+      if (&IRI->getInstruction() == FORPhi) {
+        IRI->addOperand(ResumePhiRecipe);
+        break;
+      }
+    }
 
     // Now update VPIRInstructions modeling LCSSA phis in the exit block.
     // Extract the penultimate value of the recurrence and use it as operand for
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 5e4d487261c6f0..e77fd8c5dca840 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -457,10 +457,17 @@ void VPIRBasicBlock::execute(VPTransformState *State) {
   State->Builder.SetInsertPoint(getIRBasicBlock()->getTerminator());
   executeRecipes(State, getIRBasicBlock());
   if (getSingleSuccessor()) {
-    assert(isa<UnreachableInst>(getIRBasicBlock()->getTerminator()));
-    auto *Br = State->Builder.CreateBr(getIRBasicBlock());
-    Br->setOperand(0, nullptr);
-    getIRBasicBlock()->getTerminator()->eraseFromParent();
+    auto *SuccVPIRBB = dyn_cast<VPIRBasicBlock>(getSingleSuccessor());
+    if (SuccVPIRBB && SuccVPIRBB->getIRBasicBlock() ==
+                          getIRBasicBlock()->getSingleSuccessor()) {
+      cast<BranchInst>(getIRBasicBlock()->getTerminator())
+          ->setOperand(0, nullptr);
+    } else {
+      assert(isa<UnreachableInst>(getIRBasicBlock()->getTerminator()));
+      auto *Br = State->Builder.CreateBr(getIRBasicBlock());
+      Br->setOperand(0, nullptr);
+      getIRBasicBlock()->getTerminator()->eraseFromParent();
+    }
   }
 
   for (VPBlockBase *PredVPBlock : getHierarchicalPredecessors()) {
@@ -844,10 +851,6 @@ void VPRegionBlock::print(raw_ostream &O, const Twine &Indent,
 #endif
 
 VPlan::~VPlan() {
-  for (auto &KV : LiveOuts)
-    delete KV.second;
-  LiveOuts.clear();
-
   if (Entry) {
     VPValue DummyValue;
     for (VPBlockBase *Block : vp_depth_first_shallow(Entry))
@@ -902,6 +905,8 @@ VPlanPtr VPlan::createInitialVPlan(Type *InductionTy,
   VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);
 
   VPBasicBlock *ScalarPH = new VPBasicBlock("scalar.ph");
+  VPBasicBlock *ScalarHeader = createVPIRBasicBlockFor(TheLoop->getHeader());
+  VPBlockUtils::connectBlocks(ScalarPH, ScalarHeader);
   if (!RequiresScalarEpilogueCheck) {
     VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
     return Plan;
@@ -1051,6 +1056,8 @@ void VPlan::execute(VPTransformState *State) {
   BrInst->insertBefore(MiddleBB->getTerminator());
   MiddleBB->getTerminator()->eraseFromParent();
   State->CFG.DTU.applyUpdates({{DominatorTree::Delete, MiddleBB, ScalarPh}});
+  State->CFG.DTU.applyUpdates(
+      {{DominatorTree::Delete, ScalarPh, ScalarPh->getSingleSuccessor()}});
 
   // Generate code in the loop pre-header and body.
   for (VPBlockBase *Block : vp_depth_first_shallow(Entry))
@@ -1169,12 +1176,6 @@ void VPlan::print(raw_ostream &O) const {
     Block->print(O, "", SlotTracker);
   }
 
-  if (!LiveOuts.empty())
-    O << "\n";
-  for (const auto &KV : LiveOuts) {
-    KV.second->print(O, SlotTracker);
-  }
-
   O << "}\n";
 }
 
@@ -1211,11 +1212,6 @@ LLVM_DUMP_METHOD
 void VPlan::dump() const { print(dbgs()); }
 #endif
 
-void VPlan::addLiveOut(PHINode *PN, VPValue *V) {
-  assert(LiveOuts.count(PN) == 0 && "an exit value for PN already exists");
-  LiveOuts.insert({PN, new VPLiveOut(PN, V)});
-}
-
 static void remapOperands(VPBlockBase *Entry, VPBlockBase *NewEntry,
                           DenseMap<VPValue *, VPValue *> &Old2NewVPValues) {
   // Update the operands of all cloned recipes starting at NewEntry. This
@@ -1283,10 +1279,6 @@ VPlan *VPlan::duplicate() {
   remapOperands(Preheader, NewPreheader, Old2NewVPValues);
   remapOperands(Entry, NewEntry, Old2NewVPValues);
 
-  // Clone live-outs.
-  for (const auto &[_, LO] : LiveOuts)
-    NewPlan->addLiveOut(LO->getPhi(), Old2NewVPValues[LO->getOperand(0)]);
-
   // Initialize remaining fields of cloned VPlan.
   NewPlan->VFs = VFs;
   NewPlan->UFs = UFs;
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index c886a39aec76e5..e2caf3dcdfd4af 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -671,48 +671,6 @@ class VPBlockBase {
   virtual VPBlockBase *clone() = 0;
 };
 
-/// A value that is used outside the VPlan. The operand of the user needs to be
-/// added to the associated phi node. The incoming block from VPlan is
-/// determined by where the VPValue is defined: if it is defined by a recipe
-/// outside a region, its parent block is used, otherwise the middle block is
-/// used.
-class VPLiveOut : public VPUser {
-  PHINode *Phi;
-
-public:
-  VPLiveOut(PHINode *Phi, VPValue *Op)
-      : VPUser({Op}, VPUser::VPUserID::LiveOut), Phi(Phi) {}
-
-  static inline bool classof(const VPUser *U) {
-    return U->getVPUserID() == VPUser::VPUserID::LiveOut;
-  }
-
-  /// Fix the wrapped phi node. This means adding an incoming value to exit
-  /// block phi's from the vector loop via middle block (values from scalar loop
-  /// already reach these phi's), and updating the value to scalar header phi's
-  /// from the scalar preheader.
-  void fixPhi(VPlan &Plan, VPTransformState &State);
-
-  /// Returns true if the VPLiveOut uses scalars of operand \p Op.
-  bool usesScalars(const VPValue *Op) const override {
-    assert(is_contained(operands(), Op) &&
-           "Op must be an operand of the recipe");
-    return true;
-  }
-
-  PHINode *getPhi() const { return Phi; }
-
-  /// Live-outs are marked as only using the first part during the transition
-  /// to unrolling directly on VPlan.
-  /// TODO: Remove after unroller transition.
-  bool onlyFirstPartUsed(const VPValue *Op) const override { return true; }
-
-#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
-  /// Print the VPLiveOut to \p O.
-  void print(raw_ostream &O, VPSlotTracker &SlotTracker) const;
-#endif
-};
-
 /// Struct to hold various analysis needed for cost computations.
 struct VPCostContext {
   const TargetTransformInfo &TTI;
@@ -3454,11 +3412,6 @@ class VPlan {
   /// definitions are VPValues that hold a pointer to their underlying IR.
   SmallVector<VPValue *, 16> VPLiveInsToFree;
 
-  /// Values used outside the plan. It contains live-outs that need fixing. Any
-  /// live-out that is fixed outside VPlan needs to be removed. The remaining
-  /// live-outs are fixed via VPLiveOut::fixPhi.
-  MapVector<PHINode *, VPLiveOut *> LiveOuts;
-
   /// Mapping from SCEVs to the VPValues representing their expansions.
   /// NOTE: This mapping is temporary and will be removed once all users have
   /// been modeled in VPlan directly.
@@ -3638,12 +3591,6 @@ class VPlan {
     return cast<VPCanonicalIVPHIRecipe>(&*EntryVPBB->begin());
   }
 
-  void addLiveOut(PHINode *PN, VPValue *V);
-
-  const MapVector<PHINode *, VPLiveOut *> &getLiveOuts() const {
-    return LiveOuts;
-  }
-
   VPValue *getSCEVExpansion(const SCEV *S) const {
     return SCEVToExpansion.lookup(S);
   }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index f33293e65010f9..2ba659fb850852 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -197,35 +197,6 @@ bool VPRecipeBase::mayHaveSideEffects() const {
   }
 }
 
-void VPLiveOut::fixPhi(VPlan &Plan, VPTransformState &State) {
-  VPValue *ExitValue = getOperand(0);
-  VPBasicBlock *MiddleVPBB =
-      cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSingleSuccessor());
-  VPRecipeBase *ExitingRecipe = ExitValue->getDefiningRecipe();
-  auto *ExitingVPBB = ExitingRecipe ? ExitingRecipe->getParent() : nullptr;
-  // Values leaving the vector loop reach live out phi's in the exiting block
-  // via middle block.
-  auto *PredVPBB = !ExitingVPBB || ExitingVPBB->getEnclosingLoopRegion()
-                       ? MiddleVPBB
-                       : ExitingVPBB;
-  BasicBlock *PredBB = State.CFG.VPBB2IRBB[PredVPBB];
-  Value *V = State.get(ExitValue, VPIteration(0, 0));
-  if (Phi->getBasicBlockIndex(PredBB) != -1)
-    Phi->setIncomingValueForBlock(PredBB, V);
-  else
-    Phi->addIncoming(V, PredBB);
-}
-
-#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
-void VPLiveOut::print(raw_ostream &O, VPSlotTracker &SlotTracker) const {
-  O << "Live-out ";
-  getPhi()->printAsOperand(O);
-  O << " = ";
-  getOperand(0)->printAsOperand(O, SlotTracker);
-  O << "\n";
-}
-#endif
-
 void VPRecipeBase::insertBefore(VPRecipeBase *InsertPos) {
   assert(!Parent && "Recipe already in some VPBasicBlock");
   assert(InsertPos->getParent() &&
@@ -867,7 +838,10 @@ void VPIRInstruction::execute(VPTransformState &State) {
     State.Builder.SetInsertPoint(PredBB, PredBB->getFirstNonPHIIt());
     Value *V = State.get(ExitValue, VPIteration(0, Lane));
     auto *Phi = cast<PHINode>(&I);
-    Phi->addIncoming(V, PredBB);
+    if (Phi->getBasicBlockIndex(PredBB) == -1)
+      Phi->addIncoming(V, PredBB);
+    else
+      Phi->setIncomingValueForBlock(PredBB, V);
   }
 
   // Advance the insert point after the wrapped IR instruction. This allows
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 3b37a1ec9560ee..00f12bf915e0d8 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -378,7 +378,7 @@ static bool mergeBlocksIntoPredecessors(VPlan &Plan) {
     // Don't fold the exit block of the Plan into its single predecessor for
     // now.
     // TODO: Remove restriction once more of the skeleton is modeled in VPlan.
-    if (VPBB->getNumSuccessors() == 0 && !VPBB->getParent())
+    if (!VPBB->getParent())
       continue;
     auto *PredVPBB =
         dyn_cast_or_null<VPBasicBlock>(VPBB->getSinglePredecessor());
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
index 4907d3f0397274..17a4cbd36bd30d 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
@@ -265,6 +265,7 @@ void UnrollState::unrollRecipeByUF(VPRecipeBase &R) {
 
   if (auto *VPI = dyn_cast<VPInstruction>(&R)) {
     VPValue *Op0, *Op1;
+
     if (match(VPI, m_VPInstruction<VPInstruction::ExtractFromEnd>(
                        m_VPValue(Op0), m_VPValue(Op1)))) {
       VPI->setOperand(1, getValueForPart(Op1, UF - 1));
@@ -281,6 +282,11 @@ void UnrollState::unrollRecipeByUF(VPRecipeBase &R) {
       }
       return;
     }
+    if (match(VPI, m_VPInstruction<VPInstruction::ResumePhi>(m_VPValue(Op0),
+                                                             m_VPValue(Op1)))) {
+      addUniformForAllParts(VPI);
+      return;
+    }
 
     if (vputils::onlyFirstPartUsed(VPI)) {
       addUniformForAllParts(VPI);
@@ -467,11 +473,5 @@ void VPlanTransforms::unrollByUF(VPlan &Plan, unsigned UF, LLVMContext &Ctx) {
     Part++;
   }
 
-  // Remap the operand of live-outs to the last part.
-  for (const auto &[_, LO] : Plan.getLiveOuts()) {
-    VPValue *In = Unroller.getValueForPart(LO->getOperand(0), UF - 1);
-    LO->setOperand(0, In);
-  }
-
   VPlanTransforms::removeDeadRecipes(Plan);
 }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp b/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
index 99bc4c38a3c3cd..4badf295092827 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
@@ -244,14 +244,6 @@ bool VPlanVerifier::verifyVPBasicBlock(const VPBasicBlock *VPBB) {
     return false;
   }
 
-  VPBlockBase *MiddleBB =
-      IRBB->getPlan()->getVectorLoopRegion()->getSingleSuccessor();
-  if (IRBB != IRBB->getPlan()->getPreheader() &&
-      IRBB->getSinglePredecessor() != MiddleBB) {
-    errs() << "VPIRBasicBlock can only be used as pre-header or a successor of "
-              "middle-block at the moment!\n";
-    return false;
-  }
   return true;
 }
 
@@ -416,12 +408,6 @@ bool VPlanVerifier::verify(const VPlan &Plan) {
     return false;
   }
 
-  for (const auto &KV : Plan.getLiveOuts())
-    if (KV.second->getNumOperands() != 1) {
-      errs() << "live outs must have a single operand\n";
-      return false;
-    }
-
   return true;
 }
 
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll b/llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll
index 0f3cd9d4ca4d61..2dddf766cb9cda 100644
--- a/llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll
+++ b/llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll
@@ -1077,6 +1077,17 @@ define void @merge_with_dead_gep_between_regions(i32 %n, ptr noalias %src, ptr n
 ; CHECK-NEXT: No successors
 ; CHECK-EMPTY:
 ; CHECK-NEXT: scalar.ph
+; CHECK-NEXT: Successor(s): ir-bb<loop>
+; CHECK-EMPTY:
+; CHECK-NEXT: ir-bb<loop>:
+; CHECK-NEXT:   IR   %iv = phi i32 [ %n, %entry ], [ %iv.next, %loop ]
+; CHECK-NEXT:   IR   %iv.next = add nsw i32 %iv, -1
+; CHECK-NEXT:   IR   %gep.src = getelementptr inbounds i32, ptr %src, i32 %iv
+; CHECK-NEXT:   IR   %l = load i32, ptr %gep.src, align 16
+; CHECK-NEXT:   IR   %dead_gep = getelementptr inbounds i32, ptr %dst, i64 1
+; CHECK-NEXT:   IR   %gep.dst = getelementptr inbounds i32, ptr %dst, i32 %iv
+; CHECK-NEXT:   IR   store i32 %l, ptr %gep.dst, align 16
+; CHECK-NEXT:   IR   %ec = icmp eq i32 %iv.next, 0
 ; CHECK-NEXT: No successors
 ; CHECK-NEXT: }
 ;
@@ -1156,6 +1167,13 @@ define void @ptr_induction_remove_dead_recipe(ptr %start, ptr %end) {
 ; CHECK-NEXT: No successors
 ; CHECK-EMPTY:
 ; CHECK-NEXT: scalar.ph:
+; CHECK-NEXT: Successor(s): ir-bb<loop.header>
+; CHECK-EMPTY:
+; CHECK-NEXT: ir-bb<loop.header>:
+; CHECK-NEXT:   IR   %ptr.iv = phi ptr [ %start, %entry ], [ %ptr.iv.next, %loop.latch ]
+; CHECK-NEXT:   IR   %ptr.iv.next = getelementptr inbounds i8, ptr %ptr.iv, i64 -1
+; CHECK-NEXT:   IR   %l = load i8, ptr %ptr.iv.next, align 1
+; CHECK-NEXT:   IR   %c.1 = icmp eq i8 %l, 0
 ; CHECK-NEXT: No successors
 ; CHECK-NEXT: }
 ;

@fhahn fhahn force-pushed the vplan-remove-liveout branch from 39f0f73 to d0e7dae Compare September 26, 2024 19:27
@david-arm
Copy link
Contributor

I just noticed this patch today. I have a patch to enable vectorisation of more early exit loops that require adding live outs from the early exiting block - #88385. I hadn't realised that VPLiveOuts were going to be removed from the vplan.

@fhahn If possible, can you point me at the patch or code that shows how we model outside uses of loop values in the exit block? I will need to update PR #88385 to ensure we generate the correct final value in the early exit block.

@david-arm
Copy link
Contributor

Also, is there a document somewhere that highlights (even at a high level) some of the refactoring work like this that is planned in the next few months? Just because I'm also actively working in this area. Or if not, is it possible to tag me as a reviewer or subscriber for other patches in this area? Thanks!

fhahn added a commit to fhahn/llvm-project that referenced this pull request Sep 30, 2024
Updated ILV.crateInductionResumeValues to directly update the
VPIRInstructiosn wrapping the original phis with the created resume
values.

This is the first step towards modeling them completely in VPlan.
Subsequent patches will move creation of the resume values completely
into VPlan.

Builds on top of llvm#109975, which
is included in this PR.
@fhahn
Copy link
Contributor Author

fhahn commented Oct 1, 2024

I just noticed this patch today. I have a patch to enable vectorisation of more early exit loops that require adding live outs from the early exiting block - #88385. I hadn't realised that VPLiveOuts were going to be removed from the vplan.

@fhahn If possible, can you point me at the patch or code that shows how we model outside uses of loop values in the exit block? I will need to update PR #88385 to ensure we generate the correct final value in the early exit block.

I think we should already have most of the pieces needed to model the exit values in the early exit the same way as for the regular exits after introducing VPIRBasicBlocks and VPIRInstructions. #100735 uses those for the regular exit phis.

It should be possible create a VPIRBasicBlock for the early exit block in a similar fashion and add operands to the VPIRInstructions of the phis that need updating.

A few small changes may be needed to support VPIRBBs and VPIRInst phis with multiple predecessors, but hopefully something like 3cfe25b should be enough.

@fhahn
Copy link
Contributor Author

fhahn commented Oct 1, 2024

Or if not, is it possible to tag me as a reviewer or subscriber for other patches in this area? Thanks!

Probably the best way would be to join the subscriber team for the vectorizer label: https://github.com/orgs/llvm/teams/pr-subscribers-vectorizers

@david-arm
Copy link
Contributor

I just noticed this patch today. I have a patch to enable vectorisation of more early exit loops that require adding live outs from the early exiting block - #88385. I hadn't realised that VPLiveOuts were going to be removed from the vplan.
@fhahn If possible, can you point me at the patch or code that shows how we model outside uses of loop values in the exit block? I will need to update PR #88385 to ensure we generate the correct final value in the early exit block.

I think we should already have most of the pieces needed to model the exit values in the early exit the same way as for the regular exits after introducing VPIRBasicBlocks and VPIRInstructions. #100735 uses those for the regular exit phis.

It should be possible create a VPIRBasicBlock for the early exit block in a similar fashion and add operands to the VPIRInstructions of the phis that need updating.

A few small changes may be needed to support VPIRBBs and VPIRInst phis with multiple predecessors, but hopefully something like 3cfe25b should be enough.

OK thank you for the info - I'll take a look!

@david-arm
Copy link
Contributor

Or if not, is it possible to tag me as a reviewer or subscriber for other patches in this area? Thanks!

Probably the best way would be to join the subscriber team for the vectorizer label: https://github.com/orgs/llvm/teams/pr-subscribers-vectorizers

Yeah, I remember when we first moved to github I added myself to that list and I didn't seem to get any notifications of vectoriser patches, despite it being pretty obvious that patches were being created. So I took the alternative approach of saving a link to a github search query that shows all open vectoriser patches. Maybe it has started working now?

@fhahn
Copy link
Contributor Author

fhahn commented Oct 2, 2024

I just noticed this patch today. I have a patch to enable vectorisation of more early exit loops that require adding live outs from the early exiting block - #88385. I hadn't realised that VPLiveOuts were going to be removed from the vplan.
@fhahn If possible, can you point me at the patch or code that shows how we model outside uses of loop values in the exit block? I will need to update PR #88385 to ensure we generate the correct final value in the early exit block.

I think we should already have most of the pieces needed to model the exit values in the early exit the same way as for the regular exits after introducing VPIRBasicBlocks and VPIRInstructions. #100735 uses those for the regular exit phis.
It should be possible create a VPIRBasicBlock for the early exit block in a similar fashion and add operands to the VPIRInstructions of the phis that need updating.
A few small changes may be needed to support VPIRBBs and VPIRInst phis with multiple predecessors, but hopefully something like 3cfe25b should be enough.

OK thank you for the info - I'll take a look!

https://github.com/llvm/llvm-project/pull/109193/files#diff-c33c5be02bdbedc5499230934c8fc873429b771425f21c3f594acb6fae6339d7 should show how the VPlans would look like

@fhahn fhahn force-pushed the vplan-remove-liveout branch from d0e7dae to f81a43c Compare October 2, 2024 15:09
@david-arm
Copy link
Contributor

Hi @fhahn, so I tried changing #88385 to use the new way of dealing with live-outs and not rely upon VPLiveOut in a similar way to how we deal with normal exits. However, I came across a fundamental ordering constraint that needs dealing with. The problem occurs when the exit block from the latch is the same as the exit from the early exiting block. In this case the vectorised CFG would look something like this:

vector.loop:
  ...
  br i1, label %vector.early.exit, label %vector.loop.latch

vector.loop.latch:
  br i1, label %middle.block, label %vector.loop

vector.early.exit:
  ...
  br label %loop.exit

middle.block:
  ...
  br i1 %finished, label %loop.exit, ...

... original scalar loop sits here ...

VPIRBasicBlock(loop.exit):
  VPIRInstruction(%tea = phi i32 [ ..., %loop ], [ ..., %loop.latch ] (extra operands ...))

Using the example implementation you mentioned in 3cfe25b (thank you for this!) the order of the predecessors in VPIRBasicBlock(loop.exit) by definition has to match the same order in which the extra operands were added to VPIRInstruction(%tea) when collecting/adding users in the exit block during VPlan creation. However, VPlan::execute breaks this assumption immediately when rewiring the middle block from a VPBasicBlock to a VPIRBasicBlock. In this case I have two choices that I know of:

  1. In replaceVPBBWithIRVPBB ensure that the middle block stays in the same position in the predecessor list of any successor. I imagine it's possible, but it does feel fragile.
  2. Add a map to VPIRInstruction that maps the additional operand index with the incoming block so that when rewiring the middle block we can ensure the map stays correct. This removes any fragile ordering dependency. Then I'd adapt VPIRInstruction::execute in 3cfe25b to get the incoming block by looking the index up in the map.

I'll keep playing around with this a bit more and see what works best!

fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 6, 2024
Updated ILV.crateInductionResumeValues to directly update the
VPIRInstructiosn wrapping the original phis with the created resume
values.

This is the first step towards modeling them completely in VPlan.
Subsequent patches will move creation of the resume values completely
into VPlan.

Builds on top of llvm#109975, which
is included in this PR.
Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fhahn, whilst the PR looks fine in isolation I'd rather it not land until we've had a chance to discuss its impact on PR #88385 if that's ok? Thanks!

@fhahn
Copy link
Contributor Author

fhahn commented Oct 8, 2024

Sure. Curious if you hit any limitations with VPIRInstruction that prevent it from replacing VPLiveOut?

@david-arm
Copy link
Contributor

Sure. Curious if you hit any limitations with VPIRInstruction that prevent it from replacing VPLiveOut?

Yeah, so in my comment earlier last week I highlighted the problem of ordering constraints. If the same exit block is used by both the latch and the early exit, then the phi in the exit block should have an incoming value for both of the exiting blocks. If I add operands to the VPIRInstruction wrapper - one for each exiting block - then in the VPIRInstruction::execute code using a patch you sketched out in 3cfe25b we introduce a requirement that the order of the extra operands must match the order of the predecessors. replaceVPBBWithIRVPBB breaks this by reordering them, hence I put up a patch yesterday to preempt this - #111514. Perhaps it's fine to introduce this ordering constraint, but I wonder if this makes the code fragile and prone to breaking every time we do some CFG optimisation? Secondly, the phi may or may not reference an induction variable in the loop, which is handled separately to VPIRInstruction::execute. However, you still need to add some sort of dummy operand because the number of operands must match the number of predecessors. I have hacked this downstream by adding a static 'null' VPValue marker that we can skip over the operand in VPIRInstruction::execute.

Another approach would be to introduce a new class derived from VPIRInstruction, i.e. VPIRPhiInstruction that maintains a mapping between the extra operand and the incoming block. I haven't tried this second approach - this is potentially more flexible, but it is moving a bit further away from having a simple IR wrapper and is moving back towards something like VPLiveOut.

fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 10, 2024
Updated ILV.crateInductionResumeValues to directly update the
VPIRInstructiosn wrapping the original phis with the created resume
values.

This is the first step towards modeling them completely in VPlan.
Subsequent patches will move creation of the resume values completely
into VPlan.

Builds on top of llvm#109975, which
is included in this PR.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 11, 2024
Updated ILV.crateInductionResumeValues to directly update the
VPIRInstructiosn wrapping the original phis with the created resume
values.

This is the first step towards modeling them completely in VPlan.
Subsequent patches will move creation of the resume values completely
into VPlan.

Builds on top of llvm#109975, which
is included in this PR.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 11, 2024
Updated ILV.crateInductionResumeValues to directly update the
VPIRInstructiosn wrapping the original phis with the created resume
values.

This is the first step towards modeling them completely in VPlan.
Subsequent patches will move creation of the resume values completely
into VPlan.

Builds on top of llvm#109975, which
is included in this PR.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 11, 2024
Updated ILV.crateInductionResumeValues to directly update the
VPIRInstructiosn wrapping the original phis with the created resume
values.

This is the first step towards modeling them completely in VPlan.
Subsequent patches will move creation of the resume values completely
into VPlan.

Builds on top of llvm#109975, which
is included in this PR.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 12, 2024
Updated ILV.crateInductionResumeValues to directly update the
VPIRInstructiosn wrapping the original phis with the created resume
values.

This is the first step towards modeling them completely in VPlan.
Subsequent patches will move creation of the resume values completely
into VPlan.

Builds on top of llvm#109975, which
is included in this PR.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 13, 2024
Updated ILV.crateInductionResumeValues to directly update the
VPIRInstructiosn wrapping the original phis with the created resume
values.

This is the first step towards modeling them completely in VPlan.
Subsequent patches will move creation of the resume values completely
into VPlan.

Builds on top of llvm#109975, which
is included in this PR.
@fhahn fhahn merged commit b021464 into llvm:main Oct 31, 2024
3 of 5 checks passed
@fhahn fhahn deleted the vplan-remove-liveout branch October 31, 2024 20:36
@kstoimenov
Copy link
Contributor

@fhahn you might have introduced a memory leak in this patch detected by Sanitizer bots: https://lab.llvm.org/buildbot/#/builders/169/builds/4844/steps/10/logs/stdio

Please take a look.

@fhahn
Copy link
Contributor Author

fhahn commented Nov 1, 2024

Should be fixed in 4bc8bde0bc4b0dbace5545de5792be2138e9cf64, thanks!

@kstoimenov
Copy link
Contributor

Looks like there is still one more leak left after your fix: https://lab.llvm.org/buildbot/#/builders/169/builds/4871/steps/10/logs/stdio

@kstoimenov
Copy link
Contributor

FYI, @vitalybuka pushed this fix: 80b9f07

smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
…#109975)

Update VPlan to include the scalar loop header. This allows retiring
VPLiveOut, as the remaining live-outs can now be handled by adding
operands to the wrapped phis in the scalar loop header.

Note that the current version only includes the scalar loop header, no
other loop blocks and also does not wrap it in a region block.

PR: llvm#109975
smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
…#109975)

Update VPlan to include the scalar loop header. This allows retiring
VPLiveOut, as the remaining live-outs can now be handled by adding
operands to the wrapped phis in the scalar loop header.

Note that the current version only includes the scalar loop header, no
other loop blocks and also does not wrap it in a region block.

PR: llvm#109975
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
Suggested in llvm#109975. This
makes the function consistent throughout.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
…#109975)

Update VPlan to include the scalar loop header. This allows retiring
VPLiveOut, as the remaining live-outs can now be handled by adding
operands to the wrapped phis in the scalar loop header.

Note that the current version only includes the scalar loop header, no
other loop blocks and also does not wrap it in a region block.

PR: llvm#109975
fhahn added a commit to fhahn/llvm-project that referenced this pull request Nov 4, 2024
Updated ILV.crateInductionResumeValues to directly update the
VPIRInstructiosn wrapping the original phis with the created resume
values.

This is the first step towards modeling them completely in VPlan.
Subsequent patches will move creation of the resume values completely
into VPlan.

Builds on top of llvm#109975, which
is included in this PR.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Nov 4, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Nov 10, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Nov 10, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Nov 24, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.
fhahn added a commit that referenced this pull request Dec 6, 2024
Updated ILV.createInductionResumeValues (now createInductionResumeVPValue)
to directly update the VPIRInstructions wrapping the original phis with the 
created resume values.

This is the first step towards modeling them completely in VPlan.
Subsequent patches will move creation of the resume values completely
into VPlan.

Depends on #109975.

PR: #110577
fhahn added a commit to fhahn/llvm-project that referenced this pull request Dec 7, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Dec 7, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.
broxigarchen pushed a commit to broxigarchen/llvm-project that referenced this pull request Dec 10, 2024
Updated ILV.createInductionResumeValues (now createInductionResumeVPValue)
to directly update the VPIRInstructions wrapping the original phis with the 
created resume values.

This is the first step towards modeling them completely in VPlan.
Subsequent patches will move creation of the resume values completely
into VPlan.

Depends on llvm#109975.

PR: llvm#110577
chrsmcgrr pushed a commit to RooflineAI/llvm-project that referenced this pull request Dec 12, 2024
Updated ILV.createInductionResumeValues (now createInductionResumeVPValue)
to directly update the VPIRInstructions wrapping the original phis with the 
created resume values.

This is the first step towards modeling them completely in VPlan.
Subsequent patches will move creation of the resume values completely
into VPlan.

Depends on llvm#109975.

PR: llvm#110577
fhahn added a commit to fhahn/llvm-project that referenced this pull request Dec 15, 2024
Model updating IV users directly in VPlan, replace fixupIVUsers.

Depends on llvm#110004,
llvm#109975 and
llvm#112145.
TIFitis pushed a commit to TIFitis/llvm-project that referenced this pull request Dec 18, 2024
Updated ILV.createInductionResumeValues (now createInductionResumeVPValue)
to directly update the VPIRInstructions wrapping the original phis with the 
created resume values.

This is the first step towards modeling them completely in VPlan.
Subsequent patches will move creation of the resume values completely
into VPlan.

Depends on llvm#109975.

PR: llvm#110577
fhahn added a commit that referenced this pull request Dec 29, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on #110004 and
#109975.

PR: #112145
fhahn added a commit to fhahn/llvm-project that referenced this pull request Dec 29, 2024
Model updating IV users directly in VPlan, replace fixupIVUsers.

Depends on llvm#110004,
llvm#109975 and
llvm#112145.
sunshaoce pushed a commit to sunshaoce/llvm-public that referenced this pull request Dec 30, 2024
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm#110004 and
llvm#109975.

PR: llvm#112145
fhahn added a commit to fhahn/llvm-project that referenced this pull request Jan 5, 2025
Model updating IV users directly in VPlan, replace fixupIVUsers.

Depends on llvm#110004,
llvm#109975 and
llvm#112145.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 10, 2025
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on llvm/llvm-project#110004 and
llvm/llvm-project#109975.

PR: llvm/llvm-project#112145
fhahn added a commit that referenced this pull request Jan 18, 2025
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform 
(optimizeInductionExitUsers) replaces extracts of inductions with 
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them 
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.


Depends on #110004,
#109975 and
#112145.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 18, 2025
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform
(optimizeInductionExitUsers) replaces extracts of inductions with
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.

Depends on llvm/llvm-project#110004,
llvm/llvm-project#109975 and
llvm/llvm-project#112145.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants