-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interleave loop pre-header with SWP prologue #236
base: aie-public
Are you sure you want to change the base?
Conversation
This makes the option available for any user of getMinTripCount()
@@ -29,7 +37,11 @@ const MDNode *getLoopID(const MachineBasicBlock &LoopBlock) { | |||
} | |||
|
|||
std::optional<int64_t> getMinTripCount(const MachineBasicBlock &LoopBlock) { | |||
return getMinTripCount(getLoopID(LoopBlock)); | |||
std::optional<int64_t> MinTripCount = getMinTripCount(getLoopID(LoopBlock)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a lot of sense ;-)
@@ -0,0 +1,422 @@ | |||
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever! I like TDD. It also helps to see from where you start and how the work progresses with the sequence of commits.
for (MachineBasicBlock *SuccBB : MBB.successors()) { | ||
auto &SBS = Sched.getInterBlock().getBlockState(SuccBB); | ||
assert(SBS.isScheduled()); | ||
if (SBS.getRegions().empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious about how rare this can be. If this can be frequent, we can take the successor's successor to reduce the pessimism.
Hi @gbossu, as we did before, we could integrate the 1st and 3rd commits before, as they are standalone changes. |
I'm actually tempted to drop the 3rd commit and do not have AIEMaxLatencyFinder/RegionEndEdges care about "timed"/"fixed" instructions. I think it makes sense if they keep focusing on successor MBBs instead. For dependencies between "fixed" and "free" instructions, I can add them in the |
Is this case, I think it makes sense, as we are introducing this new mutator! In this way we can also restrict a bit more the changes. |
} | ||
size_t getNumTopFixedInstructions() const { return NumTopFixedInstructions; } | ||
|
||
/// Iterate over the instructions that are fixed at the top. Typically those |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: at the bot
if (Zone.isTop()) | ||
return nullptr; | ||
const Region &Reg = InterBlock.getBlockState(CurMBB).getCurrentRegion(); | ||
unsigned NumEmitted = getNumEmittedInstrs(DAG, /*IsTop=*/false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const unsigned
Those iterators are left with garbage values when a function has multiple regions. This can confuse DAGMutators.
dd15264
to
d4c8472
Compare
@@ -543,11 +543,10 @@ class WAWEdges : public ScheduleDAGMutation { | |||
LiveRegs.init(*TRI); | |||
bool AddReservedRegs = true; | |||
if (Scheduler) { | |||
assert(!Scheduler->doMBBSchedRegionsTopDown()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK: this assert is specific for the prologue handling, and the code needs to be extended for epilogue handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, WAW handling could run on any region. Here I'm making sure that BS.getBottom()
actually represents the bottom region.
@@ -596,6 +596,10 @@ std::optional<unsigned> ScheduleDAGInstrs::initSUnit(MachineInstr &MI) { | |||
if (MI.isDebugOrPseudoInstr()) | |||
return {}; | |||
|
|||
// Ensure all instructions can be given a SUnit without re-allocation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking if we could do this in ScheduleDAGInstrs::enterRegion
to remove this fixed cost from here. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would also be fine, the reason I kept it here is to avoid distributing the logic about creating SUnits
in too many places. But I'm open to discussion :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martien-de-jong Would you find this better if I do the initialization in ScheduleDAGInstrs::enterRegion
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so. I was happy that initSUnit didn't need to care. At the time I hadn't thought about adding SUnits after edge creation. If we can pre-reserve the total number of sunits with some interface I'm fine.
auto Region = BS.getCurrentRegion(); | ||
auto BottomRegion = BS.getBottom(); | ||
if (*Region.begin() == *BottomRegion.begin()) { | ||
if (&BS.getCurrentRegion() == &BS.getBottom()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK: this a relic of the pre-GatheringRegions era
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, the DAGMutator can run for all types of regions. For non-loop blocks, we do not pre-gather regions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. now you have me worried about reallocation of the region array. pre-gather would imply pre-allocation, making the test 'obviously correct'
|
||
// Verify that all fixed instructions are at the right place in the MBB | ||
assert(TopFixedBundles.empty() || Begin == BB->begin()); | ||
assert(TopFixedBundles.empty() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. && at top level in an assert should be split in different asserts.
return getBundleStart(MI->getIterator()) == FreeEnd; | ||
})); | ||
|
||
for (auto It = FreeBegin; It != FreeEnd; ++It) { | ||
SemanticOrder.push_back(&*It); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention whether and where fixed instructions are added to the semantic order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. They aren't. And I know why.
@@ -441,6 +461,12 @@ ScheduleHazardRecognizer::HazardType AIEHazardRecognizer::getHazardType( | |||
iterator_range<const MachineOperand *> MIOperands, | |||
const MachineRegisterInfo &MRI, int DeltaCycles) const { | |||
const unsigned SchedClass = TII->getSchedClass(Desc, MIOperands, MRI); | |||
if (!IsPreRA && SchedClass == 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This verification could be refactored out to a helper function like check...
or validate...
, that can be used also below (same code).
} | ||
size_t size() const { return SemanticOrder.size(); } | ||
ArrayRef<MachineBundle> getBotFixedBundles() const { return BotFixedBundles; } | ||
size_t getNumBotFixedInstructions() const { return BotFixedBundles.size(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: NumBotFixedBundles to be absolutely clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually thinking of deleting that function. One can just do Region.getBotFixedBundles().size()
. Previously it was harder because I only had iterators, and std::distance
isn't O(1)
@@ -318,6 +305,49 @@ class RegionEndEdges : public ScheduleDAGMutation { | |||
}; | |||
}; | |||
|
|||
class EmitFixedSUnits : public ScheduleDAGMutation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could probably add a top-level comment for that mutator
@@ -318,6 +305,49 @@ class RegionEndEdges : public ScheduleDAGMutation { | |||
}; | |||
}; | |||
|
|||
class EmitFixedSUnits : public ScheduleDAGMutation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for the future, we could have a high-level description of the mutator.
if (auto BundleStart = getBundleStart(Instr->getIterator()); | ||
BundleStart->isBundle()) { | ||
assert(BundleStart != Instr->getIterator()); | ||
BundleStart->eraseFromBundle(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering when this occurs, and whether we should not remove BUNDLEs as a precondition to calling this. Apart from that, I wonder whether there's anybody caring about the format order apart from the MClayer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This happens when the prologue has VLIW instructions. They get materialized as
BUNDLE
. I could separate that "debundling" in a helper function if you want. - The ordering is mostly for stable ref updates at this point. Otherwise slight changes to the
tryCandidate
heuristics can be annoying, because they would change the order of instructions within bundles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. ref order is a good reason. Maybe my main confusion is that this function should live somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should. But I blame AIE1 ;)
assert(!(IsTop && FirstBotFixedSU) && "Top-fixed SUnits must be added first"); | ||
assert(DAG->SUnits.size() < DAG->SUnits.capacity() && | ||
"SUnits need to be re-allocated."); | ||
unsigned SUNum = DAG->initSUnit(MI).value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ignoring has_value() here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be an error if the value is missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we don't like asserts? :-b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that I would get to the same behaviour: the error would be caught and not silently ignored.
SU.NodeNum < FirstBotFixedSU.value_or(DAG->SUnits.size()); | ||
} | ||
return FirstBotFixedSU && SU.NodeNum >= *FirstBotFixedSU && | ||
SU.NodeNum <= LastBotFixedSU.value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: SU.NodeNum <= *LastBotFixedSU
(to be symmetric with the previous case) or FirstBotFixedSU.value()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is on purpose to have dynamic checks on LastBotFixedSU
(and have a runtime failure if there is no value). For FirstBotFixedSU
, I previously checked has_value()
, so there is no need for checking again.
Fixed instructions will be handled specifically by the scheduler. They cannot be reordered or inserted in different cycles. Note this commit is just an "interface change". The list of "fixed" instructions is still empty at the moment, and SWP prologues/epilogues are inserted with a specific logic in commitBlockSchedule().
This updates the scheduler code base so it can handle SUnits that are BUNDLE. This particularly focuses on/ - Properly emitting BUNDLEs in the scoreboard - Rebundling instructions that are in the same cycle. The commit also adds a bunch of asserts to verify "hidden" pre-conditions. I had to add some Itineraries in AIE1/AIE2 to make sure those conditions hold.
This means they will be placed by the scheduler, and once they are placed, other "free" instructions can be bundled with them.
3fc8e6b
to
171faec
Compare
/// Those should not be re-ordered by the scheduler. | ||
ArrayRef<MachineBundle> TopFixedBundles; | ||
|
||
/// Instructions that are already scheduled at the bottom, e.g. an swp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a swp prologue.
Any more comments @andcarminati @martien-de-jong ? |
PatchCycles >= InsertedCycles ? PatchCycles - InsertedCycles : 0; | ||
} | ||
EdgeLatency = std::max(EdgeLatency, PatchCycles); | ||
const unsigned ZOLDistance = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the future: this magic 8 could be retrieved from TII as well.
CurRegion.getBotFixedBundles(); | ||
for (SUnit &FreeSU : make_filter_range(DAG->SUnits, IsFreeSU)) { | ||
const MachineInstr &MI = *FreeSU.getInstr(); | ||
MachineInstr *FixedDepMI = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit:
To help readability, we could extend with overloaded cast operators:
struct InstrAndCycle {
MachineInstr *MI = nullptr;
int Cycle;
+ operator MachineInstr *() const { return MI; }
+ operator int() const { return Cycle; }
};
Then we can just use:
MachineInstr *FixedDepMI = AIE::findEarliestRef(MI, BotFixedBundles, BotFixedBundles.size());
And also:
Earliest = findEarliestRef(MI, TopBundles, Earliest);
|
||
// Keep track of BUNDLE instructions. They need to be cleaned up when | ||
// de-bundling before re-bundling. See applyBundles() | ||
if (Instr->getOpcode() == TargetOpcode::BUNDLE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: Instr->isBundle()
unsigned SUNum = DAG->initSUnit(MI).value(); | ||
SUnit &SU = DAG->SUnits[SUNum]; | ||
|
||
if (IsTop) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is really smart that you are preparing the way for epilogue merging (in some small details) ;-)
Hi @gbossu, I finished my review and left few suggestions. Nice piece of engineering that will make a lot more difference for the post-swp as we increase its use. |
Better review commit by commit, a lot of them are NFC and prepare for the final commit which enables interleaving.
QoR shows only improvements. Those will get bigger as we increase the scope of the post-pipeliner.