-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a pass to remove temporary regs #8732
Conversation
This also works, although maybe reducing load width could make other optimizations (like load combining) harder... in some ways we might even want to expand load widths when it's masked anyway. bool ReduceLoads(const IRWriter &in, IRWriter &out) {
for (u32 value : in.GetConstants()) {
out.AddConstant(value);
}
// This tells us to skip an AND op that has been optimized out.
// Maybe we could skip multiple, but that'd slow things down and is pretty uncommon.
int nextSkip = -1;
bool logBlocks = false;
for (int i = 0, n = (int)in.GetInstructions().size(); i < n; i++) {
IRInst inst = in.GetInstructions()[i];
if (inst.op == IROp::Load32 || inst.op == IROp::Load16 || inst.op == IROp::Load16Ext) {
int dest = IRDestGPR(inst);
for (int j = i + 1; j < n; j++) {
const IRInst &laterInst = in.GetInstructions()[j];
const IRMeta *m = GetIRMeta(laterInst.op);
if ((m->flags & IRFLAG_EXIT) != 0) {
// Exit, so we can't do the optimization.
break;
}
if (IRReadsFromGPR(laterInst, dest)) {
if (IRDestGPR(laterInst) == dest && laterInst.op == IROp::AndConst) {
const u32 mask = in.GetConstants()[laterInst.src2];
// Here we are, maybe we can reduce the load size based on the mask.
if ((mask & 0xffffff00) == 0) {
inst.op = IROp::Load8;
if (mask == 0xff) {
nextSkip = j;
}
} else if ((mask & 0xffff0000) == 0 && inst.op == IROp::Load32) {
inst.op = IROp::Load16;
if (mask == 0xffff) {
nextSkip = j;
}
}
}
// If it was read, we can't do the optimization.
break;
}
if (IRDestGPR(laterInst) == dest) {
// Someone else wrote, so we can't do the optimization.
break;
}
}
}
if (i != nextSkip) {
out.Write(inst);
}
}
return logBlocks;
} Totally skipped ANDs aren't that common (happens a good number of times in Crisis Core, though.) However, reducing width was more common for sure. -[Unknown] |
Even if there's not a big immediate performance gain, passes like this will help generate better code in real compiler passes. When we can do things like this across blocks, I'm sure we can eliminate a ton of constant stores to regular registers as well (if a register is later clobbered on both possible exits from a branch, it does not need to be written). I think most passes where you need to do an additional inner loop looking for things can be solved in O(n) by looping backwards and keeping track of some state instead. As for the load-shrinking pass, that makes sense as well. I don't think it's likely to hinder many optimizations. |
Other fun passes we should do (some of which I did in the old IR branch):
|
This way, if constants have made the temp obsolete (common with ins, for example), it won't even get set anymore.
Rebased this, but I didn't clean up the ReduceLoads thing. -[Unknown] |
Without this, Gods Eater Burst is horribly broken.
Without this, Gods Eater Burst crashes before going in game.
Added a few more bugfixes. -[Unknown] |
So far, here's some bubbling to the top: bool ReorderLoadStore(const IRWriter &in, IRWriter &out) {
bool logBlocks = false;
enum class RegState : u8 {
UNUSED = 0,
READ = 1,
CHANGED = 2,
};
bool queuing = false;
std::vector<IRInst> loadStoreQueue;
std::vector<IRInst> otherQueue;
RegState otherRegs[256] = {};
auto flushQueue = [&]() {
if (!queuing) {
return;
}
// TODO: Time to reorder loadStoreQueue.
queuing = false;
for (IRInst queued : loadStoreQueue) {
out.Write(queued);
}
for (IRInst queued : otherQueue) {
out.Write(queued);
}
loadStoreQueue.clear();
otherQueue.clear();
memset(otherRegs, 0, sizeof(otherRegs));
};
for (int i = 0; i < (int)in.GetInstructions().size(); i++) {
IRInst inst = in.GetInstructions()[i];
switch (inst.op) {
case IROp::Load32:
// To move a load up, its dest can't be changed by things we move down.
if (otherRegs[inst.dest] != RegState::UNUSED || otherRegs[inst.src1] == RegState::CHANGED) {
flushQueue();
}
queuing = true;
loadStoreQueue.push_back(inst);
break;
case IROp::Store32:
// A store can move above even if it's read, as long as it's not changed by the other ops.
if (otherRegs[inst.src3] == RegState::CHANGED || otherRegs[inst.src1] == RegState::CHANGED) {
flushQueue();
}
queuing = true;
loadStoreQueue.push_back(inst);
break;
case IROp::LoadFloat:
case IROp::StoreFloat:
// Floats can always move as long as their address is safe.
if (otherRegs[inst.src1] == RegState::CHANGED) {
flushQueue();
}
queuing = true;
loadStoreQueue.push_back(inst);
break;
case IROp::Sub:
case IROp::Slt:
case IROp::SltU:
case IROp::Add:
case IROp::And:
case IROp::Or:
case IROp::Xor:
case IROp::Shl:
case IROp::Shr:
case IROp::Ror:
case IROp::Sar:
// We'll try to move this downward.
otherRegs[inst.dest] = RegState::CHANGED;
if (inst.src1 && otherRegs[inst.src1] != RegState::CHANGED)
otherRegs[inst.src1] = RegState::READ;
if (inst.src2 && otherRegs[inst.src2] != RegState::CHANGED)
otherRegs[inst.src2] = RegState::READ;
otherQueue.push_back(inst);
queuing = true;
break;
case IROp::Neg:
case IROp::Not:
case IROp::BSwap16:
case IROp::BSwap32:
case IROp::Ext8to32:
case IROp::Ext16to32:
case IROp::ReverseBits:
case IROp::Clz:
case IROp::AddConst:
case IROp::SubConst:
case IROp::AndConst:
case IROp::OrConst:
case IROp::XorConst:
case IROp::SltConst:
case IROp::SltUConst:
case IROp::ShlImm:
case IROp::ShrImm:
case IROp::RorImm:
case IROp::SarImm:
case IROp::Mov:
// We'll try to move this downward.
otherRegs[inst.dest] = RegState::CHANGED;
if (inst.src1 && otherRegs[inst.src1] != RegState::CHANGED)
otherRegs[inst.src1] = RegState::READ;
otherQueue.push_back(inst);
queuing = true;
break;
case IROp::Downcount:
if (queuing) {
// This one's a freebie. Sometimes helps with delay slots.
otherQueue.push_back(inst);
} else {
out.Write(inst);
}
break;
default:
flushQueue();
out.Write(inst);
break;
}
}
// Can reuse the old constants array - not touching constants in this pass.
for (u32 value : in.GetConstants()) {
out.AddConstant(value);
}
return logBlocks;
} Sorting then just means matching same src1 and only reordering matching types. -[Unknown] |
That's a really nice way to do it! |
I didn't see much benefit from this, performance wise, unfortunately. It does trigger and reduce code size sometimes, I think it's still worthwhile.
The three most common types of things I saw this catch were:
ins
temporaries, e.g. when insertingzero
, so all the temporaries fold out. This prevents TEMP0 from even being stored in this case.lw
/sw
/etc. temporaries, e.g.li a0, addr1; lw t0, 0(a0); li a0, addr2; lw t1 0(a0)
etc.-[Unknown]