-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for allowing direct VEXTRACT to 20-bit registers #233
base: aie-public
Are you sure you want to change the base?
Conversation
llvm/test/CodeGen/AIE/aie2/GlobalISel/prelegalizercombiner-s20-narrowing.mir
Outdated
Show resolved
Hide resolved
llvm/test/CodeGen/AIE/aie2/GlobalISel/prelegalizercombiner-s20-narrowing.mir
Outdated
Show resolved
Hide resolved
Given that you mentioned there are no QoR gain, I would recommend you to re look at the instruction that consume S20 type reg. |
650d8a9
to
d5d7cf0
Compare
d5d7cf0
to
f12f1a4
Compare
f12f1a4
to
ba49cf1
Compare
MachineIRBuilder MIRBuilder(*ExtMI); | ||
MIRBuilder.buildInstr(TargetOpcode::G_TRUNC, {ExtMI->getOperand(0)}, | ||
{DstReg}); | ||
ExtMI->eraseFromParent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to not erase 2 instructions here, because we may corrupt the iterator. Let ExtMI
to be removed by DCE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combiner engine may access freed memory:
while (!WorkList.empty()) {
MachineInstr *CurrInst = WorkList.pop_back_val();
LLVM_DEBUG(dbgs() << "\nTry combining " << *CurrInst;);
Changed |= tryCombineAll(*CurrInst);
WLObserver->reportFullyCreatedInstrs();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question is, should we refactor void makeDeadMI(MachineInstr &MI, MachineRegisterInfo &MRI);
as a common utility function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cant erase just the TRUNC as it is input for EXT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will also erase MI (intrinsic) in the caller function.
const LLT SrcVecTy = MRI.getType(SrcReg); | ||
const unsigned SrcEltSize = SrcVecTy.getScalarSizeInBits(); | ||
if (SrcEltSize == 8 || SrcEltSize == 16) { | ||
tryToCombineTruncExt(DstReg, SignVal.value(), SrcEltSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can trigger the same multiple erased instructions problem here.
ccbdb07
to
6138a60
Compare
return false; | ||
} | ||
|
||
bool AIE2PreLegalizerCombinerImpl::tryToCombineVExtractElt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have a description of the combiner here.
const Register SignReg = MI.getOperand(4).getReg(); | ||
|
||
const auto SignVal = getIConstantVRegSExtVal(SignReg, MRI); | ||
if (!SignVal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an assert. We always need constant signal to be able to handle this intrinsic.
const LLT SrcVecTy = MRI.getType(SrcReg); | ||
const unsigned SrcEltSize = SrcVecTy.getScalarSizeInBits(); | ||
if (SrcEltSize == 8 || SrcEltSize == 16) { | ||
tryToCombineTruncExt(DstReg, SignVal.value(), SrcEltSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood that this tryToCombineTruncExt
is only safe because we know the range of the output of the defining instruction. However, it would be good to have this as a separated combiner (may looking to the input reg. def.). It will help to solve the erasing problem and we also can have different tests and, as a gift, less coupled code.
/// %2(<64 x s8>), %0(s32), %1(s32) | ||
/// %30:_(s20) = G_TRUNC %10(s32) | ||
/// This also enables S20Narrowing for vextract | ||
bool AIE2PreLegalizerCombinerImpl::tryToCombineTruncExt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this combine here, why don't we just use G_ASSERT_ZEXT
/G_ASSERT_SEXT
?
In your previous commit 16445fb,
we should pre-select
%0:_(s32) = G_INTRINSIC intrinsic(@llvm.aie2.vextract.elem8.I512), ...
into
%new:_(s32) = G_AIE_ZEXT_EXTRACT_VECTOR_ELT
%0:_(s32) = G_ASSERT_ZEXT %new, 16
Then you should get the G_TRUNC + G_ZEXT combine for free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have something like:
%0:_(s32) = G_CONSTANT i32 7
%2:_(<32 x s16>) = COPY $x0
%9:_(s20) = G_AIE_ZEXT_EXTRACT_VECTOR_ELT %2(<32 x s16>), %0(s32)
%3:_(s20) = G_ASSERT_ZEXT %9, 16
%4:_(s16) = G_TRUNC %3(s20)
%5:_(s20) = G_ZEXT %4(s16)
%6:_(p0) = G_CONSTANT i20 0
%7:_(p0), %8:_(s20) = G_INTRINSIC intrinsic(@llvm.aie2.add.2d), %6(p0), %5(s20), %5(s20), %5(s20), %5(s20)
PseudoRET implicit $lr, implicit %7(p0)
However, for SEXT case, there is no combine pattern AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can try to match this pattern explicitly looking to ASSERT, something like:
bool CombinerHelper::matchCombineSextTrunc(MachineInstr &MI, Register &Reg) {
assert(MI.getOpcode() == TargetOpcode::G_SEXT && "Expected a G_SEXT");
Register DstReg = MI.getOperand(0).getReg();
Register SrcReg = MI.getOperand(1).getReg();
LLT DstTy = MRI.getType(DstReg);
if (mi_match(SrcReg, MRI,
m_GTrunc(m_all_of(m_Reg(Reg), m_SpecificType(DstTy))))) {
unsigned DstSize = DstTy.getScalarSizeInBits();
unsigned SrcSize = MRI.getType(SrcReg).getScalarSizeInBits();
MachineInstr* MIDef = MRI.getVRegDef(Reg);
if (MIDef->getOpcode() != TargetOpcode::G_ASSERT_SEXT)
return false;
unsigned ExtBits = MIDef->getOperand(2).getImm();
return SrcSize == ExtBits;
}
return false;
}
And apply replaceSingleDefInstWithReg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of hardcoding the G_ASSERT_SEXT, I'd suggest to use the KnownBits analysis to check if we have known sign bits: KB->computeNumSignBits(Reg)
vextract.8/16/32
intrinsics as source nodes in thes20narrowing
pass.MachineVerifier
has been updated to allowG_AIE_SEXT_EXTRACT_VECTOR_ELT
andG_AIE_ZEXT_EXTRACT_VECTOR_ELT
to accept 20-bit outputs.