Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for allowing direct VEXTRACT to 20-bit registers #233

Open
wants to merge 3 commits into
base: aie-public
Choose a base branch
from

Conversation

abhinay-anubola
Copy link
Collaborator

@abhinay-anubola abhinay-anubola commented Nov 8, 2024

  • This update introduces support for direct VEXTRACT to 20-bit registers by adding vextract.8/16/32 intrinsics as source nodes in the s20narrowing pass.
  • The MachineVerifier has been updated to allow G_AIE_SEXT_EXTRACT_VECTOR_ELT and G_AIE_ZEXT_EXTRACT_VECTOR_ELT to accept 20-bit outputs.
  • Additionally, tests have been added and updated to reflect these functional changes.

@krishnamtibrewala
Copy link
Collaborator

Given that you mentioned there are no QoR gain, I would recommend you to re look at the instruction that consume S20 type reg.
Because for the optimization starts to trace back from an instruction that consumes S20 type which might not be captured in isNativeS20Consumer function.

MachineIRBuilder MIRBuilder(*ExtMI);
MIRBuilder.buildInstr(TargetOpcode::G_TRUNC, {ExtMI->getOperand(0)},
{DstReg});
ExtMI->eraseFromParent();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to not erase 2 instructions here, because we may corrupt the iterator. Let ExtMI to be removed by DCE.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combiner engine may access freed memory:

    while (!WorkList.empty()) {
      MachineInstr *CurrInst = WorkList.pop_back_val();
      LLVM_DEBUG(dbgs() << "\nTry combining " << *CurrInst;);
      Changed |= tryCombineAll(*CurrInst);
      WLObserver->reportFullyCreatedInstrs();
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question is, should we refactor void makeDeadMI(MachineInstr &MI, MachineRegisterInfo &MRI); as a common utility function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cant erase just the TRUNC as it is input for EXT.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will also erase MI (intrinsic) in the caller function.

const LLT SrcVecTy = MRI.getType(SrcReg);
const unsigned SrcEltSize = SrcVecTy.getScalarSizeInBits();
if (SrcEltSize == 8 || SrcEltSize == 16) {
tryToCombineTruncExt(DstReg, SignVal.value(), SrcEltSize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can trigger the same multiple erased instructions problem here.

@abhinay-anubola abhinay-anubola force-pushed the sanubola.support.20bit.VEXTRACT branch 2 times, most recently from ccbdb07 to 6138a60 Compare November 20, 2024 08:23
return false;
}

bool AIE2PreLegalizerCombinerImpl::tryToCombineVExtractElt(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have a description of the combiner here.

const Register SignReg = MI.getOperand(4).getReg();

const auto SignVal = getIConstantVRegSExtVal(SignReg, MRI);
if (!SignVal)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an assert. We always need constant signal to be able to handle this intrinsic.

const LLT SrcVecTy = MRI.getType(SrcReg);
const unsigned SrcEltSize = SrcVecTy.getScalarSizeInBits();
if (SrcEltSize == 8 || SrcEltSize == 16) {
tryToCombineTruncExt(DstReg, SignVal.value(), SrcEltSize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood that this tryToCombineTruncExt is only safe because we know the range of the output of the defining instruction. However, it would be good to have this as a separated combiner (may looking to the input reg. def.). It will help to solve the erasing problem and we also can have different tests and, as a gift, less coupled code.

/// %2(<64 x s8>), %0(s32), %1(s32)
/// %30:_(s20) = G_TRUNC %10(s32)
/// This also enables S20Narrowing for vextract
bool AIE2PreLegalizerCombinerImpl::tryToCombineTruncExt(
Copy link
Collaborator

@konstantinschwarz konstantinschwarz Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this combine here, why don't we just use G_ASSERT_ZEXT/G_ASSERT_SEXT?
In your previous commit 16445fb,
we should pre-select

%0:_(s32) = G_INTRINSIC intrinsic(@llvm.aie2.vextract.elem8.I512), ...

into

%new:_(s32) = G_AIE_ZEXT_EXTRACT_VECTOR_ELT
%0:_(s32) = G_ASSERT_ZEXT %new, 16

Then you should get the G_TRUNC + G_ZEXT combine for free.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have something like:

    %0:_(s32) = G_CONSTANT i32 7
    %2:_(<32 x s16>) = COPY $x0
    %9:_(s20) = G_AIE_ZEXT_EXTRACT_VECTOR_ELT %2(<32 x s16>), %0(s32)
    %3:_(s20) = G_ASSERT_ZEXT %9, 16
    %4:_(s16) = G_TRUNC %3(s20)
    %5:_(s20) = G_ZEXT %4(s16)
    %6:_(p0) = G_CONSTANT i20 0
    %7:_(p0), %8:_(s20) = G_INTRINSIC intrinsic(@llvm.aie2.add.2d), %6(p0), %5(s20), %5(s20), %5(s20), %5(s20)
    PseudoRET implicit $lr, implicit %7(p0)

However, for SEXT case, there is no combine pattern AFAIK.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can try to match this pattern explicitly looking to ASSERT, something like:

bool CombinerHelper::matchCombineSextTrunc(MachineInstr &MI, Register &Reg) {
  assert(MI.getOpcode() == TargetOpcode::G_SEXT && "Expected a G_SEXT");
  Register DstReg = MI.getOperand(0).getReg();
  Register SrcReg = MI.getOperand(1).getReg();
  LLT DstTy = MRI.getType(DstReg);
  if (mi_match(SrcReg, MRI,
               m_GTrunc(m_all_of(m_Reg(Reg), m_SpecificType(DstTy))))) {
    unsigned DstSize = DstTy.getScalarSizeInBits();
    unsigned SrcSize = MRI.getType(SrcReg).getScalarSizeInBits();
    MachineInstr* MIDef = MRI.getVRegDef(Reg);
    if (MIDef->getOpcode() != TargetOpcode::G_ASSERT_SEXT)
      return false;
    unsigned ExtBits = MIDef->getOperand(2).getImm();
    return SrcSize == ExtBits;
  }
  return false;
}

And apply replaceSingleDefInstWithReg

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hardcoding the G_ASSERT_SEXT, I'd suggest to use the KnownBits analysis to check if we have known sign bits: KB->computeNumSignBits(Reg)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants