Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arm64] ASIMD By Element Intrinsics #36916

Merged

Conversation

echesakov
Copy link
Contributor

@echesakov echesakov commented May 23, 2020

Fixes #33683
Fixes #24794 - (by element intrinsics is the remaining part)
Fixes #36300
Fixes #36298
Fixes #33683
Fixes #33490
Fixes #35037 - (InsertSelectedScalar is the remaining part)

@Dotnet-GitSync-Bot
Copy link
Collaborator

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI new-api-needs-documentation labels May 23, 2020
@echesakov echesakov added arch-arm64 area-System.Runtime.Intrinsics and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI new-api-needs-documentation labels May 23, 2020
@ghost
Copy link

ghost commented May 23, 2020

Tagging subscribers to this area: @tannergooding
Notify danmosemsft if you want to be subscribed.

@echesakov echesakov force-pushed the Arm64-ASIMD-BySelectedScalar-ByScalar branch 2 times, most recently from 522dbaa to f745a7e Compare June 11, 2020 01:37
@echesakov echesakov changed the title [Arm64] ASIMD By Element Arithmetic Intrinsics [Arm64] ASIMD By Element Intrinsics Jun 11, 2020
@echesakov
Copy link
Contributor Author

The API part of this PR is ready for review.

This PR includes most of the "by element" intrinsics (excluding the ones that have not been approved yet) and some load/store intrinsic that operate on a single SIMD element. The reason for combining these into one PR is that I found that the JIT changes that I need to make to support all of these are tightly coupled to each other and I was trying to not spread them across many PRs.

@CarolEidt @tannergooding @TamarChristinaArm Can you please take a look while I am working on finishing the JIT part of the changes?

echesakov added 15 commits June 15, 2020 13:00
…n AdvSimd.cs AdvSimd.PlatformNotSupported.cs
…edScalar in AdvSimd.cs AdvSimd.PlatformNotSupported.cs
…Arm32 in AdvSimd.cs AdvSimd.PlatformNotSupported.cs
@echesakov echesakov force-pushed the Arm64-ASIMD-BySelectedScalar-ByScalar branch from a6cc4cd to 283e686 Compare June 15, 2020 20:09
@echesakov echesakov marked this pull request as ready for review June 15, 2020 20:13
@echesakov
Copy link
Contributor Author

@CarolEidt @tannergooding This is ready for review now. Please take a look when you have time. I rebased on top of the changes in #37859

@tannergooding
Copy link
Member

I'll start giving it a look this afternoon. Thanks!

@@ -817,7 +817,7 @@ void emitter::emitInsSanityCheck(instrDesc* id)
assert(isVectorRegister(id->idReg2()));
assert(isVectorRegister(id->idReg3()));
elemsize = optGetElemsize(id->idInsOpt());
assert(isValidVectorIndex(id->idOpSize(), elemsize, emitGetInsSC(id)));
assert(isValidVectorIndex(EA_16BYTE, elemsize, emitGetInsSC(id)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this changed to account for the simd8 result with simd16 selection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of the destination/source registers shouldn't matter here. For example,

FMLA Vd.T, Vn.T, Vm.Ts[index]

index is always encoded as H:L (2 bits) when T is either 2S or 4S (and Ts is S).

In other words, the range of valid values for index is computed based on the assumption that Vm is 16 bytes.

@@ -4728,8 +4728,8 @@ struct GenTreeJitIntrinsic : public GenTreeOp
ClassLayout* m_layout;

union {
var_types gtOtherBaseType; // For AVX2 Gather* intrinsics
regNumberSmall gtOtherReg; // For intrinsics that return 2 registers
var_types gtAuxiliaryType; // For intrinsics than need another type (e.g. Avx2.Gather* or SIMD (by element))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why rename to AuxiliaryType but not AuxiliaryReg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I didn't touch gtOtherReg in other files.

The intent of the renaming was to get rid of BaseType since for SIMD By Element intrinsics I use this field to encode a SIMD type of an indexed element while in some other case we do use it to keep the "other" base type (e.g. in case of wide/long intrinsics). I could've renamed it to gtOtherType.

Do you want me to rename the gtOtherReg -> gtAuxiliaryReg?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was mostly just interested. This makes sense, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the renaming. I think it's reasonable to keep the reg as gtOtherReg, as that naming is used on other GenTree nodes. In any case, if we didn't change it to gtAuxiliaryType we might want to name it gtOtherType (though I'm happy with this change as-is).

static bool SIMDScalar(NamedIntrinsic id)
{
const HWIntrinsicFlag flags = lookupFlags(id);
return (flags & HW_Flag_SIMDScalar) != 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a flag, rather than a category?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because, many intrinsics can have features of SIMDScalar and some other category:

  1. SIMD ShiftLeftByImmediate + SIMDScalar
  2. SIMD ShiftRightByImmediate + SIMDScalar
  3. SIMD ByIndexedElement + SIMDScalar

and I didn't want to have all possible combinations of those.

In addition, on Arm64 flag SIMDScalar is going to be used only in CodeGen to set INS_OPTS_NONE (i.e. switch between vector variant to scalar variant for a given instruction)
while a category value is used in multiple phases (e.g. ByIndexedElement is used in LSRA, Lower and Importer) for making various decisions (what registers can be used for an indexed element, how to perform containment analysis etc.).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Makes sense to me.

HARDWARE_INTRINSIC(AdvSimd, Xor, -1, 2, {INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor}, HW_Category_SimpleSIMD, HW_Flag_Commutative)
HARDWARE_INTRINSIC(AdvSimd, ZeroExtendWideningLower, 8, 1, {INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg)
HARDWARE_INTRINSIC(AdvSimd, ZeroExtendWideningUpper, 16, 1, {INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg)
HARDWARE_INTRINSIC(AdvSimd, Abs, -1, 1, {INS_abs, INS_invalid, INS_abs, INS_invalid, INS_abs, INS_invalid, INS_invalid, INS_invalid, INS_fabs, INS_invalid}, HW_Category_SIMD, HW_Flag_BaseTypeFromFirstArg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Git stops reporting changes for this file after this line. Is it largely just whitespace and flags/category changes to match the previous?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate, so I can only suggest to review in commit-by-commit fashion.
Most of the changes come from this commit so to answer your question - yes, this is to account for flags/category change. I also sorted the values in flags column in alphabetic order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also seems that you added HW_Flag_NoFloatingPointUsed in some places. I confess that I probably haven't carefully scrutinized the changes in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also seems that you added HW_Flag_NoFloatingPointUsed in some places. I confess that I probably haven't carefully scrutinized the changes in this file.

Yes, I added this flag to the intrinsics that operate on general-purpose registers (e.g. crc32h, rbit, clz)

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall - a couple of minor suggestions and request for comment.

@@ -3782,7 +3782,7 @@ class Compiler
GenTree* getArgForHWIntrinsic(var_types argType, CORINFO_CLASS_HANDLE argClass, bool expectAddr = false);
GenTree* impNonConstFallback(NamedIntrinsic intrinsic, var_types simdType, var_types baseType);
GenTree* addRangeCheckIfNeeded(
NamedIntrinsic intrinsic, GenTree* lastOp, bool mustExpand, int immLowerBound, int immUpperBound);
NamedIntrinsic intrinsic, GenTree* immOp, bool mustExpand, int immLowerBound, int immUpperBound);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's minor, but thanks for renaming this - even if/when it was the last op, semantically it's more important that it's an immediate.

@@ -5827,7 +5827,6 @@ void emitter::emitIns_R_R_R(
elemsize = EA_1BYTE;
opt = optMakeArrangement(size, elemsize);
}
assert(isValidArrangement(size, opt));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why these asserts were removed - are they invalid or somehow redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! For some reason I mistakenly though that they are redundant - I will revert this commit.

@@ -6368,6 +6366,11 @@ void emitter::emitIns_R_R_R_I(instruction ins,
assert((opt == INS_OPTS_4H) || (opt == INS_OPTS_2S));
elemsize = optGetElemsize(opt);
assert(isValidVectorIndex(EA_16BYTE, elemsize, imm));
// Restricted to V0-V15 when element size is H
if ((elemsize == EA_2BYTE) && (reg3 >= REG_V16))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the conditions consistent, perhaps this could be

Suggested change
if ((elemsize == EA_2BYTE) && (reg3 >= REG_V16))
if ((elemsize == EA_2BYTE) && ((genRegMask(reg3) & RBM_ASIMD_INDEXED_H_ELEMENT_ALLOWED_REGS) == 0))

@@ -4728,8 +4728,8 @@ struct GenTreeJitIntrinsic : public GenTreeOp
ClassLayout* m_layout;

union {
var_types gtOtherBaseType; // For AVX2 Gather* intrinsics
regNumberSmall gtOtherReg; // For intrinsics that return 2 registers
var_types gtAuxiliaryType; // For intrinsics than need another type (e.g. Avx2.Gather* or SIMD (by element))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the renaming. I think it's reasonable to keep the reg as gtOtherReg, as that naming is used on other GenTree nodes. In any case, if we didn't change it to gtAuxiliaryType we might want to name it gtOtherType (though I'm happy with this change as-is).

@@ -671,6 +680,50 @@ static bool isSupportedBaseType(NamedIntrinsic intrinsic, var_types baseType)
return false;
}

struct HWIntrinsicSignatureReader final
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comment.

src/coreclr/src/jit/hwintrinsic.cpp Outdated Show resolved Hide resolved
{
assert(sig->numArgs == 3);
immOp = impStackTop(1).val;
assert(HWIntrinsicInfo::isImmOp(intrinsic, immOp));
}
else if (intrinsic == NI_AdvSimd_Arm64_InsertSelectedScalar)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this method is already rather large, you might consider extracting this code block, and the one for HW_Category_SIMDByIndexedElement into separate methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I was thinking about splitting the method into multiple methods - or even have a helper class that solely does hardware intrinsic importation - but this is more major refactoring than I want for this PR and likely affect x86/x64 side of the intrinsics, so I decided to defer such change to a later point when all the Arm64 intrinsics are implemented.

#ifdef TARGET_XARCH
if ((intrinsic == NI_SSE42_Crc32) || (intrinsic == NI_SSE42_X64_Crc32))
{
// TODO - currently we use the BaseType to bring the type of the second argument
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This should probably be TODO-ARM64-Cleanup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this should be TODO-XArch-Cleanup - I updated the comment

HARDWARE_INTRINSIC(AdvSimd, Xor, -1, 2, {INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor}, HW_Category_SimpleSIMD, HW_Flag_Commutative)
HARDWARE_INTRINSIC(AdvSimd, ZeroExtendWideningLower, 8, 1, {INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg)
HARDWARE_INTRINSIC(AdvSimd, ZeroExtendWideningUpper, 16, 1, {INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg)
HARDWARE_INTRINSIC(AdvSimd, Abs, -1, 1, {INS_abs, INS_invalid, INS_abs, INS_invalid, INS_abs, INS_invalid, INS_invalid, INS_invalid, INS_fabs, INS_invalid}, HW_Category_SIMD, HW_Flag_BaseTypeFromFirstArg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also seems that you added HW_Flag_NoFloatingPointUsed in some places. I confess that I probably haven't carefully scrutinized the changes in this file.

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@echesakov echesakov closed this Jun 17, 2020
@echesakov echesakov reopened this Jun 17, 2020
@echesakov echesakov closed this Jun 17, 2020
@echesakov echesakov reopened this Jun 17, 2020
@echesakov echesakov requested a review from tannergooding June 17, 2020 17:07
@echesakov echesakov merged commit a23d3b2 into dotnet:master Jun 17, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
@echesakov echesakov deleted the Arm64-ASIMD-BySelectedScalar-ByScalar branch April 13, 2021 20:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.