-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#78303 Add transformation ~v1 & v2 to VectorXxx.AndNot(v1, v2) #81993
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsI created a draft for the issue.
|
src/coreclr/jit/morph.cpp
Outdated
case NI_SSE_And: | ||
case NI_SSE2_And: | ||
case NI_AVX_And: | ||
case NI_AVX2_And: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Vector128/256_And and AdvSimd ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vector64/128/256_And
don't exist outside of import
at the moment so they don't need to be handled.
AdvSimd
should be since we want parity between xarch and arm.
src/coreclr/jit/morph.cpp
Outdated
case NI_AVX_And: | ||
case NI_AVX2_And: | ||
{ | ||
if (node->GetOperandCount() != 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when exactly it might be not 2 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't ever not be 2. If it was, we'd have a buggy node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use an assert then instead?
src/coreclr/jit/morph.cpp
Outdated
if (op1->OperIs(GT_HWINTRINSIC)) | ||
{ | ||
rhs = op2; | ||
inner_hw = op1->AsHWIntrinsic(); | ||
} | ||
// Transforms v2 & (~v1) to VectorXxx.AndNot(v1, v2) | ||
else if (op2->OperIs(GT_HWINTRINSIC)) | ||
{ | ||
rhs = op1; | ||
inner_hw = op2->AsHWIntrinsic(); | ||
} | ||
else | ||
{ | ||
return node; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to miss the optimization for cases like: ((x & y) & ~z)
You're going to need to check that it is a hwintrinsic and that it is the relevant xor
(xarch and arm) or not
(arm only) node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also potentially a concern around side effects
and ensuring that ~x & y
, which must be represented as gtNewSimdBinOpNode(AND_NOT, y, x, ...)
preserves side effects with regards to x
being evaluted before y
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have resolved some comments and pushed them to make sure I got you right.
Could you please give me a hint how to treat not
for arm and how to test it and how to handle the sideeffect case?
src/coreclr/jit/morph.cpp
Outdated
if ((inner_hw->GetOperandCount() != 2) || (!inner_hw->Op(2)->IsVectorAllBitsSet())) | ||
{ | ||
return node; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to check this as part of handling _Xor
below, that way you don't need to check the operand count and its easier for the general logic to support AdvSimd_Not
on Arm64.
src/coreclr/jit/morph.cpp
Outdated
var_types hw_type = node->TypeGet(); | ||
CorInfoType hw_coretype = node->GetSimdBaseJitType(); | ||
unsigned int hw_simdsize = node->GetSimdSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We refer to these as just simdType
, simdBaseJitType
, and simdSize
almost everywhere else in the JIT.
src/coreclr/jit/morph.cpp
Outdated
GenTreeHWIntrinsic* xor_hw = op1->AsHWIntrinsic(); | ||
switch (xor_hw->GetHWIntrinsicId()) | ||
{ | ||
#if defined(TARGET_XARCH) || defined(TARGET_ARM64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary, you're already in a larger identical ifdef
from L10873.
That being said, the larger identical ifdef
on L10873 should also be unnecessary given we're in a greater #ifdef FEATURE_HW_INTRINSICS
src/coreclr/jit/morph.cpp
Outdated
} | ||
|
||
// Transforms v2 & (~v1) to VectorXxx.AndNot(v2, v1) | ||
if (op2->OperIs(GT_HWINTRINSIC)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is going to miss the opt if we have something like ((x ^ AllBitsSet) & (y ^ z)
. Such a tree could have been transformed into AndNot((y ^ z), x)
In general you're going to need to match (op1 ^ AllBitsSet)
up front before determining if its a match and then if that fails do the same check for (op2 ^ AllBitsSet)
.
For Arm64, you'll also need to directly check for ~op1
or ~op2
(since NI_AdvSimd_Not
exists).
There are some things we could do to make this overall simpler, but they are slightly more involved changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd, in general, recommend extracting some of this to a helper.
For example, you could define something like:
genTreeOps GenTreeHWIntrinsic::HWOperGet()
{
switch (GetHWIntrinsicId())
{
#if defined(TARGET_XARCH)
case NI_SSE_And:
case NI_SSE2_And:
case NI_AVX_And:
case NI_AVX2_And:
#elif defined(TARGET_ARM64)
case NI_AdvSimd_And:
#endif
{
return GT_AND;
}
#if defined(TARGET_ARM64)
case NI_AdvSimd_Not:
{
return GT_NOT;
}
#endif
#if defined(TARGET_XARCH)
case NI_SSE_Xor:
case NI_SSE2_Xor:
case NI_AVX_Xor:
case NI_AVX2_Xor:
#elif defined(TARGET_ARM64)
case NI_AdvSimd_Xor:
#endif
{
return GT_XOR;
}
// TODO: Handle other cases
default:
{
return GT_NONE;
}
}
}
Such a helper allows you to instead switch over the genTreeOps
equivalent. So you could have something like:
switch (node->HWOperGet())
{
case GT_AND:
{
GenTree* op1 = node->Op(1);
GenTree* op2 = node->Op(2);
GenTree* lhs = nullptr;
GenTree* rhs = nullptr;
if (op1->OperIsHWIntrinsic())
{
// Try handle: ~op1 & op2
GenTreeHWIntrinsic* hw = op1->AsHWIntrinsic();
genTreeOps hwOper = hw->HWOperGet();
if (hwOper == GT_NOT)
{
lhs = op2;
rhs = op1;
}
else if (op1Oper == GT_XOR)
{
GenTree* hwOp1 = hw->Op(1);
GenTree* hwOp2 = hw->Op(2);
if (hwOp1->IsVectorAllBitsSet())
{
lhs = op2;
rhs = hwOp2;
}
else if (hwOp2->IsVectorAllBitsSet())
{
lhs = op2;
rhs = hwOp1;
}
}
}
if ((lhs == nullptr) && op2->OperIsHWIntrinsic())
{
// Try handle: op1 & ~op2
GenTreeHWIntrinsic* hw = op2->AsHWIntrinsic();
genTreeOps hwOper = hw->HWOperGet();
if (hwOper == GT_NOT)
{
lhs = op1;
rhs = op2;
}
else if (op1Oper == GT_XOR)
{
GenTree* hwOp1 = hw->Op(1);
GenTree* hwOp2 = hw->Op(2);
if (hwOp1->IsVectorAllBitsSet())
{
lhs = op1;
rhs = hwOp2;
}
else if (hwOp2->IsVectorAllBitsSet())
{
lhs = op1;
rhs = hwOp1;
}
}
}
if (lhs == nullptr)
{
break;
}
GenTree* andnNode = gtNewSimdBinOpNode(GT_AND_NOT, simdType, lhs, rhs, simdBaseJitType, simdSize, true);
DEBUG_DESTROY_NODE(node);
INDEBUG(andnNode->gtDebugFlags |= GTF_DEBUG_NODE_MORPHED);
return andnNode;
}
default:
{
break;
}
}
You could of course also extract the NOT op
vs op XOR AllBitsSet
matching logic to reduce duplication as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Longer term, I think we may want to introduce a "fake" Isa_Not
hwintrinsic id for xarch. That would allow morph to transform x ^ AllBitsSet
into Isa_Not
and then would in turn allow this case to be simplified in its pattern checks.
We may also want to normalize cases like Sse_Xor
, Sse2_Xor
, and AdvSimd_Xor
into Vector128_Xor
, so we don't need to consider xplat differences. But that will also involve significant refactorings, far more so than introducing a HWOperGet
helper for the time being.
|
||
genTreeOps GenTreeHWIntrinsic::HWOperGet() | ||
{ | ||
switch (GetHWIntrinsicId()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding do you think we can then (not necessarily in this PR) add this to the table where intrinsics are defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to do so, yes.
However, I rather think we'd want to represent it just a little bit differently to avoid bloating the metadata tables given most intrinsics end up as none.
src/coreclr/jit/morph.cpp
Outdated
|
||
// Transforms: | ||
// 1.(~v1 & v2) to VectorXxx.AndNot(v1, v2) | ||
// 2.(v1 & (~v2)) to VectorXxx.AndNot(v1, v2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean (v2, v1)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@SkiFoD, please resolve the merge conflict. .NET 8 rc1 snap is 8/14. |
OK, I will look at it as soon as possible. |
c68ba6f
to
bec1de3
Compare
@SkiFoD thanks! sorry for the delay, there was also a small issue in the codegen that I fixed |
The new logic introduced in dotnet#81993 would swap the LHS and RHS of the operands without any additional checks for side effects. Fix dotnet#91855
I created a draft for the issue #78303
The code converts ~v1 & v2 to VectorXxx.AndNot(v1, v2)