Introduce intrinsic llvm.isnan

This is recommit of the patch 16ff91e, reverted in 0c28a7c because it had an error in call of getFastMathFlags (base type should be FPMathOperator but not Instruction). The original commit message is duplicated below: Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854
vladimirlaz · Aug 6, 2021 · 4c4093e · 4c4093e
1 parent 3e58dd1
commit 4c4093e
Show file tree

Hide file tree

Showing 24 changed files with 2,604 additions and 145 deletions.
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -3068,37 +3068,17 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     // ZExt bool to int type.
     return RValue::get(Builder.CreateZExt(LHS, ConvertType(E->getType())));
   }
+
   case Builtin::BI__builtin_isnan: {
     CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E);
     Value *V = EmitScalarExpr(E->getArg(0));
-    llvm::Type *Ty = V->getType();
-    const llvm::fltSemantics &Semantics = Ty->getFltSemantics();
-    if (!Builder.getIsFPConstrained() ||
-        Builder.getDefaultConstrainedExcept() == fp::ebIgnore ||
-        !Ty->isIEEE()) {
-      V = Builder.CreateFCmpUNO(V, V, "cmp");
-      return RValue::get(Builder.CreateZExt(V, ConvertType(E->getType())));
-    }
 
     if (Value *Result = getTargetHooks().testFPKind(V, BuiltinID, Builder, CGM))
       return RValue::get(Result);
 
-    // NaN has all exp bits set and a non zero significand. Therefore:
-    // isnan(V) == ((exp mask - (abs(V) & exp mask)) < 0)
-    unsigned bitsize = Ty->getScalarSizeInBits();
-    llvm::IntegerType *IntTy = Builder.getIntNTy(bitsize);
-    Value *IntV = Builder.CreateBitCast(V, IntTy);
-    APInt AndMask = APInt::getSignedMaxValue(bitsize);
-    Value *AbsV =
-        Builder.CreateAnd(IntV, llvm::ConstantInt::get(IntTy, AndMask));
-    APInt ExpMask = APFloat::getInf(Semantics).bitcastToAPInt();
-    Value *Sub =
-        Builder.CreateSub(llvm::ConstantInt::get(IntTy, ExpMask), AbsV);
-    // V = sign bit (Sub) <=> V = (Sub < 0)
-    V = Builder.CreateLShr(Sub, llvm::ConstantInt::get(IntTy, bitsize - 1));
-    if (bitsize > 32)
-      V = Builder.CreateTrunc(V, ConvertType(E->getType()));
-    return RValue::get(V);
+    Function *F = CGM.getIntrinsic(Intrinsic::isnan, V->getType());
+    Value *Call = Builder.CreateCall(F, V);
+    return RValue::get(Builder.CreateZExt(Call, ConvertType(E->getType())));
   }
 
   case Builtin::BI__builtin_matrix_transpose: {

diff --git a/clang/test/CodeGen/X86/strictfp_builtins.c b/clang/test/CodeGen/X86/strictfp_builtins.c
@@ -17,7 +17,7 @@ int printf(const char *, ...);
 // CHECK-NEXT:    store i32 [[X:%.*]], i32* [[X_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load i8*, i8** [[STR_ADDR]], align 8
 // CHECK-NEXT:    [[TMP1:%.*]] = load i32, i32* [[X_ADDR]], align 4
-// CHECK-NEXT:    [[CALL:%.*]] = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @.str, i64 0, i64 0), i8* [[TMP0]], i32 [[TMP1]]) [[ATTR4:#.*]]
+// CHECK-NEXT:    [[CALL:%.*]] = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @.str, i64 0, i64 0), i8* [[TMP0]], i32 [[TMP1]]) #[[ATTR3:[0-9]+]]
 // CHECK-NEXT:    ret void
 //
 void p(char *str, int x) {
@@ -29,13 +29,13 @@ void p(char *str, int x) {
 // CHECK-LABEL: @test_long_double_isinf(
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
-// CHECK-NEXT:    store x86_fp80 [[D:%.*]], x86_fp80* [[LD_ADDR]], align 16
+// CHECK-NEXT:    store x86_fp80 [[LD:%.*]], x86_fp80* [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, x86_fp80* [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[BITCAST:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
-// CHECK-NEXT:    [[SHL1:%.*]] = shl i80 [[BITCAST]], 1
-// CHECK-NEXT:    [[CMP:%.*]] = icmp eq i80 [[SHL1]], -18446744073709551616
-// CHECK-NEXT:    [[RES:%.*]] = zext i1 [[CMP]] to i32
-// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.[[#STRID:1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
+// CHECK-NEXT:    [[TMP2:%.*]] = shl i80 [[TMP1]], 1
+// CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i80 [[TMP2]], -18446744073709551616
+// CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
+// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.1, i64 0, i64 0), i32 [[TMP4]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_long_double_isinf(long double ld) {
@@ -47,13 +47,13 @@ void test_long_double_isinf(long double ld) {
 // CHECK-LABEL: @test_long_double_isfinite(
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
-// CHECK-NEXT:    store x86_fp80 [[D:%.*]], x86_fp80* [[LD_ADDR]], align 16
+// CHECK-NEXT:    store x86_fp80 [[LD:%.*]], x86_fp80* [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, x86_fp80* [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[BITCAST:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
-// CHECK-NEXT:    [[SHL1:%.*]] = shl i80 [[BITCAST]], 1
-// CHECK-NEXT:    [[CMP:%.*]] = icmp ult i80 [[SHL1]], -18446744073709551616
-// CHECK-NEXT:    [[RES:%.*]] = zext i1 [[CMP]] to i32
-// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.[[#STRID:STRID+1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
+// CHECK-NEXT:    [[TMP2:%.*]] = shl i80 [[TMP1]], 1
+// CHECK-NEXT:    [[TMP3:%.*]] = icmp ult i80 [[TMP2]], -18446744073709551616
+// CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
+// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.2, i64 0, i64 0), i32 [[TMP4]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_long_double_isfinite(long double ld) {
@@ -65,14 +65,11 @@ void test_long_double_isfinite(long double ld) {
 // CHECK-LABEL: @test_long_double_isnan(
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
-// CHECK-NEXT:    store x86_fp80 [[D:%.*]], x86_fp80* [[LD_ADDR]], align 16
+// CHECK-NEXT:    store x86_fp80 [[LD:%.*]], x86_fp80* [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, x86_fp80* [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[BITCAST:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
-// CHECK-NEXT:    [[ABS:%.*]] = and i80 [[BITCAST]], 604462909807314587353087
-// CHECK-NEXT:    [[TMP1:%.*]] = sub i80 604453686435277732577280, [[ABS]]
-// CHECK-NEXT:    [[ISNAN:%.*]] = lshr i80 [[TMP1]], 79
-// CHECK-NEXT:    [[RES:%.*]] = trunc i80 [[ISNAN]] to i32
-// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.[[#STRID:STRID+1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.isnan.f80(x86_fp80 [[TMP0]]) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
+// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.3, i64 0, i64 0), i32 [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_long_double_isnan(long double ld) {

diff --git a/clang/test/CodeGen/aarch64-strictfp-builtins.c b/clang/test/CodeGen/aarch64-strictfp-builtins.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
 // RUN: %clang_cc1 %s -emit-llvm -ffp-exception-behavior=maytrap -fexperimental-strict-floating-point -o - -triple arm64-none-linux-gnu | FileCheck %s
 
 // Test that the constrained intrinsics are picking up the exception
@@ -15,7 +16,7 @@ int printf(const char *, ...);
 // CHECK-NEXT:    store i32 [[X:%.*]], i32* [[X_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load i8*, i8** [[STR_ADDR]], align 8
 // CHECK-NEXT:    [[TMP1:%.*]] = load i32, i32* [[X_ADDR]], align 4
-// CHECK-NEXT:    [[CALL:%.*]] = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @.str, i64 0, i64 0), i8* [[TMP0]], i32 [[TMP1]])  [[ATTR4:#.*]]
+// CHECK-NEXT:    [[CALL:%.*]] = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @.str, i64 0, i64 0), i8* [[TMP0]], i32 [[TMP1]]) #[[ATTR3:[0-9]+]]
 // CHECK-NEXT:    ret void
 //
 void p(char *str, int x) {
@@ -27,13 +28,13 @@ void p(char *str, int x) {
 // CHECK-LABEL: @test_long_double_isinf(
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca fp128, align 16
-// CHECK-NEXT:    store fp128 [[D:%.*]], fp128* [[LD_ADDR]], align 16
+// CHECK-NEXT:    store fp128 [[LD:%.*]], fp128* [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load fp128, fp128* [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[BITCAST:%.*]] = bitcast fp128 [[TMP0]] to i128
-// CHECK-NEXT:    [[SHL1:%.*]] = shl i128 [[BITCAST]], 1
-// CHECK-NEXT:    [[CMP:%.*]] = icmp eq i128 [[SHL1]], -10384593717069655257060992658440192
-// CHECK-NEXT:    [[RES:%.*]] = zext i1 [[CMP]] to i32
-// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.[[#STRID:1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = bitcast fp128 [[TMP0]] to i128
+// CHECK-NEXT:    [[TMP2:%.*]] = shl i128 [[TMP1]], 1
+// CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i128 [[TMP2]], -10384593717069655257060992658440192
+// CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
+// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.1, i64 0, i64 0), i32 [[TMP4]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_long_double_isinf(long double ld) {
@@ -45,13 +46,13 @@ void test_long_double_isinf(long double ld) {
 // CHECK-LABEL: @test_long_double_isfinite(
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca fp128, align 16
-// CHECK-NEXT:    store fp128 [[D:%.*]], fp128* [[LD_ADDR]], align 16
+// CHECK-NEXT:    store fp128 [[LD:%.*]], fp128* [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load fp128, fp128* [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[BITCAST:%.*]] = bitcast fp128 [[TMP0]] to i128
-// CHECK-NEXT:    [[SHL1:%.*]] = shl i128 [[BITCAST]], 1
-// CHECK-NEXT:    [[CMP:%.*]] = icmp ult i128 [[SHL1]], -10384593717069655257060992658440192
-// CHECK-NEXT:    [[RES:%.*]] = zext i1 [[CMP]] to i32
-// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.[[#STRID:STRID+1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = bitcast fp128 [[TMP0]] to i128
+// CHECK-NEXT:    [[TMP2:%.*]] = shl i128 [[TMP1]], 1
+// CHECK-NEXT:    [[TMP3:%.*]] = icmp ult i128 [[TMP2]], -10384593717069655257060992658440192
+// CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
+// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.2, i64 0, i64 0), i32 [[TMP4]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_long_double_isfinite(long double ld) {
@@ -63,14 +64,11 @@ void test_long_double_isfinite(long double ld) {
 // CHECK-LABEL: @test_long_double_isnan(
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca fp128, align 16
-// CHECK-NEXT:    store fp128 [[D:%.*]], fp128* [[LD_ADDR]], align 16
+// CHECK-NEXT:    store fp128 [[LD:%.*]], fp128* [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load fp128, fp128* [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[BITCAST:%.*]] = bitcast fp128 [[TMP0]] to i128
-// CHECK-NEXT:    [[ABS:%.*]] = and i128 [[BITCAST]], 170141183460469231731687303715884105727
-// CHECK-NEXT:    [[TMP1:%.*]] = sub i128 170135991163610696904058773219554885632, [[ABS]]
-// CHECK-NEXT:    [[ISNAN:%.*]] = lshr i128 [[TMP1]], 127
-// CHECK-NEXT:    [[RES:%.*]] = trunc i128 [[ISNAN]] to i32
-// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.[[#STRID:STRID+1]], i64 0, i64 0), i32 [[RES]])
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.isnan.f128(fp128 [[TMP0]]) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
+// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.3, i64 0, i64 0), i32 [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void test_long_double_isnan(long double ld) {