Skip to content

Commit

Permalink
Introduce intrinsic llvm.isnan
Browse files Browse the repository at this point in the history
This is recommit of the patch 16ff91e,
reverted in 0c28a7c because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:

    Clang has builtin function '__builtin_isnan', which implements C
    library function 'isnan'. This function now is implemented entirely in
    clang codegen, which expands the function into set of IR operations.
    There are three mechanisms by which the expansion can be made.

    * The most common mechanism is using an unordered comparison made by
      instruction 'fcmp uno'. This simple solution is target-independent
      and works well in most cases. It however is not suitable if floating
      point exceptions are tracked. Corresponding IEEE 754 operation and C
      function must never raise FP exception, even if the argument is a
      signaling NaN. Compare instructions usually does not have such
      property, they raise 'invalid' exception in such case. So this
      mechanism is unsuitable when exception behavior is strict. In
      particular it could result in unexpected trapping if argument is SNaN.

    * Another solution was implemented in https://reviews.llvm.org/D95948.
      It is used in the cases when raising FP exceptions by 'isnan' is not
      allowed. This solution implements 'isnan' using integer operations.
      It solves the problem of exceptions, but offers one solution for all
      targets, however some can do the check in more efficient way.

    * Solution implemented by https://reviews.llvm.org/D96568 introduced a
      hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
      specific code into IR. Now only SystemZ implements this hook and it
      generates a call to target specific intrinsic function.

    Although these mechanisms allow to implement 'isnan' with enough
    efficiency, expanding 'isnan' in clang has drawbacks:

    * The operation 'isnan' is hidden behind generic integer operations or
      target-specific intrinsics. It complicates analysis and can prevent
      some optimizations.

    * IR can be created by tools other than clang, in this case treatment
      of 'isnan' has to be duplicated in that tool.

    Another issue with the current implementation of 'isnan' comes from the
    use of options '-ffast-math' or '-fno-honor-nans'. If such option is
    specified, 'fcmp uno' may be optimized to 'false'. It is valid
    optimization in general, but it results in 'isnan' always returning
    'false'. For example, in some libc++ implementations the following code
    returns 'false':

        std::isnan(std::numeric_limits<float>::quiet_NaN())

    The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
    operands are never NaNs. This assumption however should not be applied
    to the functions that check FP number properties, including 'isnan'. If
    such function returns expected result instead of actually making
    checks, it becomes useless in many cases. The option '-ffast-math' is
    often used for performance critical code, as it can speed up execution
    by the expense of manual treatment of corner cases. If 'isnan' returns
    assumed result, a user cannot use it in the manual treatment of NaNs
    and has to invent replacements, like making the check using integer
    operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
    which also expresses the opinion, that limitations imposed by
    '-ffast-math' should be applied only to 'math' functions but not to
    'tests'.

    To overcome these drawbacks, this change introduces a new IR intrinsic
    function 'llvm.isnan', which realizes the check as specified by IEEE-754
    and C standards in target-agnostic way. During IR transformations it
    does not undergo undesirable optimizations. It reaches instruction
    selection, where is lowered in target-dependent way. The lowering can
    vary depending on options like '-ffast-math' or '-ffp-model' so the
    resulting code satisfies requested semantics.

    Differential Revision: https://reviews.llvm.org/D104854
  • Loading branch information
spavloff committed Aug 6, 2021
1 parent 3e58dd1 commit 4c4093e
Show file tree
Hide file tree
Showing 24 changed files with 2,604 additions and 145 deletions.
28 changes: 4 additions & 24 deletions clang/lib/CodeGen/CGBuiltin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3068,37 +3068,17 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
// ZExt bool to int type.
return RValue::get(Builder.CreateZExt(LHS, ConvertType(E->getType())));
}

case Builtin::BI__builtin_isnan: {
CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E);
Value *V = EmitScalarExpr(E->getArg(0));
llvm::Type *Ty = V->getType();
const llvm::fltSemantics &Semantics = Ty->getFltSemantics();
if (!Builder.getIsFPConstrained() ||
Builder.getDefaultConstrainedExcept() == fp::ebIgnore ||
!Ty->isIEEE()) {
V = Builder.CreateFCmpUNO(V, V, "cmp");
return RValue::get(Builder.CreateZExt(V, ConvertType(E->getType())));
}

if (Value *Result = getTargetHooks().testFPKind(V, BuiltinID, Builder, CGM))
return RValue::get(Result);

// NaN has all exp bits set and a non zero significand. Therefore:
// isnan(V) == ((exp mask - (abs(V) & exp mask)) < 0)
unsigned bitsize = Ty->getScalarSizeInBits();
llvm::IntegerType *IntTy = Builder.getIntNTy(bitsize);
Value *IntV = Builder.CreateBitCast(V, IntTy);
APInt AndMask = APInt::getSignedMaxValue(bitsize);
Value *AbsV =
Builder.CreateAnd(IntV, llvm::ConstantInt::get(IntTy, AndMask));
APInt ExpMask = APFloat::getInf(Semantics).bitcastToAPInt();
Value *Sub =
Builder.CreateSub(llvm::ConstantInt::get(IntTy, ExpMask), AbsV);
// V = sign bit (Sub) <=> V = (Sub < 0)
V = Builder.CreateLShr(Sub, llvm::ConstantInt::get(IntTy, bitsize - 1));
if (bitsize > 32)
V = Builder.CreateTrunc(V, ConvertType(E->getType()));
return RValue::get(V);
Function *F = CGM.getIntrinsic(Intrinsic::isnan, V->getType());
Value *Call = Builder.CreateCall(F, V);
return RValue::get(Builder.CreateZExt(Call, ConvertType(E->getType())));
}

case Builtin::BI__builtin_matrix_transpose: {
Expand Down
37 changes: 17 additions & 20 deletions clang/test/CodeGen/X86/strictfp_builtins.c
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ int printf(const char *, ...);
// CHECK-NEXT: store i32 [[X:%.*]], i32* [[X_ADDR]], align 4
// CHECK-NEXT: [[TMP0:%.*]] = load i8*, i8** [[STR_ADDR]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[X_ADDR]], align 4
// CHECK-NEXT: [[CALL:%.*]] = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @.str, i64 0, i64 0), i8* [[TMP0]], i32 [[TMP1]]) [[ATTR4:#.*]]
// CHECK-NEXT: [[CALL:%.*]] = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @.str, i64 0, i64 0), i8* [[TMP0]], i32 [[TMP1]]) #[[ATTR3:[0-9]+]]
// CHECK-NEXT: ret void
//
void p(char *str, int x) {
Expand All @@ -29,13 +29,13 @@ void p(char *str, int x) {
// CHECK-LABEL: @test_long_double_isinf(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
// CHECK-NEXT: store x86_fp80 [[D:%.*]], x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT: store x86_fp80 [[LD:%.*]], x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT: [[TMP0:%.*]] = load x86_fp80, x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT: [[BITCAST:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
// CHECK-NEXT: [[SHL1:%.*]] = shl i80 [[BITCAST]], 1
// CHECK-NEXT: [[CMP:%.*]] = icmp eq i80 [[SHL1]], -18446744073709551616
// CHECK-NEXT: [[RES:%.*]] = zext i1 [[CMP]] to i32
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.[[#STRID:1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
// CHECK-NEXT: [[TMP1:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
// CHECK-NEXT: [[TMP2:%.*]] = shl i80 [[TMP1]], 1
// CHECK-NEXT: [[TMP3:%.*]] = icmp eq i80 [[TMP2]], -18446744073709551616
// CHECK-NEXT: [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.1, i64 0, i64 0), i32 [[TMP4]]) #[[ATTR3]]
// CHECK-NEXT: ret void
//
void test_long_double_isinf(long double ld) {
Expand All @@ -47,13 +47,13 @@ void test_long_double_isinf(long double ld) {
// CHECK-LABEL: @test_long_double_isfinite(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
// CHECK-NEXT: store x86_fp80 [[D:%.*]], x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT: store x86_fp80 [[LD:%.*]], x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT: [[TMP0:%.*]] = load x86_fp80, x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT: [[BITCAST:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
// CHECK-NEXT: [[SHL1:%.*]] = shl i80 [[BITCAST]], 1
// CHECK-NEXT: [[CMP:%.*]] = icmp ult i80 [[SHL1]], -18446744073709551616
// CHECK-NEXT: [[RES:%.*]] = zext i1 [[CMP]] to i32
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.[[#STRID:STRID+1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
// CHECK-NEXT: [[TMP1:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
// CHECK-NEXT: [[TMP2:%.*]] = shl i80 [[TMP1]], 1
// CHECK-NEXT: [[TMP3:%.*]] = icmp ult i80 [[TMP2]], -18446744073709551616
// CHECK-NEXT: [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.2, i64 0, i64 0), i32 [[TMP4]]) #[[ATTR3]]
// CHECK-NEXT: ret void
//
void test_long_double_isfinite(long double ld) {
Expand All @@ -65,14 +65,11 @@ void test_long_double_isfinite(long double ld) {
// CHECK-LABEL: @test_long_double_isnan(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
// CHECK-NEXT: store x86_fp80 [[D:%.*]], x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT: store x86_fp80 [[LD:%.*]], x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT: [[TMP0:%.*]] = load x86_fp80, x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT: [[BITCAST:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
// CHECK-NEXT: [[ABS:%.*]] = and i80 [[BITCAST]], 604462909807314587353087
// CHECK-NEXT: [[TMP1:%.*]] = sub i80 604453686435277732577280, [[ABS]]
// CHECK-NEXT: [[ISNAN:%.*]] = lshr i80 [[TMP1]], 79
// CHECK-NEXT: [[RES:%.*]] = trunc i80 [[ISNAN]] to i32
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.[[#STRID:STRID+1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
// CHECK-NEXT: [[TMP1:%.*]] = call i1 @llvm.isnan.f80(x86_fp80 [[TMP0]]) #[[ATTR3]]
// CHECK-NEXT: [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.3, i64 0, i64 0), i32 [[TMP2]]) #[[ATTR3]]
// CHECK-NEXT: ret void
//
void test_long_double_isnan(long double ld) {
Expand Down
38 changes: 18 additions & 20 deletions clang/test/CodeGen/aarch64-strictfp-builtins.c
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
// RUN: %clang_cc1 %s -emit-llvm -ffp-exception-behavior=maytrap -fexperimental-strict-floating-point -o - -triple arm64-none-linux-gnu | FileCheck %s

// Test that the constrained intrinsics are picking up the exception
Expand All @@ -15,7 +16,7 @@ int printf(const char *, ...);
// CHECK-NEXT: store i32 [[X:%.*]], i32* [[X_ADDR]], align 4
// CHECK-NEXT: [[TMP0:%.*]] = load i8*, i8** [[STR_ADDR]], align 8
// CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[X_ADDR]], align 4
// CHECK-NEXT: [[CALL:%.*]] = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @.str, i64 0, i64 0), i8* [[TMP0]], i32 [[TMP1]]) [[ATTR4:#.*]]
// CHECK-NEXT: [[CALL:%.*]] = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @.str, i64 0, i64 0), i8* [[TMP0]], i32 [[TMP1]]) #[[ATTR3:[0-9]+]]
// CHECK-NEXT: ret void
//
void p(char *str, int x) {
Expand All @@ -27,13 +28,13 @@ void p(char *str, int x) {
// CHECK-LABEL: @test_long_double_isinf(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[LD_ADDR:%.*]] = alloca fp128, align 16
// CHECK-NEXT: store fp128 [[D:%.*]], fp128* [[LD_ADDR]], align 16
// CHECK-NEXT: store fp128 [[LD:%.*]], fp128* [[LD_ADDR]], align 16
// CHECK-NEXT: [[TMP0:%.*]] = load fp128, fp128* [[LD_ADDR]], align 16
// CHECK-NEXT: [[BITCAST:%.*]] = bitcast fp128 [[TMP0]] to i128
// CHECK-NEXT: [[SHL1:%.*]] = shl i128 [[BITCAST]], 1
// CHECK-NEXT: [[CMP:%.*]] = icmp eq i128 [[SHL1]], -10384593717069655257060992658440192
// CHECK-NEXT: [[RES:%.*]] = zext i1 [[CMP]] to i32
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.[[#STRID:1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
// CHECK-NEXT: [[TMP1:%.*]] = bitcast fp128 [[TMP0]] to i128
// CHECK-NEXT: [[TMP2:%.*]] = shl i128 [[TMP1]], 1
// CHECK-NEXT: [[TMP3:%.*]] = icmp eq i128 [[TMP2]], -10384593717069655257060992658440192
// CHECK-NEXT: [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.1, i64 0, i64 0), i32 [[TMP4]]) #[[ATTR3]]
// CHECK-NEXT: ret void
//
void test_long_double_isinf(long double ld) {
Expand All @@ -45,13 +46,13 @@ void test_long_double_isinf(long double ld) {
// CHECK-LABEL: @test_long_double_isfinite(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[LD_ADDR:%.*]] = alloca fp128, align 16
// CHECK-NEXT: store fp128 [[D:%.*]], fp128* [[LD_ADDR]], align 16
// CHECK-NEXT: store fp128 [[LD:%.*]], fp128* [[LD_ADDR]], align 16
// CHECK-NEXT: [[TMP0:%.*]] = load fp128, fp128* [[LD_ADDR]], align 16
// CHECK-NEXT: [[BITCAST:%.*]] = bitcast fp128 [[TMP0]] to i128
// CHECK-NEXT: [[SHL1:%.*]] = shl i128 [[BITCAST]], 1
// CHECK-NEXT: [[CMP:%.*]] = icmp ult i128 [[SHL1]], -10384593717069655257060992658440192
// CHECK-NEXT: [[RES:%.*]] = zext i1 [[CMP]] to i32
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.[[#STRID:STRID+1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
// CHECK-NEXT: [[TMP1:%.*]] = bitcast fp128 [[TMP0]] to i128
// CHECK-NEXT: [[TMP2:%.*]] = shl i128 [[TMP1]], 1
// CHECK-NEXT: [[TMP3:%.*]] = icmp ult i128 [[TMP2]], -10384593717069655257060992658440192
// CHECK-NEXT: [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.2, i64 0, i64 0), i32 [[TMP4]]) #[[ATTR3]]
// CHECK-NEXT: ret void
//
void test_long_double_isfinite(long double ld) {
Expand All @@ -63,14 +64,11 @@ void test_long_double_isfinite(long double ld) {
// CHECK-LABEL: @test_long_double_isnan(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[LD_ADDR:%.*]] = alloca fp128, align 16
// CHECK-NEXT: store fp128 [[D:%.*]], fp128* [[LD_ADDR]], align 16
// CHECK-NEXT: store fp128 [[LD:%.*]], fp128* [[LD_ADDR]], align 16
// CHECK-NEXT: [[TMP0:%.*]] = load fp128, fp128* [[LD_ADDR]], align 16
// CHECK-NEXT: [[BITCAST:%.*]] = bitcast fp128 [[TMP0]] to i128
// CHECK-NEXT: [[ABS:%.*]] = and i128 [[BITCAST]], 170141183460469231731687303715884105727
// CHECK-NEXT: [[TMP1:%.*]] = sub i128 170135991163610696904058773219554885632, [[ABS]]
// CHECK-NEXT: [[ISNAN:%.*]] = lshr i128 [[TMP1]], 127
// CHECK-NEXT: [[RES:%.*]] = trunc i128 [[ISNAN]] to i32
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.[[#STRID:STRID+1]], i64 0, i64 0), i32 [[RES]])
// CHECK-NEXT: [[TMP1:%.*]] = call i1 @llvm.isnan.f128(fp128 [[TMP0]]) #[[ATTR3]]
// CHECK-NEXT: [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.3, i64 0, i64 0), i32 [[TMP2]]) #[[ATTR3]]
// CHECK-NEXT: ret void
//
void test_long_double_isnan(long double ld) {
Expand Down
Loading

0 comments on commit 4c4093e

Please sign in to comment.