-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pure Julia ldexp function #19491
Conversation
22c56ef
to
d680b8f
Compare
ldexp(x::Float64,e::Integer) = ccall((:scalbn,libm), Float64, (Float64,Int32), x, Int32(e)) | ||
ldexp(x::Float32,e::Integer) = ccall((:scalbnf,libm), Float32, (Float32,Int32), x, Int32(e)) | ||
function ldexp{T<:AbstractFloat}(x::T, q::Integer) | ||
n = Int(q) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might as well correctly handle values outside of the range of Int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I have this is if q::Int128
then things are type unstable with k + q
. It's not a problem for Int32 or Int64
return flipsign(T(0.0), x) | ||
end | ||
end | ||
k += n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this overflows (e.g. if n = typemax(Int)
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k
will overflow and be negative and line 378 catches that case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could handle Int128
and BigInt
by moving line 353 here, and
if q > 50000
return flipsign(T(Inf), x)
elseif q < -50000
return flipsign(T(0), x)
end
n = q % Int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would add another branch, is it worth it, because you still have to check that k
is not overflowing and underflowing in addition to this, which currently is cleverly handled. This adds, based on benchmarking the various branches, approximately a 1ns penalty.
Currently we do a lot worse and cast to Int32.
I don't necessarily have a problem with this, but question if this is the best way to do it. The quest for the most generic function is satisfying, but it typically entails performance penalties.
Internally for my codes that need this function, I don't use this since it's a bit too slow in practice but instead use the following:
https://github.com/JuliaMath/Amal.jl/blob/master/src/ldexp.jl
it has some important cases where it doesn't pass all cases, but that's not important since those cases are handeled externally.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, another option would be
if q > typemax(Int)
return flipsign(T(Inf), x)
elseif q < typemin(Int)
return flipsign(T(0), x)
end
n = q % Int
which should be able to optimise away the check if q::Int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice!
9b88338
to
ce4a913
Compare
ys = xs << m | ||
xu = ys | (xu & sign_mask(T)) | ||
k = 1 - signed(m) | ||
if n < -50000 # underflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only needs to checked in the subnormal number case, otherwise k > 0 and it's not possible to underflow, thereby eliminating an unecessary branch.
3782359
to
928c973
Compare
inttype(::Type{Float64}) = Int64 | ||
inttype(::Type{Float32}) = Int32 | ||
inttype(::Type{Float16}) = Int16 | ||
exponent_max{T<:AbstractFloat}(::Type{T}) = Int(exponent_mask(T) >> significand_bits(T)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonbyrne what do you think about these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to do without inttype
(see comment below), but if not perhaps a more descriptive name (maybe intofwidth
or something like that?).
an exponent_max
function seems reasonable, but as it's really the biased exponent, maybe exponent_raw_max
?
Also, it would probably be more appropriate to have these in float.jl rather than here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we don't need inttype
using your comment below.
Another case where I think this function would be useful is for something like this
intasfloat{T<:IEEEFloat}(::Type{T}, m::Integer) = reinterpret(T, (m % inttype(T)) << significand_bits(T))
but personally I think it's best to not include it, until there's actually another function that would benefit.
for exponent_max
I've just inlined it. It seems rather special case to add it as a another function in float.jl
. However, if you have objections I don't mind to put it in float.jl or similar
return flipsign(T(Inf), x) | ||
end | ||
if k > 0 # normal case | ||
xu = (xu & ~exponent_mask(T)) | unsigned((k % inttype(T)) << significand_bits(T)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not k % typeof(xu)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep thanks
ys = xs << unsigned(m) | ||
xu = ys | (xu & sign_mask(T)) | ||
k = 1 - m | ||
if q < -50000 # underflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This number is more than safe for up to Float128
Fixed some typos in the comment and cleaned things up. |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
Would you mind adding some tests for |
Get rid of openlibm ldexp function for a pure Julia implementation, which is also faster.
hi @simonbyrne I added tests, thanks. |
I made a very trivial change to using |
I've already mistakenly merged 1 "trivial" change that broke things in the last 24 hours, so maybe best to wait. |
@@ -72,22 +72,22 @@ end | |||
@test y == 0 | |||
|
|||
# more ldexp tests | |||
@test ldexp(T(0.8), 4) === T(12.8) | |||
@test ldexp(T(-0.854375), 5) === T(-27.34) | |||
@test ldexp(T(0.8), 4) == T(12.8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why test less specifically? ===
verifies that the types are the same. Would it fail here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no it doesn't fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then ===
is better, as ==
could miss return-type regressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rest of the file is inconsistent anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok w/e I dropped the commit, we can now merge
@@ -349,8 +349,49 @@ julia> ldexp(5., 2) | |||
20.0 | |||
``` | |||
""" | |||
ldexp(x::Float64,e::Integer) = ccall((:scalbn,libm), Float64, (Float64,Int32), x, Int32(e)) | |||
ldexp(x::Float32,e::Integer) = ccall((:scalbnf,libm), Float32, (Float32,Int32), x, Int32(e)) | |||
function ldexp{T<:AbstractFloat}(x::T, e::Integer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't really valid for every single AbstractFloat subtype, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not: realistically it's probably only valid for Float16
,Float32
and Float64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it assumes:
- Is a
bitstype
. - Is base-2
- Uses the same layout as the IEEE format (sign | biased exponent | significand )
- Uses gradual underflow for subnormals
So it could be valid for, e.g. an x87 Float80
or an IEEE Float128
, but not BigFloat
, IEEE decimal (as in DecFP.jl), DoubleDoubles, IBM floats or formats which do funny things like two's complement exponents.
Get rid of openlibm ldexp function for a pure Julia implementation, which is also faster.
benchmark (median) using BenchmarkTools (master) and julia 0.5 for Float64
regular branch is
~ 1.5x speedup
subnormal number and subnormal output
~ 3.6x speedup
normal number and subnormal output
~ 2.6x speedup
subnormal number and normal output
~ 13x speedup
Test values for all branches (
T = Float64, Float32
)