return in any cases/platforms float32 #35

daniel-mohr · 2021-08-11T17:49:27Z

Do we really want to return in any cases/platforms float32? (Changing this could break usage of cfunits.)

I believe that every 4 byte integer could be stored in a 8 byte float (double) without losing precision. Therefore it could be more accurate to use numpy.float64 in all cases. Further for an integer with more than 4 bytes a float32 is not sufficient. What do you think?

The text was updated successfully, but these errors were encountered:

daniel-mohr · 2021-09-08T11:49:56Z

Recently I was talking with a friend about floating-point numbers and suddenly this issue here gets clear to me.

If we assume IEEE 754, we get for a 4 byte float (float32):

s: 1 bit for the sign
m_i: 23 bits for the mantissa
e_i: 8 bits for the exponent

And with a few mathematical formulas we get the decimal number from a few bits:

sign: S = (-1)^s
mantissa: M = 1 + \sum_{i=0}^22){m_i * 2**i} / 2^23
exponent: E = \sum_{i=0}^7){e_i * 2**i} - (2^7-1) = \sum_{i=0}^7){e_i * 2**i} - 127
decimal number: S * M * 2^E

And similar for a double/float64:

sign: S = (-1)^s
mantissa: M = 1 + \sum_{i=0}^51){m_i * 2**i} / 2^52
exponent: E = \sum_{i=0}^10){e_i * 2**i} - (2^10-1) = \sum_{i=0}^10){e_i * 2**i} - 1023
decimal number: S * M * 2^E

So, this means for a float32 we can only use 23 bits of the original mantissa of an integer. Converting an int32 to float32 leads to a relative error of up to \abs{\frac{2^24+1 - float32(2^24+1)}{2^24+1}} ~= 6e-8.

Whereas using a float64 we could store the full precision of an int32.

But converting an int64 to float64 leads again to a relative error of up to \abs{\frac{2^53+1 - float64(2^53+1)}{2^53+1}} ~= 1e-16.

So in both cases (int32 -> float32, int64 -> float64) we have a relative error of the machine epsilon.

There are some tools, where you can play around (and maybe more and better ones):

daniel-mohr mentioned this issue Aug 11, 2021

test suite fails #31

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

return in any cases/platforms float32 #35

return in any cases/platforms float32 #35

daniel-mohr commented Aug 11, 2021

daniel-mohr commented Sep 8, 2021

return in any cases/platforms float32 #35

return in any cases/platforms float32 #35

Comments

daniel-mohr commented Aug 11, 2021

daniel-mohr commented Sep 8, 2021