src_float_to_short_array: Use division instead of shift #85

erikd · 2019-08-07T22:05:22Z

@janstary I wonder if this fixes your issue on PowerPC.

Flamefire · 2019-08-09T10:56:52Z

No it won't fix the issue and it is slower due to the integer division (and harder to understand then the solution proposed in #84 (comment))

The PowerPC problem is a broken lrint which produces wrong results for large values as show in #65 (comment) As your change works on the already wrong value, it can't get correct.

An example: lrint(INT_MAX) = lrint(2147483647.000000) = -1
Even using smaller values runs into the same: lrint(2^30 - 2) = lrint(1073741822.000000) = -2

erikd · 2019-08-09T11:20:41Z

slower due to the integer division

On modern CPUs with deep caches, branches are usually considered more expensive than division. If you can prove to me that my solution using division is slower than your solution, I will take your solution.

The PowerPC problem is a broken lrint which

If PowerPC has a broken lrint function then I will accept a PowerPC #if solution, but i will not take a performance hit on this code for x86_64 just because PowerPC's lrint is busted.

Meanwhile, I am still interested if this fixes @janstary's issue.

Flamefire · 2019-08-09T11:52:59Z

Meanwhile, I am still interested if this fixes @janstary's issue.

Haven't we already proved that the output of lrint on his machine is wrong for values definitely used here? How could a change working on that wrong value fix anything?

On modern CPUs with deep caches, branches are usually considered more expensive than division.

Speculative execution mitigates some of that. AND:

If you can prove to me that my solution using division is slower than your solution, I will take your solution.

Prove is simple: LONG on most 64bit systems is 64bit: https://godbolt.org/z/HMv8iu
This will disable the "clipping optimization" (which just heuristically checks implementation defined behavior) Hence my code has the exact same branches and float-multiplications but 1 less shift/division and hence is faster.

janstary · 2019-08-09T15:24:37Z

@janstary I wonder if this fixes your issue on PowerPC.

tests/float_short_test

        float_to_short_test ............................. 

        Line 64 : out [1] == 0

janstary · 2019-08-09T16:33:58Z

I also tested on an arm machine (current OpenBSD/armv7
on a BeagleBone Black). Without this patch, it fails with

        float_to_short_test ............................. 
        Line 64 : out [1] == -1

With the patch, it fails as

        float_to_short_test ............................. 
        Line 64 : out [1] == 0

erikd · 2019-08-09T22:36:44Z

Prove is simple:

That's not proof, that's opinion. Proof requires a proper benchmark.

This will disable the "clipping optimization" (which just heuristically checks implementation defined behavior) Hence my code has the exact same branches

You do realize that for CPUs where it works (eg at least x86 and x86_64) the "clipping optimization" results in zero branches, don't you?

Flamefire · 2019-08-10T10:02:18Z

That's not proof, that's opinion. Proof requires a proper benchmark.

No, you are missing my point. For CPU_CLIPS_POSITIVE ==0 && CPU_CLIPS_NEGATIVE == 0 (what I called "disabled clipping optimization") your code and my code are exactly the same (except the used constants) and both have (at least) 2 branches. BUT my code has 1 operation less (either shift or division) , and hence it is faster. Do you agree until here?

Now examine what the clipping optimization is and when it is enabled:

It assumes that for x>= LONG_MAX: lrint(x) == LONG_MAX and x<= LONG_MIN: lrint(x) == LONG_MIN which is implementation defined. I know if 1 implementation (can't find it again right now) which does this by if-checks on the float
It assumes that long is 32 bit
Given 1 & 2 it holds that lrint(x * 8.0 * 0x10000000) == INT32_MAX for x>=1.0 (and similar for negative numbers)
the behavior of 3 is checked at configure

My claim: On most architectures the clipping optimization is disabled.
If this is true then on most architectures my code is faster.

Quick check: Do you have a "common" 64 bit Linux on a x86_64 architecture? Then please run configure and report the values for CPU_CLIPS_POSITIVE, CPU_CLIPS_NEGATIVE & SIZEOF_LONG. For me those are 0, 0, 8

Proof for "most architectures":

The optimization does not work for sizeof(long) == 8. 1 Example: for x==1 the expression lrint(x * 8.0 * 0x10000000) yields INT32_MAX+1. This shifted by 16 results in INT32_MIN. Proof: https://godbolt.org/z/zS1K-q
"Most" 64bit architectures use the LP64 model where long is 64 bit (as opposed to LLP64 where it is 32 bit): https://en.wikipedia.org/wiki/64-bit_computing

Many 64-bit platforms today use an LP64 model (including Solaris, AIX, HP-UX, Linux, macOS, BSD, and IBM z/OS). Microsoft Windows uses an LLP64 model.

Due to configure being used I assume Linux as the main target system, not Windows.

Summary:

Linux is the target system
Linux uses LP64 model with sizeof(long)==64
sizeof(long)==64 disables clipping optimization
With disabled clipping optimization your code execute 1 more instruction (hence is slower) and is less readable

src_float_to_short_array: Use division instead of shift

0e7b430

Flamefire mentioned this pull request Aug 9, 2019

Remove shift of signed int in src_float_to_short_array #86

Merged

erikd force-pushed the master branch 2 times, most recently from 1b3d5b1 to c10c819 Compare August 25, 2019 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src_float_to_short_array: Use division instead of shift #85

src_float_to_short_array: Use division instead of shift #85

erikd commented Aug 7, 2019

Flamefire commented Aug 9, 2019

erikd commented Aug 9, 2019

Flamefire commented Aug 9, 2019

janstary commented Aug 9, 2019

janstary commented Aug 9, 2019

erikd commented Aug 9, 2019

Flamefire commented Aug 10, 2019

src_float_to_short_array: Use division instead of shift #85

Are you sure you want to change the base?

src_float_to_short_array: Use division instead of shift #85

Conversation

erikd commented Aug 7, 2019

Flamefire commented Aug 9, 2019

erikd commented Aug 9, 2019

Flamefire commented Aug 9, 2019

janstary commented Aug 9, 2019

janstary commented Aug 9, 2019

erikd commented Aug 9, 2019

Flamefire commented Aug 10, 2019