`freeze` sometimes performs better than `unsafeFreeze` #409

julmb · 2021-08-26T16:58:59Z

I have encountered a situation where freeze is actually faster than unsafeFreeze. Minimal example:

import Control.Monad.ST
import Criterion.Main
import Data.Vector.Unboxed
import Data.Vector.Unboxed.Mutable

go :: Int -> Vector Double
go n = runST $ do
    pv <- unsafeNew n
    let w 0 i = return ()
        w n i = unsafeWrite pv i 42.0 >> w (n - 1) (i + 1)
    w n 0
    unsafeFreeze pv

main :: IO ()
main = defaultMain [bench "freeze" $ whnf go 1000000]

Compiled with ghc -O2 -fllvm freeze.hs, this gives me a mean running time of 985.8μs when using unsafeFreeze and 642.8μs when using freeze. How can this be? Shouldn't unsafeFreeze always be faster since it performs strictly less work?

I am using ghc 8.10.5 and llvm 12.0.1.

The text was updated successfully, but these errors were encountered:

lehins · 2021-08-26T17:12:26Z

I can't exactly reproduce this problem on my machine, much will depend on llvm and ghc versions (please report yours).

When compiled with llvm I see that unsafeFreeze version is indeed slower, but barely:

$ stack exec -- ghc -O2 freeze.hs -fllvm -fforce-recomp
[1 of 1] Compiling Main             ( freeze.hs, freeze.o )
You are using an unsupported version of LLVM!
Currently only 7 is supported. System LLVM version: 6.0.1
We will try though...
Linking freeze ...
$ ./freeze 
benchmarking freeze
time                 1.434 ms   (1.383 ms .. 1.505 ms)
                     0.988 R²   (0.978 R² .. 0.997 R²)
mean                 1.428 ms   (1.402 ms .. 1.468 ms)
std dev              103.3 μs   (74.59 μs .. 144.5 μs)
variance introduced by outliers: 56% (severely inflated)

benchmarking unsafeFreeze
time                 1.559 ms   (1.532 ms .. 1.594 ms)
                     0.996 R²   (0.991 R² .. 1.000 R²)
mean                 1.537 ms   (1.527 ms .. 1.564 ms)
std dev              53.98 μs   (28.68 μs .. 108.7 μs)
variance introduced by outliers: 23% (moderately inflated)

However, when compiled natively the result is even better than llvm and unsafe version is almost double the performance:

$ stack exec -- ghc -O2 freeze.hs -fforce-recomp
[1 of 1] Compiling Main             ( freeze.hs, freeze.o )
Linking freeze ...
$ ./freeze 
benchmarking freeze
time                 1.490 ms   (1.469 ms .. 1.512 ms)
                     0.995 R²   (0.990 R² .. 0.998 R²)
mean                 1.558 ms   (1.526 ms .. 1.602 ms)
std dev              126.7 μs   (86.57 μs .. 164.1 μs)
variance introduced by outliers: 61% (severely inflated)

benchmarking unsafeFreeze
time                 895.3 μs   (878.4 μs .. 911.7 μs)
                     0.996 R²   (0.990 R² .. 0.999 R²)
mean                 910.7 μs   (896.4 μs .. 954.8 μs)
std dev              71.18 μs   (33.01 μs .. 144.1 μs)
variance introduced by outliers: 63% (severely inflated)

julmb · 2021-08-26T17:37:30Z

I have added version information to my original post.

I tried the native code generator and I get the following:

$ ghc -O2 freeze.hs && ./freeze
[1 of 1] Compiling Main             ( freeze.hs, freeze.o )
Linking freeze ...
benchmarking freeze
time                 819.5 μs   (806.3 μs .. 832.0 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 830.8 μs   (822.5 μs .. 843.2 μs)
std dev              32.51 μs   (23.72 μs .. 52.26 μs)
variance introduced by outliers: 30% (moderately inflated)

benchmarking unsafeFreeze
time                 718.6 μs   (714.9 μs .. 721.6 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 720.2 μs   (717.8 μs .. 723.8 μs)
std dev              9.737 μs   (6.709 μs .. 17.12 μs)

And with LLVM:

$ ghc -O2 -fllvm freeze.hs && ./freeze
[1 of 1] Compiling Main             ( freeze.hs, freeze.o )
You are using an unsupported version of LLVM!
Currently only 10 to 12 is supported. System LLVM version: 12.0.1
We will try though...
Linking freeze ...
benchmarking freeze
time                 658.7 μs   (650.8 μs .. 668.2 μs)
                     0.998 R²   (0.996 R² .. 0.999 R²)
mean                 670.7 μs   (663.4 μs .. 679.7 μs)
std dev              26.76 μs   (20.40 μs .. 36.87 μs)
variance introduced by outliers: 32% (moderately inflated)

benchmarking unsafeFreeze
time                 967.5 μs   (955.3 μs .. 975.9 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 958.4 μs   (954.0 μs .. 963.7 μs)
std dev              16.22 μs   (13.12 μs .. 20.98 μs)

So the native code generator's unsafeFreeze is still slightly slower than freeze with LLVM. Although at least within the native code generator, things behave as expected. Also, we seem to be getting very different results courtesy of our different versions!

I am now even more puzzled. What could be causing such behavior and what can I do about it? I need LLVM as it performs much better than the native code generator on other parts of the software.

andrewthad · 2021-08-26T18:25:16Z

The difference between freeze and unsafeFreeze is that freeze performs a memcpy and unsafeFreeze just flips a bit.

You may want to check alignment of the loop. One thing that can help here is marking go with a noinline pragma. Also, for LLVM, can you instruct GHC to dump the assembly (both for freeze and unsafeFreeze)? Just paste the assembly for the go function, not for main. You may need to add the noinline pragma before dumping assembly since otherwise, it'll get inlined, and you'll never be able to figure which code is for go.

lehins · 2021-08-26T18:34:05Z

I'd suggest to report this issue on ghc issue tracker however. There is nothing can be done in vector about this issue

FYI, just for my own sanity checked performance with PrimArray from primitive and it is identical

lehins · 2021-08-26T18:42:38Z

@julianbrunner Can you try this implementation instead. It fixed the issue for me:

go :: Int -> Vector Double
go n = runST $ do
    pv <- unsafeNew n
    let w 0 !i = return ()
        w n !i = unsafeWrite pv i 42.0 >> w (n - 1) (i + 1)
    w n 0
    unsafeFreeze pv

Shimuuar · 2021-08-26T20:06:36Z

I can reproduce this with GHC8.10 and LLVM9. Results are summarized in table below. ! means version with strict accumulator as proposed by @lehins

|          | unsafeFreeze | freeze |
|----------+--------------+--------|
| NCG      |          550 |    730 |
| NCG(!)   |          390 |    740 |
| LLVM9    |         1550 |    730 |
| LLVM9(!) |          420 |    710 |

Performance with strict accumulator is essentially same for both NCG and LLVM. For lazy accumulator LLVM generates something bad. This I think bug in LLVM code generator and reason for poor performance could only be found by inspecting assembly.

julmb · 2021-08-27T12:10:04Z

@julianbrunner Can you try this implementation instead. It fixed the issue for me:

go :: Int -> Vector Double
go n = runST $ do
    pv <- unsafeNew n
    let w 0 !i = return ()
        w n !i = unsafeWrite pv i 42.0 >> w (n - 1) (i + 1)
    w n 0
    unsafeFreeze pv

Yes, this fixes the issue. I am still quite puzzled as to what is going on though. I might investigate some more when I get the chance.

lehins · 2021-08-27T12:23:21Z

@julianbrunner I suggest you open an issue on ghc issue tracker about this. It looks like ghc strictness analyzer is not interacting well with llvm, which I am more than sure is not Vector/ByteArray specific and can pop up in other scenarios.

I am closing this issue, as there is nothing we can do in vector to alleviate this, beside suggesting users to be more strict

julmb · 2021-08-27T17:11:28Z

@julianbrunner I suggest you open an issue on ghc issue tracker about this. It looks like ghc strictness analyzer is not interacting well with llvm, which I am more than sure is not Vector/ByteArray specific and can pop up in other scenarios.

I am closing this issue, as there is nothing we can do in vector to alleviate this, beside suggesting users to be more strict

Done: https://gitlab.haskell.org/ghc/ghc/-/issues/20302

lehins closed this as completed Aug 27, 2021

gksato mentioned this issue Sep 12, 2021

unstreamM etc destroys performance; expose unstreamPrimM & document most-performant method #416

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`freeze` sometimes performs better than `unsafeFreeze` #409

`freeze` sometimes performs better than `unsafeFreeze` #409

julmb commented Aug 26, 2021 •

edited

Loading

lehins commented Aug 26, 2021

julmb commented Aug 26, 2021 •

edited

Loading

andrewthad commented Aug 26, 2021

lehins commented Aug 26, 2021 •

edited

Loading

lehins commented Aug 26, 2021

Shimuuar commented Aug 26, 2021

julmb commented Aug 27, 2021

lehins commented Aug 27, 2021 •

edited

Loading

julmb commented Aug 27, 2021

freeze sometimes performs better than unsafeFreeze #409

freeze sometimes performs better than unsafeFreeze #409

Comments

julmb commented Aug 26, 2021 • edited Loading

lehins commented Aug 26, 2021

julmb commented Aug 26, 2021 • edited Loading

andrewthad commented Aug 26, 2021

lehins commented Aug 26, 2021 • edited Loading

lehins commented Aug 26, 2021

Shimuuar commented Aug 26, 2021

julmb commented Aug 27, 2021

lehins commented Aug 27, 2021 • edited Loading

julmb commented Aug 27, 2021

`freeze` sometimes performs better than `unsafeFreeze` #409

`freeze` sometimes performs better than `unsafeFreeze` #409

julmb commented Aug 26, 2021 •

edited

Loading

julmb commented Aug 26, 2021 •

edited

Loading

lehins commented Aug 26, 2021 •

edited

Loading

lehins commented Aug 27, 2021 •

edited

Loading