Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insufficient Laziness? #35

Closed
idontgetoutmuch opened this issue Aug 24, 2014 · 4 comments
Closed

Insufficient Laziness? #35

idontgetoutmuch opened this issue Aug 24, 2014 · 4 comments

Comments

@idontgetoutmuch
Copy link
Member

Consider

{-# OPTIONS_GHC -Wall                      #-}
{-# OPTIONS_GHC -fno-warn-name-shadowing   #-}
{-# OPTIONS_GHC -fno-warn-type-defaults    #-}

import System.Random.MWC
import qualified Data.Vector.Unboxed as V

n :: Int
n = 10^8

main :: IO ()
main = do
  vs <- withSystemRandom $ asGenST $ \gen -> uniformVector gen n
  let s :: Double
      s = V.foldl (+) 0 vs
  print s

This runs quite fast

~/Dropbox/Private/Multinomial $ ./MWCTestVec  +RTS -s
5.000191620413539e7
     800,203,544 bytes allocated in the heap
           7,760 bytes copied during GC
          44,312 bytes maximum residency (1 sample(s))
          51,008 bytes maximum slop
             764 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0         1 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s
  Gen  1         1 colls,     0 par    0.00s    0.05s     0.0504s    0.0504s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    1.35s  (  1.54s elapsed)
  GC      time    0.00s  (  0.05s elapsed)
  EXIT    time    0.00s  (  0.05s elapsed)
  Total   time    1.35s  (  1.64s elapsed)

  %GC     time       0.0%  (3.1% elapsed)

  Alloc rate    594,939,795 bytes per MUT second

  Productivity 100.0% of total user, 82.1% of total elapsed

and beats Matlab / Octave

num_particles = 10^8;

tic();
s = sum(rand(num_particles,1));
toc()
octave> tic();
octave> s = sum(rand(num_particles,1));
octave> toc()
Elapsed time is 1.70731 seconds.
octave> s
s =    5.0003e+07

But look at the space usage: 764 MB total memory. Maybe that is not surprising as we are using (unboxed) vectors. So let's try creating a list equivalent.

{-# OPTIONS_GHC -Wall                      #-}
{-# OPTIONS_GHC -fno-warn-name-shadowing   #-}
{-# OPTIONS_GHC -fno-warn-type-defaults    #-}

import System.Random.MWC
import Control.Monad.State
import Control.Monad.Primitive ( PrimMonad, PrimState )

import Control.Monad.ST

n :: Int
n = 10^7

uniformList :: (PrimMonad m, Variate a)
               => Gen (PrimState m) -> m [a]
uniformList gen = replicateM n (uniform gen)

uniforms :: GenST s -> ST s [Double]
uniforms = asGenST uniformList

xs :: [Double]
xs = runST (create >>= uniforms)

main :: IO ()
main = do
  print $ head xs

Notice that we only want the head of the list and we are creating only 10^7 samples rather than 10^8.

~/Dropbox/Private/Multinomial $ ./MWCTest +RTS -s
2.481036288296201e-2
     566,548,776 bytes allocated in the heap
   1,397,568,864 bytes copied during GC
     386,135,776 bytes maximum residency (9 sample(s))
      56,138,016 bytes maximum slop
             905 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       769 colls,     0 par    0.58s    0.64s     0.0008s    0.0016s
  Gen  1         9 colls,     0 par    0.51s    0.75s     0.0828s    0.3593s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.24s  (  0.19s elapsed)
  GC      time    1.09s  (  1.39s elapsed)
  EXIT    time    0.00s  (  0.07s elapsed)
  Total   time    1.33s  (  1.65s elapsed)

  %GC     time      81.9%  (84.3% elapsed)

  Alloc rate    2,399,054,756 bytes per MUT second

  Productivity  18.1% of total user, 14.6% of total elapsed

So not only does it run slower than the vector version (which has more samples) but it also consumes 905 MB total memory.

On the other hand if I use random-fu (https://hackage.haskell.org/package/random-fu) e.g.

{-# OPTIONS_GHC -Wall                      #-}
{-# OPTIONS_GHC -fno-warn-name-shadowing   #-}
{-# OPTIONS_GHC -fno-warn-type-defaults    #-}

import Data.Random.Source.PureMT
import Data.Random

import Control.Monad.State

testUniform :: MonadRandom m => Int -> m [Double]
testUniform n = replicateM (fromIntegral n) (sample stdUniform)

n :: Int
n = 10^8

us :: [Double]
us = evalState (testUniform n) (pureMT $ fromIntegral arbSeed)

arbSeed :: Int
arbSeed = 8

mean :: Double
mean = sum us / fromIntegral n

main :: IO ()
main = do
  print mean
~/Dropbox/Private/Multinomial $ ./TestUniform +RTS -s
0.5000472000960731
  51,228,308,968 bytes allocated in the heap
      12,030,928 bytes copied during GC
          44,312 bytes maximum residency (2 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     100159 colls,     0 par    0.27s    0.34s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0002s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   10.74s  ( 10.85s elapsed)
  GC      time    0.27s  (  0.34s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   11.01s  ( 11.19s elapsed)

  %GC     time       2.4%  (3.1% elapsed)

  Alloc rate    4,771,251,674 bytes per MUT second

  Productivity  97.6% of total user, 96.0% of total elapsed

Then it is very slow (x10 worse than the Vector version using mwc-random and x10 worse than Matlab) but look at the space usage: 1 MB total memory.

Now I believe random-fu uses mwc-random so it should be possible to create a sampler using mwc-random that consumes the same space as random-fu (i.e. 1 MB). How can this be done? Is it possible to get the speed of unboxed vectors and the space usage of random-fu? Now that I think about it shouldn't the production of (unboxed) vectors get fused with the consumption of them? So my list experiment is perhaps redundant and we should investigate why so much space is used with the (unboxed) vectors.

@idontgetoutmuch
Copy link
Member Author

https://hackage.haskell.org/package/vector says it has "a powerful loop optimisation framework " so I think the loop in the vector version should get fused.

@idontgetoutmuch
Copy link
Member Author

Using V.foldl' performs no better.

@idontgetoutmuch
Copy link
Member Author

And indeed Data.Vector does fuse very nicely in this example

{-# OPTIONS_GHC -Wall                      #-}
{-# OPTIONS_GHC -fno-warn-name-shadowing   #-}
{-# OPTIONS_GHC -fno-warn-type-defaults    #-}

import qualified Data.Vector.Unboxed as V

n :: Int
n = 10^8

vecsUpto :: Int -> V.Vector Double
vecsUpto n = V.generate n fromIntegral

main :: IO ()
main = do
  vs <- return $ vecsUpto n
  let s :: Double
      s = V.foldl' (+) 0 vs
  print s
~/Dropbox/Private/Multinomial $ ./FuseTest +RTS -s
4.99999995e15
         102,776 bytes allocated in the heap
           3,408 bytes copied during GC
          44,312 bytes maximum residency (1 sample(s))
          17,128 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0         0 colls,     0 par    0.00s    0.00s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.00s    0.00s     0.0002s    0.0002s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.12s  (  0.12s elapsed)
  GC      time    0.00s  (  0.00s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.12s  (  0.12s elapsed)

  %GC     time       0.1%  (0.2% elapsed)

  Alloc rate    887,951 bytes per MUT second

  Productivity  99.8% of total user, 100.2% of total elapsed

@bos
Copy link
Collaborator

bos commented Aug 25, 2014

I think you may not be understanding how laziness works. The reason that mwc-random allocates an entire vector of random data when you use uniformVector is that this is exactly what it is designed to do: it allocates a vector, fills it, then returns it. Vector fusion has no relation to this.

If it's any comfort, your misunderstanding is common enough, but I don't think this bug report is the right place to debug and correct it (and sadly I don't have time to help you individually). You might consider asking for assistance with getting a better grasp in #haskell.

Good luck!

@bos bos closed this as completed Aug 25, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants