Binomial Sampler #90

idontgetoutmuch · 2024-05-02T04:14:49Z

I am in the middle of implementing Kachitvichyanukul, V. and Schmeiser, B. W. (1988) but the inverse method (in the current PR) could be a first step. What do you think?

I tried just sampling from Bernoulli n times but this is very inefficient.
Using the saved calculation of q^n for the inverse method doesn't seem to give a performance improvement.

Shimuuar · 2024-05-03T19:45:20Z

That would be great addition! There's implementation using condensed tables algorithms but it requires costly setup. Also there's inverse incomplete beta in math-functions which is somewhat tested and may have better performance.

P.S. I'll check whether CI still works.

idontgetoutmuch · 2024-05-04T16:30:58Z

P.S. I'll check whether CI still works.

It does not :-(

idontgetoutmuch · 2024-05-04T16:51:23Z

I've added some benchmarks but I get

Name	Mean
mwc/CT/gen/binomial 0.5 4	5.79E-09
mwc/CT/gen/binomial 0.1 10	5.78E-09
mwc/CT/gen/binomial 0.6 10	5.78E-09
mwc/CT/gen/binomial 0.8 10	5.78E-09
mwc/CT/gen/binomial 0.4 100	5.79E-09
mwc/CT/gen/binomial 0.4 1400	5.79E-09
mwc/CT/gen/binomiak 0.5 4	1.06E-07
mwc/CT/gen/binomiak 0.1 10	1.03E-07
mwc/CT/gen/binomiak 0.6 10	1.23E-07
mwc/CT/gen/binomiak 0.8 10	1.15E-07
mwc/CT/gen/binomiak 0.4 100	3.10E-07
mwc/CT/gen/binomiak 0.4 1400	2.19E-07

So it would seem that my method is x50 slower. But in https://github.com/idontgetoutmuch/mwc-random/blob/master/System/Random/MWC/Binomial.hs I can run

ghc -O2 System/Random/MWC/Binomial.hs -main-is mainTab -o mainTab

./mainTab +RTS -s
559.95944
 202,045,775,928 bytes allocated in the heap
      88,204,248 bytes copied during GC
       4,872,176 bytes maximum residency (2 sample(s))
         104,464 bytes maximum slop
              22 MiB total memory in use (1 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     32758 colls,     0 par    0.092s   0.131s     0.0000s    0.0035s
  Gen  1         2 colls,     0 par    0.002s   0.007s     0.0034s    0.0062s

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    6.676s  (  6.724s elapsed)
  GC      time    0.094s  (  0.138s elapsed)
  EXIT    time    0.000s  (  0.007s elapsed)
  Total   time    6.771s  (  6.871s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    30,262,537,823 bytes per MUT second

  Productivity  98.6% of total user, 97.9% of total elapsed

So 6.8 seconds via the table method vs

ghc System/Random/MWC/Binomial.hs -O2 -main-is mainTPE -o mainTPE

./mainTPE +RTS -s
559.97745
      98,831,560 bytes allocated in the heap
      33,920,168 bytes copied during GC
       7,810,184 bytes maximum residency (4 sample(s))
          29,400 bytes maximum slop
              23 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        21 colls,     0 par    0.012s   0.013s     0.0006s    0.0031s
  Gen  1         4 colls,     0 par    0.006s   0.009s     0.0023s    0.0036s

  INIT    time    0.000s  (  0.002s elapsed)
  MUT     time    0.036s  (  0.037s elapsed)
  GC      time    0.018s  (  0.022s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time    0.054s  (  0.061s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    2,735,746,000 bytes per MUT second

  Productivity  66.8% of total user, 60.2% of total elapsed

So 0.05 seconds for the TPE (triangle parallelogram exponential) method making it about x10 faster.

I don't know what benchmarking actually measures but maybe you @Shimuuar can explain what is going on?

Shimuuar · 2024-05-05T14:57:09Z

It does not :-(

I fixed CI and added PAPI-based benchmarks. It turns out they are quite useful for making a lot of measurements quickly I'm going to look into what current benchmarks actually measure.

idontgetoutmuch · 2024-05-06T10:59:04Z

I ran some benchmarks. If I use

liftM sum $ replicateM 1 $ genFromTable (tableBinomial n p) mwc

and

liftM sum $ replicateM 1 $ binomial n p mwc

then the TPE method is faster but if I use

genFromTable (tableBinomial n p) mwc

and

binomial n p mwc

then the table method is faster. I am guessing the table is calculated once in this case and cached somehow but not if replicateM is used. So which method is really faster or does it depend on your application?

Name	Mean (ps)	Name	Mean (ps)	Ratio

100 repetitions

binomial 0.5 4	78262646	binomiak 0.5 4	45466943	1.721308732
binomial 0.1 10	336691796	binomiak 0.1 10	45204858	7.44813303
binomial 0.6 10	438881835	binomiak 0.6 10	47054516	9.327092749
binomial 0.8 10	419028320	binomiak 0.8 10	46269506	9.056252297
binomial 0.4 100	1625602343	binomiak 0.4 100	134359570	12.0988951
binomial 0.4 1400	7465168750	binomiak 0.4 1400	102613574	72.75030446

10 repetitions

binomial 0.5 4	7834100	binomiak 0.5 4	4526589	1.73068507
binomial 0.1 10	33740417	binomiak 0.1 10	4501223	7.495833244
binomial 0.6 10	44393627	binomiak 0.6 10	4695062	9.455386745
binomial 0.8 10	42145361	binomiak 0.8 10	4608663	9.144812932
binomial 0.4 100	162304394	binomiak 0.4 100	13418182	12.09585576
binomial 0.4 1400	739081640	binomiak 0.4 1400	10178796	72.60992754

One repetition

binomial 0.5 4	785509	binomiak 0.5 4	453731	1.731221803
binomial 0.1 10	3362857	binomiak 0.1 10	451447	7.449062681
binomial 0.6 10	4452917	binomiak 0.6 10	474554	9.383372598
binomial 0.8 10	4206065	binomiak 0.8 10	465411	9.037313256
binomial 0.4 100	16207208	binomiak 0.4 100	1339769	12.09701672
binomial 0.4 1400	73531445	binomiak 0.4 1400	1022201	71.93442875

No repititions

binomial 0.5 4	7444	binomiak 0.5 4	448100	0.016612363
binomial 0.1 10	7527	binomiak 0.1 10	443362	0.016977098
binomial 0.6 10	7644	binomiak 0.6 10	461722	0.016555416
binomial 0.8 10	7587	binomiak 0.8 10	454549	0.01669127
binomial 0.4 100	8396	binomiak 0.4 100	1335464	0.006286953
binomial 0.4 1400	9986	binomiak 0.4 1400	1015023	0.009838201

idontgetoutmuch · 2024-05-06T12:50:14Z

I tried running these (which is why I needed a faster binomial sampler):

betaBinomialNaive :: StatefulGen g m =>
                Double -> Double -> Int -> g -> m Int
betaBinomialNaive a b n g = do
  p <- beta a b g
  xs <- fmap (fmap fromEnum) $ replicateM n $ bernoulli p g
  return $ sum xs

betaBinomialTPE :: StatefulGen g m =>
                Double -> Double -> Int -> g -> m Int
betaBinomialTPE a b n g = do
  p <- beta a b g
  binomial n p g

betaBinomialTable :: StatefulGen g m =>
                  Double -> Double -> Int -> g -> m Int
betaBinomialTable a b n g = do
  p <- beta a b g
  genFromTable (tableBinomial n p) g

The TPE method is about x100 faster so I am still baffled as to what the benchmarks are measuring.

Shimuuar · 2024-05-06T19:03:22Z

I've looked into benchmarks and have no idea what they're measure for condensed table benchmarks either. They should measure either sample generation time or table setup time (which is expensive). They measure some mix of two.

There's no other way. I'll have to rewrite them

Shimuuar · 2024-05-07T11:40:23Z

Yes. Current benchmarks in CT/gen group measure time for setting up table. Which is of course wrong-wrong-wrong.

idontgetoutmuch · 2024-05-08T08:06:11Z

I am putting my scribbles here in case I need them. I am cleaning up the PR so it can be merged without "squeezing". I can do that later and measure performance improvements. I am following the gsl implementation https://git.savannah.gnu.org/cgit/gsl.git/tree/randist/binomial_tpe.c.


-- The Rust author also comments
--
--  * It is possible for BINV to get stuck, so we break if x >
-- BINV_MAX_X and try again.
--
-- * It would be safer to set BINV_MAX_X to self.n, but it is
-- extremely unlikely to be relevant.
--
-- * When n*p < 10, so is n*p*q which is the variance, so a result >
-- 110 would be 100 / sqrt(10) = 31 standard deviations away.



  -- if uPrev > rPrev

          -- if xPrev > bInvMaxX
          -- then foo s a undefined


bInvMaxX :: Int
bInvMaxX = 110

idontgetoutmuch · 2024-05-08T10:42:11Z

@Shimuuar this can be further improved but can be merged - maybe I should add some tests (not that the other distributions have any). I didn't want to do too much more until you had bottomed out the benchmarking. Are tables really faster? They are according to the benchmarks but not when I use that method in a program.

Shimuuar · 2024-05-08T11:21:41Z

I'm going to fix benchmarks tomorrow.

Tables have rather peculiar performance characteristic: sampling from table should be fast. AFAIR on the order of single Word32 but creation of table could be very expensive. That cost also depends on distribution: more possible outcomes results in table that's costlier to setup and could be easily 1e2-1e4 costlier. So it's rather poor algorithm if not amortized by sampling a lot. Especially if distribution's parameters are random variates themselves

P.S. I haven't looked at PR in details yet

idontgetoutmuch · 2024-05-11T09:43:55Z

@Shimuuar ready for review :-)

Shimuuar · 2024-05-11T11:04:48Z

Great! I'm still trying to fix benchmarks.

Shimuuar · 2024-05-11T15:06:48Z

Regarding performance of condensed tables. Yes sampling is fast. Below is sampling time for Poisson distribution for different value of parameter and GHC versions in CPU cycles

And here is uniform for comparison

In contrast creation of table is expensive

Shimuuar

Now that's proper impenetrable numerical codes :) It will take some time for me to understand what's going on

Shimuuar · 2024-05-11T15:17:20Z

System/Random/MWC/Distributions.hs

+         -> g                 -- ^ Generator
+         -> m Int
+{-# INLINE binomial #-}
+binomial nTrials prob gen =


What is semantics of p<0 || p>1? There are two choices: throw error as gamma and chiSquare do or to clamp in [0,1] range (and deal with NaN separately)

I'll follow the convention and throw an error. I can't see why anyone would want a NaN propagated. Conceivably we could have a fast version with no checking?

Shimuuar · 2024-05-11T15:18:35Z

System/Random/MWC/Distributions.hs

+         -> m Int
+{-# INLINE binomial #-}
+binomial nTrials prob gen =
+  if prob == 0.0


Purely stylistically. Could you turn nested ifs into guards?

Yes done :-)

Shimuuar · 2024-05-11T15:19:08Z

System/Random/MWC/Distributions.hs

+
+          -- Acceptance / rejection comparison
+          step5 :: Int -> Double -> m Int
+          step5 ix v = if var <= accept


And here as well

Shimuuar · 2024-05-11T15:20:08Z

mwc-random.cabal

@@ -1,7 +1,7 @@
 cabal-version:  3.0
 build-type:     Simple
 name:           mwc-random
-version:        0.15.0.2


That would be minor bump to 0.15.1.0 not patch bump. We're adding new function

idontgetoutmuch · 2024-05-12T09:04:32Z

Now that's proper impenetrable numerical codes :) It will take some time for me to understand what's going on

I am hope I am not "teaching grandmothers to suck eggs" but it is a rejection sampling method with an empirical proposal distribution (which I believe they prove to have the correct support in a reference which seems can only be obtained by paying a lot of money). They divide up the region into the triangular segment, two parallelogram segments and two segments covered by the exponential distribution. I might try and draw the picture and put it in the docs. The triangular region is below the target distribution so if you are there then you always accept. You then decide where you are in the parallel or exponential regions and do an accept / reject (depending on whether you are above or below the target - sorry for being obvious here).

Further they do something they call squeezing (to improve the acceptance rate?) which I haven't investigated yet but will be a future PR. HTH.

idontgetoutmuch · 2024-05-27T09:24:20Z

idontgetoutmuch · 2024-05-27T09:26:06Z

I hope that is now clear: if you are in area "One" then you are below the target so you always accept. For areas "Two", "Three" and "Four" you have to decide whether you are above or below and then reject or not. See the original description which I think has an error in the majorizing and minorizing functions which is luckily benign except when you try to recreate the diagram.

Shimuuar · 2024-05-27T10:53:33Z

Sorry I was terribly busy lately and only got to skip paper and PR.

My plan is to write better test for sampler: using likelihood ratio check that we sample from correct distribution and if everything goes well just merge PR and do further documentation improvements/microptimizations separately

idontgetoutmuch · 2024-05-27T13:11:47Z

Should this be in the repo to recreate the figure? Probably not but it would be nice to put it somewhere other than in a PR comment.

{-# LANGUAGE ScopedTypeVariables #-}

{-# OPTIONS_GHC -Wall              #-}
{-# OPTIONS_GHC -Wno-type-defaults #-}

module RecreateFigure(lineChart) where

import System.FilePath ()
import Data.Colour
import Diagrams.Backend.SVG.CmdLine
import Diagrams.Prelude hiding ( sample, render, Vector, offset )
import System.Environment


majorizingFn :: Int -> Double -> Double -> Double
majorizingFn i pp x
  | x <= xL - 0.5 = c * exp (-lambdaL * (xL - x - 0.5))
  | x <= xR - 0.5 = (1 + c) - abs (bigM - x) / p1
  | otherwise     = c * exp (-lambdaR * (x + 0.5 - xR))
  where
    (xL, xR, _, c, p1, bigM, fm, q, r) = xLRMc i pp
    a = (fm - xL) / (fm - xL * r)
    lambdaL = a * (1 + a / 2)
    b = (xR - fm) / (xR * q)
    lambdaR = b * (1 + b / 2)

minorizingFn :: Int-> Double -> Double -> Double
minorizingFn i pp x
  | x <= xL - 0.5 = 0.0
  | x <= xR - 0.5 = 1.0 - abs (bigM - x) / p1
  | otherwise     = 0.0
  where
    (xL, xR, _, _, p1, bigM, _, _, _) = xLRMc i pp

xLRMc :: Int -> Double ->
         (Double, Double, Double, Double, Double, Double, Double, Double, Double)
xLRMc i pp = (xL, xR, xM, c, p1, bigM, fm, q, r)
  where
    m :: Double
    m = fromIntegral i
    r = min pp (1 - pp)
    q = 1 - r
    fm = m * r + r
    bigM :: Double
    bigM = fromIntegral $ floor fm
    p1 = (fromIntegral (floor (2.195 * sqrt (m * pp * q) - 4.6 * q) :: Int)) + 0.5
    xM = bigM + 0.5
    xL = xM - p1
    xR = xM + p1
    c  = 0.134 + 20.5 / (15.3 + bigM)

n :: Int
n = 200

p :: Double
p = 0.25

m1, sd :: Double
m1 = (fromIntegral n) * p + p
sd = sqrt $ (fromIntegral n) * p * (1- p)

lb, ub :: Double
lb = m1 - 3 * sd
ub = m1 + 3 * sd

tick, skala :: Double
tick = 0.01
skala = 20.0

majors :: [(Double, Double)]
majors = [(x, majorizingFn n p x) | x <- [lb, lb + tick .. ub]]

minors :: [(Double, Double)]
minors = [(x, minorizingFn n p x) | x <- [lb, lb + tick .. ub]]

integralBinomialPDF :: (Integral a, Fractional b) => a -> b -> a -> b
integralBinomialPDF t q 0 = (1 - q)^t
integralBinomialPDF t q x = integralBinomialPDF t q (x - 1) * a * b
  where
    a = fromIntegral (t - x + 1) / fromIntegral x
    b = q / (1 - q)

binPdfs :: [Double]
binPdfs = map (/ v) $ map (integralBinomialPDF n p) $ map floor $ map (+ 0.5) [lb :: Double, lb + tick .. ub]

v :: Double
v = integralBinomialPDF n p bigM
  where
    bigM = floor $ (fromIntegral n) * p + p

majorVs :: [P2 Double]
majorVs = map p2 majors

binPdfVs :: [P2 Double]
binPdfVs = map p2 $ zip (map fst majors) binPdfs

minorVs :: [P2 Double]
minorVs = map p2 minors

lineChart :: String -> IO ()
lineChart fn = do
  withArgs ["-h 800", "-w 800", "-o" ++ fn ++ ".svg"] (mainWith exampleQ)

exampleQ :: Diagram B
exampleQ = strutX 2 ||| mconcat
  [ mconcat cs # fc purple # lw none

  -- x-axis
  , strokeLocT xAxisT  # lc black # lwL 0.01 # scaleY skala
  , moveTo (p2 (xM - offset , - 2.0)) $ text "Number of Successes"
  , strokeLocT brokenXAxisT # lc black # lwL 0.05

  -- y-axis
  , strokeLocT yAxisT  # lc black # lwL 0.01 # scaleY skala
  , moveTo (p2 (-sd - yAxixOffset - 3.0 , skala * 1.0)) $ text "Probability Density (Unnormalised)" # rotateBy (1/4)

  -- Key
  , let majKey = fromVertices $ map p2 [ (xM - offset - 10.0, skala * 2.0 - 2.0)
                                       , (xM - offset, skala * 2.0 - 2.0)
                                       ]
    in strokeLocT majKey # lc blue # lwL 0.1 ===
       moveTo (p2 (xM - offset + 8, skala * 2.0 - 4.0)) (text "Majorizing Function" # fc blue)
  , let minKey = fromVertices $ map p2 [ (xM - offset - 10.0, skala * 2.0 - 4.0)
                                       , (xM - offset, skala * 2.0 - 4.0)
                                       ]
    in strokeLocT minKey # lc green # lwL 0.1 ===
       moveTo (p2 (xM - offset + 8, skala * 2.0 - 6.0)) (text "Minorizing Function" # fc green)
  , let tgtKey = fromVertices $ map p2 [ (xM - offset - 10.0, skala * 2.0 - 6.0)
                                       , (xM - offset, skala * 2.0 - 6.0)
                                       ]
    in strokeLocT tgtKey # lc red # lwL 0.1 ===
       moveTo (p2 (xM - offset + 8, skala * 2.0 - 8.0)) (text "Target PDF" # fc red)
  , moveTo (p2 (xM - offset - 3 * sd, skala * 2.0 - 10.0)) (text ("n = " ++ show n)  # fc black)
  , moveTo (p2 (xM - offset - 3 * sd, skala * 2.0 - 11.0)) (text ("p = " ++ show p)  # fc black)

  -- Areas
  , area1, area2, area3, area4
  , moveTo (p2 (xM - 0.5 - offset, 0.5 * skala * xMinM)) $ text "One"
  , moveTo (p2 (xM - 0.5 - offset, skala * (0.5 * (xMajM - xMinM) + xMinM))) $ text "Two"
  , moveTo (p2 (lb + (xL - lb) / 2 - offset, 0.5 * skala * majorizingFn n p (lb + (xL - lb) / 2))) $ text "Three"
  , moveTo (p2 (ub + (xR - ub) / 2 - offset, 0.5 * skala * majorizingFn n p (ub + (xR - ub) / 2))) $ text "Four"

  , strokeLocT majorT  # lc blue  # lwL 0.01 # scaleY skala
  , strokeLocT binPdfT # lc red   # lwL 0.01 # scaleY skala
  , strokeLocT minorT  # lc green # lwL 0.01 # scaleY skala
  , strokeLocT verticalLT # lc purple # lwL 0.01
  , strokeLocT verticalRT # lc purple # lwL 0.01

  , moveTo (p2 (xL - 0.5 - offset, - 1.0)) $ text $ show (xL - 0.5)
  , moveTo (p2 (xR - 0.5 - offset, - 1.0)) $ text $ show (xR - 0.5)
  , moveTo (p2 (0.0, - 1.0)) $ text $ show $ (fromIntegral (round (lb * 100))) / 100
  , moveTo (p2 (ub - offset, - 1.0)) $ text $ show $ (fromIntegral (round (ub * 100))) / 100
  , let y = 1.5 in moveTo (p2 (-sd - yAxixOffset - 1.0, skala * y)) $ text $ show $ (fromIntegral (round (y * 100))) / 100
  , let y = 1.0 in moveTo (p2 (-sd - yAxixOffset - 1.0, skala * y)) $ text $ show $ (fromIntegral (round (y * 100))) / 100
  , let y = 0.5 in moveTo (p2 (-sd - yAxixOffset - 1.0, skala * y)) $ text $ show $ (fromIntegral (round (y * 100))) / 100
  ]
  where
    xAxisT :: Located (Trail V2 Double)
    xAxisT = (fromVertices $ map p2 [(m1 - 4 * sd, 0), (m1 + 4 * sd, 0)]) `at` (-sd ^& 0)

    brokenXAxisT :: Located (Trail V2 Double)
    brokenXAxisT = fromVertices $ map p2 [ (m1 - 4 * sd - offset - 0.0,  0.0)
                                         , (m1 - 4 * sd - offset - 0.1,  0.3)
                                         , (m1 - 4 * sd - offset - 0.2,  0.0)
                                         , (m1 - 4 * sd - offset - 0.3, -0.3)
                                         , (m1 - 4 * sd - offset - 0.4,  0.0)
                                         , (m1 - 4 * sd - offset - 2.0,  0.0)
                                         ]

    yAxixOffset = 2.0
    yAxisT :: Located (Trail V2 Double)
    yAxisT = fromVertices $ map p2 [(-sd - yAxixOffset , 0.0), (-sd - yAxixOffset , 2.0)]

    binPdfT :: Located (Trail V2 Double)
    binPdfT = fromVertices binPdfVs `at` (0 ^& (minimum binPdfs))

    majorT :: Located (Trail V2 Double)
    majorT = fromVertices majorVs `at` (0 ^& (minimum (map snd majors)))

    minorT :: Located (Trail V2 Double)
    minorT = fromVertices minorVs `at` (0 ^& (minimum (map snd minors)))

    verticalLT :: Located (Trail V2 Double)
    verticalLT = fromVertices $ map p2 [(xL - 0.5 - offset, 0.0)
                                      , (xL - 0.5 - offset, skala * xMajL)
                                      ]

    verticalRT :: Located (Trail V2 Double)
    verticalRT = fromVertices $ map p2 [ (xR - 0.5 - offset, 0.0)
                                       , (xR - 0.5 - offset, skala * xMajR)
                                       ]

    pt1 = (xL - 0.5, 0.0)
    pt2 = (xR - 0.5, 0.0)
    pt3  = (xM - 0.5, skala * xMinM)

    area1 :: Diagram B
    area1 = t # fc red # opacity 0.1
      where
        t :: Diagram B
        t = strokeLocT $ mapLoc Trail u
        u :: Located (Trail' Loop V2 Double)
        u = ((fromVertices $ map p2 [pt1, pt2, pt3]) # closeLine) `at`  ((xL - offset -0.5) ^& 0)

    area2 :: Diagram B
    area2 = t # fc blue # opacity 0.1
      where
        t :: Diagram B
        t = strokeLocT $ mapLoc Trail u
        u :: Located (Trail' Loop V2 Double)
        u = (fromVertices ([w1] ++ vs ++ [wn, w2]) # closeLine) `at` ((xL - offset - 0.5) ^& 0)
        vs = fmap (scaleY 20.0) $ filter (\x -> fst (unp2 x) <= xR - 0.5) $ filter (\x -> fst (unp2 x) > xL - 0.5) majorVs
        w1 = p2 (fst (unp2 (head vs)), 0.0)
        w2 = p2 (xM - 0.5, skala * xMinM)
        wn = p2 (fst (unp2 (last vs)), 0.0)

    area3 :: Diagram B
    area3 = t # fc yellow # opacity 0.1
      where
        t :: Diagram B
        t = strokeLocT $ mapLoc Trail u
        u :: Located (Trail' Loop V2 Double)
        u = (fromVertices ([w1] ++ vs ++ [wn]) # closeLine) `at` (0 ^& 0)
        vs = fmap (scaleY 20.0) $ filter (\x -> fst (unp2 x) <= xL - 0.5) majorVs
        w1 = p2 (fst (unp2 (head vs)), 0.0)
        wn = p2 (fst (unp2 (last vs)), 0.0)

    area4 :: Diagram B
    area4 = t # fc yellow # opacity 0.1
      where
        t :: Diagram B
        t = strokeLocT $ mapLoc Trail u
        u :: Located (Trail' Loop V2 Double)
        u = (fromVertices ([w1] ++ vs ++ [wn]) # closeLine) `at` ((xR - offset - 0.5) ^& 0)
        vs = fmap (scaleY 20.0) $ filter (\x -> fst (unp2 x) > xR - 0.5) majorVs
        w1 = p2 (fst (unp2 (head vs)), 0.0)
        wn = p2 (fst (unp2 (last vs)), 0.0)

    cs = map mkCircle $ map p2 $ [ (xL - 0.5 - offset, 0.0)
                                 , (xL - 0.5 - offset, skala * xMajL)
                                 , (xM - 0.5 - offset, skala * xMinM)
                                 , (xM - 0.5 - offset, skala * xMajM)
                                 , (xR - 0.5 - offset, 0.0)
                                 , (xR - 0.5 - offset, skala * xMajR)
                                 , (0.0, 0.0)
                                 , (ub - offset, 0.0)
                                 ]

    (xL, xR, xM, _, _, _, _, _, _) = xLRMc n p

    xMinM = minorizingFn n p (xM - 0.5)
    xMajL = majorizingFn n p (xL - 0.5)
    xMajR = majorizingFn n p (xR - 0.5)
    xMajM = majorizingFn n p (xM - 0.5)
    offset = minimum (map fst majors)

    mkCircle q = circle 0.1 # moveTo q

Test fails for n_trials >≈ 25

Shimuuar · 2024-06-03T09:12:35Z

I wrote property test (branch binomial-test in this repo) to check that binomial really generates samples from binomial distribution. It passes test for n<=10 but starts failing for larger n. I didn't looked in details whether generator or test is to blame.

idontgetoutmuch · 2024-06-03T12:13:55Z

That rather suggests the implementation:

    -- Threshold for preferring the BINV algorithm / inverse cdf
    -- logic. The paper suggests 10, Ranlib uses 30, R uses 30, Rust uses
    -- 10 and GSL uses 14.
    bInvThreshold :: Double
    bInvThreshold = 10

I will go and take a look.

idontgetoutmuch · 2024-06-03T14:54:23Z

I did a chi-squared test by hand for the values that quick check gave and agree that we can reject the null hypothesis :-(

Shimuuar · 2024-06-03T16:21:42Z

Yes it's quite visibly off for N=27, p=0.4. Black dots is sampler's output and blue is expectation

For N=8, p=0.4 agreement is perfect

Shimuuar · 2024-06-03T16:25:17Z

Oh. And another plot which could be useful for you. Ratio of sampler's probability vs expectation for same parameters as above

idontgetoutmuch · 2024-06-04T15:34:11Z

I just tried a reference implementation in C and the chi squared checks out so I must have transcribed something incorrectly. I will look tomorrow at my code.

idontgetoutmuch · 2024-06-12T06:51:38Z

[nix-shell:~/mwc-random]$ cabal bench --benchmark-options="--pattern beta"
cabal bench --benchmark-options="--pattern beta"
Build profile: -w ghc-9.4.8 -O1
In order, the following will be built (use -v for more details):
 - mwc-random-0.15.1.0 (bench:mwc-bench) (first run)
Preprocessing benchmark 'mwc-bench' for mwc-random-0.15.1.0..
Building benchmark 'mwc-bench' for mwc-random-0.15.1.0..
Running 1 benchmarks...
Benchmark mwc-bench: RUNNING...
All
  mwc
    D
      beta binomial 10:        OK (1.55s)
        1.50 ms ± 108 μs
      beta binomial 100:       OK (2.13s)
        2.08 ms ± 128 μs
      beta binomial table 10:  OK (1.54s)
        49.4 ms ± 4.2 ms
      beta binomial table 100: OK (2.95s)
        196  ms ±  15 ms

All 4 tests passed (8.16s)
Benchmark mwc-bench: FINISH

idontgetoutmuch · 2024-06-12T06:58:32Z

@Shimuuar I think it is now ready to merge but I have 2 questions:

I added some benchmark tests to test the table method for beta binomial against the new method (actually very old from the 80s method). Maybe you want these in a different group or something?
I seem to have added your prop tests to the PR. This seems ok but would you like me to do anything about it?

Shimuuar · 2024-06-12T20:27:02Z

Thanks!

At this point benchmarks doesn't matter. It seems there're several opportunities for microoptimizations. So it's likely I'll need to tweak benchmarks anyway
Nothing. It's fine

Also I forgot about program that draws diagram. If you want to add to repository program which draws diagram for algorithm just drop it into (nonexitent) docsdirectory with sort comment explaining what does it do

idontgetoutmuch · 2024-06-14T15:30:30Z

@Shimuuar I don't understand why we get these bounds warnings but of course with the suggested bounds, some of the builds fail.

Shimuuar · 2024-06-15T18:37:43Z

Thank you!

This PR could be merged. I'm going to make few documentation tweaks and some optimization work. But that could be done separately

idontgetoutmuch mentioned this pull request May 2, 2024

A Pleasant Surprise #86

Closed

idontgetoutmuch force-pushed the master branch from 9421765 to 56b1aa7 Compare May 4, 2024 16:27

idontgetoutmuch force-pushed the master branch from fe3a169 to 7e4d898 Compare May 6, 2024 08:02

idontgetoutmuch marked this pull request as ready for review May 8, 2024 10:39

idontgetoutmuch changed the title ~~A Placeholder for the Binomial~~ Binomial Sampler May 10, 2024

Shimuuar reviewed May 11, 2024

View reviewed changes

idontgetoutmuch added 9 commits May 12, 2024 10:52

The start of a binomial sampler

f16f4d1

Some experiments

908186f

Nix setup

d27d6ed

Maybe needed to run the tests

1ea8c7e

Use Binomial.hs as the development module for now

a9048ee

The BTPE without the optimisations

030b284

Switch between inv and tpe

d7fb0c3

Now the doctests pass

d675d35

Bump version and allow newer doctest

94b24cc

idontgetoutmuch added 4 commits May 12, 2024 10:55

Fix dependencies for test

13d3c20

Leave benchmarks as they are and add one for binomial

e3c2362

.envrc not needed in repo

b57a2eb

Response to comments

bb03f54

idontgetoutmuch force-pushed the master branch from c6a2359 to bb03f54 Compare May 12, 2024 10:00

Add test that `binomial' really samples from binomial distribution

8f1b2bf

Test fails for n_trials >≈ 25

Fix failure of chi-squared test

4ebd207

Benchmark a "realistic" usage of binomial

118b1b1

Update the change log

a7ee5a0

idontgetoutmuch force-pushed the master branch 2 times, most recently from 91ee327 to a7ee5a0 Compare June 15, 2024 08:15

idontgetoutmuch added 3 commits June 15, 2024 09:16

A diagram showing the binomial sampling regions (no cabal)

e7c7fef

Extra documentation

c4cbd2d

Changelog is documentation

4b2bc36

Shimuuar merged commit 08c6be9 into haskell:master Jun 15, 2024
32 checks passed

Binomial Sampler #90

Binomial Sampler #90

Conversation

idontgetoutmuch commented May 2, 2024

Shimuuar commented May 3, 2024

idontgetoutmuch commented May 4, 2024

idontgetoutmuch commented May 4, 2024 • edited Loading

Shimuuar commented May 5, 2024

idontgetoutmuch commented May 6, 2024 • edited Loading

idontgetoutmuch commented May 6, 2024

Shimuuar commented May 6, 2024

Shimuuar commented May 7, 2024

idontgetoutmuch commented May 8, 2024 • edited Loading

idontgetoutmuch commented May 8, 2024

Shimuuar commented May 8, 2024

idontgetoutmuch commented May 11, 2024

Shimuuar commented May 11, 2024

Shimuuar commented May 11, 2024

Shimuuar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

idontgetoutmuch commented May 12, 2024

idontgetoutmuch commented May 27, 2024

idontgetoutmuch commented May 27, 2024 • edited Loading

Shimuuar commented May 27, 2024

idontgetoutmuch commented May 27, 2024

Shimuuar commented Jun 3, 2024

idontgetoutmuch commented Jun 3, 2024

idontgetoutmuch commented Jun 3, 2024

Shimuuar commented Jun 3, 2024

Shimuuar commented Jun 3, 2024

idontgetoutmuch commented Jun 4, 2024

idontgetoutmuch commented Jun 12, 2024 • edited Loading

idontgetoutmuch commented Jun 12, 2024

Shimuuar commented Jun 12, 2024

idontgetoutmuch commented Jun 14, 2024

Shimuuar commented Jun 15, 2024

idontgetoutmuch commented May 4, 2024 •

edited

Loading

idontgetoutmuch commented May 6, 2024 •

edited

Loading

idontgetoutmuch commented May 8, 2024 •

edited

Loading

idontgetoutmuch commented May 27, 2024 •

edited

Loading

idontgetoutmuch commented Jun 12, 2024 •

edited

Loading