Multinomial sampling is very slow #183

jamaltas · 2023-02-09T06:49:30Z

Looking into a program I wrote I found the limiting factor was the multinomial sampling:

let mut rng = SmallRng::from_entropy();
let result = Multinomial::new(&weights, 100000).unwrap();
let counts = result.sample(&mut rng);

Specifically the sampling portion of the above code, the SmallRng call is small in comparison.

I rewrote this particular portion of my code in python using numpy.random.multinomial and found an ~400x speed increase. It appears the C code numpy calls on uses an implementation that chains many binomial calls together, whereas the statsrs implementation uses a cdf.

Wonder if there's any plans to change this?

vks · 2023-02-09T18:03:30Z

Did you compile the Rust code with optimizations enabled?

jamaltas · 2023-02-09T19:18:16Z

Yep. It appears the compiled C that numpy uses employs an algorithm known at BTPE which is significantly faster binomial sampler when p*n > 30. Which is my use case.

Is there any interest in an implementation of this algorithm?

YeungOnion · 2024-05-01T21:52:36Z

I think this could be useful, but I don't have the background for it.

I did notice that rand_distr::Binomial uses the BTPE algorithm. Would you know how to extend BTPE for multinomial from an implementation for binomial?

More broadly, perhaps we should expose the rand_distrs versions of sample when available for performance. It relies only on num_traits and it's in our dependency tree from nalgebra

benjamin-lieser · 2024-07-11T10:02:26Z

I have a working but not polished implementation for Multinomial for rand_distr, I am planing to merge it, but did not have the time. If you are interested you can try it in https://github.com/benjamin-lieser/rand/tree/multinomial
Right know it only works for compile time constant number of categories. Figuring out how to get a good API for both is one of the pain points

YeungOnion · 2024-08-03T15:02:30Z

@benjamin-lieser as we already have a dependency for nalgebra, #209 made our vector (the math kind) dimension generic over allocation type, #253 adds them for Multinomial as well and not separating an alloc allowed use of nalgebra's generic API, which made it clearer on how to proceed.

I looked at your code - I could not find free access to the paper you mention about the sampling, I attempted reading yours then making my own attempt, but I'm not thinking it's correct, I pushed that attempt to my fork. Seems like it's updating a prior that $\sum n_i$ samples were already drawn as it iterates over each type of sample, but this seems like it would be preferential toward samples of i near to 0 and away from i near to n. Perhaps I misunderstood it.

Let me know if you have a strong way of verifying it. I'm not confident my implementation will have exactly n draws (i.e. assert_eq!(samples.into_iter().sum::<u64>(), n) may sometimes fail)

benjamin-lieser · 2024-08-05T10:45:13Z

The problem is that the binomial implementation in statrs is very slow for large n. Unfortunately the current state of the art (BINV and BTPE) are very tricky to get right. Without this, there is not too much you can do to improve multinomial.

Ideally rand_distr and statrs would build on top of a common numerical implementation of these algorithms. The BTPE part in rand_distr also has still some issues with very small probabilities and large n.

jamaltas · 2024-08-05T23:21:58Z

It's been quite a while since I looked into this implementation, but I will do my best to offer what I can.

I peaked under the numpy hood at the time and found their code for this which ran significantly faster as I noted above:

void random_multinomial(bitgen_t *bitgen_state, RAND_INT_TYPE n,
                        RAND_INT_TYPE *mnix, double *pix, npy_intp d,
                        binomial_t *binomial) {
  double remaining_p = 1.0;
  npy_intp j;
  RAND_INT_TYPE dn = n;
  for (j = 0; j < (d - 1); j++) {
    mnix[j] = random_binomial(bitgen_state, pix[j] / remaining_p, dn, binomial);
    dn = dn - mnix[j];
    if (dn <= 0) {
      break;
    }
    remaining_p -= pix[j];
  }
  if (dn > 0) {
      mnix[d - 1] = dn;
  }
}

I just adapted this to my use case in rust, chaining binomial calls:

pub fn get_multinomial(pix: &Vec<f64>, n: u64, rng: &mut SmallRng) -> Vec<u64> {

    let mut remaining_p = 1.0;
    let mut dn = n;
    let mut mnix = vec![0; pix.len()];
    let size_of_pvec = pix.len();

    for i in 0..size_of_pvec-1 {
        let binomial = get_binomial(pix[i] / remaining_p, dn, rng);
        mnix[i] = binomial;
        dn -= binomial;
        if dn <= 0 {
            break;
        }
        remaining_p -= pix[i];
    }
    if dn > 0 {
        mnix[size_of_pvec-1] = dn;
    }
    mnix
}

The above worked for my use case and was much, much, faster than the original rust code:

let mut rng = SmallRng::from_entropy();
let result = Multinomial::new(&weights, 100000).unwrap();
let counts = result.sample(&mut rng);

The get_binomial function in my Rust code is just using the rand_distr::Binomial distribution as it was at that time. So I didn't have to re-write that in any way.

use rand_distr::{Distribution, Binomial};
use rand::rngs::SmallRng;

pub fn get_binomial(mut probability: f64, n: u64, rng: &mut SmallRng) -> u64 {

    let result = Binomial::new(n,probability).unwrap();
    let counts = result.sample(rng);
    counts as u64
}

Perhaps there's a whole re-write that includes the Binomial aspect, but for my use case this was totally sufficient. It may be as easy as replacing the multinomial function with approximately what I've included here. I admit I have not sufficiently checked every edge case in my translation from C to Rust. This is also nice because it means that you only really have to optimize Binomial in the future, as Multinomial performance is directly tied to Binomial performance.

YeungOnion · 2024-08-05T23:44:30Z

@benjamin-lieser perhaps we can work with rust-random on that. Do you know much about the details of the gsl implementation compared to rand_distr? Upon quick glance, it appears rand_distr referenced it, especially with choices of value to change approaches.

YeungOnion · 2024-08-05T23:53:11Z

@jamaltas I did something similar, I've got it working now. The benchmark I wrote up has a minor improvement (~10%) sampling binomial (randomly assigned probabilities) instead of against inverting categorical CDF for single samples.

Redid the bench with rand_distr::Binomial instead of statrs::Binomial and the speedup was 1000x, so I completely agree to rely on and improve binomial sampling.

benjamin-lieser · 2024-08-06T08:34:41Z

@benjamin-lieser perhaps we can work with rust-random on that. Do you know much about the details of the gsl implementation compared to rand_distr? Upon quick glance, it appears rand_distr referenced it, especially with choices of value to change approaches.

I have written some of these references into the source :D

I am not that deep into the BTPE part, but I have a good understanding of the BINV part. Apart from rust-random/rand#1484 the implementations of GSL, rand_distr and numpy are very close.
The BINV threshold is mostly a performance thing, as long as it is not too small. The best value will heavily depend on the compiler and hardware. BINV scales with np but is quick to set up, BTPE has way better scaling but requires quite a bit of setup.

YeungOnion · 2024-08-12T20:36:27Z

I don't know if there's a convergent library on where that kind of implementation would go, so I think it makes sense for us to borrow rust-random's work and direct efforts to their library as needed, we could also do the same for other distributions as well*

*since we derived from .NET and distribution functions were important, we don't distinguish distributions on open vs closed intervals so perhaps Uniform may change.

vks · 2024-08-12T21:30:32Z

Upon quick glance, it appears rand_distr referenced it, especially with choices of value to change approaches.

I specifically avoided looking at the GSL implementation when working on the rand_distr implementation, because GSL is GPL-licensed, which is incompatible with Rand. I only looked at the original paper describing the algorithm.

YeungOnion · 2024-08-25T22:58:59Z

Would it make sense to have an optional feature to wrap some of rand_distr for sampling (where available) as performance and correctness is one of rust_random's focuses?

Aside: Unsure about licensing here. Moving to Apache didn't occur from lack of response / contact info for previous contributors.

YeungOnion mentioned this issue Sep 11, 2024

multinomial sampling should rely on binomial #287

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multinomial sampling is very slow #183

Multinomial sampling is very slow #183

jamaltas commented Feb 9, 2023

vks commented Feb 9, 2023 via email •

edited

Loading

jamaltas commented Feb 9, 2023

YeungOnion commented May 1, 2024

benjamin-lieser commented Jul 11, 2024

YeungOnion commented Aug 3, 2024

benjamin-lieser commented Aug 5, 2024 •

edited

Loading

jamaltas commented Aug 5, 2024

YeungOnion commented Aug 5, 2024

YeungOnion commented Aug 5, 2024

benjamin-lieser commented Aug 6, 2024

YeungOnion commented Aug 12, 2024

vks commented Aug 12, 2024

YeungOnion commented Aug 25, 2024

Multinomial sampling is very slow #183

Multinomial sampling is very slow #183

Comments

jamaltas commented Feb 9, 2023

vks commented Feb 9, 2023 via email • edited Loading

jamaltas commented Feb 9, 2023

YeungOnion commented May 1, 2024

benjamin-lieser commented Jul 11, 2024

YeungOnion commented Aug 3, 2024

benjamin-lieser commented Aug 5, 2024 • edited Loading

jamaltas commented Aug 5, 2024

YeungOnion commented Aug 5, 2024

YeungOnion commented Aug 5, 2024

benjamin-lieser commented Aug 6, 2024

YeungOnion commented Aug 12, 2024

vks commented Aug 12, 2024

YeungOnion commented Aug 25, 2024

vks commented Feb 9, 2023 via email •

edited

Loading

benjamin-lieser commented Aug 5, 2024 •

edited

Loading