Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multinomial sampling is very slow #183

Open
jamaltas opened this issue Feb 9, 2023 · 13 comments
Open

Multinomial sampling is very slow #183

jamaltas opened this issue Feb 9, 2023 · 13 comments

Comments

@jamaltas
Copy link

jamaltas commented Feb 9, 2023

Looking into a program I wrote I found the limiting factor was the multinomial sampling:

let mut rng = SmallRng::from_entropy();
let result = Multinomial::new(&weights, 100000).unwrap();
let counts = result.sample(&mut rng);

Specifically the sampling portion of the above code, the SmallRng call is small in comparison.

I rewrote this particular portion of my code in python using numpy.random.multinomial and found an ~400x speed increase. It appears the C code numpy calls on uses an implementation that chains many binomial calls together, whereas the statsrs implementation uses a cdf.

Wonder if there's any plans to change this?

@vks
Copy link
Contributor

vks commented Feb 9, 2023 via email

@jamaltas
Copy link
Author

jamaltas commented Feb 9, 2023

Yep. It appears the compiled C that numpy uses employs an algorithm known at BTPE which is significantly faster binomial sampler when p*n > 30. Which is my use case.

Is there any interest in an implementation of this algorithm?

@YeungOnion
Copy link
Contributor

I think this could be useful, but I don't have the background for it.

I did notice that rand_distr::Binomial uses the BTPE algorithm. Would you know how to extend BTPE for multinomial from an implementation for binomial?

More broadly, perhaps we should expose the rand_distrs versions of sample when available for performance. It relies only on num_traits and it's in our dependency tree from nalgebra

@benjamin-lieser
Copy link

I have a working but not polished implementation for Multinomial for rand_distr, I am planing to merge it, but did not have the time. If you are interested you can try it in https://github.com/benjamin-lieser/rand/tree/multinomial
Right know it only works for compile time constant number of categories. Figuring out how to get a good API for both is one of the pain points

@YeungOnion
Copy link
Contributor

@benjamin-lieser as we already have a dependency for nalgebra, #209 made our vector (the math kind) dimension generic over allocation type, #253 adds them for Multinomial as well and not separating an alloc allowed use of nalgebra's generic API, which made it clearer on how to proceed.

I looked at your code - I could not find free access to the paper you mention about the sampling, I attempted reading yours then making my own attempt, but I'm not thinking it's correct, I pushed that attempt to my fork. Seems like it's updating a prior that $\sum n_i$ samples were already drawn as it iterates over each type of sample, but this seems like it would be preferential toward samples of i near to 0 and away from i near to n. Perhaps I misunderstood it.

Let me know if you have a strong way of verifying it. I'm not confident my implementation will have exactly n draws (i.e. assert_eq!(samples.into_iter().sum::<u64>(), n) may sometimes fail)

@benjamin-lieser
Copy link

benjamin-lieser commented Aug 5, 2024

The problem is that the binomial implementation in statrs is very slow for large n. Unfortunately the current state of the art (BINV and BTPE) are very tricky to get right. Without this, there is not too much you can do to improve multinomial.

Ideally rand_distr and statrs would build on top of a common numerical implementation of these algorithms. The BTPE part in rand_distr also has still some issues with very small probabilities and large n.

@jamaltas
Copy link
Author

jamaltas commented Aug 5, 2024

It's been quite a while since I looked into this implementation, but I will do my best to offer what I can.

I peaked under the numpy hood at the time and found their code for this which ran significantly faster as I noted above:

void random_multinomial(bitgen_t *bitgen_state, RAND_INT_TYPE n,
                        RAND_INT_TYPE *mnix, double *pix, npy_intp d,
                        binomial_t *binomial) {
  double remaining_p = 1.0;
  npy_intp j;
  RAND_INT_TYPE dn = n;
  for (j = 0; j < (d - 1); j++) {
    mnix[j] = random_binomial(bitgen_state, pix[j] / remaining_p, dn, binomial);
    dn = dn - mnix[j];
    if (dn <= 0) {
      break;
    }
    remaining_p -= pix[j];
  }
  if (dn > 0) {
      mnix[d - 1] = dn;
  }
}

I just adapted this to my use case in rust, chaining binomial calls:

pub fn get_multinomial(pix: &Vec<f64>, n: u64, rng: &mut SmallRng) -> Vec<u64> {

    let mut remaining_p = 1.0;
    let mut dn = n;
    let mut mnix = vec![0; pix.len()];
    let size_of_pvec = pix.len();

    for i in 0..size_of_pvec-1 {
        let binomial = get_binomial(pix[i] / remaining_p, dn, rng);
        mnix[i] = binomial;
        dn -= binomial;
        if dn <= 0 {
            break;
        }
        remaining_p -= pix[i];
    }
    if dn > 0 {
        mnix[size_of_pvec-1] = dn;
    }
    mnix
}

The above worked for my use case and was much, much, faster than the original rust code:

let mut rng = SmallRng::from_entropy();
let result = Multinomial::new(&weights, 100000).unwrap();
let counts = result.sample(&mut rng);

The get_binomial function in my Rust code is just using the rand_distr::Binomial distribution as it was at that time. So I didn't have to re-write that in any way.

use rand_distr::{Distribution, Binomial};
use rand::rngs::SmallRng;

pub fn get_binomial(mut probability: f64, n: u64, rng: &mut SmallRng) -> u64 {

    let result = Binomial::new(n,probability).unwrap();
    let counts = result.sample(rng);
    counts as u64
}

Perhaps there's a whole re-write that includes the Binomial aspect, but for my use case this was totally sufficient. It may be as easy as replacing the multinomial function with approximately what I've included here. I admit I have not sufficiently checked every edge case in my translation from C to Rust. This is also nice because it means that you only really have to optimize Binomial in the future, as Multinomial performance is directly tied to Binomial performance.

@YeungOnion
Copy link
Contributor

@benjamin-lieser perhaps we can work with rust-random on that. Do you know much about the details of the gsl implementation compared to rand_distr? Upon quick glance, it appears rand_distr referenced it, especially with choices of value to change approaches.

@YeungOnion
Copy link
Contributor

@jamaltas I did something similar, I've got it working now. The benchmark I wrote up has a minor improvement (~10%) sampling binomial (randomly assigned probabilities) instead of against inverting categorical CDF for single samples.

Redid the bench with rand_distr::Binomial instead of statrs::Binomial and the speedup was 1000x, so I completely agree to rely on and improve binomial sampling.

@benjamin-lieser
Copy link

@benjamin-lieser perhaps we can work with rust-random on that. Do you know much about the details of the gsl implementation compared to rand_distr? Upon quick glance, it appears rand_distr referenced it, especially with choices of value to change approaches.

I have written some of these references into the source :D

I am not that deep into the BTPE part, but I have a good understanding of the BINV part. Apart from rust-random/rand#1484 the implementations of GSL, rand_distr and numpy are very close.
The BINV threshold is mostly a performance thing, as long as it is not too small. The best value will heavily depend on the compiler and hardware. BINV scales with np but is quick to set up, BTPE has way better scaling but requires quite a bit of setup.

@YeungOnion
Copy link
Contributor

I don't know if there's a convergent library on where that kind of implementation would go, so I think it makes sense for us to borrow rust-random's work and direct efforts to their library as needed, we could also do the same for other distributions as well*

*since we derived from .NET and distribution functions were important, we don't distinguish distributions on open vs closed intervals so perhaps Uniform may change.

@vks
Copy link
Contributor

vks commented Aug 12, 2024

Upon quick glance, it appears rand_distr referenced it, especially with choices of value to change approaches.

I specifically avoided looking at the GSL implementation when working on the rand_distr implementation, because GSL is GPL-licensed, which is incompatible with Rand. I only looked at the original paper describing the algorithm.

@YeungOnion
Copy link
Contributor

Would it make sense to have an optional feature to wrap some of rand_distr for sampling (where available) as performance and correctness is one of rust_random's focuses?


Aside: Unsure about licensing here. Moving to Apache didn't occur from lack of response / contact info for previous contributors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants