Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] Alternative parameterization for negative binomial distributions? #198

Open
diaorch opened this issue Jan 19, 2024 · 13 comments
Open

Comments

@diaorch
Copy link

diaorch commented Jan 19, 2024

Currently, the struct NegativeBinomial uses the parameters r and p, as in the interpretation of negative binomial distribution as the distribution of the number of failures in a sequence of Bernoulli trials with probability of success p that continue until r successes occur.

Another way to describe a negative binomial distribution is through the mean and the size/dispersion parameters. This is worth implementing especially given that a lot of the statistical questions center around the mean of the distribution.

r and p can be calculated from the mean and the dispersion parameter.

Perhaps the alternative parameterization of negative binomial distributions can be a useful addition to the crate's functionality?

@diaorch diaorch changed the title Alternative parameterization for negative binomial distributions? [Request] Alternative parameterization for negative binomial distributions? Jan 22, 2024
@YeungOnion
Copy link
Contributor

I expect that you'd want this at initialization, but I don't have the background to know when else a different parametrization would be useful. Can you help me understand such a scenario?

@diaorch
Copy link
Author

diaorch commented Apr 1, 2024

Thank you for getting back to me!

Here's a quick summary of what parameterizations we are discussing: the current parameterization of negative binomial distribution as implemented uses r and p, which have specific meanings in terms of Bernoulli trials. The alternative parameterization I am interested in uses mean and dispersion as parameters, which is described in more detail in the last two paragraphs of "Alternative formulation" of negative binomial distribution on Wikipedia page. Also, I appreciate the warning in the current documentation to note carefully the meaning of the parameters.

The mean-and-dispersion is more widely used in regression analysis, because the explanatory variables can be linked to the mean, similar to linear regression. This is also, in my opinion, more intuitive when it comes to interpreting the effects of explanatory variables on the negative binomial counts. (Personally, I don't think I have seen negative binomial regression done with the r-and-p parameterization.) In my work, having the mean in parameterization also helps me compare a negative binomial distribution with a Poisson distribution of the same mean, since I can more quickly evaluate the dispersion level in the negative binomial distribution. I would also imagine that the mean-and-dispersion parameterization is more useful outside of a frequentist statistics context.

The parameters, or statistics like means and variances, can be derived using the current parameterization. In my own codes, I just wrote a separate struct that "wraps" the calls to the statrs::distribution::NegativeBinomial but with calculations to convert the r-and-p parameterization to the mean-and-dispersion parameterization. I imagine that something similar can be done natively within statsrs, which will make the use of negative binomial distribution more consistent, and it shouldn't break anything. I also understand that it may not be a priority, since conversion on the user end, such as the one I have done, is reasonably doable.

@tessob
Copy link
Contributor

tessob commented Apr 2, 2024

This "Alternative Parameterization" usually called the "Method of moments estimator". This parameterization exists for many distributions, so in my opinion it is rather a trait.

@YeungOnion
Copy link
Contributor

Hmm, were it a trait, how would you define that, i.e. what methods and generics would go with it?

@tessob
Copy link
Contributor

tessob commented Apr 5, 2024

I don't really think such a feature is one of the highest priorities, in any case I think the implementation should look like the following:

trait StandardizedMoments {
    fn mean(&self) -> Option<f64>;
    fn variance(&self) -> Option<f64>;
    fn skewness(&self) -> Option<f64>;
    fn kurtosis(&self) -> Option<f64>;
}

trait MethodOfMomentsEstimator {
    fn from_moments<M: StandardizedMoments>(moments: M) -> Result<Self>;
}

Skewness & kurtosis here as negative binomial distribution (not only this distribution) can be fitted not only to fist 2 moments, but to first 4 moments. There are multiple ways how to fit data to distributions.

@YeungOnion
Copy link
Contributor

I see this as a way to refer to distributions or estimators with constraints by their moments instead of other parameters. I was expecting something narrower, things that would specify distributions given family and number of moments that fully specify the parametrization of the distribution.

I see usefulness in thinking about moments by specifying mean [and variance [and skew...]] , (notation is bash-style optionals) but I think these are little far off from where the crate is now

  • specifying a moment without specifying all moments of lower order
  • overconstrained number of moments relative to parameters under some sense of fit

Not opposed to them, but think they would take a significant amount of work.

@diaorch how close is this to what you were wanting?

@diaorch
Copy link
Author

diaorch commented Apr 8, 2024

The discussion brings up several good points. Firstly, I agree that it's likely not of the highest priority for the crate right now.

Secondly, I wasn't considering generalizing to the Method of Moments Estimator, but I agree that if we are going with the Method of Moments Estimator in general, a trait would be a better choice than, say, a method specific to a distribution struct. As for how exactly the trait should work, I have to admit it's a bit beyond my ability for me to confidently conclude the discussion.

Hope this is still helpful.

@tessob
Copy link
Contributor

tessob commented Apr 8, 2024

@YeungOnion I don't have ready to implement API design. Maybe instead of trait it could be a builder pattern... this way set of parameters can be effectively constrained if, for instance, set of variance will return an "extended" builder.

As an additional example – Gamma distribution can be parametrized by:

  • Mean only – and it will be identical to exponential distribution.
  • Mean and variance – using method of moments from Wiki.
  • Mean, variance with skewness and/or kurtosis – using numerical optimizer.

@YeungOnion
Copy link
Contributor

@diaorch that's alright by me, but I think as a user you have some great input for how you'd like it to work as a library despite implementation. Could you share what the struct you wrapped the negative binomial distribution in?

@YeungOnion
Copy link
Contributor

@tessob yeah, I think it will take some digging, perhaps as its own feature request. I also think a builder pattern would work well, as it seems like there are a few things that can come out depending on what info is specified, assuming all have specified the family of distribution,

  • Underconstrained parameters for distribution. If you also specify a fit function and an optimizer, then you'd specify parametrized distribution
  • Exactly constrained parameters for distribution, this is equivalent to specifying a distribution but perhaps not in the parameters we use in the existing constructors.
  • Overconstrainted parameters for distribution with fit function. I don't think this makes sense without a fit function. Then adding an optimizer would specify parameters for a distribution?

Thoughts?

@diaorch
Copy link
Author

diaorch commented Jun 17, 2024

@YeungOnion hey sorry for the very late reply and thank you for the appreciation for input from a user like me. Here are the codes I used as a wrapper to use the alternative parameterization for its probability mass function. Some of the ideas are taken from R source codes. Hope they are useful for showing what I was describing as the alternative parameterization and how I was using it.

impl NegativeBinomialAlt {
    pub fn new(mu: f64, r: f64) -> Result<NegativeBinomialAlt> {
        if mu.is_nan() || r.is_nan() || mu < 0. || r <= 0. {
            Err(StatsError::BadParams)
        } else {
            Ok(NegativeBinomialAlt { mu, r })
        }
    }

    // probability mass function
    pub fn pmf(&self, x: usize, ln_trans: bool) -> f64 {
        let ln_res: f64;

        // handling zero counts, to be accurate for n << mu and n >> mu
        // based on R source codes
        if x == 0 {
            if self.r < self.mu {
                ln_res = (self.r / (self.r + self.mu)).ln() * self.r;
            } else {
                ln_res = (-1. * self.mu / (self.r + self.mu)).ln_1p() * self.r;
            }
        } else {
            // parameterization prob. = r / (r + mu)
            let p = self.r / (self.r + self.mu);

            // handling of when p close to 1
            // when p is 1, the NB is a point mass at zero
            // however, == with floating points is not trustworthy, 
            // in this case, i specifically want to handle the case with
            // p as f64 is exacly 1 causing NaN, therefore == is used here
            if p == 1. {
                if x == 0 {
                    ln_res = (1_f64).ln();
                } else {
                    ln_res = (0_f64).ln();
                    if !ln_trans {
                        return 0_f64;
                    }
                }
            } else {
                // if x << r, use Martin Maechler, June 2008 formula cited
                // in R source code dnbinom()
                if (x as f64) < 1e-10 * self.r {
                    let q: f64;
                    if self.r < self.mu {
                        q = (self.r / (1. + self.r / self.mu)).ln();
                    } else {
                        q = (self.mu / (1. + self.mu / self.r)).ln();
                    }
                    let x_f = x as f64;
                    ln_res = x_f * q - self.mu - gamma::ln_gamma(x_f + 1.) + (x_f * (x_f + 1.) / (2. * self.r)).ln_1p();
                } else {
                    let nb = NegativeBinomial::new(self.r, p).unwrap(); 
                    ln_res = nb.ln_pmf(x.try_into().expect("statrs::NegativeBinomial calc PMF of u64, found usize and failed try_into()."));
                }
            }
        }
        if ln_trans {
            return ln_res;
        } else {
            return ln_res.exp();
        }
    }
}

Let me know if there is anything else I can do to help and I will do my best to be less than months late this time. :)

@FreezyLemon
Copy link
Contributor

FreezyLemon commented Sep 14, 2024

One idea on how to implement this pattern would be with generics.
So e.g. NegativeBinomial currently looks like this:

struct NegativeBinomial {
    r: f64,
    p: f64,
}

We could move the parametrization out into a wrapper struct:

struct NegativeBinomial<P> {
    param: P
}

// naming TBD
struct SuccessProbability {
    r: f64,
    p: f64,
}

struct MeanVariance {
    m: f64, // μ
    v: f64, // σ²
}

impl NegativeBinomial<SuccessProbability> {
    pub fn new_success_probability(r: f64, p: f64) -> Result<Self, Error>;

    pub const fn p(&self) -> f64 { self.param.p }
    pub const fn r(&self) -> f64 { self.param.r }

    // maybe return Result, if conversion can fail
    pub fn into_mean_variance(self) -> NegativeBinomial<MeanVariance>;
}

impl NegativeBinomial<MeanVariance> {
    pub fn new_mean_variance(m: f64, v: f64) -> Result<Self, Error>;

    pub const fn m(&self) -> f64 { self.param.m }
    pub const fn v(&self) -> f64 { self.param.v }

    pub fn into_success_probability(self) -> NegativeBinomial<SuccessProbability>;
}

impl DiscreteCDF<u64, f64> for NegativeBinomial<SuccessProbability> {
    // implement functions for r, p
}

impl DiscreteCDF<u64, f64> for NegativeBinomial<MeanVariance> {
    // implement functions for m, v
}

// etc.

This would mean that we need to implement traits for both NegativeBinomial<SP> and NegativeBinomial<MV> (and other potential parametrizations) separately. But I think it's fair to assume that different parametrizations will usually use different formulas1? Not 100% sure, but that's easily solved by a shared function which is called by both implementations.

Anyways, from a user's point of view this should mean that when you type NegativeBinomial::new into your IDE, it should recommend the new function for both parametrizations. Most API would stay the same because of the implemented traits (min, max, mean, variance, mode, cdf, sf, etc.) with some obvious differences (I can't call nb.p() on a NegativeBinomial<MeanVariance>).

The documentation would need to explain this well. And it would add a generic parameter to the type, which makes things more complicated (often you don't need to explicitly type your variables, this is Rust after all, but for beginners it can get confusing).


As an additional example – Gamma distribution can be parametrized by:

Mean only – and it will be identical to exponential distribution.
Mean and variance – using method of moments from Wiki.
Mean, variance with skewness and/or kurtosis – using numerical optimizer.

I think a builder would be great for this. You should even be able to handle different types.

pub struct GammaBuilder<P> {
    p: P
}

impl<P> GammaBuilder<P> {
    pub fn new(mean: f64) -> GammaBuilder<ExpBuilderParams> {
        GammaBuilder {
            p: ExpBuilderParams { mean }
        }
    }
}

impl GammaBuilder<ExpBuilderParams> {
    pub fn variance(self, variance: f64) -> GammaBuilder<GammaBuilderParams> {
        GammaBuilder {
            GammaBuilderParams {
                mean: self.mean,
                variance,
             }
         }
    }

    pub fn build(self) -> Exp {
        // ...
    }
}

impl GammaBuilder<GammaBuilderParams> {
    pub fn build(self) -> Gamma {
        // ...
    }
}

struct ExpBuilderParams {
    mean: f64
}

struct GammaBuilderParams {
    mean: f64,
    variance: f64,
}

Needs a bit of polish and I can't sketch an API for the numerical optimizer part since I don't know too much about that, but that should generally work, I think.

Footnotes

  1. i.e. f(m, v) is not just convert to r, p; then call g(r, p) but instead a different formula altogether.

@YeungOnion
Copy link
Contributor

YeungOnion commented Sep 16, 2024

I think supporting construction by specifying moments would be useful and as general fitting/optimizing of distribution functions should be a feature of statrs, we should aim to have helpful, unsurprising API to work with the popular optimizer crates.

I also think the builder is a good choice. It even extends well for fitting to data.

I've got other thoughts, but while writing it I realized we haven't defined if this project aims to be more for those writing single analyses or more as a library for applications / other library-like crates.

The notion of a Gamma distribution shouldn't rely on its parametrization, but numerically speaking a caller could opt into tradeoffs by choosing a parametrization that's closer to their use case. C.f. when running your own potentially exploratory analysis, you might just expect numerical stability across parameters, so we would runtime select from suite of methods to account for broad parameter ranges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants