Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST Unit test for nb_nll #75

Merged
merged 9 commits into from
Feb 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 34 additions & 4 deletions pydeseq2/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,27 +223,57 @@ def dispersion_trend(normed_mean, coeffs):


def nb_nll(counts, mu, alpha):
"""Negative log-likelihood of a negative binomial.
"""Negative log-likelihood of a negative binomial of parameters ``mu`` and ``alpha``.

Unvectorized version.

Mathematically, if ``counts`` is a vector of counting entries :math:`y_i`
then the likelihood of each entry :math:`y_i` to be drawn from a negative
binomial :math:`NB(\\mu, \\alpha)` is [1]

.. math::
p(y_i | \\mu, \\alpha) = \\frac{\\Gamma(y_i + \\alpha^{-1})}{
\\Gamma(y_i + 1)\\Gamma(\\alpha^{-1})
}
\\left(\\frac{1}{1 + \\alpha \\mu} \\right)^{1/\\alpha}
\\left(\\frac{\\mu}{\\alpha^{-1} + \\mu} \\right)^{y_i}

As a consequence, assuming there are :math:`n` entries,
the total negative log-likelihood for ``counts`` is

.. math::
\\ell(\\mu, \\alpha) = \\frac{n}{\\alpha} \\log(\\alpha) +
\\sum_i \\left \\lbrace
- \\log \\left( \\frac{\\Gamma(y_i + \\alpha^{-1})}{
\\Gamma(y_i + 1)\\Gamma(\\alpha^{-1})
} \\right)
+ (\\alpha^{-1} + y_i) \\log (\\alpha^{-1} + y_i)
- y_i \\log \\mu
\\right \\rbrace

This is implemented in this function.

Parameters
----------
counts : ndarray
Observations.

mu : float
Mean of the distribution.
Mean of the distribution :math:`\\mu`.

alpha : float
Dispersion of the distribution,
s.t. the variance is :math:`\\mu + \\alpha * \\mu^2`.
Dispersion of the distribution :math:`\\alpha`,
s.t. the variance is :math:`\\mu + \\alpha \\mu^2`.

Returns
-------
float
Negative log likelihood of the observations counts
following :math:`NB(\\mu, \\alpha)`.

Notes
-----
[1] https://en.wikipedia.org/wiki/Negative_binomial_distribution
"""

n = len(counts)
Expand Down
29 changes: 29 additions & 0 deletions tests/test_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import numpy as np
import pytest

from pydeseq2.utils import nb_nll


@pytest.mark.parametrize("mu, alpha", [(10, 0.5), (10, 0.1), (3, 0.5), (9, 0.05)])
def test_nb_nll_moments(mu, alpha):
# get the probability of many points
y = np.arange(int(10 * (mu + mu**2 / alpha)))
probas = np.zeros(y.shape)
for i in range(y.size):
# crude trapezoidal interpolation
probas[i] = np.exp(-nb_nll(np.array([y[i]]), mu, alpha))
# check that probas sums very close to 1
assert np.allclose(probas.sum(), 1.0)
# Re-sample according to probas
n_montecarlo = int(1e6)
rng = np.random.default_rng(42)
sample = rng.choice(y, size=(n_montecarlo,), p=probas)
# Get the theoretical values
mean_th = mu
var_th = (mu * alpha + 1) * mu
# Check that the mean is in an acceptable range, up to stochasticity
diff = sample.mean() - mean_th
deviation = var_th / np.sqrt(n_montecarlo)
assert np.abs(diff) < 0.2 * deviation
error_var = np.abs(sample.var() - var_th) / var_th
assert error_var < 1 / np.sqrt(n_montecarlo)