Silent integer overflow in test_values #4279

mschmidt87 · 2020-11-30T06:39:53Z

Description of your problem

I am trying to fit a Negative-Binomial regression model. However, I am running into SamplingError: Initial evaluation of model at starting point failed!, even if I am trying to fit samples from the model prior.

I found out that the test_value of my observation value (which should be positive because of the Negative-Binomial distribution) are sometimes negative, which leads to the error. I then checked model.check_test_point() and get:

alpha_log__   -1.14
mu            -1.61
obs            -inf
Name: Log-probability of test_point, dtype: float64

This happens because some of the test values of the observed variable are negative:

model_instance.obs.tag.test_value.min()

If I understand correctly, the test values of the observed variable are basically the same as the values I am trying to fit:

>>> model_instance.obs.tag.test_value[0][:10]
array([136519719, 139778075, 162329337, 147874471, 176346820, 131886145,
       131664941, 145409569, 141731980, 152439463])
>>> prior_trace['obs'][0][:10]
array([136519719, 139778075, 162329337, 147874471, 176346820, 131886145,
       131664941, 145409569, 141731980, 152439463])

However, for very large sampled values, the test value is negative:

>>> model_instance.obs.tag.test_value.min()
-2146971425
>>> prior_trace['obs'].reshape(-1)[model_instance.obs.tag.test_value.reshape(-1).argmin()]
2147995871
>>> model_instance.obs.tag.test_value.reshape(-1)[np.where(prior_trace['obs'].reshape(-1) != model_instance.obs.tag.test_value.reshape(-1))]
array([-2041161858, -1615277538, -2008654162, -1830866193, -2066486365,
       -1939932683, -1927434696, -1920384235, -1997835169, -1856732049,
       -1849204327, -1857766493, -2045704721, -1855359298, -2123503404,
       -1829876267, -2071833989, -1976140492, -2110619502, -1816045663, ...

Thus, it seems that the test value is, for some reason, overflowing and turning negative for very large values.

Below is a minimal reproducer:

Please provide a minimal, self-contained, and reproducible example.

def model_factory(y):
    with pm.Model() as model:
        alpha = pm.HalfCauchy("alpha", 100)
        mu = pm.Normal('mu', 17., 2.)
        lam = pm.Deterministic('lambda', tt.exp(mu))
        pm.NegativeBinomial("obs", mu=lam, alpha=alpha, observed=y)
    return model

# Just some dummy data to let us sample from the prior
y = np.random.binomial(n=10, p=0.5, size=100)

# Sample from the model's prior
with model_factory(y):
    prior_trace = pm.sample_prior_predictive(1000)

# Fit the model to the prior samples
model_instance = model_factory(prior_trace['obs'])
with model_instance:
    trace = pm.sample()

Please provide the full traceback.

---------------------------------------------------------------------------
SamplingError                             Traceback (most recent call last)
<ipython-input-21-1bfc50620c9f> in <module>
      1 model_instance = model_factory(prior_trace['obs'])
      2 with model_instance:
----> 3     trace = pm.sample()

~/path/to/lib/python3.7/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, return_inferencedata, idata_kwargs, mp_ctx, pickle_backend, **kwargs)
    423                 update_start_vals(chain_start_vals, model.test_point, model)
    424 
--> 425     check_start_vals(start, model)
    426     if cores is None:
    427         cores = min(4, _cpu_count())

~/path/to/lib/python3.7/site-packages/pymc3/util.py in check_start_vals(start, model)
    228                 "Initial evaluation of model at starting point failed!\n"
    229                 "Starting values:\n{}\n\n"
--> 230                 "Initial evaluation results:\n{}".format(elem, str(initial_eval))
    231             )
    232 

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'alpha_log__': array(4.60517019), 'mu': array(17.)}

Initial evaluation results:
alpha_log__   -1.14
mu            -1.61
obs            -inf
Name: Log-probability of test_point, dtype: float64

Please provide any additional information below.

Versions and main components

PyMC3 Version: current master (9deb5ee) (Note that this also happens with the current release)
Theano Version: 1.0.11
Python Version: 3.7.7
Operating system: Mac OS X
How did you install PyMC3: (conda/pip)

The text was updated successfully, but these errors were encountered:

Spaak · 2020-11-30T07:37:10Z

Thanks @mschmidt87 for the detailed report! Pinging @StephenHogg here, who recently did some work (#4211) on initial model checks before sampling; could be related.

StephenHogg · 2020-11-30T08:06:29Z

check_test_eval is really just calling model.check_test_point under the hood, which in turn just calls each of the RV objects contained in the model. So I would try just doing that on your initial values in order to pick apart what's going on a bit more.

mschmidt87 · 2020-11-30T08:12:58Z

Yes, I looked a bit in the code, but I think the problem is in the construction of the test value for my observation variable (which becomes negative for very high values for some reason). But I don't really know where those test values are determined/initialized.

mschmidt87 · 2020-12-01T05:02:36Z

Okay, I have digged deeper into this and found the source of the error:

In my example, I am specifying the data as a numpy array with dtype np.int64. PyMC3 then takes that data and calls pandas_to_array on it:

https://github.com/pymc-devs/pymc3/blob/162fd92da607e133a83d360e97fbde2b364cb703/pymc3/model.py#L1785-L1786

Now, in the pandas_to_array, we end in the else branch of a long list of type checks:

https://github.com/pymc-devs/pymc3/blob/162fd92da607e133a83d360e97fbde2b364cb703/pymc3/model.py#L1686-L1703

which executes np.asarray(data). Since my data is already an array, nothing really happens here. But then the code checks if the dtype is an integer and, if yes, calls pm.intX(ret):

https://github.com/pymc-devs/pymc3/blob/162fd92da607e133a83d360e97fbde2b364cb703/pymc3/model.py#L1705-L1708

Now, pm.intX converts the dtype into int32 which overflows at a bit above 2.e9 and this causes the negative values which then lead to the error in the check_test_eval.

Now, I am not familiar with all the intricacies of the code, but the comments suggest this casting might happen because PyMC3 thinks that I am passing an array of indices since the dtype is integer. I don't quite understand why that necessitates casting it to int32 but I presume that is related to theano?
Another thing that is a bit odd is that we are calling pandas_to_array in the first place, because we already passed an array to the model. Thus, either we don't actually want to call pandas_to_array for numpy arrays or the function should be renamed to a more informative name, I would say.

mschmidt87 · 2020-12-02T00:29:17Z

So, to me, there are essentially two questions:

Why are integers converted to int32 using the conversion map from float64 to int32? (The line in pandas_to_array() I quoted above) Pinging @AlexAndorra who authored the PR (Expand pm.Data capacities #3925 ) that introduced that line.
Why are they then converted to int32 and not int64? Pinging @lucianopaz here because he authored the pull-request (Convert continuous and discrete distribution parameters to floatX or int32 #3300 ) that introduced the conversion map.

lucianopaz · 2020-12-02T07:19:54Z

The reason that we cast to int32 and not to int64 is that the latter can overflow float64 calculations for large integer values, whereas int32 won't. You could try to see if this happens by casting your observations to float64. Maybe some inf values will start popping up.
How were your observations generated? Is there a chance to use int32 throughout the entire process? Is there a way to rescale your observations so that they don't end up being huge?

mschmidt87 · 2020-12-03T07:13:12Z

Thank you for your reply. I could scale down my observations, I guess, at the (acceptable) cost of losing some precision in the model.

However, could you explain what you mean with "we cast to int32 and not to int64 is that the latter can overflow float64 calculations for large integer values"? It's not clear to me what you mean with this.

michaelosthege · 2020-12-05T09:42:00Z

However, could you explain what you mean with "we cast to int32 and not to int64 is that the latter can overflow float64 calculations for large integer values"? It's not clear to me what you mean with this.

I think he refers to operations like int64 * float64.

Now let's appreciate for a moment, that the recently introduced checks from #4211 surfaced a bug that would have caused very hard to debug problems in the MCMC.

What if we just add a check in intX to raise an error when the conversion results in an under/overflow?

mschmidt87 · 2020-12-10T00:57:30Z

However, could you explain what you mean with "we cast to int32 and not to int64 is that the latter can overflow float64 calculations for large integer values"? It's not clear to me what you mean with this.

I think he refers to operations like int64 * float64.

Okay, I was not aware that this cannot happen with int32 * float64. If there is no safe way around casting those integers to int32, I would suggest to add some warnings to the code to make the user aware of this behavior. In addition, I would suggest to rename the pandas_to_array function to a more informative name.

There's a known issue with pymc3 ignoring bounds on priors and then ending up in impossible places. (similar to the open issues documented here: ICB-DCM/pyPESTO#365 and here: pymc-devs/pymc#4279, and here: https://discourse.pymc.io/t/square-decision-region-model-specification-initial-evaluation-failed/6413 We discovered this bug at around 3am, and will continue attempting to fix it, but would appreciate help from the teaching staff is possible. Otherwise, we can rewrite the model before the next deadline.

michaelosthege · 2022-05-05T20:03:18Z

In addition, I would suggest to rename the pandas_to_array function to a more informative name.

Heads-up, the pandas_to_array is being renamed to convert_observed_data by #5743.

fonnesbeck · 2022-08-28T13:19:58Z

Now that #4569 and #5743 have been merged, is this still an issue?

ricardoV94 · 2022-08-28T13:46:38Z

I think the intX change was never included in V4

* add type guard for inX * fix test for pandas * fix posterior test, ints passed for float data Closes pymc-devs#4279

ricardoV94 · 2024-02-06T10:12:25Z

Closing as stale, feel free to reopen if still relevant

ricardoV94 · 2024-02-06T10:13:32Z

Should be solved by #7114, although we don't use test_values anymore

ricardoV94 · 2024-02-06T10:15:22Z

The problem may still persist, but we would need a better reproducible example that's compatible with today's API.

If anyone faces this problem, calling model.point_logps() should return -infcaused by some unexpected overflow

Spaak added this to the 3.10 milestone Nov 30, 2020

Spaak removed this from the 3.10 milestone Dec 2, 2020

michaelosthege changed the title ~~Overflow in Model.test_value~~ Silent integer overflow in test_values Dec 5, 2020

michaelosthege added the help wanted label Dec 10, 2020

ricardoV94 added beginner friendly enhancements and removed help wanted labels Mar 17, 2021

ricardoV94 mentioned this issue Mar 20, 2021

Do not force lower precision in intX #4553

Open

jybook mentioned this issue Apr 14, 2021

Update tutorial notebook, document issues with inference phys201/bayboone#6

Merged

ricardoV94 removed the beginner friendly label Mar 19, 2022

michaelosthege pushed a commit to michaelosthege/pymc that referenced this issue Nov 20, 2022

Add a type guard for intX (pymc-devs#4569)

d1e868f

* add type guard for inX * fix test for pandas * fix posterior test, ints passed for float data Closes pymc-devs#4279

michaelosthege mentioned this issue Nov 20, 2022

Add a type guard for intX (#4569) #6319

Closed

5 tasks

ricardoV94 closed this as completed Feb 6, 2024

ricardoV94 mentioned this issue Feb 6, 2024

Remove intX and floatX from distributions #7114

Merged

5 tasks

ricardoV94 reopened this Feb 6, 2024

ricardoV94 closed this as completed in #7114 Feb 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Silent integer overflow in test_values #4279

Silent integer overflow in test_values #4279

mschmidt87 commented Nov 30, 2020

Spaak commented Nov 30, 2020

StephenHogg commented Nov 30, 2020

mschmidt87 commented Nov 30, 2020

mschmidt87 commented Dec 1, 2020 •

edited

Loading

mschmidt87 commented Dec 2, 2020

lucianopaz commented Dec 2, 2020

mschmidt87 commented Dec 3, 2020

michaelosthege commented Dec 5, 2020

mschmidt87 commented Dec 10, 2020

michaelosthege commented May 5, 2022

fonnesbeck commented Aug 28, 2022

ricardoV94 commented Aug 28, 2022

ricardoV94 commented Feb 6, 2024

ricardoV94 commented Feb 6, 2024 •

edited

Loading

ricardoV94 commented Feb 6, 2024

Silent integer overflow in test_values #4279

Silent integer overflow in test_values #4279

Comments

mschmidt87 commented Nov 30, 2020

Description of your problem

Versions and main components

Spaak commented Nov 30, 2020

StephenHogg commented Nov 30, 2020

mschmidt87 commented Nov 30, 2020

mschmidt87 commented Dec 1, 2020 • edited Loading

mschmidt87 commented Dec 2, 2020

lucianopaz commented Dec 2, 2020

mschmidt87 commented Dec 3, 2020

michaelosthege commented Dec 5, 2020

mschmidt87 commented Dec 10, 2020

michaelosthege commented May 5, 2022

fonnesbeck commented Aug 28, 2022

ricardoV94 commented Aug 28, 2022

ricardoV94 commented Feb 6, 2024

ricardoV94 commented Feb 6, 2024 • edited Loading

ricardoV94 commented Feb 6, 2024

mschmidt87 commented Dec 1, 2020 •

edited

Loading

ricardoV94 commented Feb 6, 2024 •

edited

Loading