Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preprocess_diagonal RuntimeWarning: Divide by Zero encountered in Σ_T_inverse = 1.0 / Σ_T #1018

Closed
ejorgensen-wl opened this issue Jul 25, 2024 · 3 comments

Comments

@ejorgensen-wl
Copy link
Contributor

Minimal example to reproduce the warning:

>>> import stumpy
>>> import numpy as np
>>> stumpy.stump(np.array([0, 1, 2, 3, np.nan, np.nan, np.nan]), m=3)
/Users/user0/code/venv/lib/python3.10/site-packages/stumpy/core.py:2257: RuntimeWarning: divide by zero encountered in divide
  Σ_T_inverse = 1.0 / Σ_T
mparray([[inf, -1, -1, -1],
         [inf, -1, -1, -1],
         [inf, -1, -1, -1],
         [inf, -1, -1, -1],
         [inf, -1, -1, -1]], dtype=object)

This is caused when a subsequence of at least length m is all NaN. It seems to be handled without issues in the output, but I'm curious if there's a way we can handle the all-NaN subsequence case without raising the warning that may be confusing, or to raise a more useful warning.

As indicated by the warning, part of Σ_T = 0 in stumpy/core/preprocess_diagonal, though the line prior attempts to avoid the divide by zero warning by setting constant sections of the rolling std to 1.0 instead of 0.0.

It looks like preprocess_diagonal calls process_isconstant --> rolling_iscontant --> _rolling_isconstant to check if the subsequence is constant. Link to _rolling_isconstant.

Currently _rolling_isconstant says the sequence is not constant if any value is NaN. A possible fix would be to call a subsequence constant is all of its values are NaN.

That could perhaps be implemented into something like this for _rolling_isconstant:

out = np.empty(l)
all_nan = np.empty(l)
for i in prange(l):
    out[i] = np.ptp(a[i : i + w])
    all_nan[i] = all(np.isnan(a[i : i + w]))
return (out == 0) or all_nan

or

out = np.empty(l)
for i in prange(l):
    out[i] = (np.ptp(a[i : i + w]) == 0) or all(np.isnan(a[i : i + w]))
return out

Would this break other things or be otherwise undesirable?

@seanlaw
Copy link
Contributor

seanlaw commented Jul 25, 2024

@ejorgensen-wl Please correct me if I may be wrong but I believe that this has already been reported by @NimaSarajpoor in #1006 and he is currently working on a PR #1012?

@ejorgensen-wl
Copy link
Contributor Author

You're completely correct, yes. I apologize for not looking through existing issues to confirm! That solution seems to be a better alternative to what I had in mind as well so I'll close this issue.

@seanlaw
Copy link
Contributor

seanlaw commented Jul 25, 2024

@ejorgensen-wl No problem at all and excellent timing! Please feel free to chime in to the other issue if there are any other things that we failed to consider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants