[BUG] LowessRegressor: ZeroDivisionError: Weights sum to zero, can't be normalized #707

srt19170 · 2024-10-29T00:44:59Z

Using LowessRegressor, I received a "ZeroDivisionError: Weights sum to zero, can't be normalized" error. A little debugging revealed that the problem is in _calc_wts:

    def _calc_wts(self, x_i):
        """Calculate the weights for a single point `x_i` using the training data `self.X_` and the parameters
        `self.sigma` and `self.span`. The weights are calculated as `np.exp(-(distances**2) / self.sigma)`,
        where distances are the distances between `x_i` and all the training samples.

        If `self.span` is not None, then the weights are multiplied by
        `(distances <= np.quantile(distances, q=self.span))`.
        """
        distances = np.linalg.norm(self.X_ - x_i, axis=1)
        weights = np.exp(-(distances**2) / self.sigma)
        if self.span:
            weights = weights * (distances <= np.quantile(distances, q=self.span))
        return weights

If sigma is not sized properly, the scaled distances will be large and cause np.exp() to return zeros for the weights, resulting in an exception being raised when the weights are applied.

This seems like the wrong behavior. Perhaps the weights should be clipped to some minimum non-zero weight so that the predictor can still be used, perhaps with a warning message to explain the problem?

The text was updated successfully, but these errors were encountered:

FBruzzesi · 2024-10-29T07:35:30Z

Hey @srt19170 , thanks for spotting and raising this issue.

This seems like the wrong behavior.

Not sure I would call it wrong 😇

perhaps with a warning message to explain the problem?

Surely a better error message can be provided.

Perhaps the weights should be clipped to some minimum non-zero weight so that the predictor can still be used

My understanding is that this issue would occur only if all the -(distances**2) / self.sigma values blow up to minus infinity, thus all the weights are zero. Clipping all elements to any constant value would lead to compute the average of y in

results = np.stack([np.average(self.y_, weights=self._calc_wts(x_i=x_i)) for x_i in X])

I am not too comfortable with doing that implicitly for the user. I would still tend to raise an error that explains the issue

FBruzzesi · 2024-10-29T08:37:38Z

RFC on the following error message:

try:
    results = np.stack([np.average(self.y_, weights=self._calc_wts(x_i=x_i)) for x_i in X])
except ZeroDivisionError:
    msg = (
        "Weights, resulting from `np.exp(-(distances**2) / self.sigma)`, are all zero. "
        "Try to increase the value of `sigma` or to normalize the input data.\n\n"
        "`distances` refer to the distance between each sample `x_i` with all the"
        "training samples."
    )
    raise ValueError(msg)

koaning · 2024-10-29T08:49:10Z

+1 on the better error message.

@srt19170 is there anything you can share about your dataset that might help us understand the kinds of datasets for which this issue comes up?

srt19170 · 2024-10-29T14:48:53Z

My data has 75 features, and I'm using StandardScaler to condition the data. In the initial test set of data there are 2905 elements, and it looks like the distances end up being in the range 8-35. My first attempt at this I just copied the documentation and used sigma=0.1, which in this case is right at the edge of where it turns all the weights to zero.

I admit my math is weak -- is the norm being calculated more likely to yield large numbers when the number of features is large?

At any rate, thanks for considering the issue and the proposed error message would have helped me figure it out more quickly.

srt19170 added the bug Something isn't working label Oct 29, 2024

FBruzzesi mentioned this issue Oct 29, 2024

chore: improve error message for ZeroDivisionError in LowessRegression #708

Merged

9 tasks

koaning closed this as completed in #708 Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] LowessRegressor: ZeroDivisionError: Weights sum to zero, can't be normalized #707

[BUG] LowessRegressor: ZeroDivisionError: Weights sum to zero, can't be normalized #707

srt19170 commented Oct 29, 2024

FBruzzesi commented Oct 29, 2024

FBruzzesi commented Oct 29, 2024

koaning commented Oct 29, 2024

srt19170 commented Oct 29, 2024

[BUG] LowessRegressor: ZeroDivisionError: Weights sum to zero, can't be normalized #707

[BUG] LowessRegressor: ZeroDivisionError: Weights sum to zero, can't be normalized #707

Comments

srt19170 commented Oct 29, 2024

FBruzzesi commented Oct 29, 2024

FBruzzesi commented Oct 29, 2024

koaning commented Oct 29, 2024

srt19170 commented Oct 29, 2024