Numerical Instability of GMM #152

AtheMathmo · 2016-10-13T17:04:01Z

There are a few problems with the current GMM implementation.

Using det and inverse is numerically unstable, should use cholesky instead.
The regularization constant is added horribly incorrectly at the moment. Should add only to diagonal and after the nested loop - not during.
Should compute probabilities in log-space and work there wherever possible.

Scikit learn's implementation is a good reference.

The text was updated successfully, but these errors were encountered:

andrewcsmith · 2016-10-14T19:49:05Z

Can you break this down a little bit more, or cite reference literature or a reference implementation?

I'm very invested in the stability of GMM, as it's the standard way to cluster spectral frames in audio segmentation problems. Currently I've found that if I initialize too many clusters it totally goes off the rails.

AtheMathmo · 2016-10-15T06:29:17Z

Added an implementation for reference. I haven't got any literary references yet, will try to find some.

I've started working on this already but it is a sizable task and so progress is a little slow.

andrewcsmith · 2016-10-15T17:00:37Z

Okay, do you want to push stuff to a separate branch on this project? I can also pull down changes and work on it then. I spent an hour or so digging into sklearn last night.

* Issue #152: Only add regularization constant to diagonal * Move covariance initialization into its own method * Move compute_cov outside of nested loop

AtheMathmo · 2016-10-15T17:05:19Z

I haven't had much time to work on it today and am away from my PC so cannot push the changes right now. The only part I had started really was switching to log-space for the probabilities - this part here.

I hadn't pushed anything to a branch yet as it was broken :D . Feel free to start pushing forward with it and if anything I would be happy to put a PR into your branch later (if I can indeed contribute anything).

andrewcsmith · 2016-10-15T17:06:28Z

Sounds great. I'll work on the cholesky implementation first. It seems that's most of the numerical instability.

AtheMathmo · 2016-10-15T17:07:30Z

Ok, sure! I would agree that is the worst part, I just thought doing log space probabilities would be easier...

andrewcsmith · 2016-10-15T23:02:19Z

Cool. Why don't I take a crack at it first. I'm finding it pretty complicated to try and imagine things in non-log-space while looking at the sklearn code, so I'm just doing it all at once and sticking to log space while using cholesky.

I see quite a few rusty improvements we could make to the sklearn implementation, but I'm saving those for later. At the moment it's just a bad translation.

AtheMathmo · 2016-10-16T07:35:48Z

That sounds perfect, thank you for your help with this!

andrewcsmith · 2016-10-17T01:19:17Z

I've taken a first crack at it with #155. I can't seem to figure out why the means aren't separating though. They do drift in the direction of the points, but they all drift basically the same way. I think this has something to do with the log responsibilities of the components, but I can't find a bug anywhere.

Lots of println!s, too.

andrewcsmith added a commit to andrewcsmith/rusty-machine that referenced this issue Oct 15, 2016

Issue AtheMathmo#152: Only add regularization constant to diagonal

dbc5bb7

AtheMathmo pushed a commit that referenced this issue Oct 15, 2016

Only add regularization constant to diagonal (#154)

f9d0e71

* Issue #152: Only add regularization constant to diagonal * Move covariance initialization into its own method * Move compute_cov outside of nested loop

andrewcsmith mentioned this issue Oct 18, 2016

GMM CovOption redesign #153

Open

sinhrks mentioned this issue Dec 26, 2016

LinRegressor to use solve rather than inverse #165

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numerical Instability of GMM #152

Numerical Instability of GMM #152

AtheMathmo commented Oct 13, 2016 •

edited

Loading

andrewcsmith commented Oct 14, 2016

AtheMathmo commented Oct 15, 2016

andrewcsmith commented Oct 15, 2016

AtheMathmo commented Oct 15, 2016

andrewcsmith commented Oct 15, 2016

AtheMathmo commented Oct 15, 2016

andrewcsmith commented Oct 15, 2016

AtheMathmo commented Oct 16, 2016

andrewcsmith commented Oct 17, 2016

Numerical Instability of GMM #152

Numerical Instability of GMM #152

Comments

AtheMathmo commented Oct 13, 2016 • edited Loading

andrewcsmith commented Oct 14, 2016

AtheMathmo commented Oct 15, 2016

andrewcsmith commented Oct 15, 2016

AtheMathmo commented Oct 15, 2016

andrewcsmith commented Oct 15, 2016

AtheMathmo commented Oct 15, 2016

andrewcsmith commented Oct 15, 2016

AtheMathmo commented Oct 16, 2016

andrewcsmith commented Oct 17, 2016

AtheMathmo commented Oct 13, 2016 •

edited

Loading