creg2: use sparse updates #7

nschneid · 2014-03-13T18:47:23Z

Perhaps with scipy data structures: http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.dok_matrix.html#scipy.sparse.dok_matrix

brendano · 2014-03-13T18:49:42Z

scipy sparse matrixes are the only game in town for numpy world. the documentation and stuff is a little disappointing, though

nschneid · 2014-03-19T04:23:19Z

Via scikit-learn docs, I think dropping the .toarray() from

creg/creg2/creg2.py

Line 40 in 2a3563b

X = X_dict.fit_transform(X).toarray()

will leave it as a sparse matrix. Does that work? @as1986, might be worth trying to see if it saves memory and speeds things up.

nschneid · 2014-03-24T20:20:21Z

I have something that is hopefully a step in the direction of a working implementation with sparse matrices. https://gist.github.com/nschneid/9748235 (includes non-sparse code for comparison)

The sparse code is much slower than the original code on the iris dataset, which, after all, has dense instances. I have not tested it on a text dataset, and there are probably inefficiencies (e.g., I am making a sparse matrix for Hsqrt to get the division to work out even though it is really dense; also there is less reuse of matrix instances due to limitations in the APIs, though there may be better workarounds).

For testing purposes, I have modified the code to only look at the first 10 training examples rather than sampling a different minibatch for each iteration. On the iris dataset, the loss values that the sparse code prints are close but not exactly matching those of the original code. Unclear whether this is a bug or a numerical precision issue.

This code does not implement regularization. To do sparse L2 regularization there is a trick from Alex Smola's blog that I explain in my features document.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

creg2: use sparse updates #7

creg2: use sparse updates #7

nschneid commented Mar 13, 2014

brendano commented Mar 13, 2014

nschneid commented Mar 19, 2014

nschneid commented Mar 24, 2014

creg2: use sparse updates #7

creg2: use sparse updates #7

Comments

nschneid commented Mar 13, 2014

brendano commented Mar 13, 2014

nschneid commented Mar 19, 2014

nschneid commented Mar 24, 2014