Support sparse matrices for X #240

adriangb · 2021-08-11T00:24:46Z

Closes #239

codecov-commenter · 2021-08-11T00:34:49Z

Codecov Report

Merging #240 (38feeca) into master (2c8e9e0) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #240   +/-   ##
=======================================
  Coverage   98.27%   98.28%           
=======================================
  Files           7        7           
  Lines         755      759    +4     
=======================================
+ Hits          742      746    +4     
  Misses         13       13

Impacted Files	Coverage Δ
scikeras/wrappers.py	`97.53% <100.00%> (+0.02%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us.

github-actions · 2021-08-11T00:36:16Z

📝 Docs preview for commit 38feeca at: https://www.adriangb.com/scikeras/refs/pull/240/merge/

adriangb · 2022-07-22T00:53:40Z

Todo: update docs, maybe an example notebook?

carlogeertse

Tutorial notebook looks good, explaining why and how to use sparse matrices. See my last comment for some minor suggestions.

carlogeertse · 2022-07-22T19:35:22Z

docs/source/notebooks/sparse.md

+
+The dataset we'll be using is designed to demostrate a worst-case/best-case scenario for dense and sparse input features respectively.
+It consists of a single categorical feature with equal number of categories as rows.
+This means the one-hot encoded representation will require as many columns as it does rows, making it very ineffienct to store as a dense matrix but very efficient to store as a sparse matrix.


I think this is a perfect example, as data with many categorical features and/or categories per feature seems like the main reason to make use of a sparse matrix. Taking that to the extreme highlights the benefit of sparse matrices well.

carlogeertse · 2022-07-22T19:36:21Z

docs/source/notebooks/sparse.md

+```python
+%memit sparse_pipeline.fit(X, y)
+```
+


Happy to see that you did know how to properly monitor memory usage.

carlogeertse · 2022-07-22T19:43:58Z

docs/source/notebooks/sparse.md

+
+```python
+%memit sparse_pipline_uint8.fit(X, y)
+```


2 more things you might want to add:

Monitor and mention the computation time for the sparse variant. While reduced memory usage is an obvious advantage of using sparse matrices, it came along with increased computation times (for my experiments at least). I'd be curious to see if you measure the same in these somewhat better setup experiments.

Maybe mention the use of tf.data.Dataset along with a generator as an alternative solution to memory issues. I think this will work in most use cases. It just won't work when the scikeras wrapper is used inside another wrapper or pipeline that doesn't support tf.data.Dataset. Which was the reason I needed to use a sparse matrix.

Not sure how relevant these 2 suggestions are, so I'll leave it up to you if you want include that information.

Great points. I added a run time measurement and mentioned Datasets, feel free to suggest changes to those sections

Those additions look good to me.

mattalhonte-srm · 2022-07-23T02:43:15Z

Heya! Just tested this and it doesn't work for me - converting it to lil makes the container blow up when I try to train, I need the csr matrix to stay a CSR matrix (TF can do much more efficient math on those!). The thing that worked for me was just passing accept_sparse=True and 0 other code changes. Thanks!

adriangb · 2022-07-23T03:19:58Z

Ouch. I forget why I had to put the conversion in there (I mean, there's a comment, but I'm sure some test failed).

adriangb added 3 commits August 10, 2021 17:16

add support for sparse features

7d89927

Remove redundant test

4322de0

update docstrings

29f71dd

simplify

d6e3bd6

adriangb mentioned this pull request Aug 11, 2021

Use of sparse matrices #239

Closed

Merge branch 'master' into support-sparse-matrix-X

00be7c5

adriangb added 4 commits July 22, 2022 00:11

add notebook

900e2fb

add heading

09d38ea

Update tutorials.rst

561e716

fix header

98fddf1

carlogeertse approved these changes Jul 22, 2022

View reviewed changes

adriangb added 2 commits July 22, 2022 17:07

pr feedback

ffd34d6

Revert changes to other notebooks

38feeca

adriangb merged commit 8d5e1a9 into master Jul 22, 2022

adriangb deleted the support-sparse-matrix-X branch July 23, 2022 03:20

adriangb restored the support-sparse-matrix-X branch July 23, 2022 03:20

adriangb mentioned this pull request Jul 23, 2022

fix: don't convert sparse matrix formats #282

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support sparse matrices for X #240

Support sparse matrices for X #240

adriangb commented Aug 11, 2021

codecov-commenter commented Aug 11, 2021 •

edited

Loading

github-actions bot commented Aug 11, 2021 •

edited

Loading

adriangb commented Jul 22, 2022

carlogeertse left a comment

carlogeertse Jul 22, 2022

carlogeertse Jul 22, 2022

carlogeertse Jul 22, 2022

adriangb Jul 22, 2022

carlogeertse Jul 22, 2022

mattalhonte-srm commented Jul 23, 2022

adriangb commented Jul 23, 2022

Support sparse matrices for X #240

Support sparse matrices for X #240

Conversation

adriangb commented Aug 11, 2021

codecov-commenter commented Aug 11, 2021 • edited Loading

Codecov Report

github-actions bot commented Aug 11, 2021 • edited Loading

adriangb commented Jul 22, 2022

carlogeertse left a comment

Choose a reason for hiding this comment

carlogeertse Jul 22, 2022

Choose a reason for hiding this comment

carlogeertse Jul 22, 2022

Choose a reason for hiding this comment

carlogeertse Jul 22, 2022

Choose a reason for hiding this comment

adriangb Jul 22, 2022

Choose a reason for hiding this comment

carlogeertse Jul 22, 2022

Choose a reason for hiding this comment

mattalhonte-srm commented Jul 23, 2022

adriangb commented Jul 23, 2022

codecov-commenter commented Aug 11, 2021 •

edited

Loading

github-actions bot commented Aug 11, 2021 •

edited

Loading