Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark gap encoder early stopping #681

Merged

Conversation

LeoGrin
Copy link
Contributor

@LeoGrin LeoGrin commented Jul 28, 2023

Benchmark to evaluate the early stopping change made in #680, compared to the previous version of the gap encoder.

Based on #663 by @simonamaggio and #593 by @LilianBoulard

Compared to #663 , we don't compute the full score regularly, as this operation is very slow. Instead, we use an exponentially weighted average of each batch score. This is based on sklearn's MinibatchNMF's code.

Results

It seems that the new version is faster, while the KL score and balanced accuracy don't change. I think the speed can be improved further by tuning hyperparameters, this is the subject of another benchmark in #680. Furthermore, id columns like seqid take a long time to fit even with early stopping, as the score keeps decreasing, so #585 is still relevant even after this speedup.

image image image

@LeoGrin LeoGrin requested a review from LilianBoulard July 28, 2023 14:05
@LeoGrin LeoGrin mentioned this pull request Jul 28, 2023
Copy link
Member

@LilianBoulard LilianBoulard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark looks good, thanks for the contribution!
Maybe one minor modif before merging (ping me when you think this is ready): remove the print statements (and pass verbose=False).

benchmarks/bench_gap_es_score.py Show resolved Hide resolved
benchmarks/bench_gap_es_score.py Outdated Show resolved Hide resolved
benchmarks/bench_gap_es_score.py Outdated Show resolved Hide resolved
@LeoGrin
Copy link
Contributor Author

LeoGrin commented Jul 31, 2023

@LilianBoulard Running with the same batch size, the speedup is smaller, but I think still useful.

image image image

@LeoGrin
Copy link
Contributor Author

LeoGrin commented Aug 4, 2023

@LilianBoulard I think it's ready, WDYT?

Copy link
Member

@jovan-stojanovic jovan-stojanovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@jovan-stojanovic jovan-stojanovic merged commit 82d31c9 into skrub-data:main Aug 8, 2023
LeoGrin added a commit to LeoGrin/skrub that referenced this pull request Aug 24, 2023
* benchmark

* fix bug due to mixed type

* verbose

* test

* fix bug

* add benchmark results

* add balenced accuracy

* Update benchmarks/bench_gap_es_score.py

Co-authored-by: Lilian <[email protected]>

* remove prints

* run with the same batch size

* benchmark results with the same batch size

---------

Co-authored-by: Lilian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants