-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse get dummies perf #21997
Sparse get dummies perf #21997
Conversation
Hello @TomAugspurger! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on July 20, 2018 at 15:48 Hours UTC |
Here's the ASV (only a 3x speedup).
|
dtype=pd.api.types.CategoricalDtype(categories)) | ||
self.s = s | ||
|
||
def time_get_dummies_1d(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: you can param over sparce=False/True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a slight preference for leaving them separate, since they're such distinct code paths and it's a tad easier to run just sparse with this layout. Happy to change if you feel strongly about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, no strong preference to use params
then.
lgtm. |
thanks! |
Previously, we did a scalar
elem == -1
for every element in the ndarray.This replaces that check with a vectorized
array == -1
.Running the ASV now. In the meantime, here's a simple timeit on the same problem