-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Fixed regression in Series(index=idx) constructor #20865
PERF: Fixed regression in Series(index=idx) constructor #20865
Conversation
From pandas-dev#18496 Special cases empty series construction, since the reindex is not necessary.
cc @toobaz |
Codecov Report
@@ Coverage Diff @@
## master #20865 +/- ##
==========================================
+ Coverage 91.77% 91.78% +<.01%
==========================================
Files 153 153
Lines 49313 49327 +14
==========================================
+ Hits 45259 45275 +16
+ Misses 4054 4052 -2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an asv & a whatsnew (needed?)
pandas/core/series.py
Outdated
@@ -207,10 +211,20 @@ def __init__(self, data=None, index=None, dtype=None, name=None, | |||
else: | |||
data = data.reindex(index, copy=copy) | |||
data = data._data | |||
elif isinstance(data, dict): | |||
elif isinstance(data, dict) and (len(data) or index is None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than doing this here, why not inside _init_dict
itself? (you can simply return early if these conditions ar met). you can also return the type, copy
parameter as well. ideally like to locate all of a single 'type' code in the same place (dict here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - in particular, _init_dict
already checks whether data
is empty
Good call on the init_dict. For some reason I thought it had to be in the main Series.init, since _init_dict returns a BlockManager & index, but this seems to work.
Shouldn't need a whatsnew since #18496 was in 0.23. I think that was the original slowdown. And this was caught by the |
Does d8b1312 change your opinion on putting this stuff in
Series(index=['b', 'a']) Since the user input isn't actually dict-like, it's just going down _init_dict. |
I might be wrong (can't test now, sorry), but a simpler solution could maybe be to replace Line 180 in 4afc756
with if data in [None, {}]:
data = [] ? |
That won't quiet work since `data` may not be hashable, but I like the
thought. Will try a variant.
…On Mon, Apr 30, 2018 at 8:34 AM, Pietro Battiston ***@***.***> wrote:
I might be wrong (can't test now, sorry), but a simpler solution could be
to replace
https://github.com/pandas-dev/pandas/blob/4afc7563895830d224ac949f571ede
f2f069c314/pandas/core/series.py#L180
with
if data in [None, {}]:
data = []
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#20865 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIk_2hzXwI065z48-lX8D9FfUQJWXks5ttxLQgaJpZM4Trxgq>
.
|
Ah, no, doesn't quite work since the list isn't broadcast to the length of
index. We raise on Series([], index=[0]) since they're different lengths.
On Mon, Apr 30, 2018 at 8:39 AM, Tom Augspurger <[email protected]>
wrote:
… That won't quiet work since `data` may not be hashable, but I like the
thought. Will try a variant.
On Mon, Apr 30, 2018 at 8:34 AM, Pietro Battiston <
***@***.***> wrote:
> I might be wrong (can't test now, sorry), but a simpler solution could be
> to replace
>
> https://github.com/pandas-dev/pandas/blob/4afc7563895830d224
> ac949f571edef2f069c314/pandas/core/series.py#L180
>
> with
>
> if data in [None, {}]:
> data = []
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#20865 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABQHIk_2hzXwI065z48-lX8D9FfUQJWXks5ttxLQgaJpZM4Trxgq>
> .
>
|
Right! I thought at least replacing |
Merging in 2 hours if there are no further comments. I think we're just choosing between the two least-bad outcomes here. |
let me look again |
lgtm. thanks (I believe you said this was after 0.22 so no whatsnew needed) |
Right, slowdown was only on master. |
From #18496
Special cases empty series construction, since the reindex is not necessary.
xref #18532 (comment)
Setup
benchmark
HEAD:
0.21.0
Master