-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate MultiIndex names from levels #27242
Conversation
7d93c89
to
ab2fdf5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in MultiIndex, no problem setting the level.name attribute, but if you are in frame.py or reshape.py I would avoid doing this and instead use the name= parameter in ._shallow_copy() or use .rename()? (on a level)
pandas/core/indexes/multi.py
Outdated
@@ -259,6 +259,7 @@ def __new__( | |||
result._set_levels(levels, copy=copy, validate=False) | |||
result._set_codes(codes, copy=copy, validate=False) | |||
|
|||
result._names = [None for _ in levels] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[None] * len(levels)
pandas/core/reshape/reshape.py
Outdated
@@ -260,10 +260,13 @@ def get_new_values(self): | |||
def get_new_columns(self): | |||
if self.value_columns is None: | |||
if self.lift == 0: | |||
return self.removed_level | |||
lev = self.removed_level._shallow_copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why wouldn't you do
lev = self.removed_level._shallow_copy(name=self.removed_name)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_shallow_copy and rename and other indirect methods to set the .name all call ._set_names, which does a lot of checks. Those checks are not needed in these internal functionality, as the name has already been validated.
Perhaps have a fastpath parameter in _set_names?
pandas/core/reshape/reshape.py
Outdated
@@ -658,7 +663,9 @@ def _convert_level_number(level_num, columns): | |||
new_names = this.columns.names[:-1] | |||
new_columns = MultiIndex.from_tuples(unique_groups, names=new_names) | |||
else: | |||
new_columns = unique_groups = this.columns.levels[0] | |||
new_columns = this.columns.levels[0]._shallow_copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use name= here
pandas/core/reshape/reshape.py
Outdated
@@ -302,7 +305,9 @@ def get_new_index(self): | |||
lev, lab = self.new_index_levels[0], result_codes[0] | |||
if (lab == -1).any(): | |||
lev = lev.insert(len(lev), lev._na_value) | |||
return lev.take(lab) | |||
new_index = lev.take(lab) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use .rename()
|
||
lev = self.removed_level | ||
return lev.insert(0, lev._na_value) | ||
lev = self.removed_level.insert(0, item=self.removed_level._na_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use .rename()
@@ -979,7 +979,7 @@ def test_reset_index(self, float_frame): | |||
): | |||
values = lev.take(level_codes) | |||
name = names[i] | |||
tm.assert_index_equal(values, Index(deleveled[name])) | |||
tm.assert_index_equal(values, Index(deleveled[name]), check_names=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lev.take(level_codes)
doesn't provide a name any more, while a rest index does provides its Series with a name (as it should.
I've added a test assert values.name is None
to make this more explicit.
I’ve changed how the name is set. This is a bit slower (many checks that are not needed), but than could be fixed seperately in a later PR. |
pandas/tests/test_multilevel.py
Outdated
@@ -1609,12 +1607,12 @@ def test_constructor_with_tz(self): | |||
) | |||
|
|||
result = MultiIndex.from_arrays([index, columns]) | |||
tm.assert_index_equal(result.levels[0], index) | |||
tm.assert_index_equal(result.levels[1], columns) | |||
tm.assert_index_equal(result.levels[0], index, check_names=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now find these tests very confusing that we lose the names on the levels themselves. (I know that's the point of this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a .set_names(index.name) (for example) and remove the check_names arg (so its the default of True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made changes so we avoid check_names=False
(so is implicitly True
).
5bcf204
to
efcfeac
Compare
Is this ok? I'd like to get this in 0.25, as this is a breaking change. I'll add a 0.25 label as a reminder. The rest of #27138 will be non-breaking, so can go in later, if needed. |
I haven’t looked, but we shouldn’t merge breaking changes in the release candidate.
…________________________________
From: Terji Petersen <[email protected]>
Sent: Monday, July 8, 2019 6:25 PM
To: pandas-dev/pandas
Cc: Subscribed
Subject: Re: [pandas-dev/pandas] Separate MultiIndex names from levels (#27242)
Is this ok? I'd like to get this in 0.25, as this is a breaking change. I'll add a 0.25 label as a reminder.
The rest of #27138<#27138> will be non-breaking, so can go in later, if needed.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#27242?email_source=notifications&email_token=AAKAOIV7NXDLCTHW6VWPXTDP6PEINA5CNFSM4H6F75WKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZOUESY#issuecomment-509428299>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAKAOIX2TJDHTZTBXNWDJZDP6PEINANCNFSM4H6F75WA>.
|
Yeah, I understand that, but the followups to this PR will in addition to the benefits mentioned in #27138 also allow some nice simplifications of MultiIndex (by delegating all single-level checks to Categorical) and I assume release of 0.25 will mean a stop to breaking changes for a while,, because next up will be 1.0? |
@TomAugspurger I don't believe we have held off on merging even breaking changes to an RC. I don't see this as a big deal and would merge as is. |
I haven't had a chance to look (and won't this week), but if we're merging
API changes in RC0 then we'll need a second RC.
…On Tue, Jul 9, 2019 at 3:50 PM Jeff Reback ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger> I don't believe we have
held off on merging even breaking changes to an RC. I don't see this as a
big deal and would merge as is.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27242?email_source=notifications&email_token=AAKAOITD5YHRMJ3JRADT4TDP6UB4FA5CNFSM4H6F75WKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZRUSAA#issuecomment-509823232>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIUYKYEC5G4OUG647ZLP6UB4FANCNFSM4H6F75WA>
.
|
Ping. |
efcfeac
to
e7b8927
Compare
I've moved the whatsnew section to 1.0. |
@topper-123 lgtm (tiny comment). @TomAugspurger any objections? |
e7b8927
to
c81a26e
Compare
Question on the broad goal of #27138: IIUC, the motivation is for MultiIndex to be backed by a Why can @property
def levels(self):
return FrozenList(pd.Index(self._data[i], name=self.names[i] for i in idx.nlevels)) |
@topper-123 do you have thoughts on #27242 (comment)? |
But if someone does >>> mi = pd.MultiIndex.from_product([[1, 2], ['a', 'b']], names=['x', 'y'])
>>> lev = mi.levels[0]
>>> mi.set_names('z', level=0)
# then
>>> mi.names[0], lev.name
'z', 'x' So the names will be stored in two places (or users should not store individual levels seperately, which they can't be expected to know). So for this reason I think it's the most most practical to make a clean cut. EDIT: Ok I got an idea: What if we deprecate @property
def levels(self) -> FrozenList[Index]:
warnings.warn(...)
return FrozenList(pd.Index(lev.categories, name=name) for name, lev in zip(self.names, self._data))
@property
def categories(self) -> FrozenList[Index]:
return FrozenList(lev.categories for lev in self._data) This would also make the API for MultiIndex be more similar to CategoricalIndex. |
Ahh, a new name for |
+1 on @topper-123 new idea. |
On the new name, is I suspect that in the near-term we'll have a |
I'm not set on exact name for this, but would like consistency. So maybe if you make a suggestion on the attribute name for that new array type? I BTW don't know if I like the name Edit: Or is the idea that what is now |
I think `.categories` is fine for now.
It's a bit unfortunate that it's a `List[Categorical]` rather than an Index
like on CategoricalIndex, but that's probably OK.
…On Wed, Oct 16, 2019 at 8:44 AM Terji Petersen ***@***.***> wrote:
I'm not set on exact name for this, but would like consistency. So maybe
if you make a suggestion on the attribute name for that new array type?
I BTW don't know if I like the name DictEncodedArray . Can't it be just
EncodedArray instead?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27242?email_source=notifications&email_token=AAKAOITMBHZEU2367ULJJMDQO4LDFA5CNFSM4H6F75WKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMRBII#issuecomment-542707873>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOISNZBJLIMRV35ZQE6DQO4LDFANCNFSM4H6F75WA>
.
|
Ok, if I can get this PR merged, I will start implementing |
👍 I'm fine with merging this as long as we also do the |
thanks @topper-123 no need to create an issue, you can just ref this PR. |
I made #29032 so that this isn't
dropped.
…On Wed, Oct 16, 2019 at 9:41 AM Jeff Reback ***@***.***> wrote:
thanks @topper-123 <https://github.com/topper-123>
no need to create an issue, you can just ref this PR.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27242?email_source=notifications&email_token=AAKAOIUZC4524R2E2JHMM3TQO4R3HA5CNFSM4H6F75WKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBMXW3Y#issuecomment-542735215>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOISP6RLQUQVNPVBTQTTQO4R3HANCNFSM4H6F75WA>
.
|
This is breaking pyarrow (https://issues.apache.org/jira/browse/ARROW-6922). The above is a very long discussion (which I didn't really follow before, sorry for that), but what I somewhat understand is that for 1.0 we want to restore the |
If the idea is to deprecate the Short-term, can we add back the names to |
Right. Restoring getting is relatively straightforward. I can put a PR up with that later today. |
xref https://issues.apache.org/jira/browse/ARROW-6922 / pandas-dev#27242 (comment) / pandas-dev#29032 No docs yet, since it isn't clear how this will eventually sort out. But we at least want to preserve this behavior for 1.0
* API: Restore getting name from MultiIndex level xref https://issues.apache.org/jira/browse/ARROW-6922 / #27242 (comment) / #29032 No docs yet, since it isn't clear how this will eventually sort out. But we at least want to preserve this behavior for 1.0 * fixups
* API: Restore getting name from MultiIndex level xref https://issues.apache.org/jira/browse/ARROW-6922 / pandas-dev#27242 (comment) / pandas-dev#29032 No docs yet, since it isn't clear how this will eventually sort out. But we at least want to preserve this behavior for 1.0 * fixups
* API: Restore getting name from MultiIndex level xref https://issues.apache.org/jira/browse/ARROW-6922 / pandas-dev#27242 (comment) / pandas-dev#29032 No docs yet, since it isn't clear how this will eventually sort out. But we at least want to preserve this behavior for 1.0 * fixups
* API: Restore getting name from MultiIndex level xref https://issues.apache.org/jira/browse/ARROW-6922 / pandas-dev#27242 (comment) / pandas-dev#29032 No docs yet, since it isn't clear how this will eventually sort out. But we at least want to preserve this behavior for 1.0 * fixups
* API: Restore getting name from MultiIndex level xref https://issues.apache.org/jira/browse/ARROW-6922 / pandas-dev#27242 (comment) / pandas-dev#29032 No docs yet, since it isn't clear how this will eventually sort out. But we at least want to preserve this behavior for 1.0 * fixups
git diff upstream/master -u -- "*.py" | flake8 --diff
In #27138 I proposed doing some changes to
MultiIndex
, so that the index type can have its data collected in_data
as typeList[Categorical]
,+ addingMultiIndex.arrays
in order to access each full level as zero-copyCategorical
.This is the first part of that proposal, and drops setting the names on the
levels[x].name
attribute and instead sets the names on theMultiIndex._names
attribute.This PR is a minorly backward-breaking change (so would be good to get into 0.25), while the followup will not break anything.