Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Raise when setting name via level #30574

Merged
merged 4 commits into from
Jan 3, 2020

Conversation

TomAugspurger
Copy link
Contributor

Closes #29032

This is extremely ugly, but gets the job done. It's probably worth doing to avoid silently failing, but not sure.

cc @topper-123.

@TomAugspurger TomAugspurger added this to the 1.0 milestone Dec 30, 2019
@@ -234,6 +234,7 @@ def _simple_new(cls, array, name, closed=None):
result = IntervalMixin.__new__(cls)
result._data = array
result.name = name
result._no_setting_name = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be set to False in each _simple_new, or can it just use the class-level default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it’s just on the class, won’t it be shared amongst all instances? I think I should change the type declaration to not have a default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it’s just on the class, won’t it be shared amongst all instances?

I think the relevant thing is: when you set level._no_setting_name = True in the one place where you do that, will that update the class attribute, and the answer should be no.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, level._no_setting_name = True adds an instance attribute, so there shouldn't happen anything on the class attribute.

def test_setting_names_from_levels_raises():
idx = pd.MultiIndex.from_product([["a"], [1, 2]], names=["a", "b"])
with pytest.raises(RuntimeError, match="set_names"):
idx.levels[0].name = "foo"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i could do something weird like:

lvl = idx.levels[0]

ser = pd.Series(lvl, index=lvl)
ser.index.name="foo"

is there a less-contrived way in which idx.levels[0] could come un-attached from the multiindex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That raises (correctly, I think). I can add a test for it if you like.

Copy link
Contributor

@topper-123 topper-123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can avoid setting this in subclasses (other than MultiIndex), it would simplify quite a bit.

This will be removed later, when we feel confident that people have switched idiom, right? It's in general not possible to prevent people setting attributes on child objects (e.g. in CategoricalIndex, people can already do cidx.categories.name = "name") and it just adds complexity trying to prevent it, IMO.

@@ -520,6 +524,7 @@ def _simple_new(cls, values, name=None, dtype=None):
# we actually set this value too.
result._index_data = values
result._name = name
result._no_setting_name = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If _no_setting_name = False in the class declaration, isn't it unnecessary here? In both cases the attribute is set to False.

@@ -1214,6 +1219,12 @@ def name(self):

@name.setter
def name(self, value):
if self._no_setting_name:
# Used in MultiIndex.levels.
raise RuntimeError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeError instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a RuntimeError is best, since we don't check the type or valuje of value checked here.

@TomAugspurger
Copy link
Contributor Author

I think it has to be on all index subclasses since they make up the levels of a MultiIndex.

@topper-123
Copy link
Contributor

@TomAugspurger , but if it's set on Index all index subclasses inherit the attribute?

@TomAugspurger
Copy link
Contributor Author

Right, since it’s the level of the MI that can’t have its name set, not the MI itself.

@TomAugspurger
Copy link
Contributor Author

My apologies, I've simplified things a bit. I misunderstood what happened to class attributes when the attribute is set on an instance.

@TomAugspurger
Copy link
Contributor Author

This will be removed later, when we feel confident that people have switched idiom, right?

I'm not sure. We can weigh the maintenance cost down the road and remove it if it's burdensome.

@jbrockmendel
Copy link
Member

I'm fine with this as-is.

Spitballing another solution: we could have Index._names be an ndarray for all Index subclasses, with

@property
def name(self):
    return self._names[0]     

@name.setter
def name(self, value):
    if np.is_view_or_whatever(self._names):
        raise RuntimeError(...)
    self._names[0] = value

@jreback
Copy link
Contributor

jreback commented Jan 1, 2020

this looks fine, butI would consider something slightly different. We could have an attribute defined on Index._parent = None

then set a weakref when it is contained in a MultiIndex. This gives us provenance on where things are coming from. might be useful.

@TomAugspurger
Copy link
Contributor Author

Mmm, that's also reasonable. My initial concern is that indexes are potentially shared, so they could have multiple parents? For this purpose it's not a problem since we're creating the Index from scratch, just something to keep in mind going forward.

@jreback
Copy link
Contributor

jreback commented Jan 2, 2020

Mmm, that's also reasonable. My initial concern is that indexes are potentially shared, so they could have multiple parents? For this purpose it's not a problem since we're creating the Index from scratch, just something to keep in mind going forward.

I don’t think we share for a contained Index inside a MultiIndex but not really sure

@TomAugspurger
Copy link
Contributor Author

You're right, these levels of a MI are created on the fly.

In theory, a user-action could end up with an Index that's created from the level of a MI

s = pd.Series(index=idx.levels[0])

but that probably doesn't affect your proposed "parent" reference.

That implementation is going to be more complex though, since subclasses don't call Index._simple_new. Thoughts on waiting till we need it?

@jreback
Copy link
Contributor

jreback commented Jan 2, 2020

yeah was just an idea i think what u did here is ok
let me have another glance

@jreback jreback merged commit 7b35099 into pandas-dev:master Jan 3, 2020
@jreback
Copy link
Contributor

jreback commented Jan 3, 2020

thanks @TomAugspurger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecate getting the name from MultiIndex.levels
4 participants