-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: Store metadata in an attrs dict #29062
Conversation
This aids in the implementation of pandas-dev#28394. Over there, I'm having issues with using `NDFrame.__finalize__` to copy attributes, in part because getattribute on NDFrame is so complicated. This simplifies this because we only need to look in NDFrame.attrs, which is just a plain dictionary. Aside from the addition of a public NDFrame.attrs dictionary, there aren't any user-facing API changes.
return dict( | ||
_data=self._data, | ||
_typ=self._typ, | ||
_metadata=self._metadata, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are attrs linked to _metadata in any way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet, though perhaps eventually if we can find a suitable way to deprecate _metadata
for subclasses. With this PR, pandas doesn't use _metadata
itself anymore (previously, we just used it for Series.name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, then we should deprecate _metadata right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely? I have no idea who is using it or what they're using it for though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok can you open an issue to deprecate that
return dict( | ||
_data=self._data, | ||
_typ=self._typ, | ||
_metadata=self._metadata, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, then we should deprecate _metadata right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
return dict( | ||
_data=self._data, | ||
_typ=self._typ, | ||
_metadata=self._metadata, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok can you open an issue to deprecate that
@@ -5213,6 +5245,9 @@ def __finalize__(self, other, method=None, **kwargs): | |||
|
|||
""" | |||
if isinstance(other, NDFrame): | |||
for name in other.attrs: | |||
self.attrs[name] = other.attrs[name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.attrs.update(other) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd work for now, but we'll need a for loop in my other PR since we'll have per-attr logic to resolve these.
do you want to add docs anywhere / whatsnew? or just keep private mention for now? |
I pushed some basics API docs. Will want a larger section once things settle I think. |
thanks @TomAugspurger very nice! |
In xarray, we currently use duck-typing to check for My bigger concern is that I noticed that |
From a quick test, it seems this deletes the name ( But you can indeed set it to a non-hashable value that way, which doesn't directly raise an error, but can confusingly give errors in later operations:
|
Do we need to further discuss @shoyer's concern? I was planning to try this out in GeoPandas (to see if it can replace our usage of |
Sorry I missed @shoyer's comment. I'll take a look. |
This aligns with xarray and h5py: pandas-dev#29062 (comment)
* API: Store name outside attrs This aligns with xarray and h5py: #29062 (comment)
Hi, not sure this is the best issue for my comments below, or too early, but I've been wanting for a while to store metadata along with Series or Dataframes. I've tried the subclassing approach, with In [17]: s = pd.Series([0,1,2], name="x")
In [18]: s.attrs["mydata"] = "test"
In [19]: s.attrs
Out[19]: {'mydata': 'test'}
In [20]: df = s.to_frame()
In [21]: df.x.attrs
Out[21]: {} Is this supposed to work? Or do I misinterpret the purpose of Edit: |
Pandas will need to call finalize in more places to ensure the metadata is propagated. |
@buhrmann can you open a new issue with this specific use case? |
Is it appearing likely that |
This aids in the implementation of
#28394. Over there, I'm having
issues with using
NDFrame.__finalize__
to copy attributes, in partbecause getattribute on NDFrame is so complicated.
This simplifies that PR because we only need to look in NDFrame.attrs (name borrowed from xarray),
which is just a plain dictionary.
Aside from the addition of a public NDFrame.attrs dictionary, there
aren't any user-facing API changes.