Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a .drop_attrs method #8258

Merged
merged 25 commits into from
Jul 11, 2024
Merged

Add a .drop_attrs method #8258

merged 25 commits into from
Jul 11, 2024

Conversation

max-sixty
Copy link
Collaborator

@max-sixty max-sixty commented Sep 30, 2023

Part of #3891

Do we think this is a good idea? I'll add docs & tests if so...

Ready to go, just needs agreement on whether it's good

@headtr1ck
Copy link
Collaborator

I think it's a good idea.

But should probably have the same arguments and workings as drop_vars and others.

If you want to control the behavior of dropping attrs from variables or only dataset level attrs is another question.

@max-sixty
Copy link
Collaborator Author

If you want to control the behavior of dropping attrs from variables or only dataset level attrs is another question.

+1

What are variable-level attrs generally used for? For semantic information ("collected on 2023-09-30") or encoding-like information ("encoded as unicode")?

@max-sixty
Copy link
Collaborator Author

But should probably have the same arguments and workings as drop_vars and others.

Yes, I was thinking about it a bit more like .reset_encoding#8259. We could definitely allow passing attrs to drop. I'm not sure what we should do with the errors term, given attrs can be in multiple places...

@max-sixty
Copy link
Collaborator Author

Hi team — what do we think about this? @pydata/xarray

xarray/core/dataarray.py Outdated Show resolved Hide resolved

# Remove attributes from each variable in the dataset
for var in self.variables:
self[var].attrs = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding this should wipe the attrs of the index_vars of the original object?

xarray/xarray/core/dataset.py

Lines 1361 to 1365 in e8be4bb

for k, v in self._variables.items():
if k in index_vars:
variables[k] = index_vars[k]
else:
variables[k] = v._copy(deep=deep, data=data.get(k), memo=memo)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atm an Index can't have attrs, at least on main testing ds.indexes['y'].attrs fails.

I added an explicit test for the coords. Lmk if you have anything else we could do for the index...

Copy link
Member

@benbovy benbovy Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the invariant checks will pass when testing this on a Dataset with a multi-index.

It might be problematic to copy coordinates variables individually when they have a common index and especially when they all wrap exactly the same index object in their ._data attribute (like in the case of a PandasMultiIndex). A safer approach would be to handle the indexed variables with something like this:

new_variables = {}

for idx, idx_vars in self.xindexes.group_by_index():
    # copy each coordinate variable of an index and drop their attrs
    temp_variables = {k: v.copy() for k, v in idx_vars.items()}
    for v in temp_variables.values():
        v.attrs = {}
    # maybe re-wrap the index object in new coordinate variables
    new_variables.update(idx.create_variables(temp_variables))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done. I don't think I've done it elegantly, but I'm a bit out of my depth and so won't try and solve perfectly on this pass...

@max-sixty
Copy link
Collaborator Author

Thanks for the review @crusaderky .

I realize it's much easier to review when there are tests; I added those

@shoyer
Copy link
Member

shoyer commented Oct 8, 2023

My main concern is the meaning of "drop attrs" could be confusing:

  • does it refer to dropping specific attrs (like the other drop methods) or all attrs? Potentially this could be controlled by an optional argument. I don't think ds.drop_attrs() would be confusing.
  • does it refer to attrs on the top level object or also attrs on coordinates/data variables? The former is what I would expect but some users might want the later, too.

Overall I think this is probably a good idea, though. Certainly I have written this sort of thing many times!

@max-sixty
Copy link
Collaborator Author

max-sixty commented Oct 8, 2023

My main concern is the meaning of "drop attrs" could be confusing:

I'm very open-minded here! My main motivation is to put less weight on keep_attrs parameter throughout the API.

What's a good way of deciding? We could do what keep_attrs does? Or add a deep param?

Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add these methods to api

xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
@max-sixty
Copy link
Collaborator Author

I've now added a deep kwarg, which controls whether it clears the attrs of all variables.

I've provisionally set it as deep=True. This is more aggressive than keep_attrs. What do we think?

@max-sixty
Copy link
Collaborator Author

@pydata/xarray I realize this is still hanging!

What's the best way of resolving? My guess is that there's consensus that the method would be useful, but maybe not on how "deep" it should go. Should we pick something and merge?

Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
When you make deep kwarg only, it's easier to add additional arguments later (see other comments)

xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
@max-sixty
Copy link
Collaborator Author

max-sixty commented Jul 11, 2024

Thanks for the ping @headtr1ck !

Any thoughts from others before we merge?

@headtr1ck headtr1ck added the plan to merge Final call for comments label Jul 11, 2024
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's try it!

(i didn't look at the tests)

@max-sixty max-sixty merged commit b8aaa53 into pydata:main Jul 11, 2024
32 of 34 checks passed
@max-sixty max-sixty deleted the drop-attrs branch July 11, 2024 18:57
dcherian added a commit to dcherian/xarray that referenced this pull request Jul 17, 2024
* main:
  Enable pandas type checking (pydata#9213)
  Per-variable specification of boolean parameters in open_dataset (pydata#9218)
  test push
  Added a space to the documentation (pydata#9247)
  Fix typing for test_plot.py (pydata#9234)
  Allow mypy to run in vscode (pydata#9239)
  Revert "Test main push"
  Test main push
  Revert "Update _typing.py"
  Update _typing.py
  Add a `.drop_attrs` method (pydata#8258)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs review plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants