-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opt out of auto creating index variables #8711
Opt out of auto creating index variables #8711
Conversation
This currently doesn't work - when I run FAILED xarray/tests/test_coordinates.py::TestCoordinates::test_init_default_index - AssertionError: {'x': <class 'xarray.core.variable.Variable'>}
FAILED xarray/tests/test_coordinates.py::TestCoordinates::test_getitem - AssertionError: {'x': <class 'xarray.core.variable.Variable'>}
FAILED xarray/tests/test_coordinates.py::TestCoordinates::test_assign - AssertionError: {'x': <class 'xarray.core.variable.Variable'>, 'y': <class 'xarray.core.variable.IndexVariable'>}
FAILED xarray/tests/test_coordinates.py::TestCoordinates::test_align - ValueError: cannot reindex or align along dimension 'x' because of conflicting dimension sizes: {2, 3} The first couple of errors seem to be because the 1D variable doesn't have a corresponding index created by default. I noticed that the local variable |
* ``Coordinates.__init__`` create default indexes ... for any input dimension coordinate, if ``indexes=None``. Also, if another ``Coordinates`` object is passed, extract its indexes and raise if ``indexes`` is not None (no align/merge supported here). * add docstring examples * fix doctests * fix tests * update what's new
after unintentionally reverted a valid previous change.
About the |
if auto_convert: | ||
if name is not None and name in obj.dims and obj.ndim == 1: | ||
# automatically convert the Variable into an Index | ||
emit_user_level_warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should remove this warning since we're not exactly sure yet what the future behavior will be?
Or at least let's make sure this warning won't be emitted in user code unless explicitly calling as_variable(..., auto_convert=True)
, i.e., in xarray internals all new variables should be converted into index variables explicitly (if needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so I tried to ensure that this warning won't be emitted in user code simply by replacing the warning with an exception and running all the tests to see where it raised. The only places it raised were:
- In
core.dataarray._check_data_shape
, which is a place where creating an index is unnecessary, so I fixed it in 6bbcc8a, - In the
TestVariable.test_as_variable
test itself, which seems like it actually should raise the warning?
Is that sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that may be enough for now.
Thanks @benbovy !
I mean I don't think that's a huge deal, but if you definitely wanted to hide it from users couldn't you make a private |
Why is |
More context in #1303 |
Me neither (I wanted to see what others think). @TomNicholas have you tried running your notebook mentioned in #8704 (comment) with this PR? Does it fix the errors raised? |
Yes I did, and I think it does! I still can't do a full |
@benbovy I would really like to get this merged as I'm using it in a new package (see zarr-developers/VirtualiZarr#42), but I'm not really sure what the steps to make this PR ready for merging are. Does this need any new tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomNicholas I think that this PR is almost good to go in.
We just need to ensure that unnecessary warnings won't get emitted all over the place (see #8711 (comment)).
Also a test for the skipped creation of index variables would be welcome (see my comment below).
var = as_variable(data, name=name, auto_convert=False) | ||
if var.dims == (name,) and indexes is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This place is (I think) the only place where a newly created "dimension coordinate" might not always be an IndexVariable (i.e., when an empty dictionary or any other value than None
is passed as the indexes
argument). It would be nice to add a test for this specific case.
The other refactored as_variable
places in this PR always create an IndexVariable for a dimension coordinate so I think it is covered by the existing tests (invariant checks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read this comment many times over, and I don't understand why this test isn't already testing the case you're asking about (i.e. passing indexes={}
to the Coordinates
constructor, hence preventing an IndexVariable
being automatically created).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to add a check in that specific test to make sure that the created "x" variable is not coerced to an IndexVariable
(before this PR it was still the case even though no xr.indexes.PandasIndex
was created from it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I added a check both explicitly and for if that warning is raised.
Can we catch these warnings if they are expected?
|
This should be done now!
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks @TomNicholas!
* main: (26 commits) [pre-commit.ci] pre-commit autoupdate (pydata#8900) Bump the actions group with 1 update (pydata#8896) New empty whatsnew entry (pydata#8899) Update reference to 'Weighted quantile estimators' (pydata#8898) 2024.03.0: Add whats-new (pydata#8891) Add typing to test_groupby.py (pydata#8890) Avoid in-place multiplication of a large value to an array with small integer dtype (pydata#8867) Check for aligned chunks when writing to existing variables (pydata#8459) Add dt.date to plottable types (pydata#8873) Optimize writes to existing Zarr stores. (pydata#8875) Allow multidimensional variable with same name as dim when constructing dataset via coords (pydata#8886) Don't allow overwriting indexes with region writes (pydata#8877) Migrate datatree.py module into xarray.core. (pydata#8789) warn and return bytes undecoded in case of UnicodeDecodeError in h5netcdf-backend (pydata#8874) groupby: Dispatch quantile to flox. (pydata#8720) Opt out of auto creating index variables (pydata#8711) Update docs on view / copies (pydata#8744) Handle .oindex and .vindex for the PandasMultiIndexingAdapter and PandasIndexingAdapter (pydata#8869) numpy 2.0 copy-keyword and trapz vs trapezoid (pydata#8865) upstream-dev CI: Fix interp and cumtrapz (pydata#8861) ...
Tries fixing #8704 by cherry-picking from #8124 as @benbovy suggested in #8704 (comment)
whats-new.rst
New functions/methods are listed inapi.rst