-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep the original ordering of the coordinates #4409
Conversation
Could we keep them in sorted order like a normal |
thanks for the hint, sorting in Edit: that breaks two of the |
Though I was thinking we wouldn't sort it; we'd take the order as given
I reckon it's fine, it's such a corner case, and it'll only affect these doctests |
Can you give an example of a non-deterministic coordinate order? That sounds surprising to me, given that on Python 3.6+ dictionaries preserve insertion order. My preference would be not to sort mappings automatically, either in It's true that this is only a guarantee on Python 3.7+, but both CPython 3.6 and all versions of pypy 3 preserve dict insertion order, so in practice we can pretty much always guarantee this. (And soon, Python 3.7 will be required for xarray.) |
sure, just try running
Not sure if that's evidence against this (maybe we need to rearrange the |
Here's another example that yields non-deterministic coordinate order, which propagates into a plot title when selection is done on the coordinates. When I run the code below, the title is sometimes This is in a new conda environment that I created using the command I think the non-determinism is coming from the command
Output of xr.show_versions()INSTALLED VERSIONScommit: None xarray: 0.16.0 |
I think that's still deterministic (although I agree that the change in the title is annoying): if you run your script multiple times, it will always return the same output. The issue I'm trying to fix here is much worse: the output of the same code will return different results with each run. Edit: but maybe I just have a messed up environment |
I disagree that this is deterministic. If I run the script multiple times, the plot title varies, and I consider the plot title part of the output. I have jupyter notebooks that create figures and use this code idiom. If I refactor code of mine that is used by these notebooks, I would like to rerun the notebooks to confirm that the notebook results don't change. Having the plot titles change at random complicates this comparison. I think sorting the coordinates would avoid this difficulty that I encounter. |
yes, sorry, you're right, that is indeed non-deterministic, and exactly the problem I'm having with the doctests. Interestingly, this seems to be deterministic per python session: running import xarray as xr
ds = xr.Dataset(
{"a": (("x", "y"), [[0, 1], [2, 3]])}, coords={"x": ["a", "b"], "y": [0, 1]}
)
print(ds)
print(ds + 1)
print(ds)
print(ds + 1) will print the same result for |
The plotting thing could be fixed relatively easily here: xarray/xarray/core/dataarray.py Lines 2805 to 2832 in 23dc2fc
|
OK, my guess is that this is happening because there is someplace where we iterate over a Python |
unfortunately, that's pretty difficult: the set in question is |
One way to fix this is to iterate over variables instead of for k in self._coord_names:
... use: for k in self._variables:
if k in self._coord_names:
... I believe we already use this trick in a few places for exactly this reason. |
43a0fae
to
d53db2a
Compare
thanks, that almost fixed the doctests. However, |
What do you mean by "mix up"? Also see #2811 |
"mix up" in this case means that the order is random (i.e. a |
the tests pass so this should be ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @keewis
Co-authored-by: Deepak Cherian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
@kmuehlbauer, I modified Edit: at least #4072 does not seem to be fixed yet. |
@keewis I'm traveling currently, will check on Monday and come back to you. |
thanks for checking! |
should we merge this before releasing 0.16.1? |
LGTM 👍 |
…pagate-attrs * 'propagate-attrs' of github.com:dcherian/xarray: (22 commits) silence sphinx warnings about broken rst (pydata#4448) Xarray open_mfdataset with engine Zarr (pydata#4187) Fix release notes formatting (pydata#4443) fix typo in io.rst (pydata#4250) Fix typo (pydata#4181) Fix release notes typo New whatsnew section Add notes re doctests (pydata#4440) Fixed dask.optimize on datasets (pydata#4438) Release notes for 0.16.1 (pydata#4435) Small updates to How-to-release + lint (pydata#4436) Fix doctests (pydata#4439) add a ci for doctests (pydata#4437) preserve original dimension, coordinate and variable order in ``concat`` (pydata#4419) Fix for h5py deepcopy issues (pydata#4426) Keep the original ordering of the coordinates (pydata#4409) Clearer Vectorized Indexing example (pydata#4433) Revert "Fix optimize for chunked DataArray (pydata#4432)" (pydata#4434) Fix optimize for chunked DataArray (pydata#4432) fix doc dataarray to netcdf (pydata#4424) ...
In #4408, The formatting of
coords
turned out to be non-deterministic. This sorts the data variables, coordinates and dimensions without coordinates sections as well as the dimensions summary ofDataset
objects.No tests, yet, because I'm not quite sure sorting using the
str
representation of the key is the best way to make the formatting deterministic (so I'd appreciate reviews).It might also be good to document somewhere that these sections are now sorted, and that it only makes sense to look at the order in the dimension summary of
DataArray
andVariable
objects.isort . && black . && mypy . && flake8
whats-new.rst