Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a simple nbytes representation in DataArrays and Dataset repr #8702

Merged

Conversation

etienneschalk
Copy link
Contributor

@etienneschalk etienneschalk commented Feb 4, 2024

Edit: in contrary to what the title suggest, this is not an opt-in feature, it is enabled by default

  • Closes Add nbytes to repr? #8690
    • (or at least is a proposal)
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

@etienneschalk etienneschalk marked this pull request as ready for review February 4, 2024 16:39
@dcherian
Copy link
Contributor

dcherian commented Feb 4, 2024

Thanks @etienneschalk . Very nice! However I don't think we need yet another global option for the repr. This takes up very little extra space so we can just show it always.

@etienneschalk
Copy link
Contributor Author

Thanks @etienneschalk . Very nice! However I don't think we need yet another global option for the repr. This takes up very little extra space so we can just show it always.

Hello,
Thanks for the review!
I originally added the feature flag to not have to change existing tests, also for the feature to be an opt-in. After having enabled it, I noticed that many tests and doctests are failing due to the change in repr (looking for string <xarray. in the code yields 800+ results.
Before making these many changes, maybe the new representation should be rediscussed and consolidated and validated by others (not to have to do these changes twice)

@etienneschalk etienneschalk force-pushed the feature/eschalk/issue-8690-nbytes-repr branch from c1dab61 to 859daea Compare February 5, 2024 22:48
@etienneschalk
Copy link
Contributor Author

etienneschalk commented Feb 5, 2024

Procedure used:

pytest --doctest-modules xarray/backends/api.py --accept
pytest --doctest-modules xarray/backends/common.py --accept
pytest --doctest-modules xarray/backends/file_manager.py --accept
pytest --doctest-modules xarray/backends/h5netcdf_.py --accept
pytest --doctest-modules xarray/backends/locks.py --accept
pytest --doctest-modules xarray/backends/lru_cache.py --accept
pytest --doctest-modules xarray/backends/memory.py --accept
pytest --doctest-modules xarray/backends/netcdf3.py --accept
pytest --doctest-modules xarray/backends/netCDF4_.py --accept
pytest --doctest-modules xarray/backends/plugins.py --accept
pytest --doctest-modules xarray/backends/pydap_.py --accept
pytest --doctest-modules xarray/backends/pynio_.py --accept
pytest --doctest-modules xarray/backends/scipy_.py --accept
pytest --doctest-modules xarray/backends/store.py --accept
pytest --doctest-modules xarray/backends/zarr.py --accept
pytest --doctest-modules xarray/coding/calendar_ops.py --accept
pytest --doctest-modules xarray/coding/cftimeindex.py --accept
pytest --doctest-modules xarray/coding/cftime_offsets.py --accept
pytest --doctest-modules xarray/coding/frequencies.py --accept
pytest --doctest-modules xarray/coding/strings.py --accept
pytest --doctest-modules xarray/coding/times.py --accept
pytest --doctest-modules xarray/coding/variables.py --accept
pytest --doctest-modules xarray/conventions.py --accept
pytest --doctest-modules xarray/convert.py --accept
pytest --doctest-modules xarray/core/accessor_dt.py --accept
pytest --doctest-modules xarray/core/accessor_str.py --accept
pytest --doctest-modules xarray/core/_aggregations.py --accept
pytest --doctest-modules xarray/core/alignment.py --accept
pytest --doctest-modules xarray/core/arithmetic.py --accept
pytest --doctest-modules xarray/core/combine.py --accept
pytest --doctest-modules xarray/core/common.py --accept
pytest --doctest-modules xarray/core/computation.py --accept
pytest --doctest-modules xarray/core/concat.py --accept
pytest --doctest-modules xarray/core/coordinates.py --accept
pytest --doctest-modules xarray/core/dask_array_ops.py --accept
pytest --doctest-modules xarray/core/daskmanager.py --accept
pytest --doctest-modules xarray/core/dataarray.py --accept
pytest --doctest-modules xarray/core/dataset.py --accept
pytest --doctest-modules xarray/core/dtypes.py --accept
pytest --doctest-modules xarray/core/duck_array_ops.py --accept
pytest --doctest-modules xarray/core/extensions.py --accept
pytest --doctest-modules xarray/core/formatting_html.py --accept
pytest --doctest-modules xarray/core/formatting.py --accept
pytest --doctest-modules xarray/core/groupby.py --accept
pytest --doctest-modules xarray/core/indexes.py --accept
pytest --doctest-modules xarray/core/indexing.py --accept
pytest --doctest-modules xarray/core/merge.py --accept
pytest --doctest-modules xarray/core/missing.py --accept
pytest --doctest-modules xarray/core/nanops.py --accept
pytest --doctest-modules xarray/core/npcompat.py --accept
pytest --doctest-modules xarray/core/nputils.py --accept
pytest --doctest-modules xarray/core/ops.py --accept
pytest --doctest-modules xarray/core/options.py --accept
pytest --doctest-modules xarray/core/parallelcompat.py --accept
pytest --doctest-modules xarray/core/parallel.py --accept
pytest --doctest-modules xarray/core/pdcompat.py --accept
pytest --doctest-modules xarray/core/pycompat.py --accept
pytest --doctest-modules xarray/core/resample_cftime.py --accept
pytest --doctest-modules xarray/core/resample.py --accept
pytest --doctest-modules xarray/core/rolling_exp.py --accept
pytest --doctest-modules xarray/core/rolling.py --accept
pytest --doctest-modules xarray/core/_typed_ops.py --accept
pytest --doctest-modules xarray/core/types.py --accept
pytest --doctest-modules xarray/core/utils.py --accept
pytest --doctest-modules xarray/core/variable.py --accept
pytest --doctest-modules xarray/core/weighted.py --accept
pytest --doctest-modules xarray/datatree_/conftest.py --accept
pytest --doctest-modules xarray/datatree_/datatree/common.py --accept
pytest --doctest-modules xarray/datatree_/datatree/datatree.py --accept
pytest --doctest-modules xarray/datatree_/datatree/extensions.py --accept
pytest --doctest-modules xarray/datatree_/datatree/formatting_html.py --accept
pytest --doctest-modules xarray/datatree_/datatree/formatting.py --accept
pytest --doctest-modules xarray/datatree_/datatree/io.py --accept
pytest --doctest-modules xarray/datatree_/datatree/iterators.py --accept
pytest --doctest-modules xarray/datatree_/datatree/mapping.py --accept
pytest --doctest-modules xarray/datatree_/datatree/ops.py --accept
pytest --doctest-modules xarray/datatree_/datatree/render.py --accept
pytest --doctest-modules xarray/datatree_/datatree/testing.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/conftest.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_dataset_api.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_datatree.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_extensions.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_formatting_html.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_formatting.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_io.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_mapping.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_treenode.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_version.py --accept
pytest --doctest-modules xarray/datatree_/datatree/treenode.py --accept
pytest --doctest-modules xarray/datatree_/docs/source/conf.py --accept
pytest --doctest-modules xarray/namedarray/_aggregations.py --accept
pytest --doctest-modules xarray/namedarray/_array_api.py --accept
pytest --doctest-modules xarray/namedarray/core.py --accept
pytest --doctest-modules xarray/namedarray/dtypes.py --accept
pytest --doctest-modules xarray/namedarray/_typing.py --accept
pytest --doctest-modules xarray/namedarray/utils.py --accept
pytest --doctest-modules xarray/plot/accessor.py --accept
pytest --doctest-modules xarray/plot/dataarray_plot.py --accept
pytest --doctest-modules xarray/plot/dataset_plot.py --accept
pytest --doctest-modules xarray/plot/facetgrid.py --accept
pytest --doctest-modules xarray/plot/utils.py --accept
pytest --doctest-modules xarray/testing/assertions.py --accept
pytest --doctest-modules xarray/testing/strategies.py --accept
pytest --doctest-modules xarray/tutorial.py --accept
pytest --doctest-modules xarray/util/deprecation_helpers.py --accept
pytest --doctest-modules xarray/util/generate_aggregations.py --accept
pytest --doctest-modules xarray/util/generate_ops.py --accept
pytest --doctest-modules xarray/util/print_versions.py --accept

Finally the backslash-removing commit is reverted

@max-sixty
Copy link
Collaborator

Hi @etienneschalk — this looks really good — thank you!

The only issue I see is that the Windows tests seem to have different values for the size in the repr tests, for example:

  - <xarray.Dataset> Size: 1kB
  ?                        ^^
  + <xarray.Dataset> Size: 640B
  ?                        ^^^

Is anyone familiar with what's going on here?

I guess we could exclude those in windows if there isn't an easily reconcilable approach...

@keewis
Copy link
Collaborator

keewis commented Feb 6, 2024

different default dtypes (e.g. float32 instead of float64), I'd assume. The easiest fix would be to hard-code the dtypes, although that makes the code example a bit more verbose.

As for the formatting, is there a reason why the size is on the same line? Now that it is prefixed with Size:, we could just as well move it on a new line. Additionally, since this is the combined size (the sum of coordinates and data variables for datasets), maybe we should call it Total size or something similar? (to be clear, I'm hoping for a discussion, not requesting an immediate change)

@max-sixty
Copy link
Collaborator

As for the formatting, is there a reason why the size is on the same line? Now that it is prefixed with Size:, we could just as well move it on a new line.

In general I'm a fan of more concise reprs, but no strong view. I agree the current version doesn't have perfect aesthetics.

Additionally, since this is the combined size (the sum of coordinates and data variables for datasets), maybe we should call it Total size or something similar? (to be clear, I'm hoping for a discussion, not requesting an immediate change)

I figured that having it at the dataset level meant that it referred to the whole dataset. But again, no strong view!

@etienneschalk
Copy link
Contributor Author

Hello,

About windows failing tests

The only issue I see is that the Windows tests seem to have different values for the size in the repr tests, for example:

Regarding the failing tests on windows, I had a similar issue recently, and a workaround is to use a "non-default" dtype that triggers the rendering of the dtype on the numpy array representation.
However, to avoid these kind of tricky workarounds, the best would be to force numpy to show the dtype in the array's reprs, at least in a testing context to produce repeatable outputs. I made a quick search but could not find a way. Here is what I would like to do with numpy: forcing the printing of the dtype in the representation, something like:

import numpy


np.set_printoptions(dtype="always")

About conciseness of the repr

In general I'm a fan of more concise reprs, but no strong view. I agree the current version doesn't have perfect aesthetics.

Regarding aesthetics: I agree that having a new line would look way more clean, and decluttering the header on the way

<xarray.Dataset>
Total size: 640B
...

However it would be at the cost of a newline, the question is if this newline acceptable, as it is an "uncompressible cost" (the rest of the repr changes until now only added info on existing lines). Personnally, I would prefer the newline too, just to keep the aesthetics.

Here is what it would looks like:

(A)

<xarray.Dataset>
Dimensions:  (x: 1)
Total size: 16B
Coordinates:
	y        (x) int64 8B dask.array<chunksize=(1,), meta=np.ndarray>
Dimensions without coordinates: x
Data variables:
	a        (x) int64 8B dask.array<chunksize=(1,), meta=np.ndarray>

(total size after dimensions, consistent with the inline repr)

or

(B)

<xarray.Dataset>
Total size: 16B
Dimensions:  (x: 1)
Coordinates:
	y        (x) int64 8B dask.array<chunksize=(1,), meta=np.ndarray>
Dimensions without coordinates: x
Data variables:
	a        (x) int64 8B dask.array<chunksize=(1,), meta=np.ndarray>

(total size before dimensions, not consistent with the inline repr, but keep dims and coords grouped)

About the definition of the size and total size of a DataArray

⚠️ This may be out of scope of the required change, as in both Dataset and DataArray cases, the nbytes attribute is used. But this may seem surprising from a user perspective. (maybe for a future issue)

Here is an example of the output of a DataArray then Dataset repr with the proposed change:

In [10]: xda = xr.DataArray([[1,2,3],[4,5,6]], coords = {"y": [40, 60], "x": [700, 800, 900]})

In [11]: xda
Out[11]: 
<xarray.DataArray (y: 2, x: 3)> Size: 48B
array([[1, 2, 3],
       [4, 5, 6]])
Coordinates:
  * y        (y) int64 16B 40 60
  * x        (x) int64 24B 700 800 900

In [12]: xr.Dataset({"var": xda})
Out[12]: 
<xarray.Dataset> Size: 88B
Dimensions:  (y: 2, x: 3)
Coordinates:
  * y        (y) int64 16B 40 60
  * x        (x) int64 24B 700 800 900
Data variables:
    var      (y, x) int64 48B 1 2 3 4 5 6

48//6 == 8, this corresponds to the size of the data for the DataArray. However, the coordinates y and x also brings extra weight, that is not well represented here. For the Dataset however, containing the same DataArray, the nbytes also takes into account the coordinates: 16 + 24 + 48 == 88

The doc indicates:

xarray.DataArray.nbytes
Total bytes consumed by the elements of this DataArray’s data.

xarray.Dataset.nbytes
Total bytes consumed by the data arrays of all variables in this dataset.

The distinction is not perfectly clear to me, indeed, a DataArray can also group other DataArrays (the coordinates)

In [14]: xda.x
Out[14]: 
<xarray.DataArray 'x' (x: 3)> Size: 24B
array([700, 800, 900])
Coordinates:
  * x        (x) int64 24B 700 800 900

@max-sixty
Copy link
Collaborator

Regarding the failing tests on windows, I had a similar issue recently, and a workaround is to use a "non-default" dtype that triggers the rendering of the dtype on the numpy array representation.

I think that's reasonable for the moment.

Tbc, it looks like it's not just the rendering — the actual dtype looks to be different, such that the size is correctly reported differently. So we need to make the dtypes be the same on windows. Does that make sense?

The tests are failing and should pass if this is done correctly.

@etienneschalk
Copy link
Contributor Author

About Windows

Tbc, it looks like it's not just the rendering — the actual dtype looks to be different, such that the size is correctly reported differently. So we need to make the dtypes be the same on windows. Does that make sense?

It seems Ubuntu + macOS defaults to 64 bits while Windows defaults on 32 bits. This default depends on the OS, I don't know if it's a "max" (eg if we could not use 64 bits on the Windows machines used by the CI because they have 32-bit OSes). I think it's worth trying to set the dtype to 64 bits at first, and see if it still fails.

Indeed I looked for dtype=np.int64 in the tests, and the occurences are tests that don't have any "skipif windows" decorator. However, the issue would move from the actual size to the , dtype=int64 suffix in the numpy array repr.

Another option is to use the ON_WINDOWS to make the 9 failing tests dependant on the OS. It is not the cleanest for sure

I guess we could exclude those in windows if there isn't an easily reconcilable approach...

This is definitely the least effort, but excluding 9 tests for this PR seems overkill

This is the function used for numpy array representations. We can see the logic where it adds the suffix, there is no way to force print the dtype, or force not printing it.

https://github.com/numpy/numpy/blob/d35cd07ea997f033b2d89d349734c61f5de54b0d/numpy/core/arrayprint.py#L1487

What would solve this repeatable output issue would be to allow overriding this param:

def _array_repr_implementation(
        arr, max_line_width=None, precision=None, suppress_small=None,
+       skipdtype: bool | None = None,       
        array2string=array2string):
        ...
- skipdtype = dtype_is_implied(arr.dtype) and arr.size > 0
+ if skipdtype is None:
+     skipdtype = dtype_is_implied(arr.dtype) and arr.size > 0

About the basicness of the current size printing algorithm

Also, the current algorithm to print a human readable size is basic, and never shows decimal numbers. Maybe a better usage of the space could be made, eg:

Max used space: 5 letters
999kB 

What could be improved is:

9kB 
vvvvv
9.9kB  (use 5 letter space)

Indeed, the current size is not reliable and is just an estimation that should not replace the integer value returned by nbytes 🤔

@max-sixty
Copy link
Collaborator

I don't really understand why the tests fail. It seems that on windows, 3 int64 values take up 48 bytes of space??

https://github.com/pydata/xarray/actions/runs/7792432615/job/21250418347?pr=8702#step:9:419

 self = <xarray.tests.test_dataarray.TestDataArray object at 0x00000234F4DB19D0>

    def test_repr(self) -> None:
        v = Variable(["time", "x"], [[1, 2, 3], [4, 5, 6]], {"foo": "bar"})
        coords = {"x": np.arange(3, dtype=np.int64), "other": np.int64(0)}
        data_array = DataArray(v, coords, name="my_variable")
        expected = dedent(
            """\
            <xarray.DataArray 'my_variable' (time: 2, x: 3)> Size: 48B
            array([[1, 2, 3],
                   [4, 5, 6]])
            Coordinates:
              * x        (x) int64 24B 0 1 2
                other    int64 8B 0
            Dimensions without coordinates: time
            Attributes:
                foo:      bar"""
        )
>       assert expected == repr(data_array)
E       AssertionError: assert '<xarray.Data...foo:      bar' == '<xarray.Data...foo:      bar'
E         
E         Skipping 45 identical leading characters in diff, use -v to show
E         Skipping 166 identical trailing characters in diff, use -v to show
E         - 3)> Size: 24B
E         ?           -
E         + 3)> Size: 48B
E         ?            +
E           array([

Unless anyone has a better idea, I think skipping on Windows is OK.


Also, the current algorithm to print a human readable size is basic, and never shows decimal numbers. Maybe a better usage of the space could be made, eg:

I would keep it simple, and not conditionally change precision, at least for the moment.

@etienneschalk
Copy link
Contributor Author

Unless anyone has a better idea, I think skipping on Windows is OK.

https://github.com/pydata/xarray/pull/8702/files/e98a97d3085e7dc3b1bcb11ac8af012fc1acc1c4..e2db82a6d2a322e6ed18ebb0cb2bd696458b540c

For big tests, I used a skipif approach to test both Windows and non-Windows. For smaller, I added the condition in-test.

It adds many ON_WINDOWS constants, but is unavoidable, as the representation including size is OS-dependant in the current CI. (It seems win32 is not reliable to determine if we are on 32 bit OS as all windows would return win32.

Crazy how a simple "add size to repr" issue turned out to "platform-dependant shenanigans" !

Links I consulted:

I would keep it simple, and not conditionally change precision, at least for the moment.

OK!

@keewis
Copy link
Collaborator

keewis commented Feb 6, 2024

I don't really understand why the tests fail. It seems that on windows, 3 int64 values take up 48 bytes of space??

int64 has 8 bytes per element, so I agree, 3 values should be 24 bytes. However, if you look at the data it's actually a 2×3 array, and with 6 values 48 bytes makes sense. And looking at the traceback, the issue is that on windows the data only has a size of 24 bytes, which means that it is using int32 as a dtype, with 4 bytes per element. Which tells me that once again the default size is the issue.

@max-sixty
Copy link
Collaborator

int64 has 8 bytes per element, so I agree, 3 values should be 24 bytes. However, if you look at the data it's actually a 2×3 array, and with 6 values 48 bytes makes sense. And looking at the traceback, the issue is that on windows the data only has a size of 24 bytes, which means that it is using int32 as a dtype, with 4 bytes per element. Which tells me that once again the default size is the issue.

Ah, very good point. In the case above it's that this array:

        v = Variable(["time", "x"], [[1, 2, 3], [4, 5, 6]], {"foo": "bar"})

is being cast to the default size. So we can instead cast it with .astype(int64) or similar (or create from np.arange, whatever is easier.

@etienneschalk
Copy link
Contributor Author

Tests pass with a differenciation between windows and non-windows env.

I tagged test_dask_roundtrip as flaky (xfail) as it frequently failed in my previous CI runs.

Hopefully this is acceptable!

The current state of the PR don't integrate recent discussions about total size. The solution implemented is the one described from #8690 (comment) . Maybe further discussions should happen on the original issue rather than this PR

@max-sixty
Copy link
Collaborator

I would be +1 on merging.

I think the windows issues could be better handled by setting the dtype (i.e. #8702 (comment), building on @keewis 's observation), but we can also do that in another PR, and this has a large enough blast radius that it would be better to merge sooner.

Would someone else agree? (Or feel free to just hit the button...)

@etienneschalk
Copy link
Contributor Author

@max-sixty

By setting the dtype, I think that only a part of the issue would be solved as numpy would print out , dtype=np.int64, as numpy always prints out non-default dtypes. This behaviour does not seem to be able to being turned off unfortunately.

So the repr would still be different and the need to differentiate between Windows and non-Windows environments still remains 🤔

This comment on the numpy repo is interesting: numpy/numpy#9464 (comment)

@max-sixty
Copy link
Collaborator

By setting the dtype, I think that only a part of the issue would be solved as numpy would print out , dtype=np.int64, as numpy always prints out non-default dtypes. This behaviour does not seem to be able to being turned off unfortunately.

What's an xarray object that is explicitly typed that would show different reprs in linux/mac vs windows?

@max-sixty
Copy link
Collaborator

@pydata/xarray could someone second the approval here?

@max-sixty max-sixty merged commit db680b0 into pydata:main Feb 7, 2024
29 checks passed
Copy link

welcome bot commented Feb 7, 2024

Congratulations on completing your first pull request! Welcome to Xarray! We are proud of you, and hope to see you again! celebration gif

@max-sixty
Copy link
Collaborator

@etienneschalk great work!

Thanks also for the issues into pytest-accept.

I wanted to merge asap so we didn't get merge conflicts. If you're up for simplifying the formatting tests — assuming I'm not mistaken above — that would be a very nice 2nd PR...

@etienneschalk etienneschalk deleted the feature/eschalk/issue-8690-nbytes-repr branch February 7, 2024 20:51
@etienneschalk
Copy link
Contributor Author

Thanks @max-sixty!

By setting the dtype, I think that only a part of the issue would be solved as numpy would print out , dtype=np.int64, as numpy always prints out non-default dtypes. This behaviour does not seem to be able to being turned off unfortunately.

What's an xarray object that is explicitly typed that would show different reprs in linux/mac vs windows?

I detailed the issue in a previous comment #8702 (comment)

If we use dtype=np.int32, macOS and Linux will add a , dtype=np.int32 in the array repr and if we use dtype=np.int64, then Windows will add a , dtype=np.int64 in the array repr.

Also, even when providing explitly a dtype, if the dtype is a default, it won't show up in the repr. Here is an example on my machine (Linux):

numpy-only

In [3]: import numpy as np

In [4]: np.array([1,2,3])
Out[4]: array([1, 2, 3])

In [5]: np.array([1,2,3], dtype=np.int64)
Out[5]: array([1, 2, 3])

In [6]: np.array([1,2,3], dtype=np.int32)
Out[6]: array([1, 2, 3], dtype=int32)

xarray (delegating repr to numpy)

In [14]: import xarray as xr

In [15]: xr.DataArray(np.array([1,2,3]))
Out[15]: 
<xarray.DataArray (dim_0: 3)>
array([1, 2, 3])
Dimensions without coordinates: dim_0

In [16]: xr.DataArray(np.array([1,2,3], dtype=np.int64))
Out[16]: 
<xarray.DataArray (dim_0: 3)>
array([1, 2, 3])
Dimensions without coordinates: dim_0

In [17]: xr.DataArray(np.array([1,2,3], dtype=np.int32))
Out[17]: 
<xarray.DataArray (dim_0: 3)>
array([1, 2, 3], dtype=int32)
Dimensions without coordinates: dim_0

I expected the opposite on the Windows CI ; , dtype=int64 to be shown. So the only way to always get an explicit dtype repr would be to:

I wanted to merge asap so we didn't get merge conflicts. If you're up for simplifying the formatting tests — assuming I'm not mistaken above — that would be a very nice 2nd PR...

I still need to confirm all I said above by more experimentation. Actually I can add such "repr testing" in a next PR yes, to try to make a catalog of all these problematic reprs

@etienneschalk etienneschalk mentioned this pull request Feb 7, 2024
1 task
@max-sixty
Copy link
Collaborator

@etienneschalk sorry, you're completely correct. I was thinking about the Dataset repr. But you're correct that the DataArray repr just inherits from numpy.

So I don't have any strong views about better ways of doing this — the existing way seems OK. Another approach would be to just re.sub Size: \d+B in all the reprs...

@djhoese
Copy link
Contributor

djhoese commented Feb 19, 2024

Is there a reason the title of this PR can't be updated to reflect what the description was edited/updated to say: this is not opt-in.

@dcherian dcherian changed the title Add a simple nbytes representation in DataArrays and Dataset repr (opt-in) Add a simple nbytes representation in DataArrays and Dataset repr Feb 19, 2024
TomNicholas pushed a commit that referenced this pull request Feb 27, 2024
* Update the formating tests

PR (#8702) added nbytes representation in DataArrays and Dataset repr, this adds it to the datatree tests.

* Migrate treenode module

Moves treenode.py and test_treenode.py.
Updates some typing.
Updates imports from treenode.

* Update NotFoundInTreeError description.

* Reformat some comments

Add test tree structure for easier understanding.

* Updates whats-new.rst

* mypy typing. (terrible?)

There must be a better way, but I don't know it.
particularly the list comprehension casts.

* Adds __repr__ to NamedNode and updates test

This test was broken becuase only the root node was being tested and none of
the previous nodes were represented in the __str__.

* Adds quotes to NamedNode __str__ representation.

* swaps " for ' in NamedNode __str__ representation.

* Adding Tom in so he gets blamed properly.

* resolve conflict whats-new.rst

Question is I did update below the released line to give Tom some credit.  I
hope that's is allowable.

* Moves test_treenode.py to xarray/tests.

Integrated tests.

* refactors backend tests for datatree IO

* Add explicit engine back in test_to_zarr

* Removes OrderedDict from treenode

* Renames tests/test_io.py -> tests/test_backends_datatree.py

* typo

* Add types

* Pass mypy for 3.9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add nbytes to repr?
5 participants