add info method to dataset #1176

jhamman · 2016-12-21T05:23:22Z

I don't know if this is exactly what we want but here's an idea that emulates ncdump -h. I'm sure people will have thoughts on the implementation and output so I'll just throw this first cut up and let people discuss.

closes #1150
fixes #244

xref: #1044, #820

max-sixty · 2016-12-21T05:45:20Z

Nice!

Why pass a buffer into that function though? Why not return a string and the user do what they want with it?

jhamman · 2016-12-21T05:52:21Z

Good question. I tried to merge the output format from ncdump -h with the api of Pandas.DataFrame.info. I'm certainly not tied to the return format I choose.

shoyer · 2016-12-21T07:17:19Z

I never noticed that DataFrame.info() doesn't actually have a return value. It does seems strange to write to a buffer taken as an argument but I can see one reason why it sort of makes sense -- otherwise you get quotation marks printed around the returned string.

max-sixty · 2016-12-21T17:37:34Z

Yes interesting, I didn't know that either.
IIRC there are also methods that are ipython & jupyter specific, that can return repl & notebook outputs

fmaussion · 2016-12-21T18:30:09Z

Thanks @jhamman !

About the name: what about just .info(), or .var_info() ? the additional info (which isn't in the repr already) is more about variables, right?

jhamman · 2016-12-21T18:38:00Z

I'm also thinking .info() seems like a better name. It would be fairly trivial to add additional argument options to fill in the pandas .info() functionality.

spencerahill · 2016-12-21T22:37:42Z

doc/whats-new.rst

@@ -147,6 +147,10 @@ Enhancements
  plots (:issue:`897`). See :ref:`plotting.figsize` for more details.
  By `Stephan Hoyer <https://github.com/shoyer>`_ and
  `Fabien Maussion <https://github.com/fmaussion>`_.
+- New :py:meth:`~Dataset.attr_info` method to summarize ``Dataset`` variables
+  and attributes. The method produces a stirng output similar to what the


'stirng' typo

jhamman · 2016-12-22T00:20:35Z

Tests are passing on Python 2 now.

@shoyer - how do you feel about info() for the method name?

shoyer · 2016-12-22T00:27:49Z

I'm pretty happy with .info(), as long as that indicates that we aren't wedded to a display matching ncdump -h. In particular displaying the size of each variable in memory in appropriate units could be a nice addition (for later, doesn't need to be in this PR).

shoyer

Looks pretty good to me, ju

shoyer · 2016-12-22T00:30:25Z

xarray/core/dataset.py

+            lines.append('\t:{k} = {v} ;'.format(k=k, v=v))
+        lines.append('}')
+
+        formatting._put_lines(buf, lines)


I would probably just use buf.write(u'\n'.join(lines)) here, which will work as long as you ensure all elements in lines are the same (unicode/str) type.

shoyer · 2016-12-22T00:32:01Z

xarray/core/dataset.py

+        lines.append('xarray.Dataset {')
+        lines.append('dimensions:')
+        for name, size in self.dims.items():
+            lines.append('\t{name} = {size} ;'.format(name=name, size=size))


I think these probably should all be unicode (using u literals), otherwise this will break for non-ASCII characters on Python 2. Take a look at what things look like in formatting.py.

It would also be good to add a test for attributes with non-ASCII values.

shoyer · 2016-12-22T00:32:23Z

xarray/test/test_dataset.py

@@ -12,6 +12,10 @@
    import dask.array as da
 except ImportError:
    pass
+try:
+    from io import StringIO


we might put this in in pycompat instead

shoyer · 2016-12-23T04:47:17Z

xarray/core/dataset.py

+            lines.append(u'\t:{k} = {v} ;'.format(k=k, v=v))
+        lines.append(u'}')
+
+        lines = [ensure_valid_repr(line) for line in lines]


I don't think we need this. I think you can write unicode to string buffers, even on Python 2 (at least it works for sys.stdout and StringIO).

Without it, we get:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 363: ordinal not in range(128)

I've always had a tough time getting the string/bytes/unicode stuff straight so I'm open to other ideas here.

Ah, the issue is that you're using StringIO from cStringIO, which only handles bytes, not unicode. Instead, use StringIO from io, on both Python 2 and 3 (no need for the separate compatibility module even).

jhamman · 2016-12-23T17:32:03Z

@shoyer --- green.

add attr_info method to dataset, first pass

50d15ea

fix py2 string bug

84ec045

spencerahill reviewed Dec 21, 2016

View reviewed changes

change name to ds.info and fix py2 string issue

49bb76c

jhamman changed the title ~~add attr_info method to dataset~~ add info method to dataset Dec 22, 2016

shoyer reviewed Dec 22, 2016

View reviewed changes

Joe Hamman added 2 commits December 22, 2016 20:18

cleanup after @shoyer's review

7c4c4e0

add tabs to example

2b56678

shoyer reviewed Dec 23, 2016

View reviewed changes

jhamman mentioned this pull request Dec 23, 2016

Things to complete before releasing xarray v0.9.0 #1167

Closed

4 tasks

fix unicode error on python2

74df621

shoyer approved these changes Dec 23, 2016

View reviewed changes

jhamman merged commit 8192190 into pydata:master Dec 23, 2016

jhamman deleted the feature/attr_info branch December 23, 2016 17:36

fmaussion mentioned this pull request Oct 13, 2017

html repr of xarray object (for the notebook) #1627

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add info method to dataset #1176

add info method to dataset #1176

jhamman commented Dec 21, 2016

max-sixty commented Dec 21, 2016

jhamman commented Dec 21, 2016

shoyer commented Dec 21, 2016

max-sixty commented Dec 21, 2016

fmaussion commented Dec 21, 2016

jhamman commented Dec 21, 2016

spencerahill Dec 21, 2016

jhamman commented Dec 22, 2016

shoyer commented Dec 22, 2016

shoyer left a comment

shoyer Dec 22, 2016

shoyer Dec 22, 2016

shoyer Dec 22, 2016

shoyer Dec 22, 2016

shoyer Dec 23, 2016

jhamman Dec 23, 2016 •

edited

Loading

shoyer Dec 23, 2016

jhamman commented Dec 23, 2016

add info method to dataset #1176

add info method to dataset #1176

Conversation

jhamman commented Dec 21, 2016

max-sixty commented Dec 21, 2016

jhamman commented Dec 21, 2016

shoyer commented Dec 21, 2016

max-sixty commented Dec 21, 2016

fmaussion commented Dec 21, 2016

jhamman commented Dec 21, 2016

spencerahill Dec 21, 2016

Choose a reason for hiding this comment

jhamman commented Dec 22, 2016

shoyer commented Dec 22, 2016

shoyer left a comment

Choose a reason for hiding this comment

shoyer Dec 22, 2016

Choose a reason for hiding this comment

shoyer Dec 22, 2016

Choose a reason for hiding this comment

shoyer Dec 22, 2016

Choose a reason for hiding this comment

shoyer Dec 22, 2016

Choose a reason for hiding this comment

shoyer Dec 23, 2016

Choose a reason for hiding this comment

jhamman Dec 23, 2016 • edited Loading

Choose a reason for hiding this comment

shoyer Dec 23, 2016

Choose a reason for hiding this comment

jhamman commented Dec 23, 2016

jhamman Dec 23, 2016 •

edited

Loading