-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add info method to dataset #1176
Conversation
Nice! Why pass a buffer into that function though? Why not return a string and the user do what they want with it? |
Good question. I tried to merge the output format from |
I never noticed that DataFrame.info() doesn't actually have a return value. It does seems strange to write to a buffer taken as an argument but I can see one reason why it sort of makes sense -- otherwise you get quotation marks printed around the returned string. |
Yes interesting, I didn't know that either. |
Thanks @jhamman ! About the name: what about just |
I'm also thinking |
@@ -147,6 +147,10 @@ Enhancements | |||
plots (:issue:`897`). See :ref:`plotting.figsize` for more details. | |||
By `Stephan Hoyer <https://github.com/shoyer>`_ and | |||
`Fabien Maussion <https://github.com/fmaussion>`_. | |||
- New :py:meth:`~Dataset.attr_info` method to summarize ``Dataset`` variables | |||
and attributes. The method produces a stirng output similar to what the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'stirng' typo
Tests are passing on Python 2 now. @shoyer - how do you feel about |
I'm pretty happy with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me, ju
lines.append('\t:{k} = {v} ;'.format(k=k, v=v)) | ||
lines.append('}') | ||
|
||
formatting._put_lines(buf, lines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably just use buf.write(u'\n'.join(lines))
here, which will work as long as you ensure all elements in lines
are the same (unicode/str) type.
lines.append('xarray.Dataset {') | ||
lines.append('dimensions:') | ||
for name, size in self.dims.items(): | ||
lines.append('\t{name} = {size} ;'.format(name=name, size=size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these probably should all be unicode (using u
literals), otherwise this will break for non-ASCII characters on Python 2. Take a look at what things look like in formatting.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be good to add a test for attributes with non-ASCII values.
@@ -12,6 +12,10 @@ | |||
import dask.array as da | |||
except ImportError: | |||
pass | |||
try: | |||
from io import StringIO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might put this in in pycompat
instead
lines.append(u'\t:{k} = {v} ;'.format(k=k, v=v)) | ||
lines.append(u'}') | ||
|
||
lines = [ensure_valid_repr(line) for line in lines] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this. I think you can write unicode to string buffers, even on Python 2 (at least it works for sys.stdout
and StringIO
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without it, we get:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 363: ordinal not in range(128)
I've always had a tough time getting the string/bytes/unicode stuff straight so I'm open to other ideas here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the issue is that you're using StringIO
from cStringIO
, which only handles bytes, not unicode. Instead, use StringIO
from io
, on both Python 2 and 3 (no need for the separate compatibility module even).
@shoyer --- green. |
I don't know if this is exactly what we want but here's an idea that emulates
ncdump -h
. I'm sure people will have thoughts on the implementation and output so I'll just throw this first cut up and let people discuss.closes #1150
fixes #244
xref: #1044, #820