-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
to_dict without data #2659
to_dict without data #2659
Conversation
Hello @rabernat! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on January 08, 2019 at 08:43 Hours UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a couple of nits
xarray/core/dataarray.py
Outdated
|
||
d.update({'data': ensure_us_time_resolution(self.values).tolist(), | ||
'name': self.name}) | ||
d['coords'].update({k: self.coords[k].variable.to_dict(data=data)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only adding one item, so just using indexing for assignment seems cleaner: d['coords'][k] = self.coords[k].variable.to_dict(data=data)
xarray/tests/test_dataarray.py
Outdated
@@ -2909,6 +2909,13 @@ def test_to_and_from_dict(self): | |||
ValueError, "cannot convert dict without the key 'data'"): | |||
DataArray.from_dict(d) | |||
|
|||
# check the data=False option | |||
expected_no_data = {**expected} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is a fancy Py3 way of doing a copy? :)
I would lean towards the more explicit expected.copy()
given that you aren't inserting any extra fields here.
It just occurred to me that it would be nice to have some extra info about the data, such as dtype. |
xarray/tests/test_dataset.py
Outdated
@@ -3045,11 +3045,20 @@ def test_to_and_from_dict(self): | |||
# check roundtrip | |||
assert_identical(ds, Dataset.from_dict(actual)) | |||
|
|||
# check the data=False option | |||
expected_no_data = expected.copy() | |||
print(expected_no_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a forgotten print
here!
II was about to ask about it ;-). What other attribute than |
Shape might also be a good idea. |
Here is what the x = np.random.randn(10)
y = np.random.randn(10)
t = list('abcdefghij')
ds = Dataset(OrderedDict([('a', ('t', x)),
('b', ('t', y)), ('t', ('t', t))]))
ds.to_dict(data=False)
The one thing I don't like about this is the empty attributes. Maybe we could change it so that |
Does this help with #2347? |
How do people feel about this? If folks are fine with it as is, then LGTM. |
I'd rather have an empty dict as in the current implementation (like xarray) |
Let's get this merged? |
---------- | ||
data : bool, optional | ||
Whether to include the actual data in the dictionary. When set to | ||
False, returns just the schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be useful to allow including the data for the coordinates but not the data variables?
I'm thinking something like
ds.to_dict(data='coords')
I think we should merge this -- @rabernat feel free to go ahead and do that. We can leave |
* master: stale requires a label (pydata#2701) Update indexing.rst (pydata#2700) add line break to message posted (pydata#2698) Config for closing stale issues (pydata#2684) to_dict without data (pydata#2659) Update asv.conf.json (pydata#2693) try no rasterio in py36 env (pydata#2691) Detailed report for testing.assert_equal and testing.assert_identical (pydata#1507) Hotfix for pydata#2662 (pydata#2678) Update README.rst (pydata#2682) Fix test failures with numpy=1.16 (pydata#2675)
* refactor-plot-utils: (22 commits) review comment. small rename stale requires a label (pydata#2701) Update indexing.rst (pydata#2700) add line break to message posted (pydata#2698) Config for closing stale issues (pydata#2684) to_dict without data (pydata#2659) Update asv.conf.json (pydata#2693) try no rasterio in py36 env (pydata#2691) Detailed report for testing.assert_equal and testing.assert_identical (pydata#1507) Hotfix for pydata#2662 (pydata#2678) Update README.rst (pydata#2682) Fix test failures with numpy=1.16 (pydata#2675) lint Back to map_dataarray_line Refactor out cmap_params, cbar_kwargs processing Refactor out colorbar making to plot.utils._add_colorbar flake8 facetgrid refactor Refactor out utility functions. ...
Too late, I found this: |
And this |
p.s. I learned about this from @kwilcox in pangeo-data/pangeo-datastore#3. He might be a good person to loop into this discussion. |
If you are interested I could implement an |
This PR provides the ability to export Datasets and DataArrays to dictionary without the actual data. This could be useful for generating indices of dataset contents to expose to search indices or other automated data discovery tools
In the process of doing this, I refactored the core dictionary export function to live in the Variable class, since the same code was duplicated in several places.
whats-new.rst
for all changes andapi.rst
for new API