Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source encoding always set when opening datasets #2626

Merged
merged 12 commits into from
Dec 30, 2018
Merged

Source encoding always set when opening datasets #2626

merged 12 commits into from
Dec 30, 2018

Conversation

TomNicholas
Copy link
Member

Closes #2550 by ensuring that the filename is always stored in the encoding dictionary of a dataset, under 'source'. Previously this feature would be backend-dependent.

This was motivated by wanting the preprocess function passed to open_mfdataset to have access to the filename of the dataset it is operating on. A specific use case is when you are opening a spatial grid of datasets, and want to apply a different operation to datasets at the edge of the grid (e.g. to deal with guard cells). It is therefore also relevant for discussion in #2159.

@pep8speaks
Copy link

pep8speaks commented Dec 22, 2018

Hello @TomNicholas! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 23, 2018 at 20:05 Hours UTC

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tom! This looks pretty good, just one minor concern.

xarray/backends/api.py Outdated Show resolved Hide resolved
xarray/backends/api.py Outdated Show resolved Hide resolved
@@ -61,6 +61,9 @@ Enhancements
- :py:meth:`DataArray.resample` and :py:meth:`Dataset.resample` now supports the
``loffset`` kwarg just like Pandas.
By `Deepak Cherian <https://github.com/dcherian>`_
- Datasets are now guaranteed to have a ``'source'`` encoding, so the source
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Datasets are now guaranteed to have a ``'source'`` encoding, so the source
- Datasets are now guaranteed to have an ``encoding.source`` attribute, so the source

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do attribute lookups in encoding, so encoding.source isn't valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. So encoding['source'] then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, encoding['source'] would work

@dcherian
Copy link
Contributor

I think the what's new entry could be a little clearer and it would be good to add this to the docs somewhere. Here's one option: http://xarray.pydata.org/en/stable/io.html?highlight=encoding#reading-encoded-data

LGTM otherwise

@shoyer
Copy link
Member

shoyer commented Dec 30, 2018

I pushed a commit with a few doc additions, will merge shortly.

@shoyer shoyer merged commit 250b19c into pydata:master Dec 30, 2018
@shoyer
Copy link
Member

shoyer commented Dec 30, 2018

thanks!

@TomNicholas TomNicholas deleted the feature/source_encoding branch December 31, 2018 16:53
dcherian pushed a commit to yohai/xarray that referenced this pull request Jan 2, 2019
* master:
  DEP: drop python 2 support and associated ci mods (pydata#2637)
  TST: silence warnings from bottleneck (pydata#2638)
  revert to dev version
  DOC: fix docstrings and doc build for 0.11.1
  Source encoding always set when opening datasets (pydata#2626)
  Add flake check to travis (pydata#2632)
  Fix dayofweek and dayofyear attributes from dates generated by cftime_range (pydata#2633)
  silence import warning (pydata#2635)
  fill_value in shift (pydata#2470)
  Flake fixed (pydata#2629)
  Allow passing of positional arguments in `apply` for Groupby objects (pydata#2413)
  Fix failure in time encoding for pandas < 0.21.1 (pydata#2630)
  Fix multiindex selection (pydata#2621)
  Close files when CachingFileManager is garbage collected (pydata#2595)
  added some logic to deal with rasterio objects in addition to filepaths (pydata#2589)
  Get 0d slices of ndarrays directly from indexing (pydata#2625)
  FIX Don't raise a deprecation warning for xarray.ufuncs.{angle,iscomplex} (pydata#2615)
  CF: also decode time bounds when available (pydata#2571)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants