Source encoding always set when opening datasets #2626

TomNicholas · 2018-12-22T00:02:26Z

Closes #2550 by ensuring that the filename is always stored in the encoding dictionary of a dataset, under 'source'. Previously this feature would be backend-dependent.

This was motivated by wanting the preprocess function passed to open_mfdataset to have access to the filename of the dataset it is operating on. A specific use case is when you are opening a spatial grid of datasets, and want to apply a different operation to datasets at the edge of the grid (e.g. to deal with guard cells). It is therefore also relevant for discussion in #2159.

Closes Include filename or path in open_mfdataset #2550
Tests added
Fully documented, including whats-new.rst for all changes and api.rst for new API

pep8speaks · 2018-12-22T00:02:31Z

Hello @TomNicholas! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 23, 2018 at 20:05 Hours UTC

…las/xarray into feature/source_encoding

This reverts commit 7858799.

shoyer

Thanks Tom! This looks pretty good, just one minor concern.

xarray/backends/api.py

dcherian · 2018-12-28T23:28:29Z

doc/whats-new.rst

@@ -61,6 +61,9 @@ Enhancements
 - :py:meth:`DataArray.resample` and :py:meth:`Dataset.resample` now supports the
  ``loffset`` kwarg just like Pandas.
  By `Deepak Cherian <https://github.com/dcherian>`_
+- Datasets are now guaranteed to have a ``'source'`` encoding, so the source


Suggested change

- Datasets are now guaranteed to have a ``'source'`` encoding, so the source

- Datasets are now guaranteed to have an ``encoding.source`` attribute, so the source

You can't do attribute lookups in encoding, so encoding.source isn't valid.

Oh right. So encoding['source'] then?

Yep, encoding['source'] would work

dcherian · 2018-12-28T23:32:07Z

I think the what's new entry could be a little clearer and it would be good to add this to the docs somewhere. Here's one option: http://xarray.pydata.org/en/stable/io.html?highlight=encoding#reading-encoded-data

LGTM otherwise

shoyer · 2018-12-30T00:24:22Z

I pushed a commit with a few doc additions, will merge shortly.

shoyer · 2018-12-30T01:00:42Z

thanks!

* master: DEP: drop python 2 support and associated ci mods (pydata#2637) TST: silence warnings from bottleneck (pydata#2638) revert to dev version DOC: fix docstrings and doc build for 0.11.1 Source encoding always set when opening datasets (pydata#2626) Add flake check to travis (pydata#2632) Fix dayofweek and dayofyear attributes from dates generated by cftime_range (pydata#2633) silence import warning (pydata#2635) fill_value in shift (pydata#2470) Flake fixed (pydata#2629) Allow passing of positional arguments in `apply` for Groupby objects (pydata#2413) Fix failure in time encoding for pandas < 0.21.1 (pydata#2630) Fix multiindex selection (pydata#2621) Close files when CachingFileManager is garbage collected (pydata#2595) added some logic to deal with rasterio objects in addition to filepaths (pydata#2589) Get 0d slices of ndarrays directly from indexing (pydata#2625) FIX Don't raise a deprecation warning for xarray.ufuncs.{angle,iscomplex} (pydata#2615) CF: also decode time bounds when available (pydata#2571)

TomNicholas added 3 commits December 21, 2018 23:42

Add source encoding if not already present when opening dataset

bcfd759

Test source encoding present

dfc55b1

Updated what's new

7858799

TomNicholas and others added 4 commits December 22, 2018 00:02

Merge branch 'real_master' into feature/source_encoding

09f38d0

Merge branch 'master' into feature/source_encoding

2525068

Merge branch 'feature/source_encoding' of https://github.com/TomNicho…

e365047

…las/xarray into feature/source_encoding

Revert "Updated what's new"

bcc5968

This reverts commit 7858799.

shoyer reviewed Dec 23, 2018

View reviewed changes

xarray/backends/api.py Outdated Show resolved Hide resolved

xarray/backends/api.py Outdated Show resolved Hide resolved

TomNicholas and others added 3 commits December 23, 2018 19:57

Don't close file-like objects

8d62c51

Updated whats's new

d753780

Merge branch 'master' into feature/source_encoding

0e8d497

dcherian reviewed Dec 28, 2018

View reviewed changes

shoyer added 2 commits December 29, 2018 16:12

Merge branch 'master' into feature/source_encoding

d879107

DOC: document source encoding for datasets

f76593e

shoyer merged commit 250b19c into pydata:master Dec 30, 2018

TomNicholas deleted the feature/source_encoding branch December 31, 2018 16:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source encoding always set when opening datasets #2626

Source encoding always set when opening datasets #2626

TomNicholas commented Dec 22, 2018

pep8speaks commented Dec 22, 2018 •

edited

Loading

shoyer left a comment

dcherian Dec 28, 2018

shoyer Dec 28, 2018

dcherian Dec 28, 2018

shoyer Dec 29, 2018

dcherian commented Dec 28, 2018

shoyer commented Dec 30, 2018

shoyer commented Dec 30, 2018

	- Datasets are now guaranteed to have a ``'source'`` encoding, so the source
	- Datasets are now guaranteed to have an ``encoding.source`` attribute, so the source

Source encoding always set when opening datasets #2626

Source encoding always set when opening datasets #2626

Conversation

TomNicholas commented Dec 22, 2018

pep8speaks commented Dec 22, 2018 • edited Loading

Comment last updated on December 23, 2018 at 20:05 Hours UTC

shoyer left a comment

Choose a reason for hiding this comment

dcherian Dec 28, 2018

Choose a reason for hiding this comment

shoyer Dec 28, 2018

Choose a reason for hiding this comment

dcherian Dec 28, 2018

Choose a reason for hiding this comment

shoyer Dec 29, 2018

Choose a reason for hiding this comment

dcherian commented Dec 28, 2018

shoyer commented Dec 30, 2018

shoyer commented Dec 30, 2018

pep8speaks commented Dec 22, 2018 •

edited

Loading