Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pint support for Dataset #3975

Merged
merged 60 commits into from
Jun 17, 2020
Merged

pint support for Dataset #3975

merged 60 commits into from
Jun 17, 2020

Conversation

keewis
Copy link
Collaborator

@keewis keewis commented Apr 15, 2020

This is part of the effort to add support for pint (see #3594) to Dataset objects (although it will probably be a test-only PR, just like #3643).

  • Tests added
  • Passes isort -rc . && black . && mypy . && flake8
  • Fully documented, including whats-new.rst for all changes and api.rst for new API

The list of failing tests from #3594:

  • Dataset methods
    • __init__: Needs unit support in IndexVariable, and merge does not work yet (test bug is also possible)
    • aggregation: xarray does not implement __array_function__ (see running numpy functions on xarray objects #3917)
    • rank: depends on bottleneck and thus only works with numpy.array
    • ffill, bfill: uses bottleneck
    • interpolate_na: uses numpy.vectorize, which does not support NEP-18, yet
    • equals, identical: works (but no units / unit checking in IndexVariable)
    • broadcast_like: works (but no units / unit checking in IndexVariable)
    • to_stacked_array: no units in IndexVariable
    • sel, loc: no units in IndexVariable
    • interp, reindex: partially blocked by IndexVariable. reindex works with units in data, but interp uses scipy
    • interp_like, reindex_like: same as interp / reindex
    • quantile: works, but needs pint >= 0.12
    • groupby_bins: needs pint >= 0.12 (for isclose)
    • rolling: uses numpy.lib.stride_tricks.as_strided
    • rolling_exp: uses numbagg (supports NEP-18, but pint doesn't support its functions)

@keewis keewis changed the title pint support for datasets pint support for Dataset Apr 15, 2020
@dcherian dcherian mentioned this pull request May 5, 2020
23 tasks
@keewis
Copy link
Collaborator Author

keewis commented May 27, 2020

it seems assert_allclose was one of the sources of UnitStrippedWarnings, but since there's a bug in pint's isclose (fixed on master) the tests fail now.

Edit: pint will be released in the next few days, so most of the failing CI should pass after that.

Also, because it casted to numpy, there were a few bugs that were hidden.

@dcherian
Copy link
Contributor

Thanks for working on this @keewis . Since there are no changes outside test_units.py, I think you should merge whenever you think this is ready.

@keewis
Copy link
Collaborator Author

keewis commented Jun 17, 2020

I just found another issue: pint implements prod (but not yet nanprod) so the prod tests could be un-xfailed. However, we define a custom nanprod function that uses where to replace nan with 1. This won't work on quantities since, unlike nan and 0, a bare 1 cannot be put into quantities with a dimension (i.e. with a unit other than dimensionless).

I don't really understand the purpose of nanprod's min_count parameter (and _maybe_null_out) so I'm not sure how to fix that.

For now, I think we can merge this PR on green and I'll add that issue to the list in #3594.

@dcherian
Copy link
Contributor

Looks like an incompatibility with latest pandas

____________________ TestDataset.test_resample[int-coords] _____________________

self = <xarray.tests.test_units.TestDataset object at 0x7fa84fb9e490>
variant = 'coords', dtype = <class 'int'>
    return func(*all_args, **all_kwargs)
xarray/core/common.py:1123: in resample
    grouper = pd.Grouper(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'pandas.core.groupby.grouper.Grouper'>, args = ()
kwargs = {'base': 0, 'closed': None, 'freq': '6m', 'label': None, ...}
TimeGrouper = <class 'pandas.core.resample.TimeGrouper'>, stacklevel = 2

    def __new__(cls, *args, **kwargs):
        if kwargs.get("freq") is not None:
            from pandas.core.resample import TimeGrouper
    
            # Deprecation warning of `base` and `loffset` since v1.1.0:
            # we are raising the warning here to be able to set the `stacklevel`
            # properly since we need to raise the `base` and `loffset` deprecation
            # warning from three different cases:
            #   core/generic.py::NDFrame.resample
            #   core/groupby/groupby.py::GroupBy.resample
            #   core/groupby/grouper.py::Grouper
            # raising these warnings from TimeGrouper directly would fail the test:
            #   tests/resample/test_deprecated.py::test_deprecating_on_loffset_and_base
    
            # hacky way to set the stacklevel: if cls is TimeGrouper it means
            # that the call comes from a pandas internal call of resample,
            # otherwise it comes from pd.Grouper
            stacklevel = 4 if cls is TimeGrouper else 2
            if kwargs.get("base", None) is not None:
>               warnings.warn(
                    "'base' in .resample() and in Grouper() is deprecated.\n"
                    "The new arguments that you should use are 'offset' or 'origin'.\n"
                    '\n>>> df.resample(freq="3s", base=2)\n'
                    "\nbecomes:\n"
                    '\n>>> df.resample(freq="3s", offset="2s")\n',
                    FutureWarning,
                    stacklevel=stacklevel,
                )
E               FutureWarning: 'base' in .resample() and in Grouper() is deprecated.
E               The new arguments that you should use are 'offset' or 'origin'.
E               
E               >>> df.resample(freq="3s", base=2)
E               
E               becomes:
E               
E               >>> df.resample(freq="3s", offset="2s")

@keewis
Copy link
Collaborator Author

keewis commented Jun 17, 2020

yeah, I don't know how to only filter for pint warnings, I tried pytest.mark.filterwarnings("error:::pint[.*]") but that doesn't work

Edit: pytest.mark.filterwarnings("error::pint.UnitStrippedWarning") works so I'm merging.

@keewis keewis merged commit 66e7730 into pydata:master Jun 17, 2020
@keewis keewis deleted the pint-support-dataset branch June 17, 2020 20:40
@keewis keewis mentioned this pull request Jun 17, 2020
18 tasks
dcherian added a commit to TomNicholas/xarray that referenced this pull request Jun 24, 2020
…o-combine

* 'master' of github.com:pydata/xarray: (81 commits)
  use builtin python types instead of the numpy alias (pydata#4170)
  Revise pull request template (pydata#4039)
  pint support for Dataset (pydata#3975)
  drop eccodes in docs (pydata#4162)
  Update issue templates inspired/based on dask (pydata#4154)
  Fix failing upstream-dev build & remove docs build (pydata#4160)
  Improve typehints of xr.Dataset.__getitem__ (pydata#4144)
  provide a error summary for assert_allclose (pydata#3847)
  built-in accessor documentation (pydata#3988)
  Recommend installing cftime when time decoding fails. (pydata#4134)
  parameter documentation for DataArray.sel (pydata#4150)
  speed up map_blocks (pydata#4149)
  Remove outdated note from datetime accessor docstring (pydata#4148)
  Fix the upstream-dev pandas build failure (pydata#4138)
  map_blocks: Allow passing dask-backed objects in args (pydata#3818)
  keep attrs in reset_index (pydata#4103)
  Fix open_rasterio() for WarpedVRT with specified src_crs (pydata#4104)
  Allow non-unique and non-monotonic coordinates in get_clean_interp_index and polyfit (pydata#4099)
  update numpy's intersphinx url (pydata#4117)
  xr.infer_freq (pydata#4033)
  ...
dcherian added a commit to raphaeldussin/xarray that referenced this pull request Jul 1, 2020
* upstream/master: (21 commits)
  fix typo in error message in plot.py (pydata#4188)
  Support multiple dimensions in DataArray.argmin() and DataArray.argmax() methods (pydata#3936)
  Show data by default in HTML repr for DataArray (pydata#4182)
  Blackdoc (pydata#4177)
  Add CONTRIBUTING.md for the benefit of GitHub
  Correct dask handling for 1D idxmax/min on ND data (pydata#4135)
  use assert_allclose in the aggregation-with-units tests (pydata#4174)
  Remove old auto combine (pydata#3926)
  Fix 4009 (pydata#4173)
  Limit length of dataarray reprs (pydata#3905)
  Remove <pre> from nested HTML repr (pydata#4171)
  Proposal for better error message about in-place operation (pydata#3976)
  use builtin python types instead of the numpy alias (pydata#4170)
  Revise pull request template (pydata#4039)
  pint support for Dataset (pydata#3975)
  drop eccodes in docs (pydata#4162)
  Update issue templates inspired/based on dask (pydata#4154)
  Fix failing upstream-dev build & remove docs build (pydata#4160)
  Improve typehints of xr.Dataset.__getitem__ (pydata#4144)
  provide a error summary for assert_allclose (pydata#3847)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants