Support pandas copy-on-write behaviour #8846

dcherian · 2024-03-16T03:14:46Z

Closes Get ready for pandas 3 copy-on-write #8843
Tests added

import numpy as np
import pandas as pd

pd.set_option("mode.copy_on_write", True)

from xarray.core.variable import _possibly_convert_objects

string_var = np.array(["a", "bc", "def"], dtype=object)
datetime_var = np.array(
    ["2019-01-01", "2019-01-02", "2019-01-03"], dtype="datetime64[ns]"
)
assert _possibly_convert_objects(string_var).flags.writeable
assert _possibly_convert_objects(datetime_var).flags.writeable

The core issue is that we now get read-only arrays back from pandas here:

xarray/xarray/core/variable.py

Lines 197 to 212 in fbcac76

    
           def _possibly_convert_objects(values): 
        
               """Convert arrays of datetime.datetime and datetime.timedelta objects into 
        
               datetime64 and timedelta64, according to the pandas convention. For the time 
        
               being, convert any non-nanosecond precision DatetimeIndex or TimedeltaIndex 
        
               objects to nanosecond precision.  While pandas is relaxing this in version 
        
               2.0.0, in xarray we will need to make sure we are ready to handle 
        
               non-nanosecond precision datetimes or timedeltas in our code before allowing 
        
               such values to pass through unchanged.  Converting to nanosecond precision 
        
               through pandas.Series objects ensures that datetimes and timedeltas are 
        
               within the valid date range for ns precision, as pandas will raise an error 
        
               if they are not. 
        
               """ 
        
               as_series = pd.Series(values.ravel(), copy=False) 
        
               if as_series.dtype.kind in "mM": 
        
                   as_series = _as_nanosecond_precision(as_series) 
        
               return np.asarray(as_series).reshape(values.shape)

@phofl is this expected?

Closes pydata#8843

phofl · 2024-03-16T03:17:03Z

Yes, pandas now avoids copies wherever possible, meaning that an inplace modification can modify an arbitrary number of pandas objects if done outside of pandas. That's why we return read only things (same as you would get now if you have arrow arrays). You can either copy or reset the flag manually if you want to get rid of that

xarray/tests/__init__.py

dcherian · 2024-03-18T16:00:07Z

Merging so we can get more useful upstream failure reports.

* upstream/main: (765 commits) increase typing annotations coverage in `xarray/core/indexing.py` (pydata#8857) pandas 3 MultiIndex fixes (pydata#8847) FIX: adapt handling of copy keyword argument in scipy backend for numpy >= 2.0dev (pydata#8851) FIX: do not cast _FillValue/missing_value in CFMaskCoder if _Unsigned is provided (pydata#8852) Implement setitem syntax for `.oindex` and `.vindex` properties (pydata#8845) Support pandas copy-on-write behaviour (pydata#8846) correctly encode/decode _FillValues/missing_values/dtypes for packed data (pydata#8713) Expand use of `.oindex` and `.vindex` (pydata#8790) Return a dataclass from Grouper.factorize (pydata#8777) [skip-ci] Fix upstream-dev env (pydata#8839) Add dask-expr for windows envs (pydata#8837) [skip-ci] Add dask-expr dependency to doc.yml (pydata#8835) Add `dask-expr` to environment-3.12.yml (pydata#8827) Make list_chunkmanagers more resilient to broken entrypoints (pydata#8736) Do not attempt to broadcast when global option ``arithmetic_broadcast=False`` (pydata#8784) try to get the `upstream-dev` CI to complete again (pydata#8823) Bump the actions group with 1 update (pydata#8818) Update documentation for clarity (pydata#8817) DOC: link to zarr.convenience.consolidate_metadata (pydata#8816) Refactor Grouper objects (pydata#8776) ...

Support pandas copy-on-write behaviour

12c253d

Closes pydata#8843

dcherian added the run-upstream Run upstream CI label Mar 16, 2024

dcherian commented Mar 16, 2024

View reviewed changes

xarray/tests/__init__.py Outdated Show resolved Hide resolved

Update xarray/tests/__init__.py

c1d108d

dcherian marked this pull request as ready for review March 16, 2024 03:38

dcherian added the needs review label Mar 16, 2024

dcherian marked this pull request as draft March 16, 2024 04:05

dcherian added 2 commits March 15, 2024 22:14

One more fix

3f56a1b

Fix interp

c818650

dcherian marked this pull request as ready for review March 16, 2024 04:15

max-sixty approved these changes Mar 16, 2024

View reviewed changes

dcherian added 2 commits March 18, 2024 09:15

Avoid copy

29e1689

Try again

f04b2d7

dcherian added plan to merge Final call for comments and removed needs review labels Mar 18, 2024

dcherian merged commit c6c01b1 into pydata:main Mar 18, 2024
27 of 30 checks passed

dcherian deleted the fix-pd-cow branch March 18, 2024 16:00

keewis mentioned this pull request Apr 13, 2024

adapt more tests to the copy-on-write behavior of pandas #8940

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support pandas copy-on-write behaviour #8846

Support pandas copy-on-write behaviour #8846

dcherian commented Mar 16, 2024

phofl commented Mar 16, 2024

dcherian commented Mar 18, 2024

	def _possibly_convert_objects(values):
	"""Convert arrays of datetime.datetime and datetime.timedelta objects into
	datetime64 and timedelta64, according to the pandas convention. For the time
	being, convert any non-nanosecond precision DatetimeIndex or TimedeltaIndex
	objects to nanosecond precision. While pandas is relaxing this in version
	2.0.0, in xarray we will need to make sure we are ready to handle
	non-nanosecond precision datetimes or timedeltas in our code before allowing
	such values to pass through unchanged. Converting to nanosecond precision
	through pandas.Series objects ensures that datetimes and timedeltas are
	within the valid date range for ns precision, as pandas will raise an error
	if they are not.
	"""
	as_series = pd.Series(values.ravel(), copy=False)
	if as_series.dtype.kind in "mM":
	as_series = _as_nanosecond_precision(as_series)
	return np.asarray(as_series).reshape(values.shape)

Support pandas copy-on-write behaviour #8846

Support pandas copy-on-write behaviour #8846

Conversation

dcherian commented Mar 16, 2024

phofl commented Mar 16, 2024

dcherian commented Mar 18, 2024