Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(fix): don't handle time-dtypes as extension arrays in from_dataframe #9042

Merged
merged 8 commits into from
Jun 11, 2024

Conversation

ilan-gold
Copy link
Contributor

This is probably the simplest way to fix the bug but I now have many questions about PandasIndexingAdapter. I initially attempted to make datetime dtypes go into ExtensionArray (because they are extension arrays) but there are so many tests with behavior that I don't understand that this seemed easier. Primarily, my issue with doing that wasn't that the tests broke in a way that was "this behavior doesn't make sense" but rather "this behavior makes more sense than what the test expects" i.e., there were a number of places where tests expect datetime dtypes to not be preserved, but they are preserved by using ExtensionArray. But I don't understand why certain things are expected or not. For example:

if var.dtype.kind == "M":
assert var.dtype == np.dtype("datetime64[ns]")

became a condition that expected the dtype be preserved and not just blanket converting everything to np.dtype("datetime64[ns]") i.e., the dtype on var remained pd.DatetimeTZDtype("s", pytz.timezone("America/New_York"))

@ilan-gold
Copy link
Contributor Author

@dcherian could you review?

@dcherian dcherian added the plan to merge Final call for comments label Jun 11, 2024
@dcherian dcherian merged commit 52fb457 into pydata:main Jun 11, 2024
29 of 31 checks passed
dcherian added a commit to dcherian/xarray that referenced this pull request Jun 13, 2024
* upstream/main:
  [skip-ci] Try fixing hypothesis CI trigger (pydata#9112)
  Undo custom padding-top. (pydata#9107)
  add remaining core-dev citations [skip-ci][skip-rtd] (pydata#9110)
  Add user survey announcement to docs (pydata#9101)
  skip the `pandas` datetime roundtrip test with `pandas=3.0` (pydata#9104)
  Adds Matt Savoie to CITATION.cff (pydata#9103)
  [skip-ci] Fix skip-ci for hypothesis (pydata#9102)
  open_datatree performance improvement on NetCDF, H5, and Zarr files (pydata#9014)
  Migrate datatree io.py and common.py into xarray/core (pydata#9011)
  Micro optimizations to improve indexing (pydata#9002)
  (fix): don't handle time-dtypes as extension arrays in `from_dataframe` (pydata#9042)
andersy005 pushed a commit that referenced this pull request Jun 14, 2024
…e` (#9042)

* (fix): don't handle time-dtypes as extension arrays

* (fix): check series type

* Add whats-new

---------

Co-authored-by: Deepak Cherian <[email protected]>
Co-authored-by: Deepak Cherian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential regression in Dataset.from_dataframe() not preserving timezone
3 participants