Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyarrow: Support date32[day] and date64[ms] dtypes in pandas objects #2845

Merged
merged 9 commits into from
Dec 17, 2023

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Dec 3, 2023

Description of proposed changes

Handle date columns in pandas.DataFrame with pyarrow dtypes like date32[day][pyarrow] or date64[ms][pyarrow].

This is implemented by modifying the vectors_to_arrays conversion function to convert date32/date64 dtypes to numpy.datetime64.

An altenative implementation would be to handle pyarrow date32/date64 dtypes natively without converting, but this might require more work and discussion at #2800.

Note:

References:

Part of #2800, related to #242, extends #464

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
  • If adding new functionality, add an example to docstrings or tutorials.
  • Use underscores (not hyphens) in names of Python files and directories.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash commands are:

  • /format: automatically format and lint the code
  • /test-gmt-dev: run full tests on the latest GMT development version

Handle date columns in pandas.DataFrame with pyarrow dtypes like date32[day][pyarrow] or date64[ms][pyarrow] by modifying the vectors_to_arrays conversion function. Added some parametrized unit tests to test_info.py to ensure this works.
@weiji14 weiji14 added the enhancement Improving an existing feature label Dec 3, 2023
@weiji14 weiji14 added this to the 0.11.0 milestone Dec 3, 2023
@weiji14 weiji14 self-assigned this Dec 3, 2023
Need to handle Python lists that don't have the dtype attribute, unlike pandas.Series objects. Also ensure that we return a C-contiguous array.
Ensure that pyarrow date32 and date64 dtypes are converted to numpy.datetime64 dtype. Added pyarrow dependency to ci_doctests.yaml. Also changed from using `"date" in vec_dtype` to `vec_dtype.startswith("date")`.
@weiji14 weiji14 changed the title Support pyarrow date32/date64 dtypes in pandas objects pyarrow: Support pyarrow date32/date64 dtypes in pandas objects Dec 9, 2023
@weiji14 weiji14 changed the title pyarrow: Support pyarrow date32/date64 dtypes in pandas objects pyarrow: Support date32[day]/date64[ms] dtypes in pandas objects Dec 9, 2023
@weiji14 weiji14 changed the title pyarrow: Support date32[day]/date64[ms] dtypes in pandas objects pyarrow: Support date32[day] and date64[ms] dtypes in pandas objects Dec 9, 2023
@weiji14 weiji14 added the needs review This PR has higher priority and needs review. label Dec 16, 2023
Copy link
Member

@seisman seisman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but need to mark the PR as ready for review and merge the main branch to make sure all tests pass.

@weiji14
Copy link
Member Author

weiji14 commented Dec 17, 2023

I'm still debating on whether to add pyarrow into test_plot_datetime that was added in #464:

@pytest.mark.mpl_image_compare
def test_plot_datetime():
"""
Test various datetime input data.
"""
fig = Figure()
fig.basemap(
projection="X15c/5c",
region=[
np.array("2010-01-01T00:00:00", dtype=np.datetime64),
pd.Timestamp("2020-01-01"),
0,
10,
],
frame=True,
)
# numpy.datetime64 types
x = np.array(
["2010-06-01", "2011-06-01T12", "2012-01-01T12:34:56"], dtype="datetime64"
)
y = [1.0, 2.0, 3.0]
fig.plot(x=x, y=y, style="c0.2c", pen="1p")
# pandas.DatetimeIndex
x = pd.date_range("2013", freq="YS", periods=3)
y = [4, 5, 6]
fig.plot(x=x, y=y, style="t0.2c", pen="1p")
# xarray.DataArray
x = xr.DataArray(data=pd.date_range(start="2015-03", freq="QS", periods=3))
y = [7.5, 6, 4.5]
fig.plot(x=x, y=y, style="s0.2c", pen="1p")
# raw datetime strings
x = ["2016-02-01", "2017-03-04T00:00"]
y = [7, 8]
fig.plot(x=x, y=y, style="a0.2c", pen="1p")
# the Python built-in datetime and date
x = [datetime.date(2018, 1, 1), datetime.datetime(2019, 1, 1)]
y = [8.5, 9.5]
fig.plot(x=x, y=y, style="i0.2c", pen="1p")
return fig

That test is already a bit long though ...

@weiji14 weiji14 added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Dec 17, 2023
@weiji14 weiji14 marked this pull request as ready for review December 17, 2023 03:48
@weiji14
Copy link
Member Author

weiji14 commented Dec 17, 2023

I'm still debating on whether to add pyarrow into test_plot_datetime that was added in #464:

On second thought, I might do this in a follow-up PR. PyArrow also has time32/time64 types, so I might add a test_plot_datetime_pyarrow unit test that reuses test_plot_datetime's baseline image, but using pyarrow inputs instead.

@weiji14 weiji14 merged commit 005de65 into main Dec 17, 2023
17 of 23 checks passed
@weiji14 weiji14 deleted the pyarrow/date-dtype branch December 17, 2023 05:46
@seisman seisman removed the final review call This PR requires final review and approval from a second reviewer label Jan 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants