-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pyarrow: Check compatibility of pyarrow-backed pandas objects with numeric dtypes #2774
Conversation
Install pyarrow as an optional dependency, and check that pandas.Series objects backed by pyarrow dtypes (e.g. 'uint8[pyarrow]') can be read by virtualfile_from_vectors.
Remove duck-typing that didn't work, and use a proper try-except `import pyarrow`.
Ok, these two dtypes are not as trivial to support in PyGMT as I thought, because we have a lot of converters that assume numpy-backed arrays. Gonna open a proper issue for this - #2800 |
Check that pandas.Series and pandas.DataFrame objects backed by pyarrow dtypes (e.g. 'int64[pyarrow]' and 'float64[pyarrow]') can be read by pygmt.info. Using pandas.util._test_decorators_skip_if_no pytest mark to simplify the pytest parametrize.
Geopandas doesn't support casting to pyarrow dtypes like 'int32[pyarrow]' and 'int64[pyarrow]' yet, but adding an xfail test so that we don't forget to test in the future.
Actually, casting to pyarrow integer dtypes work, but writing to the temporary OGR_GMT file doesn't.
pygmt/tests/test_geopandas.py
Outdated
pytest.param( | ||
"int32[pyarrow]", | ||
marks=[ | ||
td.skip_if_no(package="pyarrow"), | ||
pytest.mark.xfail( | ||
reason="geopandas doesn't support writing columns with pyarrow dtypes to OGR_GMT yet." | ||
), | ||
], | ||
), | ||
pytest.param( | ||
"int64[pyarrow]", | ||
marks=[ | ||
td.skip_if_no(package="pyarrow"), | ||
pytest.mark.xfail( | ||
reason="geopandas doesn't support writing columns with pyarrow dtypes to OGR_GMT yet." | ||
), | ||
], | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So geopandas.GeoDataFrame objects with columns that have pyarrow dtypes can't be plotted in PyGMT yet, because we write them to a temporary OGR_GMT file, and geopandas
doesn't support writing int32[pyarrow]
or int64[pyarrow]
columns. We'll either need to wait for geopandas
to support this, or switch PyGMT away from using temporay OGR_GMT files.
Ensure that previous and future versions of GMT are compatible with PyArrow too.
Mention that PyGMT does have some initial support of Pandas objects backed by PyArrow-dtype arrays, but only uint/int/float dtypes for now.
Ready for review! I've added
Encountered a bug with passing |
I don't have access to my computer this weekend. will find some time this Sunday night. |
Implementing the td.skip_if_no pytest mark inline instead of using the private function from pandas.
Co-Authored-By: Dongdong Tian <[email protected]>
Cleaner way to check if pyarrow is installed or not.
Co-authored-by: Dongdong Tian <[email protected]>
Description of proposed changes
Pandas is moving towards using PyArrow backed arrays (previously just
numpy
), and will requirepyarrow
in pandas 3.0, so we should ensure thatpyarrow
dtypes work across PyGMT.This Pull Request installs
pyarrow
as an optional dependency, and updates some unit tests to check bothnumpy
andpyarrow
-backed dtypes forpandas.Series
andpandas.DataFrame
objects.TODO:
[ ] Check null and bool types ?[ ] Check string dtypes[ ] Check datetime dtypesRelates to #1318 (comment), part of #2800
Reminders
make format
andmake check
to make sure the code follows the style guide.doc/api/index.rst
.Slash Commands
You can write slash commands (
/command
) in the first line of a comment to performspecific operations. Supported slash commands are:
/format
: automatically format and lint the code/test-gmt-dev
: run full tests on the latest GMT development version