Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyarrow: Check compatibility of pyarrow.array and pyarrow.table with numeric and timestamp types #2864

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Dec 9, 2023

Description of proposed changes

Check that pyarrow.array and pyarrow.table objects of various types (uint/int/timestamp) can be passed to PyGMT modules/functions.

TODO:

  • Initial compatibility check of pyarrow.array by adding parametrized tests to pygmt.info
  • Add pyarrow.table to unit tests that already have the @pytest.mark.parametrize("array_func", ...)
  • Document that pyarrow.array objects can be used in place of numpy.array
  • Allow solar's terminator_datetime parameter to accept a pyarrow.TimestampScalar input
  • etc

Addresses #2800 (comment), part of #2800.

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
  • If adding new functionality, add an example to docstrings or tutorials.
  • Use underscores (not hyphens) in names of Python files and directories.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash commands are:

  • /format: automatically format and lint the code
  • /test-gmt-dev: run full tests on the latest GMT development version

Install pyarrow as an optional dependency, and ensure that pyarrow.array objects with int64 and timestamp types can be read by pygmt.info.
@weiji14 weiji14 self-assigned this Dec 9, 2023
@weiji14 weiji14 changed the title pyarrow: Check compatibility of pyarrow array objects with numeric and timestamp dtypes pyarrow: Check compatibility of pyarrow.array and pyarrow.table with numeric and timestamp dtypes Dec 9, 2023
Ensure that pyarrow.table objects can be passed into pygmt functions like blockm, info, nearneighbor, project, triangulate and xyz2grd.
@weiji14
Copy link
Member Author

weiji14 commented Dec 9, 2023

/format

Comment on lines 165 to 178
pytest.param(
getattr(pa, "table", None),
"vector memory",
marks=td.skip_if_no(package="pyarrow"),
),
],
)
def test_info_2d_array(array_func, expected_memory):
"""
Make sure info works on 2-D numpy.ndarray inputs.
Make sure info works on 2-D numpy.ndarray and pyarrow.table inputs.
"""
table = np.loadtxt(POINTS_DATA)
table = array_func(pd.read_csv(POINTS_DATA, sep=" ", header=None))
output = info(data=table)
expected_output = (
"<matrix memory>: N = 20 <11.5309/61.7074> <-2.9289/7.8648> <0.1412/0.9338>\n"
)
expected_output = f"<{expected_memory}>: N = 20 <11.5309/61.7074> <-2.9289/7.8648> <0.1412/0.9338>\n"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that pyarrow.table goes through put_vector rather than put_matrix, because pyarrow.table is more akin to pandas.DataFrame (which allows for columns with different dtypes) than a numpy.array object (which only allows for one dtype).

@weiji14 weiji14 added the maintenance Boring but important stuff for the core devs label Dec 16, 2023
Done by casting the pyarrow.TimestampScalar to a string first, before letting pd.to_datetime do the formatting. Have added a new parametrized unit test to test_solar_set_terminator_datetime with pyarrow and pd.Timestamp array_func to test this.
Copy link

codspeed-hq bot commented Dec 30, 2023

CodSpeed Performance Report

Merging #2864 will degrade performances by 6.62%

⚠️ No base runs were found

Falling back to comparing pyarrow/array-compat (8bf81ff) with main (21a8722)

Summary

❌ 1 regressions
✅ 56 untouched benchmarks

🆕 8 new benchmarks
⁉️ 7 dropped benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main pyarrow/array-compat Change
test_grdsample_dataarray_out 105.6 ms 113.1 ms -6.62%
🆕 test_project_input_table_matrix[array] N/A 40.5 ms N/A
🆕 test_solar_set_terminator_datetime[Timestamp] N/A 139.9 ms N/A
🆕 test_solar_set_terminator_datetime[fromisoformat] N/A 139.8 ms N/A
🆕 test_project_input_table_matrix[DataFrame] N/A 41.8 ms N/A
🆕 test_solar_set_terminator_datetime[str] N/A 139.7 ms N/A
🆕 test_project_input_table_matrix[Dataset] N/A 51 ms N/A
🆕 test_xyz2grd_input_table_matrix[Dataset] N/A 245.9 ms N/A
🆕 test_xyz2grd_input_table_matrix[array] N/A 249 ms N/A
⁉️ test_project_input_matrix[array] 40.4 ms N/A N/A
⁉️ test_project_input_matrix[Dataset] 50.9 ms N/A N/A
⁉️ test_project_input_matrix[DataFrame] 41.7 ms N/A N/A
⁉️ test_solar_set_terminator_datetime[terminator_datetime_string] 139.5 ms N/A N/A
⁉️ test_solar_set_terminator_datetime[terminator_datetime1] 135.7 ms N/A N/A
⁉️ test_xyz2grd_input_array[array] 248.9 ms N/A N/A
⁉️ test_xyz2grd_input_array[Dataset] 245.7 ms N/A N/A

@weiji14 weiji14 changed the title pyarrow: Check compatibility of pyarrow.array and pyarrow.table with numeric and timestamp dtypes pyarrow: Check compatibility of pyarrow.array and pyarrow.table with numeric and timestamp types Dec 30, 2023
@weiji14 weiji14 added this to the 0.11.0 milestone Dec 30, 2023
@seisman seisman modified the milestones: 0.11.0, 0.12.0 Feb 1, 2024
@seisman seisman removed this from the 0.12.0 milestone Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants