Add pygmt.read to read a dataset/grid/image into pandas.DataFrame/xarray.DataArray #3673

seisman · 2024-12-04T10:05:38Z

Description of proposed changes

This PR adds the pygmt.read function to read any recognized data files (currently dataset, grid, or image) into a pandas.DataFrame/xarray.DataArray object.

The new read function can replace most load_dataarray/xr.open_dataarray/xr.load_dataarray calls.

Related to #3643 (comment).

Preview: https://pygmt-dev--3673.org.readthedocs.build/en/3673/api/generated/pygmt.read.html

Reminders

Run make format and make check to make sure the code follows the style guide.
Add tests for new features or tests that would have caught the bug that you're fixing.
Add new public functions/methods/classes to doc/api/index.rst.
Write detailed docstrings for all functions/methods.
If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
If adding new functionality, add an example to docstrings or tutorials.
Use underscores (not hyphens) in names of Python files and directories.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash command is:

/format: automatically format and lint the code

…ray.DataArray

seisman · 2024-12-09T09:53:15Z

pygmt/src/read.py

+        raise ValueError(msg)
+
+    kwdict = {
+        "R": "/".join(f"{v}" for v in region) if is_nonstr_iter(region) else region,  # type: ignore[union-attr]


This line is used here to avoid using the kwargs_to_string, use_alias decorators:

"R": "/".join(f"{v}" for v in region) if is_nonstr_iter(region) else region

seisman · 2024-12-09T10:06:30Z

In the _load_remote_dataset function, we can't replace the following codes with the new read function, because in read, we call which to get the full path of the source grid, which doesn't work well for tiled grids.

pygmt/pygmt/datasets/load_remote_dataset.py

Lines 444 to 459 in 7768e93

    
           fname = f"@{prefix}_{resolution}_{reg}" 
        
           kind = "image" if name in {"earth_day", "earth_night"} else "grid" 
        
           kwdict = {"R": region, "T": {"grid": "g", "image": "i"}[kind]} 
        
           with Session() as lib: 
        
               with lib.virtualfile_out(kind=kind) as voutgrd: 
        
                   lib.call_module( 
        
                       module="read", 
        
                       args=[fname, voutgrd, *build_arg_list(kwdict)], 
        
                   ) 
        
                   grid = lib.virtualfile_to_raster(kind=kind, outgrid=None, vfname=voutgrd) 
        
           # Full path to the grid if not tiled grids. 
        
           source = which(fname, download="a") if not resinfo.tiled else None 
        
           # Manually add source to xarray.DataArray encoding to make the GMT accessors work. 
        
           if source: 
        
               grid.encoding["source"] = source

…me DataArray

seisman · 2024-12-09T10:33:19Z

Now, the load_dataarray function is used in pygmt/src/grdcut.py only (related to #3115).

xr.open_dataarray is used in test_accessors.py.

michaelgrund · 2024-12-11T12:13:26Z

pygmt/src/read.py

+        A list of column names.
+    header
+        Row number containing column names. ``header=None`` means not to parse the
+        column names from table header. Ignored if the row number is larger than the


Suggested change

column names from table header. Ignored if the row number is larger than the

column names from the table header. Ignored if the row number is larger than the

michaelgrund · 2024-12-11T12:13:47Z

pygmt/src/read.py

+    header
+        Row number containing column names. ``header=None`` means not to parse the
+        column names from table header. Ignored if the row number is larger than the
+        number of headers in the table.


Suggested change

number of headers in the table.

number of header lines in the table.

seisman added feature Brand new feature needs review This PR has higher priority and needs review. and removed needs review This PR has higher priority and needs review. labels Dec 4, 2024

seisman force-pushed the feature/read branch 2 times, most recently from cac7d74 to c50232e Compare December 4, 2024 10:18

seisman marked this pull request as draft December 5, 2024 03:23

seisman force-pushed the feature/read branch from c50232e to cef4cdf Compare December 5, 2024 03:24

Add pygmt.read to read a dataset/grid/image into pandas.DataFrame/xar…

d913c86

…ray.DataArray

seisman force-pushed the feature/read branch from cef4cdf to d913c86 Compare December 5, 2024 07:58

seisman added 15 commits December 5, 2024 23:17

Set GMT accessor

f456bf8

Need to set 'source' encoding to make GMT accessor work

c3cbb6e

Merge branch 'main' into feature/read

f2a4ce4

Fix the source encoding

1dd97c6

No need to set the source encoding in load_remote_dataset.py

7790ea3

Revert changes in pygmt/datasets/load_remote_dataset.py

e588008

Improve docstring in pygmt/helpers/testing.py

40d12ee

Improve docstrinbgs

fa1021d

Get rid of decorators

c378225

Improve comment

7b749e0

Get rid of the fmt_docstring alias

8befa58

Fix type hints issue with overload

a758752

Remove the type ignore flag

9d66cf4

region defaults to None

a05383a

Merge branch 'main' into feature/read

6ca4ef2

seisman added this to the 0.14.0 milestone Dec 9, 2024

Improve type hints and add tests

7851ced

seisman marked this pull request as ready for review December 9, 2024 09:47

Improve the checking of return value of which

084b87a

seisman added the needs review This PR has higher priority and needs review. label Dec 9, 2024

Use the read funciton in pygmt/tests/test_datatypes_dataset.py

b21997c

seisman commented Dec 9, 2024

View reviewed changes

seisman added 3 commits December 9, 2024 18:18

Use the read function instead of the load_dataarray method

a812317

Add one test to make sure that read and load_dataarray returns the sa…

1f0f158

…me DataArray

Simplify pygmt/tests/test_clib_read_data.py with read

957c7eb

seisman added 3 commits December 9, 2024 18:42

Fix a typo

6aef3ca

Replace xr.open_dataarray with read

72afbfe

Fix a typo

03de9b7

michaelgrund approved these changes Dec 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pygmt.read to read a dataset/grid/image into pandas.DataFrame/xarray.DataArray #3673

Add pygmt.read to read a dataset/grid/image into pandas.DataFrame/xarray.DataArray #3673

seisman commented Dec 4, 2024 •

edited

Loading

seisman Dec 9, 2024

seisman commented Dec 9, 2024

seisman commented Dec 9, 2024 •

edited

Loading

michaelgrund Dec 11, 2024

michaelgrund Dec 11, 2024

	column names from table header. Ignored if the row number is larger than the
	column names from the table header. Ignored if the row number is larger than the

	number of headers in the table.
	number of header lines in the table.

Add pygmt.read to read a dataset/grid/image into pandas.DataFrame/xarray.DataArray #3673

Are you sure you want to change the base?

Add pygmt.read to read a dataset/grid/image into pandas.DataFrame/xarray.DataArray #3673

Conversation

seisman commented Dec 4, 2024 • edited Loading

seisman Dec 9, 2024

Choose a reason for hiding this comment

seisman commented Dec 9, 2024

seisman commented Dec 9, 2024 • edited Loading

michaelgrund Dec 11, 2024

Choose a reason for hiding this comment

michaelgrund Dec 11, 2024

Choose a reason for hiding this comment

seisman commented Dec 4, 2024 •

edited

Loading

seisman commented Dec 9, 2024 •

edited

Loading