Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap GMT's standard data type GMT_GRID for grids #2398

Merged
merged 15 commits into from
Apr 16, 2024
Merged

Conversation

seisman
Copy link
Member

@seisman seisman commented Mar 5, 2023

Description of proposed changes

This PR wraps GMT's standard data type GMT_GRID for grids, so that we can store the output grid in GMT virtual files and don't have to write the output grid into a temporary file and load it as xarray.DataArray.

Expected enhancements/features

Here is a list of enhancements/features related to the GMT_GRID data type:

The GMT_GRID/GMT_GRID_HEADER data structure

The GMT_GRID data structure is defined at gmt/src/gmt_resources.h:

struct GMT_GRID {	/* To hold a GMT gmt_grdfloat grid and its header in one container */
	struct GMT_GRID_HEADER *header;	/* Pointer to full GMT header for the grid */
	gmt_grdfloat *data;             /* Pointer to the gmt_grdfloat grid */
	double *x, *y;                  /* Vector of coordinates */
	void *hidden;                   /* Row-by-row machinery information [NULL] */
};

it contains the x/y array, the data array and the grid header. The grid header GMT_GRID_HEADER is defined at gmt/src/gmt_resources.h. It contains many important metadata information about the grid. Here are some important header variables:

  • n_columns: Number of columns
  • n_rows: Number of rows
  • registration: 0 for gridline registration and 1 for pixel registration.
  • wesn: min/max of x/y coordinates, i.e., [xmin, xmax, ymin, ymax]
  • z_min/z_max: min/max of z values
  • inc: x and y increment [should be equal to x[1] - x[0] and y[1] - y[0]]
  • z_scale_factor/z_add_offset: Used for data compressing by storing floating-point data as smaller integers in a netCDF file. The relation between the actual data z' and the data stored in netCDF file (z) is z' = z * scale_factor + offset (i.e., unpacking). Need to note that when reading a netCDF file, GMT already does the conversion, so z_min/z_max and the data array are already unpacked so we usually don't need to care about these two variables.
  • x_units/y_units: x/y units in the form of long_name [units]. The [units] part is optional. Some examples are: x/y/longitude [degrees_east]/latitude [degrees_north].
  • z_units: Also in the form of long_name [units].
  • pad: Extra columns/rows on west/east/south/north sides. Paddings are mainly used for grid manipulations at grid boundary. Defaults are [2, 2, 2, 2].
  • mx: actual columns in memory. Equal to n_columns + pad[0] + pad[1]
  • my: actual rows in memory. Equal to n_rows + pad[2] + pad[3]

Conversions between netCDF and GMT_GRID

Although netCDF is the default grid format for I/O, GMT uses the GMT_GRID data structure as its internal representation of grids. Thus, GMT needs to convert netCDF data/metadata to GMT_GRID for input and do the inverse for output. The conversions between netCDF and GMT_GRID are done by functions gmt_nc_read_grd/gmt_nc_write_grd in gmt/src/gmt_nc.c. Below Here are some technical notes:

  • x_units/y_units/z_units has the form of long_name [units], in which [units] is optional. long_name and units are netCDF attributes. gmtnc_get_units
  • When reading, GMT reads the variable's long_name and units attributes and then sets the x_units/y_units/z_units string. If long_name is not defined, then use the variable's name instead. If units is not defined, then [units] is not appended. gmtnc_get_units
  • For writing, GMT extracts the long_name and units attributes by parsing the x_units/y_units/z_units string, and setting the netCDF attributes if they're defined. More attributes are added if it's a geographic coordinate. If units="degrees_east", set attributes standard_name to longitude and axis to X; If units="degrees_north", set attributes standard_name to latitude and axis to Y. gmtnc_put_units
  • The default dimension names are lon/time/x for x axis and lat/time/y for y axis. reference
  • The default name for the data is always z for 2-D grid or cube for 3-D cube, unless the variable name is given. The variable name is stored in the hidden structure HH->varname, but how is HH->varname determined?. reference
  • When writing a netCDF file, GMT defines some global attributes, including Conventions (CF-1.7), title, history, description, GMT_version, node_offset (only for pixel grids). reference
  • When writing x/y vectors, axis attributes actual_range (from header-wesn) and axis (X or Y) are also set. reference
  • When writing the data, some attributes are defined: scale_factor (if not 1.0), add_offset (if not 0.0), cpt (if defined by GMT), _FillValue (for NaN), actual_range.

Benefits of current implementation

  1. Simplify the codes of module wrappers
  2. No longer to write output grid to temporary files
  3. Performance improvements (Wrap GMT's standard data type GMT_GRID for grids #2398 (comment))

Limitations of current implementation

In addition to the metadata stored in the GMT_GRID_HEADER object, GMT also has some hidden metadata which is stored in a hidden data structure, for example, grdtype (Cartesian or Geographic grid?), cpt (the default CPT). The hidden data structure is private and may change between different GMT versions. So it's not a good practice to wrap and use the hidden data structure. But it also means we can't know the grdtype/cpt attributes when converting GMT_GRID to xarray.DataArray.

For grdtype, it's reasonable to say that the grid is geographic if x's unit is degrees_east and y's unit is degrees_north (see the note above and also CF conventions), but for cpt, we can't know if a grid has a default CPT or not.

Another limitation is that, grdcut now can't cut an image because we assume that the output is a grid. An initial fix is at #3115. But, as far as I can see, although many GMT modules can accept an image (e.g., grdcut, grdimage, grdview), only grdcut and grdmix can output an image.

References

  • Function gmtnc_put_units in gmt_nc.c
  • Function gmtnc_get_units in gmt_nc.c
  • Function gmtnc_grd_info in gmt_nc.c
  • Function gmt_grd_init in gmt_grdio.c

@seisman seisman changed the title Support GMT's standard data type GMT_GRID for grids WIP: Support GMT's standard data type GMT_GRID for grids Mar 5, 2023
@seisman seisman added maintenance Boring but important stuff for the core devs enhancement Improving an existing feature labels Mar 5, 2023
pygmt/src/grdcut.py Outdated Show resolved Hide resolved
pygmt/src/grdcut.py Outdated Show resolved Hide resolved
pad = header.pad[:]
x = self.x[: header.n_columns]
y = self.y[: header.n_rows]
data = np.array(self.data[: header.mx * header.my]).reshape(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grid has n_columns columns and n_rows rows, but usually the number of data points in memory are not n_columns * n_rows due to grid pads. For some specific reasons, GMT usually adds two more rows/columns on west, east, south, north sides. So, we should have:

mx = pad[0] + n_columns + pad[1]
my = pad[2] + n_rows + pad[3]

That's why we need to use the following codes to get the data matrix:

data = np.array(self.data[: header.mx * header.my]).reshape(
           (header.my, header.mx)
       )[pad[2] : header.my - pad[3], pad[0] : header.mx - pad[1]]

@weiji14 weiji14 removed the maintenance Boring but important stuff for the core devs label Mar 5, 2023
Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing, thanks for pulling this together @seisman! I'll need to take a closer look to review this properly, but it's looking very promising 🤩

Currently the unit tests are failing, mostly because the output xarray.DataArray doesn't have the same attributes/metadata as when we were using the NetCDF tempfile. Will need to find a way to copy those metadata over somehow.

pygmt/datatypes.py Outdated Show resolved Hide resolved
pygmt/datatypes.py Outdated Show resolved Hide resolved
@seisman

This comment was marked as outdated.

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the tests on grdview (which uses grdcut for the fixture) are failing with an incorrect figure size. Is there something to be fixed with the padding?

pygmt/datatypes.py Outdated Show resolved Hide resolved
@seisman
Copy link
Member Author

seisman commented Mar 6, 2023

Most of the tests on grdview (which uses grdcut for the fixture) are failing with an incorrect figure size. Is there something to be fixed with the padding?

No, if you compare the baseline and result images, you'll see that the differences are caused by the gtype property. gtype should be 1 for the grid used in grdview tests, but currently the GMT_GRID structure doesn't store the gtype information.

pygmt/clib/session.py Outdated Show resolved Hide resolved
pygmt/clib/session.py Outdated Show resolved Hide resolved
pygmt/clib/session.py Outdated Show resolved Hide resolved
@seisman
Copy link
Member Author

seisman commented Mar 6, 2023

Most of the tests on grdview (which uses grdcut for the fixture) are failing with an incorrect figure size. Is there something to be fixed with the padding?

No, if you compare the baseline and result images, you'll see that the differences are caused by the gtype property. gtype should be 1 for the grid used in grdview tests, but currently the GMT_GRID structure doesn't store the gtype information.

Added a temporary workaround at 6280e39. And now all tests pass except test_accessor_grid_source_file_not_exist which is expected to fail.

pygmt/datatypes.py Outdated Show resolved Hide resolved
pygmt/clib/session.py Outdated Show resolved Hide resolved
@seisman seisman changed the title WIP: Support GMT's standard data type GMT_GRID for grids POC: Support GMT's standard data type GMT_GRID for grids Oct 11, 2023
@seisman seisman added this to the 0.12.0 milestone Dec 11, 2023
@seisman seisman changed the title POC: Support GMT's standard data type GMT_GRID for grids Support GMT's standard data type GMT_GRID for grids Feb 20, 2024
@seisman seisman removed this from the 0.12.0 milestone Feb 26, 2024
@seisman seisman added the run/benchmark Trigger the benchmark workflow in PRs label Mar 13, 2024
Copy link

codspeed-hq bot commented Mar 14, 2024

CodSpeed Performance Report

Merging #2398 will improve performances by ×2

⚠️ No base runs were found

Falling back to comparing datatypes/gmtgrid (84d6546) with main (e3d52e2)

Summary

⚡ 18 improvements
✅ 80 untouched benchmarks

Benchmarks breakdown

Benchmark main datatypes/gmtgrid Change
test_binstats_no_outgrid 159.7 ms 141.2 ms +13.06%
test_load_remote_dataset_benchmark_with_region 147.8 ms 121.3 ms +21.82%
test_dimfilter_no_outgrid 87 ms 59.8 ms +45.6%
test_grdclip_no_outgrid 86.8 ms 59 ms +46.99%
test_grdcut_dataarray_in_dataarray_out 86.6 ms 58.9 ms +46.96%
test_grdfill_dataarray_out 86.6 ms 59.4 ms +45.88%
test_grdfilter_dataarray_in_dataarray_out 87.3 ms 59.7 ms +46.32%
test_grdgradient_no_outgrid 86.9 ms 59.3 ms +46.46%
test_equalize_grid_no_outgrid 86.6 ms 58.9 ms +46.96%
test_grdproject_no_outgrid[+proj=merc +ellps=WGS84 +units=m +width=10] 103.7 ms 69 ms +50.22%
test_grdproject_no_outgrid[EPSG:3395 +width=10] 104.3 ms 62.2 ms +67.65%
test_grdproject_no_outgrid[M10c] 101.5 ms 59.3 ms +71.3%
test_grdsample_dataarray_out 91.5 ms 68 ms +34.51%
test_sphdistance_no_outgrid 582.9 ms 512.3 ms +13.77%
test_surface_input_xyz 127.2 ms 63.1 ms ×2
test_regular_grid_no_outgrid 312.2 ms 247.5 ms +26.17%
test_xyz2grd_input_array[Dataset] 245.3 ms 180.8 ms +35.67%
test_xyz2grd_input_array[array] 248.6 ms 184 ms +35.12%

@seisman seisman force-pushed the datatypes/gmtgrid branch from 4ba67df to ddb7a81 Compare April 1, 2024 11:46
pygmt/clib/session.py Outdated Show resolved Hide resolved
@seisman seisman added the needs review This PR has higher priority and needs review. label Apr 8, 2024
@seisman
Copy link
Member Author

seisman commented Apr 8, 2024

I think this PR is now ready for review. Generally speaking, this PR does three things:

  1. Wrap GMT_GRID data structure
  2. Add Session.virtualfile_to_raster method which can read a virtualfile content as GMT_GRID/GMT_IMAGE/GMT_CUBE
  3. Refactor all wrappers to the virtualfiles for grid output and fix any failing tests

I think it makes more sense to split this big PR into three smaller PRs. Maybe after this PR is reviewed and approved.

@@ -1946,6 +1951,70 @@ def virtualfile_to_dataset(
return result.to_numpy()
return result # pandas.DataFrame output

def virtualfile_to_raster(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this method, kind can be grid, cube, image or None. If None, we will call Session.inquire_virtualfile to determine the family of the virtualfile.

Most GMT modules can write grids only, so we can set kind="grid" to avoid calling Session.inquire_virtualfile.

Here are some exceptions:

  1. grdmix writes images
  2. grdcut writes grids or images
  3. grdimage -A writes images [Unimplemented in PyGMT]
  4. greenspine writes grids or cubes
  5. grdinterpolate writes grids or cubes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most GMT modules can write grids only, so we can set kind="grid" to avoid calling Session.inquire_virtualfile.

Yes, I think we can set kind="grid", unless we know if GMT will allow passing GMT_IMAGE to certain grd* modules in the future. For example, maybe grdclip would allow GMT_IMAGE inputs at some point, and so we might want to set kind=None for future-proofing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do more refactoring if grdclip would support GMT_IMAGE in the future, since we need to check the kind of the input data (grid in, grid out or image in, image out), similar to what we have to do to grdcut (xref: #2398 (comment)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I meant if upstream GMT were to support passing GMT_IMAGE into grdclip in the future, but not sure when that would happen. Ok to just default to kind='grid' for now, since that's the current behaviour I think.

@weiji14
Copy link
Member

weiji14 commented Apr 8, 2024

I think it makes more sense to split this big PR into three smaller PRs. Maybe after this PR is reviewed and approved.

Can probably keep it as one PR, I think you've done a good job of cherrypicking the GMT grid header and inquire virtualfile stuff out already, and the code here is reasonable to review.

I'll give this a proper final review after Thursday-ish as I'm attending the MIGARS conference this week (have already received thanks from a couple of PyGMT users here, so I'll pass that on 😃).

pygmt/clib/session.py Outdated Show resolved Hide resolved
pygmt/clib/session.py Outdated Show resolved Hide resolved
pygmt/datatypes/grid.py Outdated Show resolved Hide resolved
Comment on lines +188 to +193
# Flip the coordinates and data if necessary so that coordinates are ascending.
# `grid.sortby(list(grid.dims))` sometimes causes crashes.
# The solution comes from https://github.com/pydata/xarray/discussions/6695.
for dim in grid.dims:
if grid[dim][0] > grid[dim][1]:
grid = grid.isel({dim: slice(None, None, -1)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate a bit on this part? Sort ascending is probably needed for the x (longitude) dimension. But for the y (latitude) dimension, it is quite common to sort in descending order (y going from North Pole +90 to South Pole -90). What is the typical sorting order returned from GMT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think GMT requires ascending order for longitude/latitude. Similar codes are already used at (The codes below use sortby, but we can't use it here due to a mysterious reason (xref #2398 (comment)).)

if any(i < 0 for i in inc): # Sort grid when there are negative increments
inc = [abs(i) for i in inc]
grid = grid.sortby(variables=list(grid.dims), ascending=True)

I tried to find some evidence from the GMT source codes but failed because the codes are so complicated. Perhaps the most straightforward evidence is from following lines (https://github.com/GenericMappingTools/gmt/blob/e493ccc220d9e997abb655cb461142ed2e7cc777/src/gmt_nc.c#L754):

		/* Determine grid step */
		header->inc[GMT_X] = gmt_M_get_inc (GMT, dummy[0], dummy[1], header->n_columns, registration);
		if (gmt_M_is_dnan(header->inc[GMT_X]) || gmt_M_is_zero (header->inc[GMT_X])) header->inc[GMT_X] = 1.0;
		if (header->n_columns == 1) registration = GMT_GRID_PIXEL_REG;	/* The only way to have a grid like that */

		/* Extend x boundaries by half if we found pixel registration */
		if (registration == GMT_GRID_NODE_REG && header->registration == GMT_GRID_PIXEL_REG)
			header->wesn[XLO] = dummy[0] - header->inc[GMT_X] / 2.0,
			header->wesn[XHI] = dummy[1] + header->inc[GMT_X] / 2.0;
		else	/* Use as is */
			header->wesn[XLO] = dummy[0], header->wesn[XHI] = dummy[1];

in which dummy[0] and dummy[1] is the min/max value of the x vector. For pixel registration, GMT extends the x boundaries by half a grid. The codes only make sense when head->inc[GMT_X] is positive.

Copy link
Member

@weiji14 weiji14 Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, from what I can tell, this means GMT expects grids with coordinates sorted in ascending order. But here in the to_dataarray() method, we're going from GMT virtualfile to xarray.DataArray, so is this sorting still needed or can we return the grid as is? Maybe the sort ascending logic is only needed in virtualfile_from_grid (which is implemented in the dataarray_to_matrix conversion function already)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But here in the to_dataarray() method, we're going from GMT virtualfile to xarray.DataArray, so is this sorting still needed or can we return the grid as is? Maybe the sort ascending logic is only needed in virtualfile_from_grid (which is implemented in the dataarray_to_matrix conversion function already)?

Hmmm, your comments make more sense. I tried to remove these sorting codes locally. Here is the output:

In [1]: import xarray as xr

In [2]: from pygmt import which

In [3]: path = which("@earth_relief_01d_g", download="a")

In [4]: grid1 = xr.load_dataarray(path)

In [6]: from pygmt.clib import Session
   ...: with Session() as lib:
   ...:     with lib.virtualfile_out(kind="grid") as voutgrd:
   ...:         lib.call_module("read", f"{path} {voutgrd} -Tg")
   ...:         grid = lib.read_virtualfile(voutgrd, kind="grid")
   ...:         grid2 = grid.contents.to_dataarray()
   ...: 

In [7]: grid1
Out[7]: 
<xarray.DataArray 'z' (lat: 181, lon: 361)> Size: 523kB
array([[ 2865. ,  2865. ,  2865. , ...,  2865. ,  2865. ,  2865. ],
       [ 3088. ,  3087.5,  3087. , ...,  3088.5,  3088. ,  3088. ],
       [ 3100.5,  3100.5,  3101. , ...,  3101.5,  3101. ,  3100.5],
       ...,
       [-3745.5, -3729. , -3722.5, ..., -3734. , -3742. , -3745.5],
       [-2940. , -2945. , -2951. , ..., -2895.5, -2921.5, -2940. ],
       [-3861. , -3861. , -3861. , ..., -3861. , -3861. , -3861. ]])
Coordinates:
  * lon      (lon) float64 3kB -180.0 -179.0 -178.0 -177.0 ... 178.0 179.0 180.0
  * lat      (lat) float64 1kB -90.0 -89.0 -88.0 -87.0 ... 87.0 88.0 89.0 90.0
Attributes:
    actual_range:  [-7174.  5350.]
    long_name:     elevation (m)

In [8]: grid2
Out[8]: 
<xarray.DataArray 'z' (lat: 181, lon: 361)> Size: 523kB
array([[-3861. , -3861. , -3861. , ..., -3861. , -3861. , -3861. ],
       [-2940. , -2945. , -2951. , ..., -2895.5, -2921.5, -2940. ],
       [-3745.5, -3729. , -3722.5, ..., -3734. , -3742. , -3745.5],
       ...,
       [ 3100.5,  3100.5,  3101. , ...,  3101.5,  3101. ,  3100.5],
       [ 3088. ,  3087.5,  3087. , ...,  3088.5,  3088. ,  3088. ],
       [ 2865. ,  2865. ,  2865. , ...,  2865. ,  2865. ,  2865. ]])
Coordinates:
  * lat      (lat) float64 1kB 90.0 89.0 88.0 87.0 ... -87.0 -88.0 -89.0 -90.0
  * lon      (lon) float64 3kB -180.0 -179.0 -178.0 -177.0 ... 178.0 179.0 180.0
Attributes:
    Conventions:   CF-1.7
    title:         SRTM15 Earth Relief v2.5.5 at 01 arc degree
    history:       
    description:   Reduced by Gaussian Cartesian filtering (314.5 km fullwidt...
    long_name:     elevation (m)
    actual_range:  [-7174.  5350.]

It seems xr.load_dataarray returns an xarray.DataArray with ascending lon/lat coodinates, but now GMT returns descending latitude. I think I was trying to make GMT_GRID.to_dataaray() return the exactly same xarray.DataArray object as xr.load_dataarray. But, maybe there are some misunderstandings here.

Copy link
Member

@weiji14 weiji14 Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems xr.load_dataarray returns an xarray.DataArray with ascending lon/lat coodinates, but now GMT returns descending latitude. I think I was trying to make GMT_GRID.to_dataaray() return the exactly same xarray.DataArray object as xr.load_dataarray. But, maybe there are some misunderstandings here.

Hmm yeah, I don't think we need to match the behaviour of xr.load_dataarray here. I tried with rioxarray.open_rasterio and it gave descending latitude:

import pygmt
import rioxarray
import xarray as xr

path = pygmt.which("@earth_relief_01d_g", download="a")
grid2 = rioxarray.open_rasterio(path)
print(grid2)
# <xarray.DataArray 'z' (band: 1, y: 181, x: 361)> Size: 131kB
# [65341 values with dtype=int16]
# Coordinates:
#   * band         (band) int64 8B 1
#   * x            (x) float64 3kB -180.0 -179.0 -178.0 ... 178.0 179.0 180.0
#   * y            (y) float64 1kB 90.0 89.0 88.0 87.0 ... -87.0 -88.0 -89.0 -90.0
#     spatial_ref  int64 8B 0
# Attributes: (12/20)
#     lat#actual_range:   [-90.  90.]
#     lat#axis:           Y
#     lat#long_name:      latitude
#     lat#standard_name:  latitude
#     lat#units:          degrees_north
#     lon#actual_range:   [-180.  180.]
#     ...                 ...
#     title:              SRTM15 Earth Relief v2.5.5 at 01 arc degree
#     actual_range:       [-7174.  5350.]
#     long_name:          elevation (m)
#     scale_factor:       0.5
#     _FillValue:         -32768
#     add_offset:         0.0

so we could probably remove the sorting in to_dataarray(), assuming this doesn't break any of the unit tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to match the behaviour of xr.load_dataarray here.

Matching the behavior of xr.load_dataarray makes it easier to test the output of the virtualfiles using xr.DataArray.indentical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to match the behaviour of xr.load_dataarray here.

Matching the behavior of xr.load_dataarray makes it easier to test the output of the virtualfiles using xr.DataArray.identical.

The returned attributes are not the same, so xr.testing.assert_identical will fail anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we're using xr.testing.assert_allclose in the tests. If we don't sort the coordinates ascendingly, then we have to update all these tests.

Copy link
Member

@weiji14 weiji14 Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah, I just tried running make test locally after commenting out the sorting code, and got 19 test failures. Most of these are from the expected_grid fixture returning a DataArray with y (latitude) sorted in ascending order:

================================================================================== short test summary info ===================================================================================
FAILED ../pygmt/datatypes/grid.py::pygmt.datatypes.grid._GMT_GRID.to_dataarray
FAILED ../pygmt/tests/test_dimfilter.py::test_dimfilter_no_outgrid - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_grdclip.py::test_grdclip_no_outgrid - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_grdfill.py::test_grdfill_dataarray_out - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_grdfill.py::test_grdfill_asymmetric_pad - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_grdfilter.py::test_grdfilter_dataarray_in_dataarray_out - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_grdgradient.py::test_grdgradient_no_outgrid - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_grdhisteq.py::test_equalize_grid_no_outgrid - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_grdlandmask.py::test_grdlandmask_no_outgrid - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_grdproject.py::test_grdproject_no_outgrid[M10c] - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_grdproject.py::test_grdproject_no_outgrid[EPSG:3395 +width=10] - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_grdproject.py::test_grdproject_no_outgrid[+proj=merc +ellps=WGS84 +units=m +width=10] - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_grdsample.py::test_grdsample_dataarray_out - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_surface.py::test_surface_input_file - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_surface.py::test_surface_input_data_array - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_surface.py::test_surface_input_xyz - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_triangulate.py::test_regular_grid_no_outgrid - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_xyz2grd.py::test_xyz2grd_input_array[array] - AssertionError: Left and right DataArray objects are not close
FAILED ../pygmt/tests/test_xyz2grd.py::test_xyz2grd_input_array[Dataset] - AssertionError: Left and right DataArray objects are not close
======================================================== 19 failed, 657 passed, 4 skipped, 5 xfailed, 2 xpassed, 5 warnings in 58.94s ========================================================

Let's just keep the ascending sort for now then.

pygmt/datatypes/grid.py Outdated Show resolved Hide resolved
@@ -35,8 +33,9 @@ def test_load_remote_dataset_benchmark_with_region():
assert data.gmt.registration == 0
assert data.shape == (11, 21)
# The cpt attribute was added since GMT 6.4.0
if Version(__gmt_version__) >= Version("6.4.0"):
assert data.attrs["cpt"] == "@earth_age.cpt"
# Can't access the cpt attribute using virtual files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so auto colormap won't work after this PR is merged? We'll need to remember to document this in the PyGMT v0.12.0 changelog.

pygmt/clib/session.py Outdated Show resolved Hide resolved
Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @seisman for the wonderful work here, from digging into the GMT C code and handling all the grid header parsing! There are still a few limitations such in regard to the CPT attribute and grdtype detection, as you've mentioned in the top post, but we can address that in the future. The speedup improvements will be worth it 😄

@@ -35,8 +33,9 @@ def test_load_remote_dataset_benchmark_with_region():
assert data.gmt.registration == 0
assert data.shape == (11, 21)
# The cpt attribute was added since GMT 6.4.0
if Version(__gmt_version__) >= Version("6.4.0"):
assert data.attrs["cpt"] == "@earth_age.cpt"
# Can't access the cpt attribute using virtual files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's keep track about this .cpt attribute idea at #499 (comment).

@weiji14 weiji14 added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Apr 16, 2024
@seisman seisman removed the final review call This PR requires final review and approval from a second reviewer label Apr 16, 2024
@seisman seisman merged commit 809880c into main Apr 16, 2024
20 of 21 checks passed
@seisman seisman deleted the datatypes/gmtgrid branch April 16, 2024 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants