Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a gmt xarray accessor to store metadata from grdinfo #499

Open
weiji14 opened this issue Jun 29, 2020 · 9 comments
Open

Make a gmt xarray accessor to store metadata from grdinfo #499

weiji14 opened this issue Jun 29, 2020 · 9 comments
Labels
feature request New feature wanted help wanted Helping hands are appreciated longterm Long standing issues that need to be resolved

Comments

@weiji14
Copy link
Member

weiji14 commented Jun 29, 2020

Description of the desired feature

Metadata (data about data) is important, and there are specific kinds of metadata used by GMT when it decides how to plot a grid, such as:

Instead of explicitly wrapping gmt info or gmt grdinfo as in #147, why not allow the user to just access it via grid.gmt.someattribute? This would make use of xarray accessors and call grdinfo under the hood to retrieve those metadata properties.

Pros:

Cons:

  • We would be designing yet another standard, as per https://xkcd.com/927/
  • Would involve some development effort
  • Might not be as discoverable as simply using grid.attrs["someattribute"] or pygmt.grdinfo(grid), but that's just a documentation thing.

In reality, this gmt accessor can do more than just hold metadata. We could extend it to do grid.gmt.plot and more. But let's start with having it hold the metadata we want, and then build on top of it from there.

References:

Other xarray accessor examples:

Are you willing to help implement and maintain this feature? Yes

@weiji14 weiji14 added the feature request New feature wanted label Jun 29, 2020
@weiji14 weiji14 added the help wanted Helping hands are appreciated label Jul 11, 2020
@weiji14 weiji14 added the longterm Long standing issues that need to be resolved label Feb 17, 2021
@weiji14 weiji14 added the scipy-sprint Things to work on during the SciPy sprint! label May 2, 2021
@seisman seisman removed the scipy-sprint Things to work on during the SciPy sprint! label Feb 17, 2023
@seisman
Copy link
Member

seisman commented Feb 17, 2023

GMTDataArrayAccessor is powerful but it's important to know that xarray accessors have some limitations (https://docs.xarray.dev/en/stable/internals/extending-xarray.html):

Accessors are created once per DataArray and Dataset instance. New instances, like those created from arithmetic operations or when accessing a DataArray from a Dataset (ex. ds[var_name]), will have new accessors created.

More in-depth discussions are available at pydata/xarray#3205 and pydata/xarray#3268.

To understand these limitations, here is a simplified version of the GMTDataArrayAccessor xarray accessor. Some print statements are added to show how the xarray accessor works:

import xarray as xr

@xr.register_dataarray_accessor("gmt")
class GMTDataArrayAccessor:
    def __init__(self, obj):
        print("Inialize the gmt accessor")
        self._obj = obj
        self._registration = 0
        self._gtype = 0

    @property
    def registration(self):
        print("Returning registration")
        return self._registration

    @registration.setter
    def registration(self, value):
        print("Setting registration")
        self._registration = value

    @property
    def gtype(self):
        print("Returning gtype")
        return self._gtype

    @gtype.setter
    def gtype(self, value):
        print("Setting gtype")
        self._gtype = value

@seisman
Copy link
Member

seisman commented Feb 17, 2023

The following test shows that accessors are created/initialized after any grid operations (like slice and arithmetic operations):

import numpy as np
import xarray as xp

interval = 2.5
lat = np.arange(90, -90 - interval, -interval)
lon = np.arange(0, 360 + interval, interval)
longrid, latgrid = np.meshgrid(lon, lat)
data = np.sin(np.deg2rad(longrid)) * np.cos(np.deg2rad(latgrid))
grid = xr.DataArray(data, coords=[("latitude", lat), ("longitude", lon)])

print(grid.gmt.registration, grid.gmt.gtype)
grid.gmt.registration = 1
grid.gmt.gtype = 1
print(grid.gmt.registration, grid.gmt.gtype)

sliced_grid = grid[0:30, 0:30]
print(sliced_grid.gmt.registration, sliced_grid.gmt.gtype)

grid *= 2.0                                                                     
print(grid.gmt.registration, grid.gmt.gtype)

grid2 = grid * 2.0
print(grid2.gmt.registration, grid2.gmt.gtype)

The output of the script is:

Inialize the gmt accessor
Returning registration
Returning gtype
0 0
Setting registration
Setting gtype
Returning registration
Returning gtype
1 1
Inialize the gmt accessor
Returning registration
Returning gtype
0 0
Returning registration
Returning gtype
1 1
Inialize the gmt accessor
Returning registration
Returning gtype
0 0

@seisman
Copy link
Member

seisman commented Feb 18, 2023

The following test shows that accesors are created/initialized when accessing a DataArray from a Dataset. (The test is modified from pydata/xarray#3205 (comment))

import xarray as xr
import numpy as np

ds = xr.Dataset({'a': xr.DataArray(np.array([1, 2, 3])), 'b': xr.DataArray(np.array([4, 5, 6]))})

print(ds["a"].gmt.registration, ds["a"].gmt.gtype)

print(ds["a"].gmt.registration, ds["a"].gmt.gtype)

var = ds["a"]
print(var.gmt.registration, var.gmt.gtype)

print(var.gmt.registration, var.gmt.gtype)

The output is:

Inialize the gmt accessor
Returning registration
Inialize the gmt accessor
Returning gtype
0 0
Inialize the gmt accessor
Returning registration
Inialize the gmt accessor
Returning gtype
0 0
Inialize the gmt accessor
Returning registration
Returning gtype
0 0
Returning registration
Returning gtype
0 0

@weiji14
Copy link
Member Author

weiji14 commented Apr 16, 2024

Originally posted by @seisman in #2398 (comment). Keeping track of the .gmt.cpt attribute idea.

Yes, it will break. We may take different actions for input netCDF files or xarray.DataArray object. For xarray.DataArray, we can check whether it has the cpt attribute (makes more sense in the gmt accessor, xref #499); For netCDF files, we need to read the file header (header only so it should be quick) and check the cpt attribute. If given, then we add the same attribute to the output xarray.DataArray/netCDF. But we need to be careful, for example, grdedit -D+c can change the CPT, so the output grid can have different grid for the input grid.

For GMT remote datasets, we probably can add a CPT attribue to the GMTRemoteDataset class (

"earth_age": GMTRemoteDataset(
).

@seisman
Copy link
Member

seisman commented Aug 6, 2024

Here is an idea to improve user experiences with GMT accessor property registration and gtype. I'll take gmt.registration as an example. Currently, gmt.registration can be 0 or 1, and users have to remember that 0 means gridline and 1 means pixel. It would be much more convenient if users could assign "pixel" or "gridline" to it. This can be implemented by storing 0/1 internally (i.e., self._registration) but do the 0->"gridline"/1->"pixel" mapping when the setter/getter method of gmt.registration is called.

Here is an example class to demonstrate the implementation:

class GMTAccessor:

    _registration_mapping = {
        "gridline": 0,
        "pixel": 1,
    }       
    
    @property
    def registration(self):
        return {v: k for k, v in self._registration_mapping.items()}[self._registration]

    @registration.setter
    def registration(self, value):
        valid_values = [*self._registration_mapping.keys(), *self._registration_mapping.values()]
        if value not in valid_values:
            msg = f"Invalid registration value. Must be in {valid_values}."
            raise ValueError(msg)
            
        self._registration = self._registration_mapping.get(value, value)

Here is an example showing how it works. Users can assign the more readable "pixel"/"gridline" to gmt.registration. Of course, 0/1 are still supported for backward compatibility:

obj = GMTAccessor()

obj.registration = "gridline"
print(obj.registration)

obj.registration = "pixel"
print(obj.registration)

obj.registration = 0
print(obj.registration)

obj.registration = 1
print(obj.registration)

The output is:

gridline
pixel
gridline
pixel

The new implementation is still a breaking change. In the old implementation, users can check if gmt.registration == 0, but now they must do gmt.registration == "gridline".

Thoughts?

@weiji14
Copy link
Member Author

weiji14 commented Aug 6, 2024

The new implementation is still a breaking change. In the old implementation, users can check if gmt.registration == 0, but now they must do gmt.registration == "gridline".

How about using an enum so that backwards compatibility is preserved? Something similar to the rasterio.enums module. We could make a pygmt.enums module, and there's a lot of GMT enums we can throw in there as well.

@seisman
Copy link
Member

seisman commented Aug 7, 2024

How about using an enum so that backwards compatibility is preserved? Something similar to the rasterio.enums module. We could make a pygmt.enums module, and there's a lot of GMT enums we can throw in there as well.

It means a steeper learning curve for users. Users have to write codes like:

from pygmt.enums import Registration

da.gmt.registration = Registration.GRIDLINE

@weiji14
Copy link
Member Author

weiji14 commented Aug 7, 2024

How about using an enum so that backwards compatibility is preserved? Something similar to the rasterio.enums module. We could make a pygmt.enums module, and there's a lot of GMT enums we can throw in there as well.

It means a steeper learning curve for users. Users have to write codes like:

from pygmt.enums import Registration

da.gmt.registration = Registration.GRIDLINE

But you can tab-complete the registration and gtype if they were enums 😆 I actually think the enums look much more readable. Can also consider making a StrEnum maybe if you insist on enabling support for da.gmt.registration = "gridline", we could probably support either str/int/enum input types in the setter logic?

@seisman
Copy link
Member

seisman commented Aug 7, 2024

I will see how it works later. Anyway, it won't be in the v0.13.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature wanted help wanted Helping hands are appreciated longterm Long standing issues that need to be resolved
Projects
None yet
Development

No branches or pull requests

2 participants