Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC: Wrap GMT_Read_Data and read datasets/grids/images into GMT data container #3318

Closed
wants to merge 4 commits into from

Conversation

seisman
Copy link
Member

@seisman seisman commented Jul 8, 2024

Description of proposed changes

This PR wraps the GMT API function GMT_Read_Data.

Currently, this PR contains 4 commits:

All new tests pass and the most important thing I learned is, that for an input grid/image, we can always read it into a GMT_IMAGE container. For grid, header.n_bands=1 and for image, header.n_bands=3(or any other values). The GMT_Read_Data API function can read a grid/image in either one step or two steps. Here "two steps" means reading the header first and then reading the data. Reading the header only is very efficient (a few milliseconds) even for huge grid/image files. Thus, we can read the grid/image header in and check header.n_bands to determine the input data kind, which addresses the concerns in ##3115 (comment).

So, the next steps are:

I plan to work on the above steps in separate PRs and keep this PR unchanged so that we can trace back to this PR in the future.

@seisman seisman force-pushed the api/read-data branch 2 times, most recently from c66965e to 316390b Compare July 8, 2024 13:54
@seisman seisman added the needs review This PR has higher priority and needs review. label Jul 8, 2024
Comment on lines +1072 to +1080
def read_data(
self,
family: str,
geometry: str,
mode: str,
wesn: Sequence[float] | None,
infile: str,
data=None,
):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of the method follows the definition of the API function GMT_Read_Data. We need to call it like this:

lib.read_data("GMT_IS_DATASET", "GMT_IS_PLP", "GMT_READ_NORMAL", None, infile, None)
lib.read_data("GMT_IS_GRID", "GMT_IS_SURFACE", "GMT_READ_NORMAL", None, infile, None)
lib.read_data("GMT_IS_IMAGE", "GMT_IS_SURFACE", "GMT_READ_NORMAL", None, infile, None)

but they look really boring and weird.

I prefer to refactor the function definition like this:

lib.read_data(infile, kind="dataset")
lib.read_data(infile, kind="grid")
lib.read_data(infile, kind="image")

similar to the syntax of the gmt read infile outfile -Tc|d|g|i|p|u syntax of the special read/write module. For reference, the read module mainly calls the gmt_copy function and the gmt_copy function calls GMT_Read_Data with different arguments based on the data family.

Anyway, this should be discussed in more detail later.

@@ -1697,7 +1769,9 @@ def virtualfile_from_data(

@contextlib.contextmanager
def virtualfile_out(
self, kind: Literal["dataset", "grid"] = "dataset", fname: str | None = None
self,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes below are copied from #3128 so can be ignored when reviewing this PR.

@@ -4,3 +4,4 @@

from pygmt.datatypes.dataset import _GMT_DATASET
from pygmt.datatypes.grid import _GMT_GRID
from pygmt.datatypes.image import _GMT_IMAGE
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from #3128

@@ -203,7 +203,8 @@ def data_attrs(self) -> dict[str, Any]:
Attributes for the data variable from the grid header.
"""
attrs: dict[str, Any] = {}
attrs["Conventions"] = "CF-1.7"
if self.type == 18: # Grid file format: ns = GMT netCDF format
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from #3128

@@ -0,0 +1,182 @@
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from #3128 and can be ignored when reviwing this PR.

assert data[3][4] == 250.0


def test_clib_read_data_grid_two_steps():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GMT_CONTAINER_ONLY & GMT_DATA_ONLY: Read the grid header first and then read the grid data.

assert data[3][4] == 250.0


def test_clib_read_data_image_as_grid():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read an image @earth_day_01d_p into a GMT_GRID container. The header.n_bands=1, so only the first band is read.

assert image.data


def test_clib_read_data_image():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read the @earth_day_01d_p file into the GMT_IMAGE container. Note that header.n_bands=3!

assert image.data


def test_clib_read_data_image_two_steps():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two-step reading for images.

assert image.data is not None # The data is read


def test_clib_read_data_grid_as_image():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read a grid @earth_relief_01d_p into a GMT_IMAGE container. header.n_bands=1.

Tests test_clib_read_data_image and test_clib_read_data_grid_as_image tell us that, we can call lib.read_data with the same arguments to read either a grid or an image, and then we can determine the data kind based on n_bands.

@seisman seisman changed the title POC + WIP: Wrap GMT_Read_Data POC: Wrap GMT_Read_Data and read datasets/grids/images into GMT data container Jul 10, 2024
@seisman seisman removed the needs review This PR has higher priority and needs review. label Jul 17, 2024
@seisman seisman closed this Jul 24, 2024
@seisman seisman deleted the api/read-data branch July 24, 2024 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant