-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POC: Wrap GMT_Read_Data and read datasets/grids/images into GMT data container #3318
Conversation
c66965e
to
316390b
Compare
def read_data( | ||
self, | ||
family: str, | ||
geometry: str, | ||
mode: str, | ||
wesn: Sequence[float] | None, | ||
infile: str, | ||
data=None, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definition of the method follows the definition of the API function GMT_Read_Data
. We need to call it like this:
lib.read_data("GMT_IS_DATASET", "GMT_IS_PLP", "GMT_READ_NORMAL", None, infile, None)
lib.read_data("GMT_IS_GRID", "GMT_IS_SURFACE", "GMT_READ_NORMAL", None, infile, None)
lib.read_data("GMT_IS_IMAGE", "GMT_IS_SURFACE", "GMT_READ_NORMAL", None, infile, None)
but they look really boring and weird.
I prefer to refactor the function definition like this:
lib.read_data(infile, kind="dataset")
lib.read_data(infile, kind="grid")
lib.read_data(infile, kind="image")
similar to the syntax of the gmt read infile outfile -Tc|d|g|i|p|u
syntax of the special read
/write
module. For reference, the read
module mainly calls the gmt_copy
function and the gmt_copy
function calls GMT_Read_Data
with different arguments based on the data family.
Anyway, this should be discussed in more detail later.
@@ -1697,7 +1769,9 @@ def virtualfile_from_data( | |||
|
|||
@contextlib.contextmanager | |||
def virtualfile_out( | |||
self, kind: Literal["dataset", "grid"] = "dataset", fname: str | None = None | |||
self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes below are copied from #3128 so can be ignored when reviewing this PR.
@@ -4,3 +4,4 @@ | |||
|
|||
from pygmt.datatypes.dataset import _GMT_DATASET | |||
from pygmt.datatypes.grid import _GMT_GRID | |||
from pygmt.datatypes.image import _GMT_IMAGE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from #3128
@@ -203,7 +203,8 @@ def data_attrs(self) -> dict[str, Any]: | |||
Attributes for the data variable from the grid header. | |||
""" | |||
attrs: dict[str, Any] = {} | |||
attrs["Conventions"] = "CF-1.7" | |||
if self.type == 18: # Grid file format: ns = GMT netCDF format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from #3128
@@ -0,0 +1,182 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from #3128 and can be ignored when reviwing this PR.
assert data[3][4] == 250.0 | ||
|
||
|
||
def test_clib_read_data_grid_two_steps(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GMT_CONTAINER_ONLY
& GMT_DATA_ONLY
: Read the grid header first and then read the grid data.
assert data[3][4] == 250.0 | ||
|
||
|
||
def test_clib_read_data_image_as_grid(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read an image @earth_day_01d_p
into a GMT_GRID
container. The header.n_bands=1
, so only the first band is read.
assert image.data | ||
|
||
|
||
def test_clib_read_data_image(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read the @earth_day_01d_p
file into the GMT_IMAGE
container. Note that header.n_bands=3
!
assert image.data | ||
|
||
|
||
def test_clib_read_data_image_two_steps(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two-step reading for images.
assert image.data is not None # The data is read | ||
|
||
|
||
def test_clib_read_data_grid_as_image(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read a grid @earth_relief_01d_p
into a GMT_IMAGE
container. header.n_bands=1
.
Tests test_clib_read_data_image
and test_clib_read_data_grid_as_image
tell us that, we can call lib.read_data
with the same arguments to read either a grid or an image, and then we can determine the data kind based on n_bands
.
Description of proposed changes
This PR wraps the GMT API function GMT_Read_Data.
Currently, this PR contains 4 commits:
GMT_Read_Data
function_GMT_DATASET
/_GMT_GRID
_GMT_IMAGE
(with minor fixes)All new tests pass and the most important thing I learned is, that for an input grid/image, we can always read it into a GMT_IMAGE container. For grid,
header.n_bands=1
and for image,header.n_bands=3
(or any other values). TheGMT_Read_Data
API function can read a grid/image in either one step or two steps. Here "two steps" means reading the header first and then reading the data. Reading the header only is very efficient (a few milliseconds) even for huge grid/image files. Thus, we can read the grid/image header in and checkheader.n_bands
to determine the input data kind, which addresses the concerns in ##3115 (comment).So, the next steps are:
GMT_Read_Data
wrapper and the tests for datasets/grids and finalize the wrapper Wrap the GMT API function GMT_Read_Data to read data into GMT data containers #3324GMT_IMAGE
wrapper (line 1-71 ofpygmt/datatypes/image.py
in PR GMT_IMAGE: Implement the GMT_IMAGE.to_dataarray method for 3-band images #3128, even without any doctests), and another PR with the remaining WIP codes. Then review/merge the first PR and work on the 2nd PR in the future.I plan to work on the above steps in separate PRs and keep this PR unchanged so that we can trace back to this PR in the future.