Add an internal function to load GMT remote datasets #2200

willschlitzer · 2022-11-20T19:45:43Z

As mentioned in this comment, this adds an internal function that can be used to load a grid file from a remote dataset.

Reminders

Run make format and make check to make sure the code follows the style guide.
Add tests for new features or tests that would have caught the bug that you're fixing.
Add new public functions/methods/classes to doc/api/index.rst.
Write detailed docstrings for all functions/methods.
If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
If adding new functionality, add an example to docstrings or tutorials.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash commands are:

/format: automatically format and lint the code
/test-gmt-dev: run full tests on the latest GMT development version

…rth_age.py

…ets_earth_relief.py

…d_earth_dataset.py

seisman · 2022-11-21T15:32:05Z

I'm thinking if we should define a class (or a namedtuple) which stores all metadata of a remote dataset. For example, the object for the earth_relief dataset should be like this:

from collections import namedtuple

Dataset = namedtuple(
    "Dataset",
    [
        "name",
        "prefix",
        "long_name",
        "units",
        "tiled_resolutions",
        "non_tiled_resolutions",
        "pixel_only_resolutions",
        "gridline_only_resolutions",
        "vertical_datum",
        "horizontal_datum",
    ],
)

datasets = {
    "earth_relief": Dataset(
        name="elevation",
        prefix="earth_relief",
        long_name="elevation relative to the geoid",
        units="meters",
        vertical_datum="EMG96",
        horizontal_datum="WGS84",
        tiled_resolutions=[
            "05m",
            "04m",
            "03m",
            "02m",
            "01m",
            "30s",
            "15s",
            "03s",
            "01s",
        ],
        non_tiled_resolutions=["01d", "30m", "20m", "15m", "10m", "06m"],
        pixel_only_resolutions=["15s"],
        gridline_only_resolutions=["03s", "01s"],
    )
}

print(datasets["earth_relief"])

willschlitzer · 2022-11-21T15:39:43Z

I'm thinking if we should define a class (or a namedtuple) which stores all metadata of a remote dataset. For example, the object for the earth_relief dataset should be like this:

Not opposed to creating a named tuple or a dictionary for this. Creating a class for this seems unnecessarily complicated in my opinion, but I'm open to discussion!

Do all of the gridded datasets have the same attribute fields (e.g. long name, units, vertical datum, horizontal datum)?

seisman · 2022-11-22T02:09:12Z

Do all of the gridded datasets have the same attribute fields (e.g. long name, units, vertical datum, horizontal datum)?

I'm not sure about it. Perhaps these fields can be set to None if they make no sense to a dataset.

maxrjones · 2022-11-22T18:43:12Z

I like the idea of using NamedTuple for storing the metadata. The access patterns for metadata could be simplified if the resolution information was linked to the registration and tiling info, for example:

from typing import NamedTuple, Dict

class Resolution(NamedTuple):
    registrations: list[str]
    tiled: bool

class Dataset(NamedTuple):
    name: str
    prefix: str
    long_name: str
    units: str
    resolutions: Dict[str, Resolution]    
    vertical_datum: str
    horizontal_datum: str

earth_relief_resolutions={
    "01d": Resolution(["pixel", "gridline"], False),
    "30m": Resolution(["pixel", "gridline"], False),
    "20m": Resolution(["pixel", "gridline"], False),
    "15m": Resolution(["pixel", "gridline"], False),
    "10m": Resolution(["pixel", "gridline"], False),
    "06m": Resolution(["pixel", "gridline"], False),
    "05m": Resolution(["pixel", "gridline"], True),
    "04m": Resolution(["pixel", "gridline"], True),
    "03m": Resolution(["pixel", "gridline"], True),
    "02m": Resolution(["pixel", "gridline"], True),
    "01m": Resolution(["pixel", "gridline"], True),
    "30s": Resolution(["pixel", "gridline"], True),
    "15s": Resolution(["pixel"], True),
    "03s": Resolution(["gridline"], True),
    "01s": Resolution(["gridline"], True),
}


datasets = {
    "earth_relief": Dataset(
        name="elevation",
        prefix="earth_relief",
        long_name="elevation relative to the geoid",
        units="meters",
        vertical_datum="EMG96",
        horizontal_datum="WGS84",
        resolutions=earth_relief_resolutions
    )
}

This would allow logic like this:

if pixel_only_resolutions:
    if registration == "gridline" and resolution in pixel_only_resolutions:
        raise GMTInvalidInput(
            f"{resolution} resolution is only available in pixel registration."
        )
if gridline_only_resolutions:
    if registration == "pixel" and resolution in gridline_only_resolutions:
        raise GMTInvalidInput(
            f"{resolution} resolution is only available in gridline registration."

to be simplified to something like

if registration not in datasets[dataset_prefix].resolutions[resolution].registrations:
    raise GMTInvalidInput(
        f"{registration} registration is not available for the {resolution} {dataset_prefix} dataset. Only {datasets[dataset_prefix].resolutions[resolution].registrations[0]} registration is available."
    )

seisman · 2022-11-23T04:14:57Z

earth_relief_resolutions={
    "01d": Resolution(["pixel", "gridline"], False),
    "30m": Resolution(["pixel", "gridline"], False),
    "20m": Resolution(["pixel", "gridline"], False),
    "15m": Resolution(["pixel", "gridline"], False),
    "10m": Resolution(["pixel", "gridline"], False),
    "06m": Resolution(["pixel", "gridline"], False),
    "05m": Resolution(["pixel", "gridline"], True),
    "04m": Resolution(["pixel", "gridline"], True),
    "03m": Resolution(["pixel", "gridline"], True),
    "02m": Resolution(["pixel", "gridline"], True),
    "01m": Resolution(["pixel", "gridline"], True),
    "30s": Resolution(["pixel", "gridline"], True),
    "15s": Resolution(["pixel"], True),
    "03s": Resolution(["gridline"], True),
    "01s": Resolution(["gridline"], True),
}

I like it!

willschlitzer · 2022-11-23T14:41:36Z

@maxrjones and @seisman Would it be alright to split these changes between two PRs? My thought is the first PR is the creation of load_earth_dataset.py (AKA what's been done already), and the second PR moves all of the information to named tuples.

seisman · 2022-11-23T23:37:06Z

@maxrjones and @seisman Would it be alright to split these changes between two PRs? My thought is the first PR is the creation of load_earth_dataset.py (AKA what's been done already), and the second PR moves all of the information to named tuples.

I feel it's better to have all things done in a single PR, because if we use namedtuple, the current _load_earth_dataset functions need a lot of changes.

pygmt/datasets/load_earth_dataset.py

pygmt/datasets/earth_age.py

pygmt/datasets/earth_relief.py

pygmt/datasets/earth_age.py

pygmt/datasets/earth_relief.py

Co-authored-by: Dongdong Tian <[email protected]>

seisman · 2022-12-04T15:53:49Z

Here are some issues that I found when I review the new codes. I think these issues can be addressed in separate PRs after this PR is merged.

For the load_earth_relief function:

Add from pygmt.datasets import load_earth_relief at the beginning of the examples, so that users can copy and run these examples
In the function definition, I think it makes more sense to put data_source before use_srtm, because use_srtm is less commonly used and use_srtm only works for data_source="igpp".
In the docstrings, need to explain what earth_relief_type means.
The code says synbath, gebco and gebcosi are not available for GMT 6.3.0. I haven't tested it, but if I remember it correctly, synbath should work with GMT 6.3.0

For the load_earth_age function:

Need to add examples in the docstrings

pygmt/datasets/load_earth_dataset.py

pygmt/datasets/load_remote_dataset.py

seisman

Looks great!

weiji14

Nice work! Just noticed one small typo carried over from previous functions, otherwise all good from me.

pygmt/tests/test_datasets_earth_relief.py

pygmt/datasets/load_remote_dataset.py

Co-authored-by: Dongdong Tian <[email protected]> Co-authored-by: Wei Ji <[email protected]>

seisman · 2022-12-05T14:46:14Z

FYI, I remove the "feature" label and added the "maintenance" label, because this PR adds an internal function that users don't care about.

…ools#2200) Co-authored-by: Dongdong Tian <[email protected]> Co-authored-by: Wei Ji <[email protected]>

willschlitzer added 5 commits November 20, 2022 10:45

create load_earth_dataset.py and move functions over from earth_age.py

188dd58

remove unused imports

79560e3

shorten docstring

f171dcc

add error handling for pixel/gridline only registrations

39d8b25

update earth_relief.py to use load_earth_dataset

31a728a

willschlitzer added the feature Brand new feature label Nov 20, 2022

willschlitzer self-assigned this Nov 20, 2022

seisman added this to the 0.8.0 milestone Nov 20, 2022

willschlitzer added 12 commits November 21, 2022 08:09

change pylint disable argument

4ffbe82

Merge branch 'main' into load-remote-dataset/load-earth-dataset

ffcf9f4

add test for incompatible registration/resolution to test_datasets_ea…

16059d5

…rth_age.py

fix test name

2780568

add test_earth_relief_incorrect_resolution_registration to test_datas…

6a08ca2

…ets_earth_relief.py

modify test to check that grid name and attributes are properly set

e2378c3

modify test to check that grid name and attributes are properly set

d81a62f

run make format

1bf3060

Merge branch 'main' into load-remote-dataset/load-earth-dataset

46dedf0

remove error handling from earth_relief.py that has been moved to loa…

9dba27c

…d_earth_dataset.py

add synbath to parameters for test

cde524f

Merge branch 'main' into load-remote-dataset/load-earth-dataset

f2e6bdf

seisman mentioned this pull request Nov 23, 2022

load_earth_relief: Add the support of data source 'GEBCOSI' #2192

Merged

6 tasks

willschlitzer added 2 commits November 25, 2022 09:44

Merge branch 'main' into load-remote-dataset/load-earth-dataset

40ce1ed

create classes for Resolutions and Dataset info in load_earth_dataset.py

83f1f74

seisman reviewed Dec 4, 2022

View reviewed changes

Apply suggestions from code review

4d3f7d3

Co-authored-by: Dongdong Tian <[email protected]>

run make format

7b10624

seisman reviewed Dec 4, 2022

View reviewed changes

pygmt/datasets/load_earth_dataset.py Outdated Show resolved Hide resolved

rename load_remote_dataset.py

7be9eea

seisman reviewed Dec 4, 2022

View reviewed changes

pygmt/datasets/load_earth_dataset.py Outdated Show resolved Hide resolved

willschlitzer added 2 commits December 4, 2022 12:13

change name to extra attributes

8302535

change extra attributes docstring

0051c5a

seisman reviewed Dec 5, 2022

View reviewed changes

pygmt/datasets/load_remote_dataset.py Outdated Show resolved Hide resolved

seisman approved these changes Dec 5, 2022

View reviewed changes

seisman added the final review call This PR requires final review and approval from a second reviewer label Dec 5, 2022

seisman requested a review from a team December 5, 2022 01:28

weiji14 reviewed Dec 5, 2022

View reviewed changes

pygmt/tests/test_datasets_earth_relief.py Outdated Show resolved Hide resolved

pygmt/datasets/load_remote_dataset.py Outdated Show resolved Hide resolved

weiji14 approved these changes Dec 5, 2022

View reviewed changes

michaelgrund approved these changes Dec 5, 2022

View reviewed changes

willschlitzer and others added 2 commits December 5, 2022 07:45

Apply suggestions from code review

cc8249f

Co-authored-by: Dongdong Tian <[email protected]> Co-authored-by: Wei Ji <[email protected]>

Merge branch 'main' into load-remote-dataset/load-earth-dataset

193b4f2

seisman removed the final review call This PR requires final review and approval from a second reviewer label Dec 5, 2022

seisman changed the title ~~Add a function to load Earth grid remote datasets~~ Add an internal function to load GMT remote datasets Dec 5, 2022

seisman merged commit 54a8559 into main Dec 5, 2022

seisman deleted the load-remote-dataset/load-earth-dataset branch December 5, 2022 14:45

seisman added maintenance Boring but important stuff for the core devs and removed feature Brand new feature labels Dec 5, 2022

This was referenced Dec 5, 2022

Improve the load_earth_relief and load_earth_age functions #2225

Closed

Add load_earth_magnetic_anomaly function for Earth magnetic anomaly dataset #2196

Merged

sixy6e pushed a commit to sixy6e/pygmt that referenced this pull request Dec 21, 2022

Add an internal function to load GMT remote datasets (GenericMappingT…

7587a5c

…ools#2200) Co-authored-by: Dongdong Tian <[email protected]> Co-authored-by: Wei Ji <[email protected]>

weiji14 mentioned this pull request Dec 28, 2022

Changelog entry for v0.8.0 #2272

Merged

15 tasks

willschlitzer mentioned this pull request Dec 28, 2022

More GMT data sources in load_earth_relief and other load_earth functions #1786

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an internal function to load GMT remote datasets #2200

Add an internal function to load GMT remote datasets #2200

willschlitzer commented Nov 20, 2022

seisman commented Nov 21, 2022

willschlitzer commented Nov 21, 2022

seisman commented Nov 22, 2022

maxrjones commented Nov 22, 2022

seisman commented Nov 23, 2022

willschlitzer commented Nov 23, 2022

seisman commented Nov 23, 2022

seisman commented Dec 4, 2022

seisman left a comment

weiji14 left a comment

seisman commented Dec 5, 2022

Add an internal function to load GMT remote datasets #2200

Add an internal function to load GMT remote datasets #2200

Conversation

willschlitzer commented Nov 20, 2022

seisman commented Nov 21, 2022

willschlitzer commented Nov 21, 2022

seisman commented Nov 22, 2022

maxrjones commented Nov 22, 2022

seisman commented Nov 23, 2022

willschlitzer commented Nov 23, 2022

seisman commented Nov 23, 2022

seisman commented Dec 4, 2022

seisman left a comment

Choose a reason for hiding this comment

weiji14 left a comment

Choose a reason for hiding this comment

seisman commented Dec 5, 2022