Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an internal function to load GMT remote datasets #2200

Merged
merged 51 commits into from
Dec 5, 2022

Conversation

willschlitzer
Copy link
Contributor

As mentioned in this comment, this adds an internal function that can be used to load a grid file from a remote dataset.

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
  • If adding new functionality, add an example to docstrings or tutorials.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash commands are:

  • /format: automatically format and lint the code
  • /test-gmt-dev: run full tests on the latest GMT development version

@willschlitzer willschlitzer added the feature Brand new feature label Nov 20, 2022
@willschlitzer willschlitzer self-assigned this Nov 20, 2022
@seisman seisman added this to the 0.8.0 milestone Nov 20, 2022
@seisman
Copy link
Member

seisman commented Nov 21, 2022

I'm thinking if we should define a class (or a namedtuple) which stores all metadata of a remote dataset. For example, the object for the earth_relief dataset should be like this:

from collections import namedtuple

Dataset = namedtuple(
    "Dataset",
    [
        "name",
        "prefix",
        "long_name",
        "units",
        "tiled_resolutions",
        "non_tiled_resolutions",
        "pixel_only_resolutions",
        "gridline_only_resolutions",
        "vertical_datum",
        "horizontal_datum",
    ],
)

datasets = {
    "earth_relief": Dataset(
        name="elevation",
        prefix="earth_relief",
        long_name="elevation relative to the geoid",
        units="meters",
        vertical_datum="EMG96",
        horizontal_datum="WGS84",
        tiled_resolutions=[
            "05m",
            "04m",
            "03m",
            "02m",
            "01m",
            "30s",
            "15s",
            "03s",
            "01s",
        ],
        non_tiled_resolutions=["01d", "30m", "20m", "15m", "10m", "06m"],
        pixel_only_resolutions=["15s"],
        gridline_only_resolutions=["03s", "01s"],
    )
}

print(datasets["earth_relief"])

@willschlitzer
Copy link
Contributor Author

I'm thinking if we should define a class (or a namedtuple) which stores all metadata of a remote dataset. For example, the object for the earth_relief dataset should be like this:

Not opposed to creating a named tuple or a dictionary for this. Creating a class for this seems unnecessarily complicated in my opinion, but I'm open to discussion!

Do all of the gridded datasets have the same attribute fields (e.g. long name, units, vertical datum, horizontal datum)?

@seisman
Copy link
Member

seisman commented Nov 22, 2022

Do all of the gridded datasets have the same attribute fields (e.g. long name, units, vertical datum, horizontal datum)?

I'm not sure about it. Perhaps these fields can be set to None if they make no sense to a dataset.

@maxrjones
Copy link
Member

I like the idea of using NamedTuple for storing the metadata. The access patterns for metadata could be simplified if the resolution information was linked to the registration and tiling info, for example:

from typing import NamedTuple, Dict

class Resolution(NamedTuple):
    registrations: list[str]
    tiled: bool

class Dataset(NamedTuple):
    name: str
    prefix: str
    long_name: str
    units: str
    resolutions: Dict[str, Resolution]    
    vertical_datum: str
    horizontal_datum: str

earth_relief_resolutions={
    "01d": Resolution(["pixel", "gridline"], False),
    "30m": Resolution(["pixel", "gridline"], False),
    "20m": Resolution(["pixel", "gridline"], False),
    "15m": Resolution(["pixel", "gridline"], False),
    "10m": Resolution(["pixel", "gridline"], False),
    "06m": Resolution(["pixel", "gridline"], False),
    "05m": Resolution(["pixel", "gridline"], True),
    "04m": Resolution(["pixel", "gridline"], True),
    "03m": Resolution(["pixel", "gridline"], True),
    "02m": Resolution(["pixel", "gridline"], True),
    "01m": Resolution(["pixel", "gridline"], True),
    "30s": Resolution(["pixel", "gridline"], True),
    "15s": Resolution(["pixel"], True),
    "03s": Resolution(["gridline"], True),
    "01s": Resolution(["gridline"], True),
}


datasets = {
    "earth_relief": Dataset(
        name="elevation",
        prefix="earth_relief",
        long_name="elevation relative to the geoid",
        units="meters",
        vertical_datum="EMG96",
        horizontal_datum="WGS84",
        resolutions=earth_relief_resolutions
    )
}

This would allow logic like this:

if pixel_only_resolutions:
    if registration == "gridline" and resolution in pixel_only_resolutions:
        raise GMTInvalidInput(
            f"{resolution} resolution is only available in pixel registration."
        )
if gridline_only_resolutions:
    if registration == "pixel" and resolution in gridline_only_resolutions:
        raise GMTInvalidInput(
            f"{resolution} resolution is only available in gridline registration."

to be simplified to something like

if registration not in datasets[dataset_prefix].resolutions[resolution].registrations:
    raise GMTInvalidInput(
        f"{registration} registration is not available for the {resolution} {dataset_prefix} dataset. Only {datasets[dataset_prefix].resolutions[resolution].registrations[0]} registration is available."
    )

@seisman
Copy link
Member

seisman commented Nov 23, 2022

earth_relief_resolutions={
    "01d": Resolution(["pixel", "gridline"], False),
    "30m": Resolution(["pixel", "gridline"], False),
    "20m": Resolution(["pixel", "gridline"], False),
    "15m": Resolution(["pixel", "gridline"], False),
    "10m": Resolution(["pixel", "gridline"], False),
    "06m": Resolution(["pixel", "gridline"], False),
    "05m": Resolution(["pixel", "gridline"], True),
    "04m": Resolution(["pixel", "gridline"], True),
    "03m": Resolution(["pixel", "gridline"], True),
    "02m": Resolution(["pixel", "gridline"], True),
    "01m": Resolution(["pixel", "gridline"], True),
    "30s": Resolution(["pixel", "gridline"], True),
    "15s": Resolution(["pixel"], True),
    "03s": Resolution(["gridline"], True),
    "01s": Resolution(["gridline"], True),
}

I like it!

@willschlitzer
Copy link
Contributor Author

@maxrjones and @seisman Would it be alright to split these changes between two PRs? My thought is the first PR is the creation of load_earth_dataset.py (AKA what's been done already), and the second PR moves all of the information to named tuples.

@seisman
Copy link
Member

seisman commented Nov 23, 2022

@maxrjones and @seisman Would it be alright to split these changes between two PRs? My thought is the first PR is the creation of load_earth_dataset.py (AKA what's been done already), and the second PR moves all of the information to named tuples.

I feel it's better to have all things done in a single PR, because if we use namedtuple, the current _load_earth_dataset functions need a lot of changes.

pygmt/datasets/load_earth_dataset.py Outdated Show resolved Hide resolved
pygmt/datasets/earth_age.py Outdated Show resolved Hide resolved
pygmt/datasets/earth_relief.py Outdated Show resolved Hide resolved
pygmt/datasets/earth_age.py Outdated Show resolved Hide resolved
pygmt/datasets/earth_relief.py Outdated Show resolved Hide resolved
@seisman
Copy link
Member

seisman commented Dec 4, 2022

Here are some issues that I found when I review the new codes. I think these issues can be addressed in separate PRs after this PR is merged.

For the load_earth_relief function:

  1. Add from pygmt.datasets import load_earth_relief at the beginning of the examples, so that users can copy and run these examples
  2. In the function definition, I think it makes more sense to put data_source before use_srtm, because use_srtm is less commonly used and use_srtm only works for data_source="igpp".
  3. In the docstrings, need to explain what earth_relief_type means.
  4. The code says synbath, gebco and gebcosi are not available for GMT 6.3.0. I haven't tested it, but if I remember it correctly, synbath should work with GMT 6.3.0

For the load_earth_age function:

  • Need to add examples in the docstrings

Copy link
Member

@seisman seisman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@seisman seisman added the final review call This PR requires final review and approval from a second reviewer label Dec 5, 2022
@seisman seisman requested a review from a team December 5, 2022 01:28
Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Just noticed one small typo carried over from previous functions, otherwise all good from me.

pygmt/tests/test_datasets_earth_relief.py Outdated Show resolved Hide resolved
pygmt/datasets/load_remote_dataset.py Outdated Show resolved Hide resolved
@seisman seisman removed the final review call This PR requires final review and approval from a second reviewer label Dec 5, 2022
@seisman seisman changed the title Add a function to load Earth grid remote datasets Add an internal function to load GMT remote datasets Dec 5, 2022
@seisman seisman merged commit 54a8559 into main Dec 5, 2022
@seisman seisman deleted the load-remote-dataset/load-earth-dataset branch December 5, 2022 14:45
@seisman seisman added maintenance Boring but important stuff for the core devs and removed feature Brand new feature labels Dec 5, 2022
@seisman
Copy link
Member

seisman commented Dec 5, 2022

FYI, I remove the "feature" label and added the "maintenance" label, because this PR adds an internal function that users don't care about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants