-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the internal _load_remote_dataset function to simplify datasets' definitions #2917
Conversation
…ataset definitions
if resolution not in dataset.resolutions: | ||
for res in dataset.resolutions: | ||
if res.code == resolution: | ||
valid_registrations = res.registrations | ||
is_tiled = res.tiled | ||
break | ||
else: | ||
raise GMTInvalidInput(f"Invalid resolution '{resolution}'.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This for-loop seems a lot more inefficient compared to the previous if resolution not in dataset.resolutions
dictionary-based lookup. I almost think we should go with the intermediate {"01d": Resolution(), ...}
dictionary you mentioned at #2917 (comment), or come up with a better data structure than a list of Resolution
NamedTuples. I almost feel like that datasets
variable could be a nested JSON or a multi-level pandas.DataFrame object (but maybe that's overkill).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could just keep the datasets
variable as is, but bring in all the type hint stuff. The remote datasets aren't really updated that often, though I know you're working on those Moon/Venus/Mars PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This for-loop seems a lot more inefficient compared to the previous
if resolution not in dataset.resolutions
dictionary-based lookup.
Yes, it's almost 10-times slower.
I almost think we should go with the intermediate
{"01d": Resolution(), ...}
dictionary you mentioned at #2917 (comment), or come up with a better data structure than a list ofResolution
NamedTuples. I almost feel like thatdatasets
variable could be a nested JSON or a multi-level pandas.DataFrame object (but maybe that's overkill).
What about a dictionary:
{
"01d": Resolution("01d"),
}
It's more clear than:
{
"01d": Resolution(),
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a dictionary:\n\n{\n "01d": Resolution("01d"),\n}
Yes, that could work.
For each GMT remote dataset, we need to define it like:
The
resolutions
property is a dictionary of available resolutions and the corresponding registrations and tile information. As you can see,["gridline", "pixel"]
is duplicated multiple times. I feel maintaining such a dictionary is tedious.Since most resolutions support both "gridline" and "pixel" registrations, it makes sense to let
Resolution.registrations
have a default value["gridline", "pixel"]
. Similarly,Resolution.tiled
can have a default valueFalse
.Then, the dataset definition can be written as:
The new function definition looks more compact, but an entry like
"01d": Resolution()
looks weird, since it may be unclear whatResolution()
means.Thus, I add the new
code
property to theResolution
class, then aResolution
object can be defined likeor the shortest version:
After the above changes, entries like
"01d": Resolution("01d")
are still weird since the resolution code (e.g.,01d
) must be duplicated twice. Thus, I changedresolutions
from a dict to a list.Here is the final version of the dataset definition:
which I think is more compact and more readable.