-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
15 changed files
with
938 additions
and
405 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
--- | ||
jupytext: | ||
text_representation: | ||
format_name: myst | ||
kernelspec: | ||
display_name: Python 3 | ||
name: python3 | ||
--- | ||
|
||
```{code-cell} | ||
:tags: [remove-cell] | ||
from intake_esgf import ESGFCatalog | ||
from pprint import pprint | ||
``` | ||
|
||
# Just the Paths | ||
|
||
While the basic paradigm of `intake-esgf` is to return xarray datasets for everything in your catalog, we recognize that you may wish to just get the paths back and use them in creative ways. | ||
|
||
1. You may not want to use xarray datasets. We highly recommend learning the package and using it in your research, but you may have alternatives and we do not want to prohibit you from working as you see fit. | ||
2. The analysis script you are running may not have been written to leverage xarray datasets. | ||
3. You may need just the paths to pass into another tool or benchmark package. | ||
4. You may have specific options you want to pass to `xarray.open_dataset()` that our interface does not support. | ||
|
||
There is a catalog method we call `to_path_dict()`. This works just like `to_dataset_dict()` except we do not call xarray dataset constructors on the paths returned for you. Both functions even have most of the same keyword arguments. If we perform a search | ||
|
||
```{code-cell} | ||
cat = ESGFCatalog().search( | ||
experiment_id="historical", | ||
source_id="CanESM5", | ||
frequency="mon", | ||
variable_id=["gpp", "tas", "pr"], | ||
member_id="r1i1p1f1", | ||
) | ||
``` | ||
|
||
Then we can call instead the path function and then print the local paths. | ||
|
||
```{code-cell} | ||
paths = cat.to_path_dict() | ||
pprint(paths) | ||
``` | ||
|
||
Note that this will also check first to see if data is available locally and download if not just as with `to_dataset_dict()`. In fact, internally our `to_dataset_dict()` function calls `to_path_dict()` first. You can also use this to obtain the OPenDAP links if you prefer. | ||
|
||
```{code-cell} | ||
cat = ESGFCatalog().search( | ||
experiment_id="historical", | ||
source_id="CanESM5", | ||
frequency="mon", | ||
variable_id=["cSoil"], | ||
member_id="r1i1p1f1", | ||
) | ||
paths = cat.to_path_dict(prefer_streaming=True) | ||
pprint(paths) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
--- | ||
jupytext: | ||
text_representation: | ||
format_name: myst | ||
kernelspec: | ||
display_name: Python 3 | ||
name: python3 | ||
--- | ||
|
||
```{code-cell} | ||
:tags: [remove-cell] | ||
import matplotlib.pyplot as plt | ||
from intake_esgf import ESGFCatalog | ||
``` | ||
|
||
# Streaming Data | ||
|
||
In addition to the transfer of entire files, data may be streamed to the user as it is required by their script. The benefit is that if only a small portion of the data is to be used, we avoid downloading the whole file. At the time of this writing, ESGF indices only contain [OPeNDAP](https://www.opendap.org/) access information. However, as we consider expanding support, the below interface will extend to other streaming/cloud-ready technologies such as [Zarr](https://zarr.dev/) stores, [kerchunk](https://github.com/fsspec/kerchunk), and [VirtualiZarr](https://github.com/zarr-developers/VirtualiZarr). | ||
|
||
To demonstrate this functionality, consider the following search for some future surface air temperature data from the UKESM model. | ||
|
||
```{code-cell} | ||
cat = ESGFCatalog().search( | ||
experiment_id="ssp585", | ||
source_id="UKESM1-0-LL", | ||
variable_id="tas", | ||
frequency="mon", | ||
) | ||
cat.remove_ensembles() | ||
``` | ||
|
||
To harvest the OPeNDAP access link from the index nodes, you tell the package that you `prefer_streaming=True`. Not all files will have this capability, but if they do, then this will tell `intake-esgf` to use them. Also, in this example we do not need any cell measures and so we will disable that in this search. | ||
|
||
```{code-cell} | ||
dsd = cat.to_dataset_dict(prefer_streaming=True, add_measures=False) | ||
``` | ||
|
||
At this point, the dataset dictionary is returned but you will notice that no file download messages were received. The OPeNDAP access link was passed to the xarray constructor. We now proceed with our analysis as if the data is local. In this example, we wish to see what future temperatures will be under the SSP585 scenario over my hometown. | ||
|
||
```{code-cell} | ||
ds = dsd["tas"] | ||
ds = ds.sel(lat=35.96, lon=-83.92 + 360, method="nearest") | ||
``` | ||
|
||
Now we can plot this trace using matlotlib. When the xarray dataset needs data, it uses the OPeNDAP protocol to stream just the time trace at the specific location. | ||
|
||
```{code-cell} | ||
fig, ax = plt.subplots(figsize=(10, 3)) | ||
ds["tas"].plot(ax=ax); | ||
``` | ||
|
||
This can be a very fast alternative if the data volume is relatively low. If you want to verify that data has indeed been streamed and not accessed locally, you may print the session log and look at what was accessed. | ||
|
||
```{code-cell} | ||
print(cat.session_log()) | ||
``` | ||
|
||
If you look towards the bottom of that log, you will see that a https link was accessed in place of a local file. Note that if a local file is present in your local cache, we will use that file even if you have preferred to use streaming. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.