Skip to content
Matthew Harris edited this page Oct 7, 2013 · 9 revisions

Scripting Files Download in the Earth System Grid Federation

Summary

One of the most powerful features of the Earth System Grid Federation (ESGF) is the capability to generate scripts to download files for arbitrary query parameters, that work across all sites in the federation. Currently, these scripts are based on the ** 'wget ** ' command, which is typically installed by default on all modern laptops and desktops. Before downloading the data, the script will prompt the user for their OpenID and password, which will be used to retrieve a short-lifetime digital certificate from the ESGF site where the user registered. This certificate (which is valid for only 72 hours) is passed by _ wget _ to the server holding the data, as a proof of the user's identity.

The exact procedure to generate and execute a wget download script is described below.

Pre-requisites

Before being able to execute a wget download script, the following pre- requistes must be satisfied:

  • ** The user must have the wget application (version 1.12 or later) installed on their computer compiled with the OpenSSL libraries (normally it is) and java 1.5+, and available in their execution PATH **

    • Note: on a Unix system, you can typically type the following instruction at the command line to check the availability and version of wget: _ wget --version _
  • ** The user must have registered with one of the sites (a.k.a. _ Nodes _ ) in the ESGF **

    • To register with an ESGF Node, simply use a browser to access the site's home page, and follow the _ Create Account _ hyperlink
  • ** The user must have been authorized to access the desired data **

    • At this time, the easiest way to request authorization to download a dataset is to try to download one file from a web browser. If the dataset is restricted, you will be prompted to request membership in the proper access control group. Depending on the group, this process may be automatic (i.e. your membership is granted immediately) or might require approval from an administrator. Once you are notified that your membership has been granted (through the site itself, or via email), try to download that file again from the web, to make sure everything is working.

The above operations must be performed one time only. Once the pre-requisites are satisfied, follow the steps below to generate and execute a wget download script.

Step 1: generate a wget script

To generate a wget script, simply start typing a URL in your browser window, starting with the address of any ESGF Node, and optionally adding any parameters that specify which files you are interested in.

For example, the following URL will generate a wget script that match _ all _ files in the ESGF, across all sites:

http://esg-datanode.jpl.nasa.gov/esg-search/wget

Actually, the above script will contain download links for only the first 100 files. You can paginate through the files, and limit the total number of files, by using a combination of the _ offset _ and _ limit _ parameters. For example the following URL will retrieve files 301 through 400:

http://esg-datanode.jpl.nasa.gov/esg-search/wget?offset=300&limit=100

Please note that the maximum possible value of file downloads per script is limit <= 1000.

To download only files that match specific criteria, you can constrain the wget URL via any of the allowed ESGF search categories. For example, the following URL will select only files for _ observations _ that contain the variable _ hus _ :

http://esg-datanode.jpl.nasa.gov/esg- search/wget?variable=hus&project=obs4MIPs

The following URL will select all _ model _ files that contain the variable with CF standard name _ air_temperature _ and conform to the _ CMIP5 _ experiment _ decadal2000 _ , across all models and sites:

[ http://pcmdi9.llnl.gov/esg-search/wget?cf_standard_name=air_temperature&expe riment=decadal2000&project=CMIP5 ](https://github.com/ESGF/esgf.github.io/wiki /|esg-search|wget)

Note that you can type your wget URL at _ any _ of the ESGF sites - the wget script that is generated will automatically find files across all ESGF sites, for the matching criteria specified in the URL.

To learn more about generating a wget script, please see the list of references at the bottom of this page.

Step 2: run the script

Once the script has been downloaded to the user's desktop, it needs to be made executable, and then it is ready for running. On a Unix system:

  • _ chmod +x wget.sh _

  • _ wget.sh _

Or since it is a bash script you may directly call it:

  • _ bash wget.sh _

The script will perform the following operations:

  • Retrieve the most up to date trusted certificates for all sites in the ESG federation.
  • Ask the user for his/her OpenID and password, and retrieve a short-term digital certificate that will be sent to the ESGF servers as part of the data request
  • Start downloading the data
  • In case the download is interrupted, restart the download from the previous point
  • Verify the checksum of each file after the download completes, if the checksum is available from the server

The wget-script also has other advanced options which can be seen if the help flag '-h' is used (e.g. 'bash wget.sh -h').

Note: the (default) files created and/or managed by the script are:

  • $HOME/.MyProxyLogon - shared with the MyProxy gui stores MyProxy server access data

  • $HOME/.esg/getcert.jar - java program to retrieve credentials from a given OpenID and without a gui

  • $HOME/.esg/certificates - retrieved federation CAs

  • $HOME/.esg/credentials.pem - retrieved credential

Appendix

Alternatively, to obtain a short-term certificate prior to running the wget script, simply click on the MyProxyLogon link on the home page of ** the ESGF site where you registered ** . A MyProxy logon application will pop-up, where you need to enter the _ username _ and _ password _ that you chose during the registration process (this is why you _ must _ start from the ESGF Node where you registered). Upon clicking the _ Logon _ button, a new short-term certificate will be generated by the server and sent to your computer, where it will be saved in a location ( _ ~/.esg/credentials.pem _ on a Unix system) where it will be automatically found by the wget script.

References

Clone this wiki locally