Skip to content
This repository has been archived by the owner on Aug 29, 2023. It is now read-only.

Software Requirements

Norman Fomferra edited this page Apr 5, 2016 · 2 revisions

Data Access Requirements

SR-01 Global data access function

The API shall offer a global function or method of a global object that returns a dataset representation when given a dataset name and a list of optional selectors. The selectors allow to subset the overall dataset in the data store.

Possible solutions:

The following example presents access to a local data store:

data_store = LocalDataStore('/home/norman/esa-cci-data')   
ozone_dataset = data_store.load('ozone/data/total_columns/l3/merged/v0100/'+
                                'ESACCI-OZONE-L3S-TC-MERGED-DLR_1M-(?P<year>\d\d\d\d)0104-fv0100.nc', 
                                 lambda kv: int(kv['year']) == 2012)

It has the advantage that the LocalDataStore does not need to know anything about the contents of the source data tree. It has the disadvantage that users must know how the file tree is organised and how dataset files are named. A smarter LocalDataStore could scan the tree, create an index, and maintain it. The index will contain the various different datasets available. Each dataset comprises a set of netCDF files or shapefiles where each file contributes to a unique time series. For each dataset the index would also provide the common file content schema (info about variables and dimensions), the spatial and temporal coverage and other information, for example the coordinate reference system used.

data_store = LocalDataStore('/home/norman/esa-cci-data')   
data_store.dataset_info()