Skip to content
This repository has been archived by the owner on Aug 29, 2023. It is now read-only.

Why do we (still) require CCI data to be stored in the FS?

Norman Fomferra edited this page Mar 14, 2017 · 2 revisions

ECT currently requires original CCI data to be stored in the user's file system for a number of reasons:

  • We started developing the software in Jan 2016 when there was only FTP access to CCI data. The first ODP services where made available only in June 2016. And only partly (e.g. datasets limited to 1000 files, no temporal coverage from data index server, see issue #11). Therefore the EsaCciFtpDataStore downloads CCI data and caches it locally from where we open datasets from multiple netCDF files using the xarray.open_mfdataset() function.
  • Furthermore, the ESA statement of work required us to develop a toolbox that will run on the user's computer and can open large datasets from local files rather than depending solely on remote services. However, gathering data from the ODP services is anyway core requirement.

If we'd rely on remote data access services only then it would consequently make more sense to also perform remote processing.

Main problem

The main problem with the local data approach occurs with analyses that focus on the time dimension. To perform a regional time series analysis, even if it is at a single latitude-longitude point, users must currently download all files of a CCI dataset that fall in the considered time range.