Skip to content

Commit

Permalink
add how to write file commands to data storage page
Browse files Browse the repository at this point in the history
  • Loading branch information
hot007 authored May 12, 2022
1 parent 3e6235b commit d70ae79
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion BigData/data_storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,9 @@ CLEX CMS team have produced [some very useful advice](http://climate-cms.wikis.u
* chunking, and
* command line tools to control file storage structure (some are specific to [NCI](https://nci.org.au/)).

All of these things can have massive impacts on file performance for both regular access but in particular parallelised access using `dask` (e.g. with `xarray`). Data will be most performant with a tool like `dask` if it is structured appropriately for the read patterns, and chunk arguments supplied to `dask` must align with the chunk sizes the data is physically stored in, otherwise you can end up with *worse* performance.
All of these things can have massive impacts on file performance for both regular access but in particular parallelised access using `dask` (e.g. with `xarray`). Data will be most performant with a tool like `dask` if it is structured appropriately for the read patterns, and chunk arguments supplied to `dask` must align with the chunk sizes the data is physically stored in, otherwise you can end up with *worse* performance. See also the [Tools](https://acdguide.github.io/BigData/tools/intro.html) section of this book for further information.

If working in `xarray`, data can be saved to NetCDF or Zarr format using the [`xarray.Dataset.to_netcdf()`](https://docs.xarray.dev/en/latest/generated/xarray.Dataset.to_netcdf.html) and [`xarray.Dataset.to_zarr()`](https://docs.xarray.dev/en/latest/generated/xarray.Dataset.to_zarr.html) functions respectively.

Also consider things like data and metadata standards and ease of use for other researchers or data consumers. For example, data should be [CF-compliant](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html), but also consider [ACDD](https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3) for metadata, [UGRID](https://ugrid-conventions.github.io/ugrid-conventions/) for unstructured data, and [CMOR](https://pcmdi.github.io/cmor-site/) and other data request requirements for Earth System Grid Federation data submission.

Expand Down

0 comments on commit d70ae79

Please sign in to comment.