-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #27 from hot007/main
Replacement PR to avoid merge problems. If this works then we can also accept #25, else I can delete that one and @paolap can go from here.
- Loading branch information
Showing
3 changed files
with
151 additions
and
77 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,126 +1,144 @@ | ||
# Identifying which languages/tools are best suited to specific tasks | ||
## Python | ||
|
||
This page contains: | ||
- [Python](#python) | ||
- [R](#r) | ||
- [MATLAB](#matlab) | ||
- [NCO](#nco-netcdf-operators) | ||
- [CDO](#cdo-climate-data-operators) | ||
|
||
Other languages and tools exist which can work with netCDF data (e.g. C, FORTRAN, ArcGIS, QGIS, paraview, panoply, Ferret, as well as the deprecated NCL), but on this page we focus on tools commonly used for *analysis* of large scale (typically netCDF) climate data. | ||
|
||
## Python | ||
This is a free, open-source language that is a standard tool used in many organisations and industries. It interfaces with other programs and tools like ArcGIS. Packages like xarray are great for analysing large gridded time-series data in climate and environmental science fields. Python creates beautiful plots. | ||
|
||
Os, sys, glob: to handle directories and files | ||
`os, sys, glob` - to handle directories and files | ||
|
||
`numpy` - numerical python | ||
|
||
Numpy: numerical math | ||
`matplotlib` - to create plots | ||
|
||
Matplotlib: to create plots | ||
`cartopy` - plots maps from geospatial data | ||
|
||
Other plotting packages: https://mode.com/blog/python-data-visualization-libraries/: plotly, seaborn, holoviews | ||
Other plotting packages: https://mode.com/blog/python-data-visualization-libraries/ - `plotly, seaborn, holoviews, bokeh` | ||
|
||
Pandas: timeseries, integrates with numpy | ||
`pandas` - timeseries, integrates with numpy | ||
|
||
Xarray: gridded data, integrates with pandas, include basic plotting capabilities | ||
`xarray` - gridded data, integrates with pandas, include basic plotting capabilities | ||
|
||
Dask: to parallelise tasks and manage memory more efficiently , integrates with xarray | ||
`dask` - to parallelise tasks and manage memory more efficiently , integrates with xarray | ||
|
||
Calendar: to handle calendars and time information | ||
`calendar, datetime` - to handle calendars and time information | ||
|
||
Netcdf4 - to handle netcdf files, usually integrated in tools like xarray, pandas | ||
`netcdf4` - to handle netCDF files, usually integrated in tools like xarray, pandas | ||
|
||
hdf5, hdf4, h4netcdf, hdfeos2, hdfeos5, h5py, pyhdf - to handle various hdf formats they have different advantages | ||
`hdf5, hdf4, h4netcdf, hdfeos2, hdfeos5, h5py, pyhdf` - to handle various HDF formats they have different advantages | ||
|
||
Pygrib -m to handle grib file | ||
`pygrib` - to handle GRIB file | ||
|
||
Requests: download/upload from/to website (not specifically analysis but can be useful for data handling) | ||
`requests` - download/upload from/to website (not specifically analysis but can be useful for data handling) | ||
|
||
Csv - to handle csv files | ||
`csv` - to handle CSV files | ||
|
||
Json - to handle json files (often useful to store table information and pass schema, vocabularies and other dictionary style information to programs) | ||
`json` - to handle JSON files (often useful to store table information and pass schema, vocabularies and other dictionary style information to programs) | ||
|
||
Yaml - to handle yaml files - often use to handle program configurations | ||
`yaml` - to handle yaml files - often use to handle program configurations | ||
|
||
Rasterio, rasterstats, rio-xarray, geopandas, fiona - to handle raster and shapefiles | ||
`rasterio, rasterstats, rio-xarray, geopandas, fiona` - to handle raster and shapefiles | ||
|
||
Zarr - | ||
`gdal` - useful for reprojecting data and interfacing with geoTIFFs | ||
|
||
Specific tools: | ||
`scipy` - scientific python tools | ||
|
||
Iris - | ||
`zarr` - to read and write datasets as zarr archives | ||
|
||
Cfcheker.py - checking against CF and ACDD conventions | ||
### Specific tools: | ||
|
||
marineHeatwaves / xmhw - calculate MHW statistics | ||
`Iris` - MetOffice tool for working with CF-compliant netCDF data | ||
|
||
CleF - discovering ESGF datasets at NCI | ||
`cfcheker.py` - checks netCDF files against CF and ACDD conventions | ||
|
||
ClimTas - makes it easier to apply and extend dask functions | ||
`marineHeatwaves` / `xmhw` - calculate MHW statistics | ||
|
||
Xclim - … | ||
`CleF` - discovering ESGF datasets at NCI | ||
|
||
Cosima cookbook | ||
`ClimTas` - makes it easier to apply and extend dask functions | ||
|
||
Cdo - to call cdo operators (Scott has a regridding function that exploit this) | ||
`Xclim` - … | ||
|
||
Wrf-python - | ||
`Cosima cookbook` - various python libraries for ocean and sea ice | ||
|
||
Siphon - to navigate thredds servers | ||
`cdo` - to call cdo operators (Scott has a regridding function that exploits this) | ||
|
||
Xesmf - | ||
`Wrf-python` - | ||
|
||
Udunits2 - | ||
`Siphon` - to query and navigate THREDDS servers | ||
|
||
Eofs - | ||
`Xesmf` - regridding tool | ||
|
||
Eccodes - | ||
`udunits2` - Library used to interpret units of measurement | ||
|
||
Earthpy - | ||
`Eofs` - | ||
|
||
xgcm - work with offset grids | ||
`Eccodes` - | ||
|
||
Specific distributions: | ||
`Earthpy` - | ||
|
||
Anaconda | ||
`xgcm` - work with offset grids | ||
|
||
miniconda | ||
### Specific toolsets: | ||
|
||
Pangeo | ||
[Anaconda](https://www.anaconda.com/): Contains pretty much all the python libraries you'd want to get started, great for newcomers but takes up a lot of space. Not recommended on shared systems with quotas but good on local laptops. Includes Spyder, a Matlab-like programming environment (IDE). | ||
|
||
scipy | ||
[miniconda](https://docs.conda.io/en/latest/miniconda.html): A lightweight version of anaconda which by default only includes core libraries, good for building specific environments for data analysis. This underpins the `conda` modules in the `hh5` project at NCI. | ||
|
||
[Pangeo](https://pangeo.io/): A community for analysis of large scale climate data. Built on tools like python, xarray, dask, iris, cartopy. | ||
|
||
## R | ||
This is a free, open-source statistical programming language. It is used mainly in research, but it is also a standard tool in many organisations. This tool is great for statistical analysis. | ||
|
||
Dplyr, tidyr, tidyverse - Dataframe manipulation | ||
`dplyr, tidyr, tidyverse` - Dataframe manipulation | ||
|
||
ggplot2 - Creating graphics | ||
`ggplot2` - creating graphics | ||
|
||
purrr - data wrangling | ||
`purrr` - data wrangling | ||
|
||
rio - data import/export | ||
`rio` - data import/export | ||
|
||
Shiny - report results, e.g., build interactive web apps | ||
`Shiny` - report results, e.g., build interactive web apps | ||
|
||
Mlr - machine learning tasks | ||
`Mlr` - machine learning tasks | ||
|
||
Leaflet - mapping and working on interactive maps | ||
`Leaflet` - mapping and working on interactive maps | ||
|
||
tidymodels - modeling and machine learning | ||
`tidymodels` - modeling and machine learning | ||
|
||
sp, maptools - processing spatial data | ||
`sp, maptools` - processing spatial data | ||
|
||
Zoo,xls - for time series data | ||
`zoo,xls` - for time series data | ||
|
||
climpact - https://github.com/ARCCSS-extremes/climpact heatwave/extremes statistics | ||
`climpact` - https://github.com/ARCCSS-extremes/climpact Heatwave/extremes statistics | ||
|
||
https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-useful-R-packages | ||
Recommended list of packages: https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-useful-R-packages | ||
|
||
### Specific toolsets | ||
|
||
[Rstudio](https://support.rstudio.com/hc/en-us) IDE | ||
|
||
## MATLAB | ||
MATLAB (Matrix Laboratory) is a licenced tool. It is the best tool when dealing with large matrices and matrix manipulations. It allows examining the content of data quickly in a built-in docked or undocked window within the tool to gain an overview of the pattern and structures presented in the data. This tool is helpful because many data types, for example, large image files and large tabular data, can be converted into matrices and analysed efficiently in MATLAB. MATLAB provides an easy-to-use environment with interactive applications, which is excellent for novel programmers. | ||
MATLAB (Matrix Laboratory) is a licenced tool. It is a good tool when dealing with large matrices and matrix manipulations. It allows examining the content of data quickly in a built-in docked or undocked window within the tool to gain an overview of the pattern and structures presented in the data. This tool is helpful because many data types, for example, large image files and large tabular data, can be converted into matrices and analysed efficiently in MATLAB. MATLAB provides an easy-to-use environment with interactive applications, which is excellent for novice programmers. MATLAB also has excellent help resources and a useful online community. | ||
|
||
As a licensed tool matlab might not be available to other researchers and collaborators, so even if you are producing data with matlab, avoid saving the data as ‘mat’ files, use the best alternative open source format instead. | ||
As a licensed tool MATLAB might not be available to other researchers and collaborators, so even if you are producing data with Matlab, it is best to avoid saving the data as `.mat` files, and use the best alternative open source format instead. | ||
|
||
## NCO - NetCDF Operators | ||
NetCDF Operators toolkit of command-line operators to both handle and perform analysis on netCDF files. It is the tool of choices to add, rename, modified attributes and variables. It can add internal compression to netcdf4 files and convert between different formats. It is also useful to concatenate files, performing averages and other simple mathematical operations on an entire variable, extracting or deleting variables. The advantage is that the results will be automatically saved in a netcdf file. | ||
Limitations: memory? File size? | ||
[NetCDF Operators](http://nco.sourceforge.net/) is a toolkit of command-line operators to both handle and perform analysis on netCDF files. It is the tool of choice to add, rename, and modify attributes and variables. It can add internal compression to netCDF4 files and convert between different formats. It is also useful to concatenate files, performing averages and other simple mathematical operations on an entire variable, extracting or deleting variables. The advantage is that the results will be automatically saved in a netCDF file. | ||
|
||
|
||
## CDO - Climate Data Operators | ||
CDO, like NCO is a large tool set to handle and analyse climate and weather data. CDO can also work with grib files, in fact it is a useful tool to convert from grib to netcdf and vice versa. CDO can also be used to compress, convert and concatenate files. However this is usually in conjunction with another operation. | ||
[CDO](https://code.mpimet.mpg.de/projects/cdo/), like NCO, is a large command-line tool set to handle and analyse climate and weather data. CDO can also work with GRIB files, in fact it is a useful tool to convert from GRIB to netCDF and vice versa. CDO can also be used to compress, convert and concatenate files, often in conjunction with another operation. | ||
|
||
One of the strengths of CDO is its ability to combine operations in succession of steps without creating intermediate files, using little additional memory in the process. | ||
|
||
One of the strengths of CDO is its ability to combine operations in succession of steps without creating intermediate files. | ||
CDO is useful to calculate climatologies, regrid datasets, select subset both spatially and temporally. It can be used to perform simple transformations across an entire variable as for NCO. It is useful to handle time axis operations as going from unlimited to limited dimension and setting a new reference time. CDO can integrate with other languages such as python using the ‘cdo’ module. | ||
CDO is useful to calculate climatologies, regrid datasets, and select subsets both spatially and temporally. It can be used to perform simple transformations across an entire variable as for NCO. It is useful to handle time axis operations such as going from unlimited to limited dimension and setting a new reference time. CDO can integrate with other languages such as python using the `cdo` module. | ||
|
||
Limitations: specific versions can have issues with threading | ||
Limitations: specific versions can have issues with threading, meaning chained commands are not always safe. CDO **cannot** be built in threadsafe mode due to underpinning HDF dependencies which means some versions simply are not reliable and can cause random segfaults when using chained operations. |
Oops, something went wrong.