Xarray-DataAccessor Documentation

Core Features

Efficiently reads remote gridded data for an Area of Interest (AOI) into Xarray.Dataset objects using dask.distributed for parallelization.
Transform data for your needs: resample the grid, resample along a time dimension, convert timezone, etc.
Extract time series data at coordinates and save to a tabular file (i.e. .xlsx, .csv, or .parquet) for use in physical or machine learning models.
Extendable/modular package architecture supporting open-source contributions, and connections to more datasets/sources.

Getting Started

Start by cloning this repository locally.
Next, within an conda terminal navigate to the local repository location and clone and activate our conda virtual environment using environment_demo.yml.
- environment_dev.yml currently does not recognize the 'jupyter lab' command

# mock conda terminal
(base) C://User: cd Path/To/Xarray-DataAccessor
(base) C://User/Path/To/Xarray-DataAccessor conda env create -f environment_demo.yml
...
(base) C://User/Path/To/Xarray-DataAccessor conda activate data_accessor_full
   (data_accessor_full) C://User/Path/To/Xarray-DataAccessor

(optional) if you plan to use the CDSDataAccessor, follow the instructions here to allow your computer to interact with the CDS API. Basically you must manually make a .cdsapirc text file (no extension!) where cdsapi.Client() expects it to be.
Use the conda-develop develop command pointed to the /src/ directory to make the repo importable.

# mock conda terminal with the env activated
(data_accessor_full) C://User/Path/To/Xarray-DataAccessor conda develop src

# a this point you are ready to open an IDE/Notebook of your choice to run your code!
# For example:
(data_accessor_full) C://User/Path/To/Xarray-DataAccessor jupyter lab

Finally, import the library into your workflow:

import xarray_data_accessor

Exploring Available Data

All data one can retrieve with this library is organized in a three tier hierarchy:

A "data accessor" is a python class that interacts with a given data source.
- Each data accessor can retrieve data from any number of specific datasets.
- For example: CDSDataAccessor accesses the CDS API and can currently be used to access a few ERA5 datasets.
A specific dataset may be something like "reanalysis-era5-single-levels". Note that the same dataset may be able to be accessed by different data accessors.
Each dataset will contain one or more variables.

To allow this library to be extendable, the "data accessors", the datasets they can access, and the variables that exist in each dataset are not hardcoded anywhere in the repo.

Therefore to explore what is available, one can use the following xarray_data_accessor.DataAccessorFactory class functions:

from xarray_data_accessor import DataAccessorFactory

# to return a list of all data accessor names
DataAccessorFactory.data_accessor_names()

# to return a dictionary with data accessor names as keys and their respective objects and values
DataAccessorFactory.data_accessor_objects()

# to return a dictionary with data accessor names as keys, and their supported dataset names as values
DataAccessorFactory.supported_datasets()

# to return a list of variable names for a specific data accessor - dataset combination
DataAccessorFactory.supported_variables(
    data_accessor_name: str,
    dataset_name: str,
)

We also intend to keep documentation about data accessors and their respective datasets updated here.

Getting Data

To get data one can use the get_xarray_dataset() function after specifying time and space AOI.

The spatial AOI can be specified with a shapefile, raster, a list of lat/long coordinate tuples, or a csv with lat/lon as columns.

The temporal AOI can be specified as a string or a datetime object. Additionally, one can specify a timezone using param:timezone.

In the example below we fetch ERA5 data from AWS for a shapefile defined extent.

import xarray_data_accessor
dataset = xarray_data_accessor.get_xarray_dataset(
        data_accessor_name='AWSDataAccessor',
        dataset_name='reanalysis-era5-single-levels',
        variables=[
            'air_temperature_at_2_metres',
            'eastward_wind_at_100_metres',
        ],
        start_time='2019-01-30',
        end_time='2019-02-02',
        shapefile='path/to/shapefile.shp',
    )

Transforming Data

Functionality has not been thoroughly tested...documentation pending.

Name		Name	Last commit message	Last commit date
Latest commit History 274 Commits
.github/workflows		.github/workflows
examples		examples
src/xarray_data_accessor		src/xarray_data_accessor
testing		testing
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Data_Sources_Info.md		Data_Sources_Info.md
LICENSE		LICENSE
README.md		README.md
environment_demo.yml		environment_demo.yml
environment_dev.yml		environment_dev.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Xarray-DataAccessor Documentation

Core Features

Getting Started

Exploring Available Data

Getting Data

Transforming Data

Development Road Map

About

Releases

Packages

Contributors 4

Languages

License

LimnoTech/Xarray-DataAccessor

Folders and files

Latest commit

History

Repository files navigation

Xarray-DataAccessor Documentation

Core Features

Getting Started

Exploring Available Data

Getting Data

Transforming Data

Development Road Map

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages