Skip to content

Latest commit

 

History

History
212 lines (156 loc) · 8.03 KB

File metadata and controls

212 lines (156 loc) · 8.03 KB

Processing historical emissions for CMIP7 harmonization routines

Scripts that combine historical emissions data records from several datasets like CEDS and GFED to create complete historical emissions files that are input to the IAM emissions harmonization algorithms in IAMconsortium/concordia (regional harmonization and spatial gridding for ESMs) and iiasa/climate-assessment (global climate emulator workflow).

Status

  • prototype: the project is just starting up and the code is all prototype

Installation

We do all our environment management using pixi. To get started, you will need to make sure that pixi is installed (instructions here, we found that using the pixi provided script was best on a Mac).

To create the virtual environment, run

pixi install
pixi run pre-commit install

These steps are also captured in the Makefile so if you want a single command, you can instead simply run make virtual-enviroment.

Having installed your virtual environment, you can now run commands in your virtual environment using

pixi run <command>

For example, to run Python within the virtual environment, run

pixi run python

As another example, to run a notebook server, run

pixi run jupyter lab

Data

Some of our data is managed using git lfs. To install it, please follow the instructions here.

Then, before doing anything else, run

git lfs install

Once you have git lfs installed, you can grab all the files we track with

git lfs fetch --all

To grab a specific file, use

git lfs pull --include="path/to/file"
# e.g.
git lfs pull --include="data/national/gfed/data_aux/iso_mask.nc"

For more info, see, for example, here.

Input data

Note that this repository focuses on processing data, and does not currently also (re)host input data files.

Files that need to be downloaded to make sure you can run the notebooks are specified in the relevant data subfolders, in README files, such as in \data\national\ceds\data_raw\README.txt for the CEDS data download, and in \data\national\gfed\data_raw\README.txt for the GFED data download.

Processed data

Data is processed by the jupyter notebooks (saved as .py scripts using jupytext, under the notebooks folder). The output paths are generally specified at the beginning of each notebook.

For instance, you find processed CEDS data at \data\national\ceds\processed and processed GFED data at \data\national\gfed\processed.

Development

Install and run instructions are the same as the above (this is a simple repository, without tests etc. so there are no development-only dependencies).

Adding new dependencies

If there is a dependency missing, you can add it with pixi. Please only add dependencies with pixi, as this ensures that all the other developers will get the same dependencies as you (if you add dependencies directly with conda or pip, then they are not added to the pixi.lock file so other developers will not realise they are needed!).

To add a conda dependency,

pixi add <dependency-name>

To add a PyPI/pip dependency,

pixi add --pypi <dependency-name>

The full documentation can be found here in case you have a more exotic use case.

Repository structure

Notebooks

These are the main processing scripts. They are saved as plain .py files using jupytext. Jupytext will let you open the plain .py files as Jupyter notebooks.

In general, you should run the notebooks in numerical order. We do not have a comprehensive way of capturing the dependencies between notebooks implemented at this stage. We try and make it so that notebooks in each YY** series are independent (i.e. you can run 02** without running 01**), but we do not guarantee this. Hence, if in doubt, run the notebooks in numerical order.

Overview of notebooks:

  • 01**: preparing input data for IAMconsortium/concordia.
  • 02**: preparing input data for iiasa/climate-assessment.

Local package

We have a local package, emissions_harmonization_historical, that lives in src, which we use to share general functions across the notebooks.

Data

All data files should be saved in data. We divide data sources into national i.e. those that are used for country-level data (e.g. CEDS, GFED) and global i.e. those that are used for global-level data (e.g. GCB). Within each data source's folder, we use data_raw for raw data. Where raw data is not included, we include a README.txt file which explains how to generate the data.

Tools

In this repository, we use the following tools:

  • git for version-control (for more on version control, see general principles: version control)
  • Pixi for environment management (for more on environment management, see general principles: environment management)
    • there are lots of environment management systems. Pixi works well in our experience and, for projects that need conda, it is the only solution we have tried that worked really well.
    • we track the pixi.lock file so that the environment is completely reproducible on other machines or by other people (e.g. if you want a colleague to take a look at what you've done)
  • pre-commit with some very basic settings to get some easy wins in terms of maintenance, specifically:
    • code formatting with ruff
    • basic file checks (removing unneeded whitespace, not committing large files etc.)
    • (for more thoughts on the usefulness of pre-commit, see general principles: automation
    • track your notebooks using jupytext (for more thoughts on the usefulness of Jupytext, see tips and tricks: Jupytext)
      • this avoids nasty merge conflicts and incomprehensible diffs

Original template

This project was generated from this template: basic python repository. copier is used to manage and distribute this template.