The project-template
package can create a project sturcture and template scripts for a data science project. The package also provides tools to automate common data science tasks. The package has been developed to be used with Visual Studio Code for Python projects and RStudio for R projects.
In the command line run:
pip3 install cookiecutter
In the command line move to where you want to create the project directory and run:
python3 -m cookiecutter https://github.com/NICD-UK/project-template
You will be prompted for the:
- Project Name
- Project Directory Name
- Project Manager Name
- Project Manager Email
- Project Sponsor Name
- Project Sponsor Email
- Project Summary
- Project Language
In the command line run:
make
This command will:
- Initialise a virtual environment:
venv
for Pythonrenv
for R
- Install the packages required for the template scripts
- Save the packages to a dependencies file:
requirements.txt
for Pythonrenv.lock
for R
- Initialise a git repository
To install a package in Python run:
venv/bin/pip install <package>
To install a package in R use the Packages tab in RStudio.
To save packages to the dependencies file run:
make save
To load packages from the dependencies file run:
make load
The project has the following structure:
Makefile
README.md
data/
├─ clean/
├─ raw/
├─ wrangle/
models/
notebooks/
presentations/
reports/
src/
├─ 1-import/
├─ 2-clean/
├─ 3-wrangle/
├─ 4-model/
There are template scripts for:
- transforming raw data into cleaned data in
src/2-clean/
, - visualising cleaned data in
src/2-clean/
, - transforming cleaned data into wrangled data in
src/3-wrangle/
, - visualising wrangled data in
src/3-wrangle
available in Python or R. Answer Python or R to the Language prompt during setup for the corresponding template scripts. All template transformation scripts include code to read data from and write data to the appropriate data directories. All template visualisation scripts include code to read data from the appropriate data directory and to generate a data report. There is also a template script for presenting data in presentations/
available in Quarto.