This repository is used to provide guidance in a standard data engineering project that consists of a data lake and data warehouse. The documentation originated out of a need to standardize a requirements gathering methodology. It is derived from the CRISP-DM (Cross Industry Standard Process for Data Mining).
There are nine templates numbered in logical order within the templates directory. These templates have text in italics that is used for reference purposes. You may clone, modify, or fork the repository at your leisure.
Note that some documentation processes may overlap as you learn more about your project. Do not feel obligated to fill everything out in sequence. Generally you will fill out the first few documents in order and adjust as needed. For more details, learn about CRISP-DM.
My goal in publishing these templates is to make it teach others how to formalize a process around data engineering. It would be awesome for the community to expand on some of the templates to make them more featureful.
To contribute, fork this repository and open a pull request with your changes.