This project is a template for a data engineering project using Kubernetes, Apache Airflow, and Apache Spark. The project is structured as follows:
airflow/
: Contains the Apache Airflow DAGs and plugins.spark/
: Contains the Apache Spark jobs.k8s/
: Contains the Kubernetes manifests.scripts/
: Contains the scripts to deploy the project.
Instructions on how to install your project. Steps:
- Clone the repository.
- Run the
install.sh
script. - Run the
deploy.sh
script. - Run the
run.sh
script. - Run the
teardown.sh
script. - Run the
uninstall.sh
script. - Done!
Instructions on how to use your project. Usage examples:
- Create DAGs in the
airflow/dags/
directory. - Create Spark jobs in the
spark/jobs/
directory. - Create Kubernetes manifests in the
k8s/
directory. - Sandbox your project using the jupyter notebook in the
sandbox/
directory. - Commit your changes to the repository to trigger the CI/CD pipeline.
- Done!
Guidelines for contributing to your project. Steps:
- Fork the repository.
- Create a branch.
- Make your changes.
- Push your changes to the branch.
- Create a pull request.
- Done!
This project is licensed under the terms of the LICENSE file.