Skip to content

Commit

Permalink
Merge pull request #21 from IVproger/19-docs
Browse files Browse the repository at this point in the history
Update Docs
  • Loading branch information
ArtemSBulgakov authored Jul 23, 2024
2 parents f0ae1c7 + ca8b4fe commit fe8296e
Show file tree
Hide file tree
Showing 13 changed files with 79 additions and 21 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,3 +85,6 @@ We use Docker Compose to run all services of Airflow and ZenML server.
```
3. Wait for all models to train.
4. Access MLFlow server at http://localhost:5000.

## Docs
Each folder contains a `README.md` file with short description for every file. Code is well-documented with inline comments and representative symbol names
9 changes: 9 additions & 0 deletions api/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# API
```
api
├── api.Dockerfile # Dockerfile for starting the Flask backend
├── app.py # Flask entrypoint
├── gradio_app.py # Gradio config file
├── gradio.Dockerfile # Dockerfile for starting the Gradio frontend
└── ml.Dockerfile # Dockerfile for starting the model with MLFlow
```
16 changes: 15 additions & 1 deletion configs/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,15 @@
# MLops-project
# Configs
```
configs
├── data_sample.yaml # Main configuration file
├── data_transformations.yaml # Sample data configuration
├── data_version.txt # Defines transformations applied to data rows
├── experiment.yaml # Current data version (for convenience)
├── main.yaml # MLFlow experiment definition
├── model
│ ├── lr.yaml # Definition of folds and metrics
│ ├── model.yaml # Parameters for Logistic Regression
│ ├── rf.yaml # Parameters for Random Forest
│ └── xgboost.yaml # Parameters for XGBoost
└── README.md # Project documentation
```
8 changes: 7 additions & 1 deletion data/README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,8 @@
# Data outline
- `samples/sample.csv` - our primary sample file. Synced via DVC.
```
data
├── README.md
└── samples
├── sample.csv # Our primary sample file. Synced via DVC.
└── sample.csv.dvc # DVC shadow for sample.csv
```
1 change: 0 additions & 1 deletion docs/README.md

This file was deleted.

3 changes: 2 additions & 1 deletion models/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
# MLops-project
# Models
This folder contains "champion" models for each architecture (that have been trained)
15 changes: 10 additions & 5 deletions notebooks/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
# Notebooks outline
- `business_data_understanding.ipynb` - showcases the Business Data Understanding of the project.
- `data_analysis.ipynb` - EDA for the dataset.
- `data_quality.ipynb` - Defines data requirements for the project and showcases data checks.
- `feature_descriptions.csv` - Human-understandable description of all of the available features (taken from the dataset's datacard on Kaggle)
- `poc.ipynb` - Proof-of-concept model showcase that solves the business problem.
```
notebooks
├── business_data_understanding.ipynb # Showcases the Business Data Understanding of the project.
├── data_analysis.ipynb # EDA for the dataset
├── expectations.ipynb # Great Expectations
├── data_quality.ipynb # Defines data requirements for the project and showcases data checks
├── poc.ipynb # Proof-of-concept model showcase that solves the business problem.
├── xgboost_experiment.ipynb # Proof-of-concept XGBoost model
└── feature_descriptions.csv # Human-understandable description of all of the available features (taken from the dataset's datacard on Kaggle)
```
1 change: 0 additions & 1 deletion outputs/README.md

This file was deleted.

3 changes: 2 additions & 1 deletion reports/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
# MLops-project
# Reports
This folder contains Giskard reports (if any meaningful reports have been generated and tracked with Git)
17 changes: 12 additions & 5 deletions scripts/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# Scripts outline
- `install_requirements.sh` - installs all of the requirements. Make sure that you have activated a local environment.
- `test_data.sh` - installs, tests, and runs GX on the data.
- `airflow_activate.sh` - run all airflow services to track and run DAGs.
- `airflow_activate.sh` - delete all processes of airflow to clean up the working directory and restart airflow piplines.
# Scipts
```
scripts
├── airflow_activate.sh # Outdated by Docker
├── airflow_cleanup.sh # Outdated by Docker
├── airflow_logs.sh # Outdated by Docker
├── extend_activate.sh # "Extends" shell to include AIRFLOW_HOME env var
├── extract_data.sh # Runs src.data Python script
├── install_requirements.sh # Installs dependencies
├── push_sample_version.sh # Outdated
└── test_data.sh # Samples, validates, versions, and commits data
```
4 changes: 2 additions & 2 deletions services/README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# GX outline
- `gx/great_expectations.yaml` - primary configuration file for GX
# Services outline
This folder contains config files related to Airflow and GX
6 changes: 6 additions & 0 deletions services/airflow/dags/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pipelines
├── data_extract_v0_dag.py # Data extraction Airflow pipeline (unmaintained)
├── data_extract_v1_dag.py # Data extraction Airflow pipeline (latest)
├── data_prepare_dag.py # Data preparation Airflow pipeline
└── data_prepare.py # Data preparation ZenML pipeline
```
14 changes: 11 additions & 3 deletions src/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,12 @@
# Source code outline
- `data_quality.py` - Python script to run Great eXpectations on the dataset
- `data_transformations.py` - Python script with all data transformation functions for EDA and POC model
- `data.py` - script for data sampling and validation
```
src
├── data.py # Functions to manipulate data. If ran on its own, downloads, samples, validates, and versions data
├── data_quality.py # Unmaintained. `load_context_and_sample_data` is used in two notebooks. Keep this for archival reasons
├── data_transformations.py # Functions for data transformation
├── evaluate.py # Script that validates a given model
├── main.py # Script runs model training (MLFlow)
├── model.py # MLFlow-related functions
├── utils.py # Utility functions
└── validate.py # Giskard model validation. Generates a Giskard report
```

0 comments on commit fe8296e

Please sign in to comment.