-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #21 from IVproger/19-docs
Update Docs
- Loading branch information
Showing
13 changed files
with
79 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# API | ||
``` | ||
api | ||
├── api.Dockerfile # Dockerfile for starting the Flask backend | ||
├── app.py # Flask entrypoint | ||
├── gradio_app.py # Gradio config file | ||
├── gradio.Dockerfile # Dockerfile for starting the Gradio frontend | ||
└── ml.Dockerfile # Dockerfile for starting the model with MLFlow | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,15 @@ | ||
# MLops-project | ||
# Configs | ||
``` | ||
configs | ||
├── data_sample.yaml # Main configuration file | ||
├── data_transformations.yaml # Sample data configuration | ||
├── data_version.txt # Defines transformations applied to data rows | ||
├── experiment.yaml # Current data version (for convenience) | ||
├── main.yaml # MLFlow experiment definition | ||
├── model | ||
│ ├── lr.yaml # Definition of folds and metrics | ||
│ ├── model.yaml # Parameters for Logistic Regression | ||
│ ├── rf.yaml # Parameters for Random Forest | ||
│ └── xgboost.yaml # Parameters for XGBoost | ||
└── README.md # Project documentation | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,8 @@ | ||
# Data outline | ||
- `samples/sample.csv` - our primary sample file. Synced via DVC. | ||
``` | ||
data | ||
├── README.md | ||
└── samples | ||
├── sample.csv # Our primary sample file. Synced via DVC. | ||
└── sample.csv.dvc # DVC shadow for sample.csv | ||
``` |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
# MLops-project | ||
# Models | ||
This folder contains "champion" models for each architecture (that have been trained) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,11 @@ | ||
# Notebooks outline | ||
- `business_data_understanding.ipynb` - showcases the Business Data Understanding of the project. | ||
- `data_analysis.ipynb` - EDA for the dataset. | ||
- `data_quality.ipynb` - Defines data requirements for the project and showcases data checks. | ||
- `feature_descriptions.csv` - Human-understandable description of all of the available features (taken from the dataset's datacard on Kaggle) | ||
- `poc.ipynb` - Proof-of-concept model showcase that solves the business problem. | ||
``` | ||
notebooks | ||
├── business_data_understanding.ipynb # Showcases the Business Data Understanding of the project. | ||
├── data_analysis.ipynb # EDA for the dataset | ||
├── expectations.ipynb # Great Expectations | ||
├── data_quality.ipynb # Defines data requirements for the project and showcases data checks | ||
├── poc.ipynb # Proof-of-concept model showcase that solves the business problem. | ||
├── xgboost_experiment.ipynb # Proof-of-concept XGBoost model | ||
└── feature_descriptions.csv # Human-understandable description of all of the available features (taken from the dataset's datacard on Kaggle) | ||
``` |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
# MLops-project | ||
# Reports | ||
This folder contains Giskard reports (if any meaningful reports have been generated and tracked with Git) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,12 @@ | ||
# Scripts outline | ||
- `install_requirements.sh` - installs all of the requirements. Make sure that you have activated a local environment. | ||
- `test_data.sh` - installs, tests, and runs GX on the data. | ||
- `airflow_activate.sh` - run all airflow services to track and run DAGs. | ||
- `airflow_activate.sh` - delete all processes of airflow to clean up the working directory and restart airflow piplines. | ||
# Scipts | ||
``` | ||
scripts | ||
├── airflow_activate.sh # Outdated by Docker | ||
├── airflow_cleanup.sh # Outdated by Docker | ||
├── airflow_logs.sh # Outdated by Docker | ||
├── extend_activate.sh # "Extends" shell to include AIRFLOW_HOME env var | ||
├── extract_data.sh # Runs src.data Python script | ||
├── install_requirements.sh # Installs dependencies | ||
├── push_sample_version.sh # Outdated | ||
└── test_data.sh # Samples, validates, versions, and commits data | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
# GX outline | ||
- `gx/great_expectations.yaml` - primary configuration file for GX | ||
# Services outline | ||
This folder contains config files related to Airflow and GX |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
pipelines | ||
├── data_extract_v0_dag.py # Data extraction Airflow pipeline (unmaintained) | ||
├── data_extract_v1_dag.py # Data extraction Airflow pipeline (latest) | ||
├── data_prepare_dag.py # Data preparation Airflow pipeline | ||
└── data_prepare.py # Data preparation ZenML pipeline | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,12 @@ | ||
# Source code outline | ||
- `data_quality.py` - Python script to run Great eXpectations on the dataset | ||
- `data_transformations.py` - Python script with all data transformation functions for EDA and POC model | ||
- `data.py` - script for data sampling and validation | ||
``` | ||
src | ||
├── data.py # Functions to manipulate data. If ran on its own, downloads, samples, validates, and versions data | ||
├── data_quality.py # Unmaintained. `load_context_and_sample_data` is used in two notebooks. Keep this for archival reasons | ||
├── data_transformations.py # Functions for data transformation | ||
├── evaluate.py # Script that validates a given model | ||
├── main.py # Script runs model training (MLFlow) | ||
├── model.py # MLFlow-related functions | ||
├── utils.py # Utility functions | ||
└── validate.py # Giskard model validation. Generates a Giskard report | ||
``` |