-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
8b6b314
commit 00c6fd7
Showing
1 changed file
with
43 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,45 @@ | ||
# bridgeAI-drift-monitoring | ||
|
||
## ENV VARS | ||
## Drift detection | ||
|
||
1. The data used is available [here](https://www.kaggle.com/datasets/yasserh/housing-prices-dataset). | ||
2. Ensure you have a model endpoint available that serves the regression model that we want to test the data against | ||
3. Update the python environment in `.env` file | ||
4. Install `poetry` if not already installed | ||
5. Install the dependencies using poetry `poetry install` | ||
6. update the config and other parameters in the `config.yaml` file | ||
7. Add `./src` to the `PYTHONPATH` - `export PYTHONPATH="${PYTHONPATH}:./src"` | ||
8. Run `poetry run python src/main.py` | ||
|
||
**The above manual steps are automated using the drift detection dag in the [DAGs repo](https://github.com/digicatapult/bridgeAI-airflow-DAGs) **\ | ||
|
||
|
||
### Environment Variables | ||
|
||
The following environment variables need to be set for this repo. | ||
|
||
| Variable | Default Value | Description | | ||
|--------------------------|--------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------| | ||
| CONFIG_PATH | `./config.yaml` | File path to the model training and other configuration file | | ||
| LOG_LEVEL | `INFO` | The logging level for the application. Valid values are `DEBUG`, `INFO`, `WARNING`, `ERROR`, and `CRITICAL`. | | ||
| MLFLOW_TRACKING_URI | `http://localhost:5000` | MLFlow tracking URI. Use `http://host.docker.internal:5000` if the MLFlow is running within docker container. | | ||
| GITHUB_USERNAME | None | Githuib username. This is needed to pull the data form the dvc repo. | | ||
| GITHUB_PASSWORD | None | Githuib token. This is needed to pull the data form the dvc repo. | | ||
| DVC_ACCESS_KEY_ID | `admin` | Access key for dvc remote | | ||
| DVC_SECRET_ACCESS_KEY | `password` | secret access key for dvc remote | | ||
| DVC_REMOTE_NAME | `regression-model-remote` | A name assigned to the dvc remote | | ||
| DVC_REMOTE | `s3://artifacts` | DVC remote path (to s3/minio bucket) | | ||
| DVC_ENDPOINT_URL | `http://minio` | Endpoint url for dvc remote | | ||
| DATA_REPO | `https://github.com/digicatapult/bridgeAI-regression-model-data-ingestion.git` | data ingestion repo where the data is versioned with dvc | | ||
| HISTORICAL_DATA_VERSION | `data-v1.0.0` | the data version (dvc tagged version from the data ingestion repo) used for training the model | | ||
| NEW_DATA_VERSION | `data-v1.1.0` | the data version (dvc tagged version from the data ingestion repo) curresponding to the new data | | ||
| MODEL_ENDPOINT | `http://host.docker.internal:5001/invocations` | deployed model endpoint using which predictions can be made | | ||
|
||
|
||
### Running the tests | ||
|
||
Ensure that you have the project requirements already set up by following the [Data Ingestion and versioning](#data-ingestion-and-versioning) instructions | ||
- Ensure `pytest` is installed. `poetry install` will install it as a dependency. | ||
|
||
[//]: # (- - For integration tests, set up the dependencies (MLFlow) by running, `docker-compose up -d`) | ||
- Run the tests with `poetry run pytest ./tests` |