This repository showcases how to train a YOLOv8 deep learning model on the Pyronear dataset. Key features include the use of DVC for data versioning and MLflow for model versioning and performance tracking, with cloud storage for data.
- Python 3.x
- Pip package manager
To install necessary libraries, run:
pip install -r requirements.txt
-
Download the dataset using the following command:
gdown --fuzzy https://drive.google.com/file/d/12gGuFd3aQmtPXP-cbBRjsciWLtpFNBB-/view?usp=sharing
-
Unzip and organize the dataset:
mkdir datasets unzip DS-18d12de1.zip -d datasets/
-
Update the dataset path in
data_configuration.yaml
.
The dataset comprises 596 training images and 148 validation images featuring forest landscapes with smoke. Each image (640x480 pixels) is annotated with a bounding box in a corresponding txt file, marking the smoke areas.
Use the same requirements file to install DVC.
-
Initialize DVC in your workspace:
dvc init
-
Set up remote storage (e.g., AWS S3, Google Cloud Storage):
dvc remote add -d remote_storage path/to/your/dvc_remote
-
Track data and configuration files using DVC:
dvc add <file_or_directory> git add .dvc/<file_or_directory>.dvc .gitignore
MLflow is used for experiment tracking and model management. Key tracked metrics include epochs, accuracy, and loss.
-
Start the MLflow UI:
mlflow ui
-
(Optional) Specify a custom port:
mlflow ui --port <port_number>
Execute the training script with specified data and model configurations:
python3 train_yolo.py --data_config data_configuration.yaml --model_config model_configuration.yaml
Add AWS credentials to the training script:
"s3", aws_access_key_id="your_access_key_id", aws_secret_access_key="your_secret_access_key"
You've successfully set up and run the Pyronear machine learning pipeline for wildfire detection.