This project provides a set of tools for predicting air raid alerts in Ukraine. It uses historical data to train a GBM (Gradient Boosting Model) that can forecast the likelihood of future air raid alerts. The system is designed to be interactive, allowing users to specify parameters for both training and prediction processes.
The project consists of three Python scripts:
predict.py
: For making predictions using a pre-trained model.train.py
: For training the model on historical air raid alert data.erase-models.py
: For deleting all pre-trained models from the system.
Additionally, there is a models/
folder intended to store the trained models.
- Clone the repository to your local machine.
- Ensure you have Python 3.6+ installed.
- Install the required dependencies by running
pip install -r requirements.txt
. - Run
train.py
to train a new model based on the latest data. - Use
predict.py
to make predictions with the trained model. - If needed, run
erase-models.py
to remove all models from themodels/
directory.
Before running the scripts, you must have the following packages installed:
- pandas
- joblib
- tqdm
- xgboost
- requests
You can install these packages using pip
:
pip install pandas joblib tqdm xgboost requests
Run train.py
from the command line to start the training process. The script will prompt you to provide a date from which to train the model, and a specific region to focus on, and the filename for the model. After training, it will save the model to the models/
directory with the provided filename or a randomly generated one.
Execute predict.py
to use a trained model for predictions. You will be asked to provide the model filename and other parameters such as the current date and the number of days to predict. The script will then display the predicted high-probability alert times.
When you run erase-models.py
, it will ask for confirmation before deleting all files within the models/
directory. This is useful for clearing out old models.
predict.py
: Contains the logic for loading a trained model and making predictions.train.py
: Handles the downloading of the dataset, preprocessing, training the model, and saving it.erase-models.py
: Provides a simple script to clear all trained models.models/
: A directory intended to store trained model files.
ARAPS operates by utilizing historical air raid alert data to train a machine learning model, which can then predict the likelihood of future alerts. Here's a breakdown of how each component of the system contributes to its overall operation:
The system begins with the acquisition of historical air raid data. This data is obtained via a URL pointing to a dataset, typically in CSV format, which contains records of past air raid alerts. The train.py
script is responsible for downloading this dataset.
Once downloaded, the data is preprocessed to be suitable for machine learning. This involves:
- Converting timestamps to a datetime format.
- Resampling the data to an hourly frequency to count the number of alerts within each hour.
- Filling in missing values to ensure a continuous timeline.
- Extracting time-related features such as the hour of the day and the day of the week, which are crucial for the model to identify patterns.
The preprocessed data is used to train a Gradient Boosting Machine (GBM) model, specifically an implementation provided by the XGBoost library. This model is chosen for its effectiveness in handling tabular data and its ability to capture complex nonlinear relationships.
The training process involves:
- Splitting the data into features (the time-related aspects) and the target variable (the number of alerts).
- Feeding this data into the XGBoost model, which learns to predict the target variable based on the features.
- The model learns by minimizing the difference between its predictions and the actual number of alerts, adjusting its internal parameters accordingly.
With a trained model, the predict.py
script is used to make predictions:
- The user specifies the current date and the number of days into the future for which to predict air raid alerts.
- The script generates features for the specified time period (the next 7 days by default), considering the hour and the day of the week for each time slot.
- The trained model uses these features to predict the likelihood of an air raid alert for each hour in the prediction range.
To determine which times are most likely to have an air raid alert, a threshold is applied:
- Predictions that represent the likelihood of an alert are filtered by a threshold, which is set at the 90th percentile by default. This means only the top 10% of predictions with the highest likelihood are considered as potential alerts.
- Users can adjust this threshold to be more or less conservative based on their needs.
This project is licensed under the MIT License - see the LICENSE file for details.
- This project was created by Artem.
- Special thanks to Vadym Klymenko for the Ukrainian air raid sirens dataset, which is used to train the models.
For support, please open an issue on the project's GitHub repository.
Contributions are welcome. Please read CONTRIBUTING.md
for details on how to contribute to this project.
For any queries or further assistance, please contact Artem at [email protected].
Copyright (c) 2023 Artem Curious. All rights reserved.