This project aims to build an end-to-end SageMaker pipeline to classify whether an asteroid is hazardous or not using NASA's Near Earth Object data and the XGBoost algorithm.
This project leverages AWS SageMaker to create a machine learning pipeline for classifying asteroids as hazardous or non-hazardous based on data provided by NASA. The model used for classification is XGBoost, a powerful and scalable tree boosting algorithm.
To run this project, you will need the following:
-
An AWS account with access to SageMaker
-
Python 3.7 or higher
-
Boto3 and AWS CLI configured with your AWS credentials
-
Necessary Python libraries (pandas, numpy, sagemaker, etc.)
You can install the required libraries using pip:
pip install -r requirements.txt
The dataset used in this project is the NASA Near Earth Object data. It contains information about various asteroids, including their size, velocity, distance from Earth, and whether they are classified as hazardous.
You can download the dataset from NASA's official repository.
The pipeline consists of the following steps:
-
Data Preprocessing: Cleaning and preparing the data for training.
-
Feature Engineering: Creating relevant features for the model.
-
Model Training: Training the XGBoost model on the preprocessed data.
-
Model Evaluation: Evaluating the model's performance using appropriate metrics.
-
Deployment: Deploying the trained model to an endpoint for inference.
The model is trained using the XGBoost algorithm. The training process includes:
-
Loading the dataset into a SageMaker-compatible format.
-
Defining the XGBoost estimator with appropriate hyperparameters.
-
Fitting the model on the training data.
The model's performance is evaluated using metrics such as accuracy, precision, recall, and F1 score. Confusion matrices and ROC curves are also generated to provide a detailed analysis of the model's performance.
The results of the model, including performance metrics and visualizations, are documented in this section. The model's predictions are compared against the actual labels to determine its effectiveness in classifying hazardous asteroids.
Contributions to this project are welcome. If you have suggestions for improvements or new features, please submit a pull request or open an issue.