Skip to content

SageMaker pipeline to classify hazardous asteroids using NASA Near Earth Object data and XGBoost.

Notifications You must be signed in to change notification settings

curio25/nasa-neo-watch

Repository files navigation

NASA Asteroid Hazard Classification with SageMaker and XGBoost

This project aims to build an end-to-end SageMaker pipeline to classify whether an asteroid is hazardous or not using NASA's Near Earth Object data and the XGBoost algorithm.

Project Description

This project leverages AWS SageMaker to create a machine learning pipeline for classifying asteroids as hazardous or non-hazardous based on data provided by NASA. The model used for classification is XGBoost, a powerful and scalable tree boosting algorithm.

Table of Contents

Installation

To run this project, you will need the following:

  • An AWS account with access to SageMaker

  • Python 3.7 or higher

  • Boto3 and AWS CLI configured with your AWS credentials

  • Necessary Python libraries (pandas, numpy, sagemaker, etc.)

You can install the required libraries using pip:

    pip install -r requirements.txt  

Dataset

The dataset used in this project is the NASA Near Earth Object data. It contains information about various asteroids, including their size, velocity, distance from Earth, and whether they are classified as hazardous.

You can download the dataset from NASA's official repository.

Pipeline Overview

The pipeline consists of the following steps:

  1. Data Preprocessing: Cleaning and preparing the data for training.

  2. Feature Engineering: Creating relevant features for the model.

  3. Model Training: Training the XGBoost model on the preprocessed data.

  4. Model Evaluation: Evaluating the model's performance using appropriate metrics.

  5. Deployment: Deploying the trained model to an endpoint for inference.

Model Training

The model is trained using the XGBoost algorithm. The training process includes:

  1. Loading the dataset into a SageMaker-compatible format.

  2. Defining the XGBoost estimator with appropriate hyperparameters.

  3. Fitting the model on the training data.

Evaluation

The model's performance is evaluated using metrics such as accuracy, precision, recall, and F1 score. Confusion matrices and ROC curves are also generated to provide a detailed analysis of the model's performance.

Results

The results of the model, including performance metrics and visualizations, are documented in this section. The model's predictions are compared against the actual labels to determine its effectiveness in classifying hazardous asteroids.

Contributing

Contributions to this project are welcome. If you have suggestions for improvements or new features, please submit a pull request or open an issue.

About

SageMaker pipeline to classify hazardous asteroids using NASA Near Earth Object data and XGBoost.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published