NASA Asteroid Hazard Classification with SageMaker and XGBoost

This project aims to build an end-to-end SageMaker pipeline to classify whether an asteroid is hazardous or not using NASA's Near Earth Object data and the XGBoost algorithm.

Project Description

This project leverages AWS SageMaker to create a machine learning pipeline for classifying asteroids as hazardous or non-hazardous based on data provided by NASA. The model used for classification is XGBoost, a powerful and scalable tree boosting algorithm.

Installation

To run this project, you will need the following:

An AWS account with access to SageMaker
Python 3.7 or higher
Boto3 and AWS CLI configured with your AWS credentials
Necessary Python libraries (pandas, numpy, sagemaker, etc.)

You can install the required libraries using pip:

    pip install -r requirements.txt

Dataset

The dataset used in this project is the NASA Near Earth Object data. It contains information about various asteroids, including their size, velocity, distance from Earth, and whether they are classified as hazardous.

You can download the dataset from NASA's official repository.

Pipeline Overview

The pipeline consists of the following steps:

Data Preprocessing: Cleaning and preparing the data for training.
Feature Engineering: Creating relevant features for the model.
Model Training: Training the XGBoost model on the preprocessed data.
Model Evaluation: Evaluating the model's performance using appropriate metrics.
Deployment: Deploying the trained model to an endpoint for inference.

Model Training

The model is trained using the XGBoost algorithm. The training process includes:

Loading the dataset into a SageMaker-compatible format.
Defining the XGBoost estimator with appropriate hyperparameters.
Fitting the model on the training data.

Evaluation

The model's performance is evaluated using metrics such as accuracy, precision, recall, and F1 score. Confusion matrices and ROC curves are also generated to provide a detailed analysis of the model's performance.

Results

The results of the model, including performance metrics and visualizations, are documented in this section. The model's predictions are compared against the actual labels to determine its effectiveness in classifying hazardous asteroids.

Contributing

Contributions to this project are welcome. If you have suggestions for improvements or new features, please submit a pull request or open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
config		config
data		data
scripts		scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
documentation_nasa-neo-watch.ipynb		documentation_nasa-neo-watch.ipynb
requirements.txt		requirements.txt
script.py		script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NASA Asteroid Hazard Classification with SageMaker and XGBoost

Project Description

Table of Contents

Installation

Dataset

Pipeline Overview

Model Training

Evaluation

Results

Contributing

About

Releases

Packages

Languages

curio25/nasa-neo-watch

Folders and files

Latest commit

History

Repository files navigation

NASA Asteroid Hazard Classification with SageMaker and XGBoost

Project Description

Table of Contents

Installation

Dataset

Pipeline Overview

Model Training

Evaluation

Results

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages