Skip to content

Uses ML to predict if patients are at-risk for Diabetes based on Health Indicator data

License

Notifications You must be signed in to change notification settings

joseph-curtis/data-science-diabetes-classifier

Repository files navigation

At-Risk Diabetes Classifier

This project aims to leverage machine learning to identify individuals at risk of diabetes based on various health indicators. By analyzing a dataset of health measures and demographic information, we train a supervised machine learning model to predict the likelihood of a person having diabetes.

Documentation

See project wiki or click here for the Computer Science Capstone write-up.

Project Overview

The At-Risk Diabetes Classifier uses a dataset that includes health indicators such as BMI, age, smoking status, physical activity, and others, to predict diabetes status. Our approach involves preprocessing the data, selecting relevant features, splitting the dataset into training and testing sets, and then training and evaluating a machine learning model.

Key Features

  • Data Preprocessing: Cleaning and preparing data for modeling, including handling missing values and encoding categorical variables.
  • Feature Engineering: Selecting and potentially creating new features to improve model performance.
  • Model Selection: Evaluating several machine learning algorithms to identify the most effective model.
  • Evaluation: Assessing the model's performance using metrics such as accuracy, precision, recall, and F1 score.

Model

We experiment with various machine learning models, including Logistic Regression and Random Forest, to identify the most suitable model based on performance metrics.

Usage

Options to view and interact with the notebook:

  1. Run this inside Google's Colaboratory environment: Open In Colaboratory

  2. Run this project using Binder: Launch Binder

  3. Visit the Kaggle project page (static output only): Open in Kaggle

  4. Download this repo and run it on your local machine: GitHub Downloads (all assets, all releases)

Contributing

Contributions are always welcome! Please feel free to submit pull requests or open issues to discuss improvements or additions to the project.

License

This project is open-sourced under the Apache-2.0 License.

About

Uses ML to predict if patients are at-risk for Diabetes based on Health Indicator data

Resources

License

Stars

Watchers

Forks

Packages

No packages published