This project aims to leverage machine learning to identify individuals at risk of diabetes based on various health indicators. By analyzing a dataset of health measures and demographic information, we train a supervised machine learning model to predict the likelihood of a person having diabetes.
See project wiki or click here for the Computer Science Capstone write-up.
The At-Risk Diabetes Classifier uses a dataset that includes health indicators such as BMI, age, smoking status, physical activity, and others, to predict diabetes status. Our approach involves preprocessing the data, selecting relevant features, splitting the dataset into training and testing sets, and then training and evaluating a machine learning model.
- Data Preprocessing: Cleaning and preparing data for modeling, including handling missing values and encoding categorical variables.
- Feature Engineering: Selecting and potentially creating new features to improve model performance.
- Model Selection: Evaluating several machine learning algorithms to identify the most effective model.
- Evaluation: Assessing the model's performance using metrics such as accuracy, precision, recall, and F1 score.
We experiment with various machine learning models, including Logistic Regression and Random Forest, to identify the most suitable model based on performance metrics.
Options to view and interact with the notebook:
Contributions are always welcome! Please feel free to submit pull requests or open issues to discuss improvements or additions to the project.
This project is open-sourced under the Apache-2.0 License.