Skip to content

MariiaAleksandrovych/ML-Hospital-Readmissions-Pred

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Machine Learning Multi-Modal Modeling Project

Mariia Aleksandrovych

Introduction

This GitHub repository is dedicated to a machine learning project that focuses on multi-modal modeling using various data types, including text, categorical, and numerical data. The primary objective of this project is to predict the "readmitted" status of patients using the features provided in the "8k_diabetes_v2.csv" dataset.

Project Structure

The project is divided into several parts, as outlined in the project description:

Part A: Model Code and Exploration

  1. Exploratory Data Analysis (EDA): In this part, we analyze the dataset to understand its characteristics, identify patterns, and gain insights into the data. EDA plays a crucial role in shaping our approach to modeling.

  2. Pre-processing Categorical Data: Categorical data often require encoding and transformations before they can be used in machine learning models. This section explains how we pre-process categorical data and the justification behind the chosen methods.

  3. Pre-processing Numerical Data: Handling missing data and other preprocessing steps for numerical features is discussed in this section. Different algorithms may require distinct treatment, and the reasons for these choices are explained.

  4. Text Data Modeling with tf-idf: We implement a model to make predictions using text data, particularly the "diag_desc_combined" field, by employing the Term Frequency-Inverse Document Frequency (tf-idf) method.

  5. Model Stacking: This section covers the incorporation of tf-idf predictions for the text field into downstream algorithms, utilizing model stacking techniques.

  6. Experimentation with Multiple Algorithms: We explore multiple modeling algorithms, each chosen with specific reasons, and conduct experiments to assess their performance.

  7. Final Model Selection: The final choice of the model is discussed, along with an analysis of its strengths and weaknesses. We also address where the model may not perform well and potential areas for improvement.

Getting Started

To get started with this project, follow these steps:

  1. Clone this repository to your local machine.
  2. Download the dataset "8k_diabetes_v2.csv" from the provided source and place it in the project directory.
  3. Refer to the Jupyter notebooks or Python scripts in this repository to explore the code and models.
  4. Follow the instructions within each notebook or script to run the code and perform your own analyses.

Dependencies

Make sure you have the following Python libraries installed to run the code in this project:

  • NumPy
  • Pandas
  • Matplotlib
  • Seaborn
  • Scikit-learn
  • NLTK
  • TensorFlow
  • XGBoost (or other preferred machine learning libraries)

Contributions

If you'd like to contribute to this project, feel free to fork the repository and submit pull requests. We welcome improvements, bug fixes, or additional features.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published