Machine Learning Multi-Modal Modeling Project

Mariia Aleksandrovych

Introduction

This GitHub repository is dedicated to a machine learning project that focuses on multi-modal modeling using various data types, including text, categorical, and numerical data. The primary objective of this project is to predict the "readmitted" status of patients using the features provided in the "8k_diabetes_v2.csv" dataset.

Project Structure

The project is divided into several parts, as outlined in the project description:

Part A: Model Code and Exploration

Exploratory Data Analysis (EDA): In this part, we analyze the dataset to understand its characteristics, identify patterns, and gain insights into the data. EDA plays a crucial role in shaping our approach to modeling.
Pre-processing Categorical Data: Categorical data often require encoding and transformations before they can be used in machine learning models. This section explains how we pre-process categorical data and the justification behind the chosen methods.
Pre-processing Numerical Data: Handling missing data and other preprocessing steps for numerical features is discussed in this section. Different algorithms may require distinct treatment, and the reasons for these choices are explained.
Text Data Modeling with tf-idf: We implement a model to make predictions using text data, particularly the "diag_desc_combined" field, by employing the Term Frequency-Inverse Document Frequency (tf-idf) method.
Model Stacking: This section covers the incorporation of tf-idf predictions for the text field into downstream algorithms, utilizing model stacking techniques.
Experimentation with Multiple Algorithms: We explore multiple modeling algorithms, each chosen with specific reasons, and conduct experiments to assess their performance.
Final Model Selection: The final choice of the model is discussed, along with an analysis of its strengths and weaknesses. We also address where the model may not perform well and potential areas for improvement.

Getting Started

To get started with this project, follow these steps:

Clone this repository to your local machine.
Download the dataset "8k_diabetes_v2.csv" from the provided source and place it in the project directory.
Refer to the Jupyter notebooks or Python scripts in this repository to explore the code and models.
Follow the instructions within each notebook or script to run the code and perform your own analyses.

Dependencies

Make sure you have the following Python libraries installed to run the code in this project:

NumPy
Pandas
Matplotlib
Seaborn
Scikit-learn
NLTK
TensorFlow
XGBoost (or other preferred machine learning libraries)

Contributions

If you'd like to contribute to this project, feel free to fork the repository and submit pull requests. We welcome improvements, bug fixes, or additional features.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
code		code
data-sets		data-sets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Multi-Modal Modeling Project

Mariia Aleksandrovych

Introduction

Project Structure

Part A: Model Code and Exploration

Getting Started

Dependencies

Contributions

About

Releases

Packages

Languages

MariiaAleksandrovych/ML-Hospital-Readmissions-Pred

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Multi-Modal Modeling Project

Mariia Aleksandrovych

Introduction

Project Structure

Part A: Model Code and Exploration

Getting Started

Dependencies

Contributions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages