I cleaned and explored a Kaggle stroke dataset to apply machine learning techniques (random forests and cross validation) to create a model to predict strokes using a patient’s past medical history data. An article was written explaining findings including likelihood of stroke and contributing factors. Spoiler alert: this model predicted a stroke with 81% precision and recall.
- Pandas
- Python
I want to work on deploying this to a web app where users could enter their health data and predict the likelyhood of a stroke. A more robust dataset would be useful as well.