The Tanzanian government has worked with for-profit and non-profit organizations to build water pumps across Tanzania to provide its denizens with potable water. These pumps need maintenance to continue to operate. Before the repair teams can be sent, they need to know which pumps need replacement or repair. It is expensive and time-consuming to send teams to all water pumps to determine which are non-functional. The purpose of this dataset analysis is to use data from the Taarifa waterpoints dashboard to create a machine learning algorithm that can determine which pumps are non-functional, saving the time and money of the Tanzanian government and its cooperative organizations.
This data was from a private Kaggle competition held by BloomTech for its DS36 Data Science cohort; its data mirrors that of the community Kaggle competition. Data files are not included per Kaggle and BloomTech guidelines.
Water_Pump_Classifier.ipynb is the most recent iteration of the project. It approaches and completes the project using CRISP-DM.
Pump_Classifier.py is a deployment file for the model created in Water_Pump_Classifier.ipynb.
feature_descriptions.json contains the feature descriptions from Kaggle.
requirements.txt is a list of the packages and modules required for the files in the repository to function when downloaded and run locally.
DS36_Unit_2_Kaggle_Project_Notebook.ipynb is the result of analysis done during the Kaggle competition. However, the notebook was constructed 'on-the-fly' and is therefore poorly structured for reader comprehension.
Tanzania_Water_Pump_Analysis.ipynb is the first attempt at providing a clear narrative of the work done in the Kaggle Project notebook. This file is incomplete.