This is a repository containing a wealth of Data Cleaning methodologies
Are you looking to improve your data cleaning skills? This is a project designed to help you master Data Cleaning.
According to a poll, data science professionals say, 80% of their time is spent on data cleaning. There is no one-size-fits-all in cleaning, however, practicing with as many datasets as you can find really sets you up in the right direction.
Of course, data comes in different formats. If you however, practice with as many as possible, you expose yourself to a wide range of manipulation techniques. Learning all the possible tehniques available helps set you with the right ability to deal with any dataset.
Some of the datasets are excel sheets containing the cleaned version and the dirty version. The scripts to clean the data are available in Pyhon and R. If you want to clean the datasets using other languages feel free to do that.
We aim to populate this repository with as many cleaning projects as possible. If you have datasets you have previously cleaned, you're welcome to send a pull request. But ensure the code works and is well documented (In case you decide to leave a script as a guide). A PR of the dataset and the script should be sent and it would merged once properly reviewed. You will be added to the contributors list once your PR has been merged.
- Make sure your dataset is in a folder (Alongside a demo script/image/description to help guide what the cleaned data looks like)
- Edit the Contributors List in the Readme to Include your social media account/LinkedIn account incase anyone tries to reach you (Optional)