Skip to content

A repository for Data science enthusiasts who want to hone their data cleaning skills

Notifications You must be signed in to change notification settings

simmieyungie/Master-Data-Cleaning

Repository files navigation

Data-Cleaning

This is a repository containing a wealth of Data Cleaning methodologies

Overview

Are you looking to improve your data cleaning skills? This is a project designed to help you master Data Cleaning.

According to a poll, data science professionals say, 80% of their time is spent on data cleaning. There is no one-size-fits-all in cleaning, however, practicing with as many datasets as you can find really sets you up in the right direction.

Of course, data comes in different formats. If you however, practice with as many as possible, you expose yourself to a wide range of manipulation techniques. Learning all the possible tehniques available helps set you with the right ability to deal with any dataset.

Datasets Scripts

Some of the datasets are excel sheets containing the cleaned version and the dirty version. The scripts to clean the data are available in Pyhon and R. If you want to clean the datasets using other languages feel free to do that.

Pull Request

We aim to populate this repository with as many cleaning projects as possible. If you have datasets you have previously cleaned, you're welcome to send a pull request. But ensure the code works and is well documented (In case you decide to leave a script as a guide). A PR of the dataset and the script should be sent and it would merged once properly reviewed. You will be added to the contributors list once your PR has been merged.

Contributors List

-Foresight BI

Contributors submission guide

  • Make sure your dataset is in a folder (Alongside a demo script/image/description to help guide what the cleaned data looks like)
  • Edit the Contributors List in the Readme to Include your social media account/LinkedIn account incase anyone tries to reach you (Optional)

https://github.com/erykml/medium_articles

About

A repository for Data science enthusiasts who want to hone their data cleaning skills

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published