Training material and references for Data Science and analytics with Python
This repository contains training material in the form of references, example notebooks, and some challenging exercises. These exercises try to cover machine learnings basics like linear and logistic regression models, as well as classification through natural language processing.
Example notebooks have various content:
- Supervised and unsupervised learning
- Time series forecasting
- Code snippets for basic clustering, correlation, A / B testing, heatmaps and more
- Weather classification through decision trees
- Using machine learning to figure out a diabetes use case
- and more!
If you are new to machine learning, or machine learning with python, we recommend the learn machine learning course. This course goes through Data Science, Statistics, and Math. All of which are explained by using Python!
If you have some basic Python skills and now a bit of statistics than you can jump into exercise 1 right away, and try to follow the tutorial that comes with it.
During the creation of this repository, we encountered many many tutorials which all explain things in a very clear and methodical way
This repository would not exist without the work of many great minds. In particular we would like to acknowledge:
- Kaggle for being awesome with their contests and varied data sets
- Driven data for also having very cool competitions
- Towards datascience for having great blogs and tutorials, and for not giving a straight copy paste way of learning
- Elite datascience for providing fundamental data science knowledge in a very understandable way
- Data science and Analytics with Python - By Jesus Rogel-Salazar - is an increadible book which we can suggest for any budding data scientist
This repository is solely meant for training purposes. We tried to include links to the original sources where possible. People are free to fork this repository and add more exercises, links, tutorials, and examples.