A collection of small projects in the field of data science. Each project is independent and serves mostly to demonstrate a concept and help me understand it better. This repository is mostly for keeping track of my study, but hopefully it can be useful to someone else too. :)
I recently finished Stanford's course of Machine Learning by Andrew Ng in Coursera. Sadly sharing the programming assignments is against the "Code of Honor" which I do respect. Therefore I will not upload them here - if you are looking for copy-paste material this is not the place. But in order to further study the topics from the course I plan to implement the concepts in Python. I'll not translate the Octave code into Python, but instead I'll try to improve the prediction models and maybe find more interesting datasets.
Besides the Coursera inspired mini-projects I plan to put my notes and homework for Sofia University FMI course of machine learning with Python and some Big data processing programs with Spark + Scala (some of them related to Advanced Analytics with Spark)
And last but not least I'll start my own data-related projects here until they prove to be worthy of their own repository.
I hope this repository can help someone else in their learning path. Either way if you have feedback please tweet me @mightypixel or just put a PR on GitHub.
- Fundamentals:
- Linear regression
- Logistic regression
- [Exam Scores](/Fundamentals/logistic_regression/Logistic\ Regression.ipynb)
- Multi-class classification with neural network
- SVM
- K-Means clustering and PCA
- Anomaly Detection
- Recommendation systems
- Dataset exploration:
- Titanic
- Exoplanet Hunting in Deep Space (TODO: https://www.kaggle.com/keplersmachines/kepler-labelled-time-series-data)
- Uber Drives (TODO: https://www.kaggle.com/zusmani/uberdrives)
- Big data with Spark