This repository contains implementations for different machine learning models for large datasets. Thay are briefly summarized below:
- NaiveBayes MapReduce for Large Data - Implements training a Naive Bayes classifier for document classification
- GuineaPIG Document Classification for Large Data - Uses the mapreduce abstraction GuineaPIG for training a Naive Bayes document classifier
- Multinomial Regression for Large Data - Implements the lazy sparse SGD method for classifying a very set of documents
- Automatic Differentiation (MLP) - Implements the autograd or automatic differentiation using wengert lists for a multi layer perceptron.
- Automatic Differentiation (LSTM) - Implements the autograd or automatic differentiation using wengert lists for an LSTM.
- LDA for Large Data - Implements LDA using Gibbs Sampling and trains on large datsets using the JBosen parameter server