Machine Leanrning for Large Datasets

This repository contains implementations for different machine learning models for large datasets. Thay are briefly summarized below:

NaiveBayes MapReduce for Large Data - Implements training a Naive Bayes classifier for document classification
GuineaPIG Document Classification for Large Data - Uses the mapreduce abstraction GuineaPIG for training a Naive Bayes document classifier
Multinomial Regression for Large Data - Implements the lazy sparse SGD method for classifying a very set of documents
Automatic Differentiation (MLP) - Implements the autograd or automatic differentiation using wengert lists for a multi layer perceptron.
Automatic Differentiation (LSTM) - Implements the autograd or automatic differentiation using wengert lists for an LSTM.
LDA for Large Data - Implements LDA using Gibbs Sampling and trains on large datsets using the JBosen parameter server

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
AutoDiff MLP		AutoDiff MLP
Autodiff LSTM		Autodiff LSTM
Document Classification using GuineaPIG		Document Classification using GuineaPIG
LDA using Parameter Server		LDA using Parameter Server
Lazy Sparse SGD for Multinomial Regression		Lazy Sparse SGD for Multinomial Regression
Naive Bayes Training Hadoop MapReduce		Naive Bayes Training Hadoop MapReduce
README.md		README.md
_config.yml		_config.yml

Provide feedback