Skip to content

samridhishree/Machine-Learning-for-Large-Datasets

Repository files navigation

Machine Leanrning for Large Datasets

This repository contains implementations for different machine learning models for large datasets. Thay are briefly summarized below:

  1. NaiveBayes MapReduce for Large Data - Implements training a Naive Bayes classifier for document classification
  2. GuineaPIG Document Classification for Large Data - Uses the mapreduce abstraction GuineaPIG for training a Naive Bayes document classifier
  3. Multinomial Regression for Large Data - Implements the lazy sparse SGD method for classifying a very set of documents
  4. Automatic Differentiation (MLP) - Implements the autograd or automatic differentiation using wengert lists for a multi layer perceptron.
  5. Automatic Differentiation (LSTM) - Implements the autograd or automatic differentiation using wengert lists for an LSTM.
  6. LDA for Large Data - Implements LDA using Gibbs Sampling and trains on large datsets using the JBosen parameter server