Skip to content
Nihal Soans edited this page Feb 2, 2018 · 2 revisions

Welcome to the Team Andromeda wiki!


Goal

Our goal was to design a large-scale document classifier in Apache Spark that maximizes its classification accuracy against a testing dataset.The training Dataset consists of vsmall, small,large sets which range from 1kb all the way to 1GB. Using this dataset from Reuters we train a Baysian Classifier to distinguish between the labels