Skip to content

Course work for Big Data Analytics with Spark

Notifications You must be signed in to change notification settings

jramakr/pyspark

Repository files navigation

pyspark

Course work for Big Data Analytics with Spark

This repository contains course assignments completed as part of the eDX course Big Data Analysis with Spark, completed in Dec '18.

Collinear Points

Given an input file with an arbitrary set of co-ordinates, the task is to use pyspark library functions and write a program in python3 to find if three or more points are collinear.

Classification CoverType

This project attempts to classify geographical locations according to their predicted tree cover using Gradient Boosting and Random Forest classifiers.

KMeans using Spark

This project estimates intrinsic dimensions by calculating the Mean Squared Distance of the entire dataset to their representative centers. We use the K-Means API in spark to find representative centers.

Twitter Data Analysis

This assignment covers a set of steps to analyze Twitter feed data.

  1. Parsing JSON strings to JSON objects
  2. Number of posts from each user partition
  3. Tokens that are relatively opular in each user partition

Tensorflow using Neural Networks

Tensorflow code to distinguish between a signal process which produces Higgs bosons and a background process which does not. We model this problem as a binary classification problem.

About

Course work for Big Data Analytics with Spark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published