pyspark

Course work for Big Data Analytics with Spark

This repository contains course assignments completed as part of the eDX course Big Data Analysis with Spark, completed in Dec '18.

Collinear Points

Given an input file with an arbitrary set of co-ordinates, the task is to use pyspark library functions and write a program in python3 to find if three or more points are collinear.

Classification CoverType

This project attempts to classify geographical locations according to their predicted tree cover using Gradient Boosting and Random Forest classifiers.

KMeans using Spark

This project estimates intrinsic dimensions by calculating the Mean Squared Distance of the entire dataset to their representative centers. We use the K-Means API in spark to find representative centers.

Twitter Data Analysis

This assignment covers a set of steps to analyze Twitter feed data.

Parsing JSON strings to JSON objects
Number of posts from each user partition
Tokens that are relatively opular in each user partition

Tensorflow using Neural Networks

Tensorflow code to distinguish between a signal process which produces Higgs bosons and a background process which does not. We model this problem as a binary classification problem.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Classification_CoverType.ipynb		Classification_CoverType.ipynb
Collinear Points.ipynb		Collinear Points.ipynb
KMeans using Spark.ipynb		KMeans using Spark.ipynb
README.md		README.md
Tensorflow_NeuralNetworks.ipynb		Tensorflow_NeuralNetworks.ipynb
Twitter Data Analysis.ipynb		Twitter Data Analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyspark

Collinear Points

Classification CoverType

KMeans using Spark

Twitter Data Analysis

Tensorflow using Neural Networks

About

Releases

Packages

Languages

jramakr/pyspark

Folders and files

Latest commit

History

Repository files navigation

pyspark

Collinear Points

Classification CoverType

KMeans using Spark

Twitter Data Analysis

Tensorflow using Neural Networks

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages