#

pyspark

Here are 3,703 public repositories matching this topic...

basel-ay / Hands-on-Apache-Spark

Writing dummy snippets of code to read, manipulate, and build a simple ML model with PySpark.

apache-spark linear-regression pyspark

Updated Jul 18, 2023
Jupyter Notebook

zuliani99 / All-Pairs-Docs-Similarity

Given a set of documents and the minimum required similarity threshold find the number of document pairs that exceed the threshold

sklearn pyspark tf-idf cosine-similarity document-similarity beir

Updated May 26, 2023
Jupyter Notebook

JonathanPollyn / Spark

This notebook contains detailed code for spark and machine learning and databricks

python spark pyspark spark-sql pyspark-python

Updated Mar 15, 2023
Jupyter Notebook

data-miner00 / spark

A laboratory to carry out experiments with PySpark

python pyspark databricks

Updated Nov 5, 2023
Jupyter Notebook

furkancets / PrescreiberPipelineSpark

Trying best case apache spark working environment for robust data pipelines

spark apache-spark hadoop pyspark

Updated Apr 1, 2023
Python

simonediluna / Distributed-Data-Analysis-and-Mining

An academic project carried out for the Distributed Data Analysis and Mining course (a. y. 2022/2023)

distributed-systems data-science pyspark

Updated May 18, 2023
Jupyter Notebook

milesgranger / pontem

Treat Spark like pandas.

pandas pyspark dataframes dataframe-api spark-dataframes distributed-dataframe

Updated Sep 3, 2017
Python

SreekarJammula / tf-idf-

The current assignment is to write the python scripts for Apache Spark. The tasks are divided into three parts as below: WordCount-To count the occurrences of words in a book on a per-book basis and compare the results with those of Assignment1. pyspark.ml. feature- To count the tf-idf values for the unigram and bigrams using the pyspark.ml.feat…

apache-spark pyspark tf-idf spark-ml

Updated Dec 26, 2017
Python

aashokvardhan / Predicting-Forest-Cover-with-Decision-Trees

random-forest python3 pyspark decision-tree pyspark-notebook

Updated Dec 7, 2017
Jupyter Notebook

CSQlombard / Spark

Pyspark Codes for Machine Learning and Big Data

machine-learning big-data pyspark

Updated Feb 1, 2018
Python

abhinavmaurya / brown-datathon

Data Visualization and Prediction for TripAdvisor's dataset at Brown-Datathon event

scikit-learn pyspark d3js tripadvisor

Updated May 26, 2017
JavaScript

riki95 / machine-learning-pyspark

Machine Learning Task implemented in PySpark to parallelise K-Fold Cross Validation

machine-learning spark azure parallel pyspark google-cloud-platform kfold-cross-validation

Updated May 3, 2020
Python

harrisonfeng / ml-w-pyspark

ML examples with pyspark

Updated Jan 18, 2019
Python

jramakr / pyspark

Course work for Big Data Analytics with Spark

Updated Jan 10, 2019
Jupyter Notebook

GregMurray30 / recommendation_engines

Recommendation engine using Apache Spark (PySpark) and Python using network theory

pyspark networks recommendation-engine

Updated Jan 17, 2020
Python

venkateshavula / Evaluate-Spark-MLlib-using-PySpark

A UDF to evaluate Spark-MLlib classification model using PySpark

pyspark evaluation-metrics spark-mllib classification-algorithims spark-ml

Updated Oct 19, 2018
Python

Djasingh / Assignment

Spark Assignment

assignment pyspark data-analytics

Updated Jun 29, 2018
Python

peleonard / MusicBoxCapstone

machine pyspark churn-prediction

Updated Jul 16, 2018
Jupyter Notebook

i05nagai / docker-pyspark-pytest

PySpark with pytest in dokcer

docker pytest pyspark

Updated May 23, 2018
Python

farrajota / kaggle_house_prices

Kaggle's house prices competition

python docker notebook scikit-learn jupyter-notebook kaggle pyspark kaggle-competition kaggle-house-prices

Updated Sep 27, 2018
Makefile

Improve this page

Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."