pyspark
Here are 3,700 public repositories matching this topic...
Simple and Distributed Machine Learning
-
Updated
Nov 19, 2024 - Scala
State of the Art Natural Language Processing
-
Updated
Nov 21, 2024 - Scala
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
-
Updated
Nov 19, 2024 - Java
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
-
Updated
Dec 2, 2023 - Python
A curated list of awesome Apache Spark packages and resources.
-
Updated
Oct 24, 2024 - Shell
Implementing best practices for PySpark ETL jobs and applications.
-
Updated
Jan 1, 2023 - Python
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
-
Updated
Mar 16, 2024 - Jupyter Notebook
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
-
Updated
Nov 18, 2024 - Python
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
-
Updated
Jul 18, 2022 - Jupyter Notebook
PySpark-Tutorial provides basic algorithms using PySpark
-
Updated
Jan 20, 2023 - Jupyter Notebook
Hopsworks - Data-Intensive AI platform with a Feature Store
-
Updated
Nov 4, 2024 - Java
MapReduce, Spark, Java, and Scala for Data Algorithms Book
-
Updated
Oct 14, 2024 - Java
Sparkling Water provides H2O functionality inside Spark cluster
-
Updated
Nov 19, 2024 - Scala
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
-
Updated
Mar 14, 2024 - Vue
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
-
Updated
Aug 10, 2022 - JavaScript
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
-
Updated
Oct 8, 2024 - Python
Improve this page
Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."