Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you need Spark to create a data stack? #187

Open
3 tasks done
souravsingh opened this issue Nov 13, 2024 · 0 comments
Open
3 tasks done

Do you need Spark to create a data stack? #187

souravsingh opened this issue Nov 13, 2024 · 0 comments
Labels
talk-proposal New talk of Python Pune meetup

Comments

@souravsingh
Copy link

Title of the talk

Do you need Spark to create a data stack?

Description

Spark has been considered a mature and reliable data processing framework for data engineers across the world. But with the evolution of the landscape around data engineering, we have new tools and frameworks available for use.

This talk will focus on using MinIO for object store, duckdb for data warehousing and dbt for processing. We will also look into polars for processing of data as well.

The purpose of this talk is not to declare obsolescence of Spark as a data processing library, but rather suggest alternatives for data engineers which can be useful and better suited for specific situations.

Table of contents

  1. Introduction
  2. Background behind Spark
  3. Current outlook of data engineering
  4. MinIO as local object store
  5. Duckdb as data warehouse
  6. Using dbt to define transforms

Duration (including Q&A)

30-35 mins

Prerequisites

No response

Speaker bio

My LinkedIn ID is-- https://www.linkedin.com/in/sourav-singh-8124b6267

The talk/workshop speaker agrees to

@souravsingh souravsingh added the talk-proposal New talk of Python Pune meetup label Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
talk-proposal New talk of Python Pune meetup
Projects
None yet
Development

No branches or pull requests

1 participant