Skip to content

Latest commit

 

History

History
27 lines (17 loc) · 1.59 KB

File metadata and controls

27 lines (17 loc) · 1.59 KB

Workshop: Build Portable Data Lake

In this hands-on course, you'll learn how to create a basic yet functional portable data lake that sidesteps traditional cloud vendor locks.

With open-source technologies like Iceberg, Delta, and DuckDB at the forefront, we'll explore the power of portable data runtimes, embedded catalogs and cloud-agnostic compute solutions.

We’ll evaluate our alternatives and discuss existing industry limitations and why we chose the solution implemented.

We will then walk you through building a portable data lake from scratch, while understanding the trade-offs of using open-source tools in real-world scenarios.

What's covered?

In this workshop, you'll get hands-on experience with a variety of powerful open-source tools that will empower you to build your data lake.

  • You'll learn about the current state of the industry and how to sidestep the current limitations.
  • We will compare our options building with Iceberg, Delta, or different stacks altogether.
  • Finally, we will choose a stack that's not currently vendor locked and build a functional portable data lake.
  • With dlt, parquet and DuckDB we will manage our data loading and storage.
  • Explore using Ibis as an embedded catalog and explore the benefits of this approach.
  • We explore how Polars fits in this stack to accelerate data exploration.
  • Finally, we will explore how to make this data accessible to other compute engines.

Materials