In this hands-on course, you'll learn how to create a basic yet functional portable data lake that sidesteps traditional cloud vendor locks.
With open-source technologies like Iceberg, Delta, and DuckDB at the forefront, we'll explore the power of portable data runtimes, embedded catalogs and cloud-agnostic compute solutions.
We’ll evaluate our alternatives and discuss existing industry limitations and why we chose the solution implemented.
We will then walk you through building a portable data lake from scratch, while understanding the trade-offs of using open-source tools in real-world scenarios.
In this workshop, you'll get hands-on experience with a variety of powerful open-source tools that will empower you to build your data lake.
- You'll learn about the current state of the industry and how to sidestep the current limitations.
- We will compare our options building with Iceberg, Delta, or different stacks altogether.
- Finally, we will choose a stack that's not currently vendor locked and build a functional portable data lake.
- With dlt, parquet and DuckDB we will manage our data loading and storage.
- Explore using Ibis as an embedded catalog and explore the benefits of this approach.
- We explore how Polars fits in this stack to accelerate data exploration.
- Finally, we will explore how to make this data accessible to other compute engines.