Skip to content

Commit

Permalink
split roadmap vs voda support
Browse files Browse the repository at this point in the history
  • Loading branch information
lostmygithubaccount committed Feb 2, 2024
1 parent 6dcedd6 commit 9c7ccc9
Show file tree
Hide file tree
Showing 5 changed files with 209 additions and 171 deletions.
180 changes: 9 additions & 171 deletions docs/posts/roadmap-2024-H1/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Ibis project 2024 roadmap"
author: "Cody"
date: "2024-02-05"
thumbnail: commits.png
image: commits.png
categories:
- blog
- roadmap
Expand All @@ -11,173 +11,10 @@ categories:

## Overview

The Ibis project is an [independently governed open-source
community](https://github.com/ibis-project/governance) that builds the portable
Python dataframe library. It is primarily backed by [Voltron
Data](https://voltrondata.com) but has
[contributors](https://github.com/ibis-project/ibis/graphs/contributors) across
a range of data companies and institutions.

This is the first public roadmap for the Ibis project. Thus, there's a bit of
backstory to explain how we got here and setup the context for where we're
going. Feel free to [skip ahead to the roadmap](#roadmap) if you're already
familiar with Ibis.

## Background

The Ibis project was started in 2015 by [Wes McKinney](https://wesmckinney.com),
the creator of pandas, as a pandas-like interface to Apache Impala. It received
improvements and support over the years, but really took off under the
stewardship of [Phillip Cloud](https://github.com/cpcloud) and the [current Ibis
team at Voltron Data](#who-are-the-core-contributors). It now supports 20+
backends and is improving rapidly. It's never been a
better time to get involved with Ibis.

You can see the inflection point in the number of commits to the repository in
early 2022:

![Ibis commits over time](commits.png)

### Who am I?

My name is Cody and I'm employed by Voltron Data to work on Ibis full-time as a
Technical Product Manager. I am an Ibis committer and have contributed Delta
Lake table input/output methods, helped moved the documentation over to
[Quarto](https://quarto.org), and created the [Zulip
chat](https://ibis-project.zulipchat.com) for the community.

My job is to help the Ibis community grow and thrive. I have a background in ML
(especially MLOps) and data products. Ibis solves many challenges I've seen in
the data space and I'm excited to help increase its adoption as a standard
Python frontend for dozens of data backends to reduce friction in the data
ecosystem.

### Why does Voltron Data (VoDa) support Ibis?

Why does Voltron Data employ a Technical Product Manager to work on Ibis
full-time? Why does Voltron Data employ five software engineers to work on Ibis
full-time? Great questions!

::: {.callout-note}
To understand Voltron Data -- or if you're generally interested in learning
about the composable data ecosystem -- check out the [The Composable
Codex by Voltron Data](https://voltrondata.com/codex).
:::

Voltron Data is a for-profit company building
[Theseus](https://voltrondata.com/theseus), an accelerator-native query engine
for the composable data ecosystem. On its own, Theseus is fairly bare bones -- it
needs storage below it, a frontend above it, and a bunch of other components to
connect things together. Voltron Data's founders and engineers are very
experienced with open-source software and pioneers of the composable data
ecosystem. Voltron Data supports Ibis because it is the Python frontend for
Theseus and can act as a standard Python dataframe API for **any** backend,
whether you're querying a CSV file on your laptop, running thousand-node Spark
jobs on a cluster in the cloud, or doing cutting-edge work on-premise with GPU
clusters. With Ibis, you can write your experimentation code for your laptop and
scale up to Theseus (or any other backend) seamlessly.

::: {.callout-note collapse="true" title="Why not the pandas API?"}
This is a great, and natural, question -- if Voltron Data wants a standard
Python dataframe API, why not just use pandas? The reason is relatively simple:
the pandas API inherently does not scale. This is largely due to the expectation
of ordered results and the index. pandas is implemented for single-threaded
execution and has a lot of baggage when it comes to distributed execution. While
projects like Modin and pandas on Spark (formerly Koalas) attempt to scale the
pandas API, any project that attempts the feat is doomed to a dubious support
matrix of operations.

Instead, Wes McKinney envisioned Ibis as a portable Python dataframe where the
API is decoupled from the execution engine. Ibis code scales to the backend it
is connected to. Any other Python dataframe library locks you into its execution
engine. While they may claim to be easy to migrate to, this is rarely the case.
The founders of Voltron Data experienced these pains with the pandas API
themselves in previous efforts, including cuDF. For Theseus and as an
open-source standard, we believe Ibis is the right approach.

Instead of using Snowpark Python for Snowflake, you can use Ibis on Snowflake.
Instead of using PySpark or pandas on Spark, you can use Ibis on Spark. Instead
of using the pandas API on BigQuery (built on top of Ibis), you can use Ibis on
BigQuery. Instead of using PyStarburst on Starburst Galaxy, you can use Ibis on
Starburst Galaxy. Instead of using the Polars Python on the Polars execution
engine, you can use Ibis on Polars. Instead of using DataFusion Python on
DataFusion execution engine, you can use Ibis on DataFusion. Instead of
executing SQL strings on DuckDB through the Python client, you can use Ibis on
DuckDB. And so on...

Ibis brings a Python dataframe interface to data platforms that only have SQL,
and brings a standard Python dataframe interface to data platforms that have
their own Python dataframe interface. It is the only portable Python dataframe
that can serve as a standard across the data ecosystem.
:::

Voltron Data supports Ibis for the same reason it supports Apache Arrow.
Open-source standards make it easier to build the composable data ecosystem and
reduce friction for data teams looking to swap out components as their needs
change. If you're already using Ibis as your frontend, Apache Arrow for data
interchange, and Substrait for intermediary representation, you can swap in
Theseus with ease!

::: {.callout-important}
[Ibis is independently governed](https://github.com/ibis-project/governance) and
not owned by Voltron Data. While currently four out of five members of the
steering committee are employed by Voltron Data (the fifth being at Alphabet
working on Google BigQuery), we expect this to change over time as more
organizations join the project. **Ibis will never be solely available for VoDa
products.**

Voltron Data also welcomes this dilution of power and influence! A healthy
open-source project is one that is not controlled by a single entity. This is
true of [Apache Arrow](https://arrow.apache.org) and other open-source projects
that Voltron Data employees have been instrumental in building.
:::

### Who are the core contributors?

The core contributors working full-time on Ibis are employed at Voltron Data,
with deep experience on successful open-source projects including pandas,
Apache Arrow, Dask, and more. Everything in the Ibis project is made possible by
their hard work! They are:

- [**Gil Forsyth**](https://github.com/gforsyth): long-time Ibis contributor and
primary maintainer of the `ibis-substrait` package
- [**Jim Crist-Harif**](https://github.com/jcrist): the engineering manager for
the Ibis team at VoDa
- [**Krisztián Szűcs**](https://github.com/kszucs): long-time Ibis contributor
and primary author of the precursor to [the big refactor](#the-big-refactor)
- [**Naty Clementi**](https://github.com/ncclementi): newest member of the Ibis
team at VoDa recently focusing on [geospatial support in
DuckDB](#geospatial-improvements)
- [**Phillip Cloud**](https://github.com/cpcloud): the tech lead for the Ibis
team at VoDa

If you're interacting with us on GitHub or Zulip, you'll definitely run into at
least one of them! They make the Ibis project the delightful software it is
today and are always happy to help.

### Who else supports Ibis?

Anybody who contributes to Ibis is a supporter of Ibis! You can contribute by
[opening an issue](https://github.com/ibis-project/ibis/issues), [submitting a
pull request](https://github.com/ibis-project/ibis/pulls), [using Ibis in your
project](https://github.com/ibis-project/ibis/network/dependents), or [joining
the Zulip chat](https://ibis-project.zulipchat.com) to discuss problems or
ideas.

Notable organizations that support Ibis include:

- [**Claypot AI**](https://www.claypot.ai/): contributing the Apache Flink
backend
- [**Exasol**](https://www.exasol.com/): contributing the Exasol backend
- [**Google's BigQuery
DataFrames**](https://github.com/googleapis/python-bigquery-dataframes): a
pandas API for BigQuery built on top of Ibis
- [**RisingWave**](https://risingwave.com/): contributing the RisingWave backend
- [**Starburst
Galaxy**](https://www.starburst.io/blog/introducing-python-dataframes/):
supporting Ibis alongside their native PyStarburst dataframes
- [**SuperDuperDB**](https://github.com/SuperDuperDB/superduperdb): bringing AI
to any database Ibis supports
Welcome to the first public roadmap for the Ibis project! If you aren't familiar
with the background of Ibis or who supports it nowadays, we recommend reading
[why Voltron Data supports Ibis](../why-voda-supports-ibis/index.qmd) before the
roadmap below.

## 2024 roadmap

Expand All @@ -200,9 +37,10 @@ disorganization as we get it up and running efficiently. In general, we have:
- **Label-specific views**: consisting of issues for specific labels, like
documentation or a large refactor

Right now, the team at Voltron Data sets the roadmap and priorities. Over time
as more contributors and organizations join the project, we expect this process
to diversify and become more community-driven.
Right now, [the team at Voltron Data](../why-voda-supports-ibis/index.qmd) sets
the roadmap and priorities. Over time as more contributors and organizations
join the project, we expect this process to diversify and become more
community-driven.

### Overall themes

Expand Down
Binary file added docs/posts/why-voda-supports-ibis/commits.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 9c7ccc9

Please sign in to comment.