Skip to content

Commit

Permalink
feat(athena): add amazon athena backend
Browse files Browse the repository at this point in the history
  • Loading branch information
cpcloud committed Dec 31, 2024
1 parent bb0f354 commit 4f7404e
Show file tree
Hide file tree
Showing 55 changed files with 1,927 additions and 147 deletions.
14 changes: 13 additions & 1 deletion .github/workflows/ibis-backends-cloud.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ jobs:
title: Databricks
extras:
- --extra databricks
- name: athena
title: Amazon Athena
extras:
- --extra athena
include:
- python-version: "3.10"
backend:
Expand Down Expand Up @@ -78,7 +82,7 @@ jobs:
# rate limit while restricting GITHUB_TOKEN permissions elsewhere
permissions:
contents: "read"
# required for GCP workload identity federation
# required for workload identity federation
id-token: "write"

steps:
Expand Down Expand Up @@ -127,6 +131,7 @@ jobs:
run: just download-data

- uses: google-github-actions/auth@v2
if: matrix.backend.name == 'bigquery'
with:
project_id: "ibis-gbq"
workload_identity_provider: "${{ vars.WIF_PROVIDER_NAME }}"
Expand Down Expand Up @@ -164,6 +169,13 @@ jobs:
SNOWFLAKE_SCHEMA: ${{ secrets.SNOWFLAKE_SCHEMA }}
SNOWFLAKE_WAREHOUSE: ${{ secrets.SNOWFLAKE_WAREHOUSE }}

- name: setup aws credentials
if: matrix.backend.name == 'athena'
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: us-east-2
role-to-assume: arn:aws:iam::070284473168:role/ibis-project-athena

- name: enable snowpark testing
if: matrix.backend.key == 'snowpark'
run: echo "SNOWFLAKE_SNOWPARK=1" >> "$GITHUB_ENV"
Expand Down
117 changes: 117 additions & 0 deletions docs/backends/athena.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Amazon Athena

[https://aws.amazon.com/athena](https://aws.amazon.com/athena/)

## Install

Install Ibis and dependencies for the Athena backend:

::: {.panel-tabset}

## `pip`

Install with the `athena` extra:

```{.bash}
pip install 'ibis-framework[athena]'
```

And connect:

```{.python}
import ibis
con = ibis.athena.connect(s3_staging_dir="s3://...") # <1>
```

::: {.callout-note}
## At a **minimum**, the `s3_staging_dir` argument must be provided.

This argument tells the underlying driver library--`pyathena`--and ultimately
Athena itself where to dump query results.
:::

1. Adjust other connection parameters as needed.

## `conda`

Install for Athena:

```{.bash}
conda install -c conda-forge ibis-athena
```

```{.python}
import ibis
con = ibis.athena.connect(s3_staging_dir="s3://...") # <1>
```

::: {.callout-note}
## At a **minimum**, the `s3_staging_dir` argument must be provided.

This argument tells the underlying driver library--`pyathena`--and ultimately
Athena itself where to dump query results.
:::

1. Adjust other connection parameters as needed.

## `mamba`

Install for Athena:

```{.bash}
mamba install -c conda-forge ibis-athena
```

```{.python}
import ibis
con = ibis.athena.connect(s3_staging_dir="s3://my-bucket/") # <1>
```

::: {.callout-note}
## At a **minimum**, the `s3_staging_dir` argument must be provided.

This argument tells the underlying driver library--`pyathena`--and ultimately
Athena itself where to dump query results.
:::

1. Adjust other connection parameters as needed.

:::

## Connect

### `ibis.athena.connect`

```python
con = ibis.athena.connect(
s3_staging_dir="s3://my-bucket/",
)
```

::: {.callout-note}
## At a **minimum**, the `s3_staging_dir` argument must be provided.

This argument tells the underlying driver
library--[`pyathena`](https://laughingman7743.github.io/PyAthena/)--and
ultimately Athena itself where to dump query results.
:::

### Connection Parameters

```{python}
#| echo: false
#| output: asis
from _utils import render_do_connect
render_do_connect("athena")
```

```{python}
#| echo: false
BACKEND = "Athena"
```

{{< include ./_templates/api.qmd >}}
28 changes: 19 additions & 9 deletions docs/backends/support/cloud_support_policy.qmd
Original file line number Diff line number Diff line change
@@ -1,16 +1,26 @@
# Cloud backend support policy

Ibis supports a number of proprietary cloud backends.
Ibis supports and tests some of the major proprietary cloud backends:

Snowflake and Databricks cost money to test in our continuous integration test suite.
- Google BigQuery
- Snowflake
- Databricks Cloud
- Amazon Athena

If funding dries up, support for these backends will become best effort.
**All of these services cost money to run in our continuous integration test suite.**

If you're interested in ensuring continued support for a cloud backend, please
consider sponsoring Ibis development.
If funding for these services dries up, support for these backends will move to
best effort.

The cost at the time of writing (2024-10-31) is about **$5,000 USD per year**
split evenly between the Snowflake and Databricks backend.
If you or your company are interested in ensuring continued maintenance for one
of these backends, please get in touch with us on
[Zulip](https://ibis-project.zulipchat.com/), or [open a GitHub
issue](https://github.com/ibis-project/ibis/issues/new) and consider sponsoring
Ibis development.

Google has very generously supported the entire cost of testing the BigQuery
backend for a number of years.
## Current cost estimates

- Google BigQuery: free, donated very generously by Google
- Snowflake: $2500 USD per year
- Databricks Cloud: $2500 USD per year
- Amazon Athena: ?
1 change: 1 addition & 0 deletions flake.nix
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@
inherit shellHook;

PYSPARK_PYTHON = "${env}/bin/python";
AWS_PROFILE = "ibis-testing";

# needed for mssql+pyodbc
ODBCSYSINI = pkgs.writeTextDir "odbcinst.ini" ''
Expand Down
Loading

0 comments on commit 4f7404e

Please sign in to comment.