Skip to content

Commit

Permalink
docs: Clarify S3 and ODBC docs (#646)
Browse files Browse the repository at this point in the history
  • Loading branch information
peasee authored Nov 20, 2024
1 parent c82314e commit 5fe59a8
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 14 deletions.
23 changes: 11 additions & 12 deletions spiceaidocs/docs/components/data-connectors/odbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@ sidebar_label: 'ODBC Data Connector'
description: 'ODBC Data Connector Documentation'
---

ODBC (Open Database Connectivity) is a standard API that allows applications to connect to and interact with various database management systems using a common interface. To connect to any ODBC database for federated/accelerated SQL queries, specify `odbc` as the selector in the `from` value for the dataset. The `odbc_connection_string` parameter is required.

ODBC (Open Database Connectivity) is a standard API that allows applications to connect to and interact with various database management systems using a common interface. To connect to any ODBC database for federated/accelerated SQL queries, specify `odbc` as the selector in the `from` value for the dataset. The `odbc_connection_string` parameter is required.

:::warning

Expand All @@ -14,11 +13,13 @@ Spice must be [built with the `odbc` feature](#building-spice-with-odbc), and th
Alternatively, use the official Spice Docker image. To use the official Spice Docker image from [DockerHub](https://hub.docker.com/r/spiceai/spiceai):

# Pull the latest official Spice image

```bash
docker pull spiceai/spiceai:latest
```

# Pull the official v0.20.0-beta Spice image

```bash
docker pull spiceai/spiceai:0.20.0-beta
```
Expand Down Expand Up @@ -91,6 +92,7 @@ The `from` field takes the form `odbc:path.to.my.dataset` where `path.to.my.data
The dataset name. This will be used as the table name within Spice.

Example:

```yaml
datasets:
- from: odbc:my.cool.table
Expand All @@ -113,15 +115,13 @@ SELECT COUNT(*) FROM cool_dataset;

### `params`

The following [arrow_odbc builder parameters](https://docs.rs/arrow-odbc/latest/arrow_odbc/struct.OdbcReaderBuilder.html) are exposed as params:

| Parameter | Type | Description |
| ----------------------------- | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `sql_dialect` | string | Override what SQL dialect is used for the ODBC connection. Supports `postgresql`, `mysql`, `sqlite`, `athena` or `databricks` values. Default is unset (auto-detected). |
| `odbc_max_bytes_per_batch` | number (bytes) | Upper allocation limit for transit buffer. Default is `512_000_000`. |
| `odbc_max_num_rows_per_batch` | number (rows) | Upper limit for number of rows fetched for one batch. Default is `65536`. |
| `odbc_max_text_size` | number (bytes) | Upper limit for value buffers bound to columns with text values. Default is unset (allocates driver-reported max column size). |
| `odbc_max_binary_size` | number (bytes) | Upper limit for value buffers bound to columns with binary values. Default is unset (allocates driver-reported max column size). |
| `odbc_max_bytes_per_batch` | number (bytes) | Maximum number of bytes transferred in each query record batch. A lower value may improve performance on low-memory systems. Default is `512_000_000`. |
| `odbc_max_num_rows_per_batch` | number (rows) | Maximum number of rows transferred in each query record batch. A higher value may speed up query results, but requires more memory in conjunction with `odbc_max_bytes_per_batch`. Default is `65536`. |
| `odbc_max_text_size` | number (bytes) | A limit for the maximum size of text columns transmitted between the ODBC driver and the Runtime. Default is unset (allocates driver-reported max column size). |
| `odbc_max_binary_size` | number (bytes) | A limit for the maximum size of binary columns transmitted between the ODBC driver and the Runtime. Default is unset (allocates driver-reported max column size). |
| `odbc_connection_string` | string | Connection string to use to connect to the ODBC server |

```yaml
Expand All @@ -134,9 +134,9 @@ datasets:

## Selecting SQL Dialect

The default SQL dialect may not be supported by every ODBC connection. The `sql_dialect` parameter allows overriding the selected SQL dialect for a specified connection.
The default SQL dialect may not be supported by every ODBC connection. The `sql_dialect` parameter supports overriding the selected SQL dialect for a specified connection.

The runtime will attempt to detect the dialect to use for a connection based on the contents of `Driver=` in the `odbc_connection_string`. The runtime will usually detect the correct SQL dialect for the following connection types:
The runtime will attempt to detect the dialect to use for a connection based on the contents of `Driver=` in the `odbc_connection_string`. The runtime will detect the correct SQL dialect for the following connection types, when setup with a standard driver configuration:

- PostgreSQL
- MySQL
Expand Down Expand Up @@ -169,7 +169,6 @@ docker pull spiceai/spiceai:latest
docker pull spiceai/spiceai:0.20.0-beta
```


## Baking an image with ODBC Support

There are many dozens of ODBC adapters; this recipe covers making a custom image and configuring it to work with Spice.
Expand Down Expand Up @@ -331,4 +330,4 @@ See the [ODBC Quickstart](https://github.com/spiceai/quickstarts/blob/trunk/odbc

## Secrets

Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
4 changes: 2 additions & 2 deletions spiceaidocs/docs/components/data-connectors/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: 'S3 Data Connector Documentation'

The S3 Data Connector enables federated SQL querying on files stored in S3 or S3-compatible systems (e.g., MinIO, Cloudflare R2).

If a folder is provided, all child files will be loaded.
If a folder path is specified as the dataset source, all files within the folder will be loaded.

File formats are specified using the `file_format` parameter, as described in [Object Store File Formats](/components/data-connectors/index.md#object-store-file-formats).

Expand Down Expand Up @@ -56,7 +56,7 @@ SELECT COUNT(*) FROM cool_dataset;

| Parameter Name | Description |
| --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `file_format` | Specifies the data format. Required if not inferrable from from. Options: `parquet`, `csv`, `json`. |
| `file_format` | Specifies the data format. Required if it cannot be inferred from the object URI. Options: `parquet`, `csv`, `json`. |
| `s3_endpoint` | S3 endpoint URL (e.g., for MinIO). Default is the region endpoint. E.g. `s3_endpoint: https://my.minio.server` |
| `s3_region` | S3 bucket region. Default: `us-east-1`. |
| `client_timeout` | Timeout for S3 operations. Default: `30s`. |
Expand Down

0 comments on commit 5fe59a8

Please sign in to comment.