From 5fe59a8a39706e974fdf3a6b9b3daedc6fee53a8 Mon Sep 17 00:00:00 2001 From: peasee <98815791+peasee@users.noreply.github.com> Date: Wed, 20 Nov 2024 16:01:49 +1000 Subject: [PATCH] docs: Clarify S3 and ODBC docs (#646) --- .../docs/components/data-connectors/odbc.md | 23 +++++++++---------- .../docs/components/data-connectors/s3.md | 4 ++-- 2 files changed, 13 insertions(+), 14 deletions(-) diff --git a/spiceaidocs/docs/components/data-connectors/odbc.md b/spiceaidocs/docs/components/data-connectors/odbc.md index dfee8fc1..50beb3cb 100644 --- a/spiceaidocs/docs/components/data-connectors/odbc.md +++ b/spiceaidocs/docs/components/data-connectors/odbc.md @@ -4,8 +4,7 @@ sidebar_label: 'ODBC Data Connector' description: 'ODBC Data Connector Documentation' --- -ODBC (Open Database Connectivity) is a standard API that allows applications to connect to and interact with various database management systems using a common interface. To connect to any ODBC database for federated/accelerated SQL queries, specify `odbc` as the selector in the `from` value for the dataset. The `odbc_connection_string` parameter is required. - +ODBC (Open Database Connectivity) is a standard API that allows applications to connect to and interact with various database management systems using a common interface. To connect to any ODBC database for federated/accelerated SQL queries, specify `odbc` as the selector in the `from` value for the dataset. The `odbc_connection_string` parameter is required. :::warning @@ -14,11 +13,13 @@ Spice must be [built with the `odbc` feature](#building-spice-with-odbc), and th Alternatively, use the official Spice Docker image. To use the official Spice Docker image from [DockerHub](https://hub.docker.com/r/spiceai/spiceai): # Pull the latest official Spice image + ```bash docker pull spiceai/spiceai:latest ``` # Pull the official v0.20.0-beta Spice image + ```bash docker pull spiceai/spiceai:0.20.0-beta ``` @@ -91,6 +92,7 @@ The `from` field takes the form `odbc:path.to.my.dataset` where `path.to.my.data The dataset name. This will be used as the table name within Spice. Example: + ```yaml datasets: - from: odbc:my.cool.table @@ -113,15 +115,13 @@ SELECT COUNT(*) FROM cool_dataset; ### `params` -The following [arrow_odbc builder parameters](https://docs.rs/arrow-odbc/latest/arrow_odbc/struct.OdbcReaderBuilder.html) are exposed as params: - | Parameter | Type | Description | | ----------------------------- | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `sql_dialect` | string | Override what SQL dialect is used for the ODBC connection. Supports `postgresql`, `mysql`, `sqlite`, `athena` or `databricks` values. Default is unset (auto-detected). | -| `odbc_max_bytes_per_batch` | number (bytes) | Upper allocation limit for transit buffer. Default is `512_000_000`. | -| `odbc_max_num_rows_per_batch` | number (rows) | Upper limit for number of rows fetched for one batch. Default is `65536`. | -| `odbc_max_text_size` | number (bytes) | Upper limit for value buffers bound to columns with text values. Default is unset (allocates driver-reported max column size). | -| `odbc_max_binary_size` | number (bytes) | Upper limit for value buffers bound to columns with binary values. Default is unset (allocates driver-reported max column size). | +| `odbc_max_bytes_per_batch` | number (bytes) | Maximum number of bytes transferred in each query record batch. A lower value may improve performance on low-memory systems. Default is `512_000_000`. | +| `odbc_max_num_rows_per_batch` | number (rows) | Maximum number of rows transferred in each query record batch. A higher value may speed up query results, but requires more memory in conjunction with `odbc_max_bytes_per_batch`. Default is `65536`. | +| `odbc_max_text_size` | number (bytes) | A limit for the maximum size of text columns transmitted between the ODBC driver and the Runtime. Default is unset (allocates driver-reported max column size). | +| `odbc_max_binary_size` | number (bytes) | A limit for the maximum size of binary columns transmitted between the ODBC driver and the Runtime. Default is unset (allocates driver-reported max column size). | | `odbc_connection_string` | string | Connection string to use to connect to the ODBC server | ```yaml @@ -134,9 +134,9 @@ datasets: ## Selecting SQL Dialect -The default SQL dialect may not be supported by every ODBC connection. The `sql_dialect` parameter allows overriding the selected SQL dialect for a specified connection. +The default SQL dialect may not be supported by every ODBC connection. The `sql_dialect` parameter supports overriding the selected SQL dialect for a specified connection. -The runtime will attempt to detect the dialect to use for a connection based on the contents of `Driver=` in the `odbc_connection_string`. The runtime will usually detect the correct SQL dialect for the following connection types: +The runtime will attempt to detect the dialect to use for a connection based on the contents of `Driver=` in the `odbc_connection_string`. The runtime will detect the correct SQL dialect for the following connection types, when setup with a standard driver configuration: - PostgreSQL - MySQL @@ -169,7 +169,6 @@ docker pull spiceai/spiceai:latest docker pull spiceai/spiceai:0.20.0-beta ``` - ## Baking an image with ODBC Support There are many dozens of ODBC adapters; this recipe covers making a custom image and configuring it to work with Spice. @@ -331,4 +330,4 @@ See the [ODBC Quickstart](https://github.com/spiceai/quickstarts/blob/trunk/odbc ## Secrets -Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets). \ No newline at end of file +Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets). diff --git a/spiceaidocs/docs/components/data-connectors/s3.md b/spiceaidocs/docs/components/data-connectors/s3.md index 89caf702..1dd7d636 100644 --- a/spiceaidocs/docs/components/data-connectors/s3.md +++ b/spiceaidocs/docs/components/data-connectors/s3.md @@ -6,7 +6,7 @@ description: 'S3 Data Connector Documentation' The S3 Data Connector enables federated SQL querying on files stored in S3 or S3-compatible systems (e.g., MinIO, Cloudflare R2). -If a folder is provided, all child files will be loaded. +If a folder path is specified as the dataset source, all files within the folder will be loaded. File formats are specified using the `file_format` parameter, as described in [Object Store File Formats](/components/data-connectors/index.md#object-store-file-formats). @@ -56,7 +56,7 @@ SELECT COUNT(*) FROM cool_dataset; | Parameter Name | Description | | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `file_format` | Specifies the data format. Required if not inferrable from from. Options: `parquet`, `csv`, `json`. | +| `file_format` | Specifies the data format. Required if it cannot be inferred from the object URI. Options: `parquet`, `csv`, `json`. | | `s3_endpoint` | S3 endpoint URL (e.g., for MinIO). Default is the region endpoint. E.g. `s3_endpoint: https://my.minio.server` | | `s3_region` | S3 bucket region. Default: `us-east-1`. | | `client_timeout` | Timeout for S3 operations. Default: `30s`. |