docs: Clarify S3 and ODBC docs (#646)

spiceai · Nov 20, 2024 · 5fe59a8 · 5fe59a8
1 parent c82314e
commit 5fe59a8
Show file tree

Hide file tree

Showing 2 changed files with 13 additions and 14 deletions.
diff --git a/spiceaidocs/docs/components/data-connectors/odbc.md b/spiceaidocs/docs/components/data-connectors/odbc.md
@@ -4,8 +4,7 @@ sidebar_label: 'ODBC Data Connector'
 description: 'ODBC Data Connector Documentation'
 ---
 
-ODBC (Open Database Connectivity) is a standard API that allows applications to connect to and interact with various database management systems using a common interface. To connect to any ODBC database for federated/accelerated SQL queries, specify `odbc` as the selector in the `from` value for the dataset. The `odbc_connection_string` parameter is required. 
-
+ODBC (Open Database Connectivity) is a standard API that allows applications to connect to and interact with various database management systems using a common interface. To connect to any ODBC database for federated/accelerated SQL queries, specify `odbc` as the selector in the `from` value for the dataset. The `odbc_connection_string` parameter is required.
 
 :::warning
 
@@ -14,11 +13,13 @@ Spice must be [built with the `odbc` feature](#building-spice-with-odbc), and th
 Alternatively, use the official Spice Docker image. To use the official Spice Docker image from [DockerHub](https://hub.docker.com/r/spiceai/spiceai):
 
 # Pull the latest official Spice image
+
 ```bash
 docker pull spiceai/spiceai:latest
 ```
 
 # Pull the official v0.20.0-beta Spice image
+
 ```bash
 docker pull spiceai/spiceai:0.20.0-beta
 ```
@@ -91,6 +92,7 @@ The `from` field takes the form `odbc:path.to.my.dataset` where `path.to.my.data
 The dataset name. This will be used as the table name within Spice.
 
 Example:
+
 ```yaml
 datasets:
   - from: odbc:my.cool.table
@@ -113,15 +115,13 @@ SELECT COUNT(*) FROM cool_dataset;
 
 ### `params`
 
-The following [arrow_odbc builder parameters](https://docs.rs/arrow-odbc/latest/arrow_odbc/struct.OdbcReaderBuilder.html) are exposed as params:
-
 | Parameter                     | Type           | Description                                                                                                                                                             |
 | ----------------------------- | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `sql_dialect`                 | string         | Override what SQL dialect is used for the ODBC connection. Supports `postgresql`, `mysql`, `sqlite`, `athena` or `databricks` values. Default is unset (auto-detected). |
-| `odbc_max_bytes_per_batch`    | number (bytes) | Upper allocation limit for transit buffer. Default is `512_000_000`.                                                                                                    |
-| `odbc_max_num_rows_per_batch` | number (rows)  | Upper limit for number of rows fetched for one batch. Default is `65536`.                                                                                               |
-| `odbc_max_text_size`          | number (bytes) | Upper limit for value buffers bound to columns with text values. Default is unset (allocates driver-reported max column size).                                          |
-| `odbc_max_binary_size`        | number (bytes) | Upper limit for value buffers bound to columns with binary values. Default is unset (allocates driver-reported max column size).                                        |
+| `odbc_max_bytes_per_batch`    | number (bytes) | Maximum number of bytes transferred in each query record batch. A lower value may improve performance on low-memory systems. Default is `512_000_000`.                                                                                                    |
+| `odbc_max_num_rows_per_batch` | number (rows)  | Maximum number of rows transferred in each query record batch. A higher value may speed up query results, but requires more memory in conjunction with `odbc_max_bytes_per_batch`. Default is `65536`.                                                                                               |
+| `odbc_max_text_size`          | number (bytes) | A limit for the maximum size of text columns transmitted between the ODBC driver and the Runtime. Default is unset (allocates driver-reported max column size).                                          |
+| `odbc_max_binary_size`        | number (bytes) | A limit for the maximum size of binary columns transmitted between the ODBC driver and the Runtime. Default is unset (allocates driver-reported max column size).                                        |
 | `odbc_connection_string`      | string         | Connection string to use to connect to the ODBC server                                                                                                                  |
 
 ```yaml
@@ -134,9 +134,9 @@ datasets:
 
 ## Selecting SQL Dialect
 
-The default SQL dialect may not be supported by every ODBC connection. The `sql_dialect` parameter allows overriding the selected SQL dialect for a specified connection.
+The default SQL dialect may not be supported by every ODBC connection. The `sql_dialect` parameter supports overriding the selected SQL dialect for a specified connection.
 
-The runtime will attempt to detect the dialect to use for a connection based on the contents of `Driver=` in the `odbc_connection_string`. The runtime will usually detect the correct SQL dialect for the following connection types:
+The runtime will attempt to detect the dialect to use for a connection based on the contents of `Driver=` in the `odbc_connection_string`. The runtime will detect the correct SQL dialect for the following connection types, when setup with a standard driver configuration:
 
 - PostgreSQL
 - MySQL
@@ -169,7 +169,6 @@ docker pull spiceai/spiceai:latest
 docker pull spiceai/spiceai:0.20.0-beta
 ```
 
-
 ## Baking an image with ODBC Support
 
 There are many dozens of ODBC adapters; this recipe covers making a custom image and configuring it to work with Spice.
@@ -331,4 +330,4 @@ See the [ODBC Quickstart](https://github.com/spiceai/quickstarts/blob/trunk/odbc
 
 ## Secrets
 
-Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
+Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
diff --git a/spiceaidocs/docs/components/data-connectors/s3.md b/spiceaidocs/docs/components/data-connectors/s3.md
@@ -6,7 +6,7 @@ description: 'S3 Data Connector Documentation'
 
 The S3 Data Connector enables federated SQL querying on files stored in S3 or S3-compatible systems (e.g., MinIO, Cloudflare R2).
 
-If a folder is provided, all child files will be loaded.
+If a folder path is specified as the dataset source, all files within the folder will be loaded.
 
 File formats are specified using the `file_format` parameter, as described in [Object Store File Formats](/components/data-connectors/index.md#object-store-file-formats).
 
@@ -56,7 +56,7 @@ SELECT COUNT(*) FROM cool_dataset;
 
 | Parameter Name              | Description                                                                                                                                                   |
 | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `file_format`               | Specifies the data format. Required if not inferrable from from. Options: `parquet`, `csv`, `json`.                                                           |
+| `file_format`               | Specifies the data format. Required if it cannot be inferred from the object URI. Options: `parquet`, `csv`, `json`.                                                           |
 | `s3_endpoint`               | S3 endpoint URL (e.g., for MinIO). Default is the region endpoint. E.g. `s3_endpoint: https://my.minio.server`                                                |
 | `s3_region`                 | S3 bucket region. Default: `us-east-1`.                                                                                                                       |
 | `client_timeout`            | Timeout for S3 operations. Default: `30s`.                                                                                                                    |