Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Release DuckDB RC #629

Merged
merged 2 commits into from
Nov 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions spiceaidocs/docs/components/data-connectors/duckdb.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ The DuckDB data connector can be configured by providing the following `params`:

Configuration `params` are provided either in the top level `dataset` for a dataset source, or in the `acceleration` section for a data store.

The DuckDB data connector supports specifying an [`invalid_type_action` dataset parameter](../../reference/spicepod/datasets.md#invalid_type_action), modifying the behavior of the Runtime when a data type the connector does not support is encountered.

A generic example of DuckDB data connector configuration.

```yaml
Expand Down
2 changes: 1 addition & 1 deletion spiceaidocs/docs/components/data-connectors/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Currently supported Data Connectors include:

| Name | Description | Status | Protocol/Format | Refresh Modes | Supports [Ingestion][ingestion] | Supports Documents |
| --------------- | ------------------------- | ----------------- | ----------------------------------- | --------------------------- | ------------------------------- | ------------------ |
| `duckdb` | DuckDB | Release Candidate | | `append`, `full` | ❌ | ❌ |
| `github` | GitHub | Release Candidate | GraphQL, REST | `append`, `full` | ❌ | ❌ |
| `mysql` | MySQL | Release Candidate | | `append`, `full` | Roadmap | ❌ |
| `postgres` | PostgreSQL | Release Candidate | | `append`, `full` | Roadmap | ❌ |
Expand All @@ -26,7 +27,6 @@ Currently supported Data Connectors include:
| `clickhouse` | Clickhouse | Alpha | | `append`, `full` | ❌ | ❌ |
| `debezium` | Debezium | Alpha | CDC, Kafka | `append`, `full`, `changes` | ❌ | ❌ |
| `dremio` | Dremio | Alpha | Arrow Flight SQL | `append`, `full` | ❌ | ❌ |
| `duckdb` | DuckDB | Alpha | | `append`, `full` | ❌ | ❌ |
| `file` | File | Alpha | Parquet, CSV | `append`, `full` | Roadmap | ✅ |
| `ftp`, `sftp` | FTP/SFTP | Alpha | Parquet, CSV | `append`, `full` | ❌ | ✅ |
| `graphql` | GraphQL | Alpha | GraphQL | `append`, `full` | ❌ | ❌ |
Expand Down
24 changes: 22 additions & 2 deletions spiceaidocs/docs/reference/spicepod/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,25 @@ Spice emits a warning if the `time_column` from the data source is incompatible
:::warning[Limitations]

- String-based columns are assumed to be ISO8601 format.

:::

## `invalid_type_action`

Optional. Specifies the action to take when a data type that is not supported by the data connector is encountered.

The following values are supported:

- `error` - Default. Return an error when an unsupported data type is encountered.
- `warn` - Log a warning and ignore the column containing the unsupported data type.
- `ignore` - Log nothing and ignore the column containing the unsupported data type.

:::warning[Limitations]

Not all connectors support specifying an `invalid_type_action`. When specified on a connector that does not support the option, the connector will fail to register. The following connectors support `invalid_type_action`:

- [DuckDB](../../components/data-connectors/duckdb.md)

:::

## `acceleration`
Expand Down Expand Up @@ -196,6 +215,7 @@ Must be of the form `SELECT * FROM {name} WHERE {refresh_filter}`. `{name}` is t
- The refresh SQL only supports filtering data from the current dataset - joining across other datasets is not supported.
- Selecting a subset of columns isn't supported - the refresh SQL needs to start with `SELECT * FROM {name}`.
- Queries for data that have been filtered out will not fall back to querying against the federated table.

:::

## `acceleration.refresh_data_window`
Expand Down Expand Up @@ -230,8 +250,8 @@ Optional. Defines the maximum number of retry attempts when refresh retries are

Supports one of two values:

* `on_registration`: Mark the dataset as ready immediately, and queries on this table will fall back to the underlying source directly until the initial acceleration is complete
* `on_load`: Mark the dataset as ready only after the initial acceleration. Queries against the dataset will return an error before the load has been completed.
- `on_registration`: Mark the dataset as ready immediately, and queries on this table will fall back to the underlying source directly until the initial acceleration is complete
- `on_load`: Mark the dataset as ready only after the initial acceleration. Queries against the dataset will return an error before the load has been completed.

```yaml
datasets:
Expand Down