From 0be040543bb3419abd8f71990fa08b28e266a922 Mon Sep 17 00:00:00 2001 From: Luke Kim <80174+lukekim@users.noreply.github.com> Date: Mon, 7 Oct 2024 16:14:51 -0700 Subject: [PATCH] Add Arrow Data Accelerator documentation (#443) * Add Arrow Data Accelerator documentation Fixes #223 Add documentation for the In-Memory Arrow Data Accelerator. --- .../components/data-accelerators/arrow.md | 33 +++++++++++++++++++ .../components/data-accelerators/index.md | 8 ++--- 2 files changed, 37 insertions(+), 4 deletions(-) create mode 100644 spiceaidocs/docs/components/data-accelerators/arrow.md diff --git a/spiceaidocs/docs/components/data-accelerators/arrow.md b/spiceaidocs/docs/components/data-accelerators/arrow.md new file mode 100644 index 000000000..17e2e2503 --- /dev/null +++ b/spiceaidocs/docs/components/data-accelerators/arrow.md @@ -0,0 +1,33 @@ +--- +title: 'In-Memory Arrow Data Accelerator' +sidebar_label: 'In-Memory Arrow Data Accelerator' +description: 'In-Memory Arrow Data Accelerator Documentation' +sidebar_position: 1 +--- + +The In-Memory Arrow Data Accelerator is the default data accelerator in Spice. It uses Apache Arrow to store data in-memory for fast access and query performance. + +## Configuration + +To use the In-Memory Arrow Data Accelerator, specify `arrow` as the `engine` for acceleration. + +```yaml +datasets: + - from: spice.ai:path.to.my_dataset + name: my_dataset + acceleration: + engine: arrow +``` + +## Limitations + +- The In-Memory Arrow Data Accelerator does not support persistent storage. Data is stored in-memory and will be lost when the Spice runtime is stopped. +- The In-Memory Arrow Data Accelerator does not support `Decimal256` (76 digits), as it exceeds Arrow's maximum Decimal width of 38 digits. + +:::warning[Memory Considerations] + +When accelerating a dataset using the In-Memory Arrow Data Accelerator, some or all of the dataset is loaded into memory. Ensure sufficient memory is available, including overhead for queries and the runtime, especially with concurrent queries. + +In-memory limitations can be mitigated by storing acceleration data on disk, which is supported by [`duckdb`](./duckdb.md) and [`sqlite`](./sqlite.md) accelerators by specifying `mode: file`. + +::: diff --git a/spiceaidocs/docs/components/data-accelerators/index.md b/spiceaidocs/docs/components/data-accelerators/index.md index b84eef92c..52a4fb078 100644 --- a/spiceaidocs/docs/components/data-accelerators/index.md +++ b/spiceaidocs/docs/components/data-accelerators/index.md @@ -30,10 +30,10 @@ Supported Data Accelerators include: | Engine Name | Description | Status | Engine Modes | | --------------------------------- | ----------------------- | ------ | ---------------- | -| `arrow` | In-Memory Arrow Records | Alpha | `memory` | -| [`duckdb`](./duckdb.md) | Embedded DuckDB | Alpha | `memory`, `file` | -| [`sqlite`](./sqlite.md) | Embedded SQLite | Alpha | `memory`, `file` | -| [`postgres`](./postgres/index.md) | Attached PostgreSQL | Alpha | | +| [`arrow`](./arrow.md) | In-Memory Arrow Records | Beta | `memory` | +| [`duckdb`](./duckdb.md) | Embedded DuckDB | Beta | `memory`, `file` | +| [`sqlite`](./sqlite.md) | Embedded SQLite | Beta | `memory`, `file` | +| [`postgres`](./postgres/index.md) | Attached PostgreSQL | Beta | | ## Data Types