From a8fe1bd7112305f291ad97ae6dbd238aa31b8f3f Mon Sep 17 00:00:00 2001 From: Phillip LeBlanc Date: Tue, 29 Oct 2024 01:29:51 +0900 Subject: [PATCH 1/4] Add localpod docs --- .../components/data-connectors/localpod.md | 39 +++++++++++++++++++ 1 file changed, 39 insertions(+) create mode 100644 spiceaidocs/docs/components/data-connectors/localpod.md diff --git a/spiceaidocs/docs/components/data-connectors/localpod.md b/spiceaidocs/docs/components/data-connectors/localpod.md new file mode 100644 index 00000000..0039aaa2 --- /dev/null +++ b/spiceaidocs/docs/components/data-connectors/localpod.md @@ -0,0 +1,39 @@ +--- +title: 'Localpod Data Connector' +sidebar_label: 'Localpod Data Connector' +description: 'Localpod Data Connector Documentation' +pagination_prev: null +--- + +The Localpod Data Connector enables setting up a parent/child relationship between datasets in the current Spicepod. This is useful for configuring multiple/tiered accelerations for a single dataset, and ensuring that the data is only downloaded once from the remote source. + +The dataset created by the `localpod` connector will logically have the same data as the parent dataset. + +## Synchronized Refreshes + +The `localpod` connector supports synchronized refreshes, which ensures that the child dataset is refreshed from the same data as the parent dataset. Synchronized refreshes require that both the parent and child datasets are accelerated with `refresh_mode: full` (which is the default). + +When synchronization is enabled, the following logs will be emitted: + +```bash +2024-10-28T15:45:24.220665Z INFO runtime::datafusion: Localpod dataset test_local synchronizing refreshes with parent table test +``` + +### Examples + +```yaml +datasets: +- from: postgres:cleaned_sales_data + name: test + params: + ... + acceleration: + enabled: true # This dataset will be accelerated into a DuckDB file + engine: duckdb + mode: file + refresh_check_interval: 10s +- from: localpod:test + name: test_local + acceleration: + enabled: true # This dataset accelerates the parent `test` dataset into in-memory Arrow records and is synchronized with the parent +``` From b84a12be9f6d0612139dcbc4ced94dbc1a77d4dd Mon Sep 17 00:00:00 2001 From: peasee <98815791+peasee@users.noreply.github.com> Date: Wed, 30 Oct 2024 13:29:47 +1000 Subject: [PATCH 2/4] docs: Add localpod to connectors table --- spiceaidocs/docs/components/data-connectors/index.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/spiceaidocs/docs/components/data-connectors/index.md b/spiceaidocs/docs/components/data-connectors/index.md index ce0d2ce9..97c47855 100644 --- a/spiceaidocs/docs/components/data-connectors/index.md +++ b/spiceaidocs/docs/components/data-connectors/index.md @@ -19,7 +19,7 @@ Currently supported Data Connectors include: | `debezium` | Debezium | Alpha | CDC, Kafka | `append`, `full`, `changes` | ❌ | ❌ | | `delta_lake` | Delta Lake | Beta | Delta Lake | `append`, `full` | Roadmap | ❌ | | `dremio` | Dremio | Alpha | Arrow Flight SQL | `append`, `full` | ❌ | ❌ | -| `file` | File | Alpha | Parquet, CSV | `append`, `full` | Roadmap | ✅ | +| `file` | File | Alpha | Parquet, CSV | `append`, `full` | Roadmap | ✅ | | `flightsql` | FlightSQL | Beta | Arrow Flight SQL | `append`, `full` | ❌ | ❌ | | `ftp`, `sftp` | FTP/SFTP | Alpha | Parquet, CSV | `append`, `full` | ❌ | ✅ | | `github` | GitHub | Beta | GraphQL, REST | `append`, `full` | ❌ | ❌ | @@ -27,13 +27,14 @@ Currently supported Data Connectors include: | `http`, `https` | HTTP(s) | Alpha | Parquet, CSV | `append`, `full` | ❌ | ❌ | | `mssql` | MS SQL Server | Alpha | Tabular Data Stream (TDS) | `append`, `full` | ❌ | ❌ | | `mysql` | MySQL | Beta | | `append`, `full` | Roadmap | ❌ | -| `odbc` | ODBC | Beta | | `append`, `full` | ❌ | ❌ | +| `odbc` | ODBC | Beta | | `append`, `full` | ❌ | ❌ | | `postgres` | PostgreSQL | Beta | | `append`, `full` | Roadmap | ❌ | | `s3` | S3 | Beta | Parquet, CSV | `append`, `full` | Roadmap | ✅ | | `sharepoint` | SharePoint | Alpha | | `append`, `full` | ❌ | ✅ | | `snowflake` | Snowflake | Alpha | Arrow | `append`, `full` | Roadmap | ❌ | | `spiceai` | Spice.ai | Beta | Arrow Flight | `append`, `full` | ✅ | ❌ | | `spark` | Spark | Alpha | Spark Connect | `append`, `full` | ❌ | ❌ | +| `localpod` | Local dataset replication | Alpha | | `append`, `full` | ❌ | ✅ | ## Object Store File Formats From f447d6542d7aa3d37dd0d83317a9865023e16228 Mon Sep 17 00:00:00 2001 From: peasee <98815791+peasee@users.noreply.github.com> Date: Wed, 30 Oct 2024 13:31:13 +1000 Subject: [PATCH 3/4] docs: Re-order localpod position in table --- spiceaidocs/docs/components/data-connectors/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spiceaidocs/docs/components/data-connectors/index.md b/spiceaidocs/docs/components/data-connectors/index.md index 742de8fe..83e2b57d 100644 --- a/spiceaidocs/docs/components/data-connectors/index.md +++ b/spiceaidocs/docs/components/data-connectors/index.md @@ -30,11 +30,11 @@ Currently supported Data Connectors include: | `ftp`, `sftp` | FTP/SFTP | Alpha | Parquet, CSV | `append`, `full` | ❌ | ✅ | | `graphql` | GraphQL | Alpha | GraphQL | `append`, `full` | ❌ | ❌ | | `http`, `https` | HTTP(s) | Alpha | Parquet, CSV | `append`, `full` | ❌ | ❌ | +| `localpod` | Local dataset replication | Alpha | | `append`, `full` | ❌ | ✅ | | `mssql` | MS SQL Server | Alpha | Tabular Data Stream (TDS) | `append`, `full` | ❌ | ❌ | | `sharepoint` | SharePoint | Alpha | | `append`, `full` | ❌ | ✅ | | `snowflake` | Snowflake | Alpha | Arrow | `append`, `full` | Roadmap | ❌ | | `spark` | Spark | Alpha | Spark Connect | `append`, `full` | ❌ | ❌ | -| `localpod` | Local dataset replication | Alpha | | `append`, `full` | ❌ | ✅ | ## Object Store File Formats From 6b13accc874d287a016005795f60d570f66f1d93 Mon Sep 17 00:00:00 2001 From: Phillip LeBlanc Date: Thu, 31 Oct 2024 22:44:01 +0900 Subject: [PATCH 4/4] wip --- spiceaidocs/docs/components/data-connectors/localpod.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spiceaidocs/docs/components/data-connectors/localpod.md b/spiceaidocs/docs/components/data-connectors/localpod.md index 0039aaa2..83aec516 100644 --- a/spiceaidocs/docs/components/data-connectors/localpod.md +++ b/spiceaidocs/docs/components/data-connectors/localpod.md @@ -5,7 +5,7 @@ description: 'Localpod Data Connector Documentation' pagination_prev: null --- -The Localpod Data Connector enables setting up a parent/child relationship between datasets in the current Spicepod. This is useful for configuring multiple/tiered accelerations for a single dataset, and ensuring that the data is only downloaded once from the remote source. +The Localpod Data Connector enables setting up a parent/child relationship between datasets in the current Spicepod. This can be used for configuring multiple/tiered accelerations for a single dataset, and ensuring that the data is only downloaded once from the remote source. For example, you can use the `localpod` connector to create a child dataset that is accelerated in-memory, while the parent dataset is accelerated to a file. The dataset created by the `localpod` connector will logically have the same data as the parent dataset.