Improve structure for v0.9 release (#122)

* Restructure for v0.9.1 release * Additional content * Tweak what is spice * Fix links * Update Twitter * Fix links
spiceai · Mar 20, 2024 · 4507c9e · 4507c9e
1 parent 2704cb2
commit 4507c9e
Show file tree

Hide file tree

Showing 23 changed files with 325 additions and 199 deletions.
diff --git a/spiceaidocs/config.toml b/spiceaidocs/config.toml
@@ -125,7 +125,7 @@ footer_about_disable = false
 # End user relevant links. These will show up on left side of footer and in the community page if you have one.
 [[params.links.developer]]
 	name ="Twitter"
-	url = "https://twitter.com/SpiceAIHQ"
+	url = "https://twitter.com/spice_ai"
 	icon = "fab fa-twitter"
         desc = "Follow us on Twitter to get the latest news!"
 # Developer relevant links. These will show up on right side of footer and in the community page if you have one.

diff --git a/spiceaidocs/content/en/Connectors/_index.md b/spiceaidocs/content/en/Connectors/_index.md
diff --git a/spiceaidocs/content/en/_index.md b/spiceaidocs/content/en/_index.md
@@ -4,9 +4,10 @@ no_list: true
 ---
 
 # Spice
+
 ## What is Spice?
 
-**Spice** is a small, portable runtime that provides developers with a unified SQL query interface to locally accelerate and query data tables sourced from any database, data warehouse, or data lake.
+**Spice** is a small, portable runtime that provides developers with a unified SQL query interface to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake.
 
 Spice makes it easy to build data-driven and data-intensive applications by streamlining the use of data and machine learning (ML) in software.
 

diff --git a/spiceaidocs/content/en/acknowledgements/_index.md b/spiceaidocs/content/en/acknowledgements/_index.md
diff --git a/spiceaidocs/content/en/cli/_index.md b/spiceaidocs/content/en/cli/_index.md
@@ -1,9 +1,9 @@
 ---
 type: docs
-title: "Spice.ai CLI documentation"
-linkTitle: "CLI"
-weight: 60
-description: "Detailed documentation on the Spice.ai CLI"
+title: 'Spice.ai CLI documentation'
+linkTitle: 'CLI'
+weight: 100
+description: 'Detailed documentation on the Spice.ai CLI'
 ---
 
 The Spice.ai CLI is a set of commands to create and manage Spice.ai pods and interact with the Spice.ai runtime.
@@ -45,14 +45,13 @@ spice add spiceai/quickstart
 
 Common commands are:
 
-| Command           | Description                                                       |
-| ----------------- | ------------------------------------------------------------------- |
-| spice add         | Add Pod - adds a pod to the project                                 |
-| spice run         | Run Spice - starts the Spice runtime, installing if necessary |
-| spice version     | Spice CLI version                                                   |
-| spice help        | Help about any command                                              |
-| spice upgrade     | Upgrades the Spice CLI to the latest release                        |
-
+| Command       | Description                                                   |
+| ------------- | ------------------------------------------------------------- |
+| spice add     | Add Pod - adds a pod to the project                           |
+| spice run     | Run Spice - starts the Spice runtime, installing if necessary |
+| spice version | Spice CLI version                                             |
+| spice help    | Help about any command                                        |
+| spice upgrade | Upgrades the Spice CLI to the latest release                  |
 
 See [Spice CLI command reference]({{<ref "cli/reference">}}) for the full list of available commands.
 

diff --git a/...s/content/en/flight-sql/DBeaver/_index.md → ...docs/content/en/clients/DBeaver/_index.md b/...s/content/en/flight-sql/DBeaver/_index.md → ...docs/content/en/clients/DBeaver/_index.md
diff --git a/spiceaidocs/content/en/clients/_index.md b/spiceaidocs/content/en/clients/_index.md
@@ -0,0 +1,7 @@
+---
+type: docs
+title: 'Clients and Tools'
+linkTitle: 'Clients and Tools'
+weight: 110
+description: 'Client and tools'
+---
diff --git a/spiceaidocs/content/en/flight-sql/_index.md → ...ent/en/clients/arrow-flight-sql/_index.md b/spiceaidocs/content/en/flight-sql/_index.md → ...ent/en/clients/arrow-flight-sql/_index.md
diff --git a/spiceaidocs/content/en/data-accelerators/_index.md b/spiceaidocs/content/en/data-accelerators/_index.md
@@ -0,0 +1,33 @@
+---
+type: docs
+title: 'Data Accelerators'
+linkTitle: 'Data Accelerators'
+description: ''
+weight: 80
+---
+
+Data sourced by Data Connectors can be locally materialized and accelerated using a Data Accelerator.
+
+Acceleration is enabled on a dataset by setting the acceleration configuration. E.g.
+
+```yaml
+datasets:
+  - name: accelerated_dataset
+    acceleration:
+      enabled: true
+```
+
+For the complete reference specification see [datasets]({{<ref "reference/spicepod/datasets">}}).
+
+By default, datasets will be locally materialized using in-memory Arrow records.
+
+Data Accelerators using DuckDB, SQLite, or PostgreSQL engines can be used to materialize data in files or attached databases.
+
+Currently supported Data Accelerators include:
+
+| Engine Name                                          | Description             | Status | Engine Modes     |
+| ---------------------------------------------------- | ----------------------- | ------ | ---------------- |
+| `arrow`                                              | In-Memory Arrow Records | Alpha  | `memory`         |
+| `duckdb`                                             | Embedded DuckDB         | Alpha  | `memory`, `file` |
+| `sqlite`                                             | Embedded SQLite         | Alpha  | `memory`, `file` |
+| [`postgres`]({{<ref "data-accelerators/postgres">}}) | Attached PostgreSQL     | Alpha  |                  |
diff --git a/spiceaidocs/content/en/data-accelerators/postgres/_index.md b/spiceaidocs/content/en/data-accelerators/postgres/_index.md
@@ -0,0 +1,66 @@
+---
+type: docs
+title: 'PostgreSQL Data Accelerator'
+linkTitle: 'PostgreSQL Data Accelerator'
+description: 'PostgreSQL Data Accelerator Documentation'
+---
+
+To use PostgreSQL as Data Accelerator, specify `postgres` as the `engine` for acceleration.
+
+```yaml
+datasets:
+  - from: spiceai:path.to.my_dataset
+    name: my_dataset
+    acceleration:
+      engine: postgres
+```
+
+## Configuration
+
+The connection to PostgreSQL can be configured by providing the following `params`:
+
+- `pg_host`: The hostname of the PostgreSQL server.
+- `pg_port`: The port of the PostgreSQL server.
+- `pg_db`: The name of the database to connect to.
+- `pg_user`: The username to connect with.
+- `pg_pass_key`: The secret key containing the password to connect with.
+- `pg_pass`: The plain-text password to connect with, ignored if `pg_pass_key` is provided.
+
+Configuration `params` are provided either in the top level `dataset` for a dataset source and federated SQL query, or in the `acceleration` section for a data store.
+
+```yaml
+datasets:
+  - from: spiceai:path.to.my_dataset
+    name: my_dataset
+    acceleration:
+      engine: postgres
+      params:
+        pg_host: localhost
+        pg_port: 5432
+        pg_db: my_database
+        pg_user: my_user
+        pg_pass_key: my_secret
+```
+
+Additionally, an `engine_secret` may be provided when configuring a PostgreSQL data store to allow for using a different secret store to specify the password for a dataset using PostgreSQL as both the data source and data store.
+
+```yaml
+datasets:
+  - from: spiceai:path.to.my_dataset
+    name: my_dataset
+    params:
+      pg_host: localhost
+      pg_port: 5432
+      pg_db: data_store
+      pg_user: my_user
+      pg_pass_key: my_secret
+    acceleration:
+      engine: postgres
+      engine_secret: pg_backend
+      params:
+        pg_host: localhost
+        pg_port: 5433
+        pg_db: data_store
+        pg_user: my_user
+        pg_pass_key: my_secret
+```
diff --git a/spiceaidocs/content/en/data-connectors/_index.md b/spiceaidocs/content/en/data-connectors/_index.md
@@ -0,0 +1,22 @@
+---
+type: docs
+title: 'Data Connectors'
+linkTitle: 'Data Connectors'
+description: ''
+weight: 70
+---
+
+Data Connectors provide connections to databases, data warehouses, and data lakes for federated SQL queries and data replication.
+
+Currently supported Data Connectors include:
+
+| Name         | Description | Status       | Protocol/Format  | Refresh Modes    | Supports Inserts |
+| ------------ | ----------- | ------------ | ---------------- | ---------------- | ---------------- |
+| `databricks` | Databricks  | Alpha        | Delta Lake       | `full`           | ❌               |
+| `postgres`   | PostgreSQL  | Alpha        |                  | `full`           | ✅               |
+| `spiceai`    | Spice.ai    | Alpha        | Arrow Flight     | `append`, `full` | ✅               |
+| `s3`         | S3          | Alpha        | Parquet          | `full`           | ❌               |
+| `dremio`     | Dremio      | Alpha        | Arrow Flight SQL | `full`           | ❌               |
+| `snowflake`  | Snowflake   | Coming soon! | Arrow Flight SQL | `full`           | ❌               |
+| `bigquery`   | BigQuery    | Coming soon! | Arrow Flight SQL | `full`           | ❌               |
+| `mysql`      | MySQL       | Coming soon! |                  | `full`           | ❌               |
diff --git a/.../content/en/Connectors/Postgres/_index.md → ...ent/en/data-connectors/postgres/_index.md b/.../content/en/Connectors/Postgres/_index.md → ...ent/en/data-connectors/postgres/_index.md
@@ -1,12 +1,10 @@
 ---
 type: docs
-title: "PostgreSQL"
-linkTitle: "PostgreSQL"
-description: 'PostgreSQL reference'
+title: 'PostgreSQL Data Connector'
+linkTitle: 'PostgreSQL Data Connector'
+description: 'PostgreSQL Data Connector Documentation'
 ---
 
-PostgreSQL can be used by the Spice runtime as a dataset source, a data store, or for federated SQL query.
-
 ## Dataset Source/Federated SQL Query
 
 To use PostgreSQL as a dataset source or for federated SQL query, specify `postgres` as the selector in the `from` value for the dataset.
@@ -17,18 +15,6 @@ datasets:
     name: my_dataset
 ```
 
-## Data Store
-
-To use PostgreSQL as a data store for dataset acceleration, specify `postgres` as the `engine` for the dataset.
-
-```yaml
-datasets:
-  - from: spiceai:path.to.my_dataset
-    name: my_dataset
-    acceleration:
-        engine: postgres
-```
-
 ## Configuration
 
 The connection to PostgreSQL can be configured by providing the following `params`:
@@ -42,34 +28,16 @@ The connection to PostgreSQL can be configured by providing the following `param
 
 Configuration `params` are provided either in the top level `dataset` for a dataset source and federated SQL query, or in the `acceleration` section for a data store.
 
-### Dataset Source/Federated SQL Query
-
 ```yaml
 datasets:
   - from: postgres:path.to.my_dataset
     name: my_dataset
     params:
-        pg_host: localhost
-        pg_port: 5432
-        pg_db: my_database
-        pg_user: my_user
-        pg_pass_key: my_secret
-```
-
-### Data Store
-
-```yaml
-datasets:
-  - from: spiceai:path.to.my_dataset
-    name: my_dataset
-    acceleration:
-        engine: postgres
-        params:
-          pg_host: localhost
-          pg_port: 5432
-          pg_db: my_database
-          pg_user: my_user
-          pg_pass_key: my_secret
+      pg_host: localhost
+      pg_port: 5432
+      pg_db: my_database
+      pg_user: my_user
+      pg_pass_key: my_secret
 ```
 
 Additionally, an `engine_secret` may be provided when configuring a PostgreSQL data store to allow for using a different secret store to specify the password for a dataset using PostgreSQL as both the data source and data store.
@@ -79,18 +47,18 @@ datasets:
   - from: spiceai:path.to.my_dataset
     name: my_dataset
     params:
+      pg_host: localhost
+      pg_port: 5432
+      pg_db: data_store
+      pg_user: my_user
+      pg_pass_key: my_secret
+    acceleration:
+      engine: postgres
+      engine_secret: pg_backend
+      params:
         pg_host: localhost
-        pg_port: 5432
+        pg_port: 5433
         pg_db: data_store
         pg_user: my_user
         pg_pass_key: my_secret
-    acceleration:
-        engine: postgres
-        engine_secret: pg_backend
-        params:
-            pg_host: localhost
-            pg_port: 5433
-            pg_db: data_store
-            pg_user: my_user
-            pg_pass_key: my_secret
-```
+```
diff --git a/spiceaidocs/content/en/Connectors/s3.md → spiceaidocs/content/en/data-connectors/s3.md b/spiceaidocs/content/en/Connectors/s3.md → spiceaidocs/content/en/data-connectors/s3.md
@@ -1,41 +1,44 @@
 ---
 type: docs
-title: "S3 Data Connector"
-linkTitle: "S3 Data Connector"
-description: 'S3 Data Connector YAML reference'
+title: 'S3 Data Connector'
+linkTitle: 'S3 Data Connector'
+description: 'S3 Data Connector Documentation'
 ---
 
 S3 as a connector for federated SQL query across Parquet files stored in S3, or S3-compatible storage solutions (e.g. Minio, Cloudflare R2).
 
 ## `params`
 
-- `endpoint`: The S3 endpoint, or equivalent (e.g. Minio endpoint), for the S3-compatible storage. 
-- `region`: Region of the S3 bucket, if region specific. 
+- `endpoint`: The S3 endpoint, or equivalent (e.g. Minio endpoint), for the S3-compatible storage.
+- `region`: Region of the S3 bucket, if region specific.
 
 ## `auth`
 
-Check [Secrets]({{<ref "secrets">}}).
+Check [Secrets Stores]({{<ref "secret-stores">}}).
 
 Required attribbutes:
+
 - `key`: The access key authorised to access the S3 data (e.g. `AWS_ACCESS_KEY_ID` for AWS)
 - `secret`The secret key authorised to access the S3 data (e.g. `AWS_SECRET_ACCESS_KEY` for AWS)
 
-
 ## Example
+
 ### Minio
+
 ```yaml
 - from: s3://s3-bucket-name/path/to/parquet/cool_dataset.parquet
   name: cool_dataset
   params:
     endpoint: https://my.minio.server
-    region: "us-east-1" # Best practice for Minio
+    region: 'us-east-1' # Best practice for Minio
 ```
 
 #### S3
+
 ```yaml
 - from: s3://my-startups-data/path/to/parquet/cool_dataset.parquet
   name: cool_dataset
   params:
     endpoint: http://my-startups-data.s3.amazonaws.com
-    region: "ap-southeast-2"
-```
+    region: 'ap-southeast-2'
+```
diff --git a/spiceaidocs/content/en/data-ingestion/_index.md b/spiceaidocs/content/en/data-ingestion/_index.md
@@ -0,0 +1,13 @@
+---
+type: docs
+title: 'Data Ingestion'
+linkTitle: 'Data Ingestion'
+description: ''
+weight: 40
+---
+
+Data can be ingested by the Spice runtime for replication to a Data Connector, like PostgreSQL or the Spice.ai Cloud platform.
+
+By default, the runtime exposes an [OpenTelemety](https://opentelemetry.io) (OTEL) endpoint at grpc://127.0.0.1:50052 for data ingestion.
+
+OTEL metrics will be inserted into datasets with matching names (metric name = dataset name) and optionally replicated to the dataset source.
diff --git a/spiceaidocs/content/en/federated-queries/_index.md b/spiceaidocs/content/en/federated-queries/_index.md
@@ -0,0 +1,7 @@
+---
+type: docs
+title: 'Federated Queries'
+linkTitle: 'Federated Queries'
+description: ''
+weight: 20
+---