Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve structure for v0.9 release #122

Merged
merged 10 commits into from
Mar 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion spiceaidocs/config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ footer_about_disable = false
# End user relevant links. These will show up on left side of footer and in the community page if you have one.
[[params.links.developer]]
name ="Twitter"
url = "https://twitter.com/SpiceAIHQ"
url = "https://twitter.com/spice_ai"
icon = "fab fa-twitter"
desc = "Follow us on Twitter to get the latest news!"
# Developer relevant links. These will show up on right side of footer and in the community page if you have one.
Expand Down
9 changes: 0 additions & 9 deletions spiceaidocs/content/en/Connectors/_index.md

This file was deleted.

3 changes: 2 additions & 1 deletion spiceaidocs/content/en/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,10 @@ no_list: true
---

# Spice

## What is Spice?

**Spice** is a small, portable runtime that provides developers with a unified SQL query interface to locally accelerate and query data tables sourced from any database, data warehouse, or data lake.
**Spice** is a small, portable runtime that provides developers with a unified SQL query interface to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake.

Spice makes it easy to build data-driven and data-intensive applications by streamlining the use of data and machine learning (ML) in software.

Expand Down
168 changes: 84 additions & 84 deletions spiceaidocs/content/en/acknowledgements/_index.md

Large diffs are not rendered by default.

23 changes: 11 additions & 12 deletions spiceaidocs/content/en/cli/_index.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
type: docs
title: "Spice.ai CLI documentation"
linkTitle: "CLI"
weight: 60
description: "Detailed documentation on the Spice.ai CLI"
title: 'Spice.ai CLI documentation'
linkTitle: 'CLI'
weight: 100
description: 'Detailed documentation on the Spice.ai CLI'
---

The Spice.ai CLI is a set of commands to create and manage Spice.ai pods and interact with the Spice.ai runtime.
Expand Down Expand Up @@ -45,14 +45,13 @@ spice add spiceai/quickstart

Common commands are:

| Command | Description |
| ----------------- | ------------------------------------------------------------------- |
| spice add | Add Pod - adds a pod to the project |
| spice run | Run Spice - starts the Spice runtime, installing if necessary |
| spice version | Spice CLI version |
| spice help | Help about any command |
| spice upgrade | Upgrades the Spice CLI to the latest release |

| Command | Description |
| ------------- | ------------------------------------------------------------- |
| spice add | Add Pod - adds a pod to the project |
| spice run | Run Spice - starts the Spice runtime, installing if necessary |
| spice version | Spice CLI version |
| spice help | Help about any command |
| spice upgrade | Upgrades the Spice CLI to the latest release |

See [Spice CLI command reference]({{<ref "cli/reference">}}) for the full list of available commands.

Expand Down
7 changes: 7 additions & 0 deletions spiceaidocs/content/en/clients/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
type: docs
title: 'Clients and Tools'
linkTitle: 'Clients and Tools'
weight: 110
description: 'Client and tools'
---
33 changes: 33 additions & 0 deletions spiceaidocs/content/en/data-accelerators/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
type: docs
title: 'Data Accelerators'
linkTitle: 'Data Accelerators'
description: ''
weight: 80
---

Data sourced by Data Connectors can be locally materialized and accelerated using a Data Accelerator.

Acceleration is enabled on a dataset by setting the acceleration configuration. E.g.

```yaml
datasets:
- name: accelerated_dataset
acceleration:
enabled: true
```

For the complete reference specification see [datasets]({{<ref "reference/spicepod/datasets">}}).

By default, datasets will be locally materialized using in-memory Arrow records.

Data Accelerators using DuckDB, SQLite, or PostgreSQL engines can be used to materialize data in files or attached databases.

Currently supported Data Accelerators include:

| Engine Name | Description | Status | Engine Modes |
| ---------------------------------------------------- | ----------------------- | ------ | ---------------- |
| `arrow` | In-Memory Arrow Records | Alpha | `memory` |
| `duckdb` | Embedded DuckDB | Alpha | `memory`, `file` |
| `sqlite` | Embedded SQLite | Alpha | `memory`, `file` |
| [`postgres`]({{<ref "data-accelerators/postgres">}}) | Attached PostgreSQL | Alpha | |
66 changes: 66 additions & 0 deletions spiceaidocs/content/en/data-accelerators/postgres/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
type: docs
title: 'PostgreSQL Data Accelerator'
linkTitle: 'PostgreSQL Data Accelerator'
description: 'PostgreSQL Data Accelerator Documentation'
---

To use PostgreSQL as Data Accelerator, specify `postgres` as the `engine` for acceleration.

```yaml
datasets:
- from: spiceai:path.to.my_dataset
name: my_dataset
acceleration:
engine: postgres
```

## Configuration

The connection to PostgreSQL can be configured by providing the following `params`:

- `pg_host`: The hostname of the PostgreSQL server.
- `pg_port`: The port of the PostgreSQL server.
- `pg_db`: The name of the database to connect to.
- `pg_user`: The username to connect with.
- `pg_pass_key`: The secret key containing the password to connect with.
- `pg_pass`: The plain-text password to connect with, ignored if `pg_pass_key` is provided.

Configuration `params` are provided either in the top level `dataset` for a dataset source and federated SQL query, or in the `acceleration` section for a data store.

```yaml
datasets:
- from: spiceai:path.to.my_dataset
name: my_dataset
acceleration:
engine: postgres
params:
pg_host: localhost
pg_port: 5432
pg_db: my_database
pg_user: my_user
pg_pass_key: my_secret
```

Additionally, an `engine_secret` may be provided when configuring a PostgreSQL data store to allow for using a different secret store to specify the password for a dataset using PostgreSQL as both the data source and data store.

```yaml
datasets:
- from: spiceai:path.to.my_dataset
name: my_dataset
params:
pg_host: localhost
pg_port: 5432
pg_db: data_store
pg_user: my_user
pg_pass_key: my_secret
acceleration:
engine: postgres
engine_secret: pg_backend
params:
pg_host: localhost
pg_port: 5433
pg_db: data_store
pg_user: my_user
pg_pass_key: my_secret
```
22 changes: 22 additions & 0 deletions spiceaidocs/content/en/data-connectors/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
type: docs
title: 'Data Connectors'
linkTitle: 'Data Connectors'
description: ''
weight: 70
---

Data Connectors provide connections to databases, data warehouses, and data lakes for federated SQL queries and data replication.

Currently supported Data Connectors include:

| Name | Description | Status | Protocol/Format | Refresh Modes | Supports Inserts |
| ------------ | ----------- | ------------ | ---------------- | ---------------- | ---------------- |
| `databricks` | Databricks | Alpha | Delta Lake | `full` | ❌ |
| `postgres` | PostgreSQL | Alpha | | `full` | ✅ |
| `spiceai` | Spice.ai | Alpha | Arrow Flight | `append`, `full` | ✅ |
| `s3` | S3 | Alpha | Parquet | `full` | ❌ |
| `dremio` | Dremio | Alpha | Arrow Flight SQL | `full` | ❌ |
| `snowflake` | Snowflake | Coming soon! | Arrow Flight SQL | `full` | ❌ |
| `bigquery` | BigQuery | Coming soon! | Arrow Flight SQL | `full` | ❌ |
| `mysql` | MySQL | Coming soon! | | `full` | ❌ |
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
---
type: docs
title: "PostgreSQL"
linkTitle: "PostgreSQL"
description: 'PostgreSQL reference'
title: 'PostgreSQL Data Connector'
linkTitle: 'PostgreSQL Data Connector'
description: 'PostgreSQL Data Connector Documentation'
---

PostgreSQL can be used by the Spice runtime as a dataset source, a data store, or for federated SQL query.

## Dataset Source/Federated SQL Query

To use PostgreSQL as a dataset source or for federated SQL query, specify `postgres` as the selector in the `from` value for the dataset.
Expand All @@ -17,18 +15,6 @@ datasets:
name: my_dataset
```

## Data Store

To use PostgreSQL as a data store for dataset acceleration, specify `postgres` as the `engine` for the dataset.

```yaml
datasets:
- from: spiceai:path.to.my_dataset
name: my_dataset
acceleration:
engine: postgres
```

## Configuration

The connection to PostgreSQL can be configured by providing the following `params`:
Expand All @@ -42,34 +28,16 @@ The connection to PostgreSQL can be configured by providing the following `param

Configuration `params` are provided either in the top level `dataset` for a dataset source and federated SQL query, or in the `acceleration` section for a data store.

### Dataset Source/Federated SQL Query

```yaml
datasets:
- from: postgres:path.to.my_dataset
name: my_dataset
params:
pg_host: localhost
pg_port: 5432
pg_db: my_database
pg_user: my_user
pg_pass_key: my_secret
```

### Data Store

```yaml
datasets:
- from: spiceai:path.to.my_dataset
name: my_dataset
acceleration:
engine: postgres
params:
pg_host: localhost
pg_port: 5432
pg_db: my_database
pg_user: my_user
pg_pass_key: my_secret
pg_host: localhost
pg_port: 5432
pg_db: my_database
pg_user: my_user
pg_pass_key: my_secret
```

Additionally, an `engine_secret` may be provided when configuring a PostgreSQL data store to allow for using a different secret store to specify the password for a dataset using PostgreSQL as both the data source and data store.
Expand All @@ -79,18 +47,18 @@ datasets:
- from: spiceai:path.to.my_dataset
name: my_dataset
params:
pg_host: localhost
pg_port: 5432
pg_db: data_store
pg_user: my_user
pg_pass_key: my_secret
acceleration:
engine: postgres
engine_secret: pg_backend
params:
pg_host: localhost
pg_port: 5432
pg_port: 5433
pg_db: data_store
pg_user: my_user
pg_pass_key: my_secret
acceleration:
engine: postgres
engine_secret: pg_backend
params:
pg_host: localhost
pg_port: 5433
pg_db: data_store
pg_user: my_user
pg_pass_key: my_secret
```
```
Original file line number Diff line number Diff line change
@@ -1,41 +1,44 @@
---
type: docs
title: "S3 Data Connector"
linkTitle: "S3 Data Connector"
description: 'S3 Data Connector YAML reference'
title: 'S3 Data Connector'
linkTitle: 'S3 Data Connector'
description: 'S3 Data Connector Documentation'
---

S3 as a connector for federated SQL query across Parquet files stored in S3, or S3-compatible storage solutions (e.g. Minio, Cloudflare R2).

## `params`

- `endpoint`: The S3 endpoint, or equivalent (e.g. Minio endpoint), for the S3-compatible storage.
- `region`: Region of the S3 bucket, if region specific.
- `endpoint`: The S3 endpoint, or equivalent (e.g. Minio endpoint), for the S3-compatible storage.
- `region`: Region of the S3 bucket, if region specific.

## `auth`

Check [Secrets]({{<ref "secrets">}}).
Check [Secrets Stores]({{<ref "secret-stores">}}).

Required attribbutes:

- `key`: The access key authorised to access the S3 data (e.g. `AWS_ACCESS_KEY_ID` for AWS)
- `secret`The secret key authorised to access the S3 data (e.g. `AWS_SECRET_ACCESS_KEY` for AWS)


## Example

### Minio

```yaml
- from: s3://s3-bucket-name/path/to/parquet/cool_dataset.parquet
name: cool_dataset
params:
endpoint: https://my.minio.server
region: "us-east-1" # Best practice for Minio
region: 'us-east-1' # Best practice for Minio
```

#### S3

```yaml
- from: s3://my-startups-data/path/to/parquet/cool_dataset.parquet
name: cool_dataset
params:
endpoint: http://my-startups-data.s3.amazonaws.com
region: "ap-southeast-2"
```
region: 'ap-southeast-2'
```
13 changes: 13 additions & 0 deletions spiceaidocs/content/en/data-ingestion/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
type: docs
title: 'Data Ingestion'
linkTitle: 'Data Ingestion'
description: ''
weight: 40
---

Data can be ingested by the Spice runtime for replication to a Data Connector, like PostgreSQL or the Spice.ai Cloud platform.

By default, the runtime exposes an [OpenTelemety](https://opentelemetry.io) (OTEL) endpoint at grpc://127.0.0.1:50052 for data ingestion.

OTEL metrics will be inserted into datasets with matching names (metric name = dataset name) and optionally replicated to the dataset source.
7 changes: 7 additions & 0 deletions spiceaidocs/content/en/federated-queries/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
type: docs
title: 'Federated Queries'
linkTitle: 'Federated Queries'
description: ''
weight: 20
---
Loading
Loading