Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update spice.ai connector docs and examples with spiceai/quickstart/datasets/taxi_trips #679

Merged
merged 3 commits into from
Dec 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions spiceaidocs/docs/cli/reference/connect.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
title: "connect"
sidebar_label: "connect"
pagination_prev: null
pagination_next: null
---

Connect to Spice.ai Cloud Platform App

### Usage

```shell
spice connect [flags]
```

#### Flags

- `-h`, `--help` Print this help message

### Examples

```shell
spice connect spiceai/quickstart
```

```shell
spice connect spiceai/tpch
```
26 changes: 20 additions & 6 deletions spiceaidocs/docs/cli/reference/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,17 +23,31 @@ spice datasets [flags]
```shell
>>> spice datasets

FROM NAME REPLICATION ACCELERATION DEPENDSON STATUS
spice.ai/eth.beacon.recent_slots eth_beacon_recent_slotsss false false Ready
spice.ai/eth.recent_blocks eth_rec_blocks false false Initializing
FROM NAME REPLICATION ACCELERATION STATUS PROPERTIES
spice.ai/spiceai/quickstart/datasets/taxi_trips taxi_trips false false Ready map[]
spice.ai/spiceai/tpch/datasets/tpch.customer tpch.customer false false Ready map[]
spice.ai/spiceai/tpch/datasets/tpch.lineitem tpch.lineitem false false Ready map[]
spice.ai/spiceai/tpch/datasets/tpch.nation tpch.nation false false Ready map[]
spice.ai/spiceai/tpch/datasets/tpch.orders tpch.orders false false Ready map[]
spice.ai/spiceai/tpch/datasets/tpch.part tpch.part false false Ready map[]
spice.ai/spiceai/tpch/datasets/tpch.partsupp tpch.partsupp false false Ready map[]
spice.ai/spiceai/tpch/datasets/tpch.region tpch.region false false Ready map[]
spice.ai/spiceai/tpch/datasets/tpch.supplier tpch.supplier false false Ready map[]
```

### Additional Example

```shell
>>> spice datasets --tls-root-certificate-file /path/to/cert.pem

FROM NAME REPLICATION ACCELERATION DEPENDSON STATUS
spice.ai/eth.beacon.recent_slots eth_beacon_recent_slotsss false false Ready
spice.ai/eth.recent_blocks eth_rec_blocks false false Initializing
FROM NAME REPLICATION ACCELERATION STATUS PROPERTIES
spice.ai/spiceai/quickstart/datasets/taxi_trips taxi_trips false false Ready map[]
spice.ai/spiceai/tpch/datasets/tpch.customer tpch.customer false false Ready map[]
spice.ai/spiceai/tpch/datasets/tpch.lineitem tpch.lineitem false false Ready map[]
spice.ai/spiceai/tpch/datasets/tpch.nation tpch.nation false false Ready map[]
spice.ai/spiceai/tpch/datasets/tpch.orders tpch.orders false false Ready map[]
spice.ai/spiceai/tpch/datasets/tpch.part tpch.part false false Ready map[]
spice.ai/spiceai/tpch/datasets/tpch.partsupp tpch.partsupp false false Ready map[]
spice.ai/spiceai/tpch/datasets/tpch.region tpch.region false false Ready map[]
spice.ai/spiceai/tpch/datasets/tpch.supplier tpch.supplier false false Ready map[]
```
18 changes: 9 additions & 9 deletions spiceaidocs/docs/components/data-accelerators/data-refresh.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,15 +187,15 @@ Example:

```yaml
datasets:
- from: spice.ai/eth.recent_blocks
name: eth_recent_blocks
- from: spice.ai/spiceai/quickstart/datasets/taxi_trips
name: taxi_trips
acceleration:
enabled: true
refresh_mode: full
refresh_check_interval: 10s
```

This configuration will refresh `eth.recent_blocks` data every 10 seconds.
This configuration will refresh `taxi_trips` data every 10 seconds.

## Refresh On-Demand

Expand Down Expand Up @@ -242,8 +242,8 @@ Example: Disable rertries

```yaml
datasets:
- from: spice.ai/eth.recent_blocks
name: eth_recent_blocks
- from: spice.ai/spiceai/quickstart/datasets/taxi_trips
name: taxi_trips
acceleration:
refresh_retry_enabled: false
refresh_check_interval: 30s
Expand All @@ -253,8 +253,8 @@ Example: Limit retries to a maximum of 10 attempts

```yaml
datasets:
- from: spice.ai/eth.recent_blocks
name: eth_recent_blocks
- from: spice.ai/spiceai/quickstart/datasets/taxi_trips
name: taxi_trips
acceleration:
refresh_retry_max_attempts: 10
refresh_check_interval: 30s
Expand All @@ -278,8 +278,8 @@ Example:

```yaml
datasets:
- from: spice.ai/eth.recent_blocks
name: eth_recent_blocks
- from: spice.ai/spiceai/quickstart/datasets/taxi_trips
name: taxi_trips
acceleration:
refresh_check_interval: 10s
refresh_jitter_enabled: true
Expand Down
12 changes: 7 additions & 5 deletions spiceaidocs/docs/components/data-connectors/spiceai.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,15 @@ Secrets will be written to a `.env` file by using the `spice login` command and

### Parameters

- `from`: The Spice.ai dataset ID. For instance `spice.ai/eth.recent_blocks` or `spice.ai/eth.recent_traces`. To query a dataset in a shared Spicepod, use the format `spice.ai/<org>/<app>/datasets/<dataset_id>`.
#### `from`

The Spice.ai Cloud Platform dataset URI. To query a dataset in a public Spice.ai App, use the format `spice.ai/<org>/<app>/datasets/<dataset_name>`.

## Example

```yaml
- from: spice.ai/eth.recent_blocks
name: eth_recent_blocks
- from: spice.ai/spiceai/quickstart/datasets/taxi_trips
name: taxi_trips
```

```yaml
Expand All @@ -35,8 +37,8 @@ Secrets will be written to a `.env` file by using the `spice login` command and
## Full Configuration Example

```yaml
- from: spice.ai/eth.recent_blocks
name: eth_recent_blocks
- from: spice.ai/spiceai/tpch/datasets/customer
name: tpch.customer
params:
spiceai_api_key: ${secrets:spiceai_api_key}
acceleration:
Expand Down
4 changes: 2 additions & 2 deletions spiceaidocs/docs/components/secret-stores/keyring/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ And the secret can be referenced in parameters:

```yaml
datasets:
- from: spice.ai:eth.recent_blocks
name: blocks
- from: spice.ai/spiceai/quickstart/datasets/taxi_trips
name: taxi_trips
params:
spiceai_api_key: ${keyring:spiceai_api_key} # ${secrets:spiceai_api_key} can also be used
```
4 changes: 2 additions & 2 deletions spiceaidocs/docs/components/secret-stores/kubernetes/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ And the secret can be referenced in parameters:

```yaml
datasets:
- from: spice.ai:eth.recent_blocks
name: blocks
- from: spice.ai/spiceai/quickstart/datasets/taxi_trips
name: taxi_trips
params:
spiceai_api_key: ${k8s:spiceai_api_key} # ${secrets:spiceai_api_key} can also be used to fallback to other secret stores
```
Expand Down
12 changes: 6 additions & 6 deletions spiceaidocs/docs/features/data-acceleration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,35 +26,35 @@ Data Security: Assess data sensitivity and secure network connections between th

## Example

### Locally Accelerating eth.recent_blocks
### Locally Accelerating taxi_trips

- Start Spice with the following dataset:

```yaml
datasets:
- from: spice.ai/eth.recent_blocks
name: eth_recent_blocks
- from: spice.ai/spiceai/quickstart/datasets/taxi_trips
name: taxi_trips
acceleration:
enabled: true
refresh_mode: full
refresh_check_interval: 10s
```

- The dataset `eth.recent_blocks` will be accelerated locally by the Spice runtime. The data will be refreshed every 10 seconds.
- The dataset `taxi_trips` will be accelerated locally by the Spice runtime. The data will be refreshed every 10 seconds.

- Compare query times against the Spice platform:

```bash
curl \
--url 'https://data.spiceai.io/v1/sql?api_key=[API_KEY]' \
--data 'select * from eth.recent_blocks'
--data 'select * from taxi_trips'
```

And the locally accelerated dataset:

```bash
spice sql
select * from eth_recent_blocks
select * from taxi_trips;
```

[Learn more about Data Accelerators](/components/data-accelerators) for faster access.
64 changes: 32 additions & 32 deletions spiceaidocs/docs/getting-started/spiceai.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ After logging in, create an app in order to get an API key.

![create_app-1](https://github.com/spiceai/spiceai/assets/112157037/d2446406-1f06-40fb-8373-1b6d692cb5f7)

This quickstart will use the [`eth.recent_blocks`](https://docs.spice.ai/reference/sql-query-tables/ethereum/core-tables) dataset.
This quickstart will use the `taxi_trips` dataset from https://spice.ai/spiceai/quickstart Spice.ai app.

**Step 1.** Initialize a new project:

Expand Down Expand Up @@ -58,19 +58,19 @@ spice dataset configure
Enter a dataset name that will be used to reference the dataset in queries. This name does not need to match the name in the dataset source.

```bash
dataset name: (spice_app) eth_recent_blocks
dataset name: (spice_app) taxi_trips
```

Enter the description of the dataset:

```
description: Recent Ethereum blocks
description: Taxi trips dataset
```

Enter the location of the dataset:

```bash
from: spice.ai/eth.recent_blocks
from: spice.ai/spiceai/quickstart/datasets/taxi_trips
```

Select `y` when prompted whether to accelerate the data:
Expand All @@ -82,9 +82,9 @@ Locally accelerate (y/n)? y
You should see the following output from your runtime terminal:

```bash
2024-08-05T13:09:08.342450Z INFO runtime: Dataset eth_recent_blocks registered (spice.ai/eth.recent_blocks), acceleration (arrow, 10s refresh), results cache enabled.
2024-08-05T13:09:08.343641Z INFO runtime::accelerated_table::refresh_task: Loading data for dataset eth_recent_blocks
2024-08-05T13:09:09.575822Z INFO runtime::accelerated_table::refresh_task: Loaded 146 rows (6.36 MiB) for dataset eth_recent_blocks in 1s 232ms.
2024-12-16T05:12:45.803694Z INFO runtime::init::dataset: Dataset taxi_trips registered (spice.ai/spiceai/quickstart/datasets/taxi_trips), acceleration (arrow, 10s refresh), results cache enabled.
2024-12-16T05:12:45.805494Z INFO runtime::accelerated_table::refresh_task: Loading data for dataset taxi_trips
2024-12-16T05:13:24.218345Z INFO runtime::accelerated_table::refresh_task: Loaded 2,964,624 rows (8.41 GiB) for dataset taxi_trips in 38s 412ms.
```

**Step 5.** In a new terminal window, use the Spice SQL REPL to query the dataset
Expand All @@ -94,47 +94,47 @@ spice sql
```

```bash
SELECT number, size, gas_used from eth_recent_blocks LIMIT 10;
SELECT tpep_pickup_datetime, passenger_count, trip_distance from taxi_trips LIMIT 10;
```

The output displays the results of the query along with the query execution time:

```bash
+----------+--------+----------+
| number | size | gas_used |
+----------+--------+----------+
| 20462425 | 32466 | 6705045 |
| 20462435 | 262114 | 29985196 |
| 20462427 | 138376 | 29989452 |
| 20462444 | 40541 | 9480363 |
| 20462431 | 78505 | 16994166 |
| 20462461 | 110372 | 21987571 |
| 20462441 | 51089 | 11136440 |
| 20462428 | 327660 | 29998593 |
| 20462429 | 133518 | 20159194 |
| 20462422 | 61461 | 13389415 |
+----------+--------+----------+

Time: 0.008562625 seconds. 10 rows.
+----------------------+-----------------+---------------+
| tpep_pickup_datetime | passenger_count | trip_distance |
+----------------------+-----------------+---------------+
| 2024-01-11T12:55:12 | 1 | 0.0 |
| 2024-01-11T12:55:12 | 1 | 0.0 |
| 2024-01-11T12:04:56 | 1 | 0.63 |
| 2024-01-11T12:18:31 | 1 | 1.38 |
| 2024-01-11T12:39:26 | 1 | 1.01 |
| 2024-01-11T12:18:58 | 1 | 5.13 |
| 2024-01-11T12:43:13 | 1 | 2.9 |
| 2024-01-11T12:05:41 | 1 | 1.36 |
| 2024-01-11T12:20:41 | 1 | 1.11 |
| 2024-01-11T12:37:25 | 1 | 2.04 |
+----------------------+-----------------+---------------+

Time: 0.00538925 seconds. 10 rows.
```

You can experiment with the time it takes to generate queries when using non-accelerated datasets. You can change the acceleration setting from `true` to `false` in the datasets.yaml file.

### Additional Example

```bash
# Query to display the average gas used in recent Ethereum blocks
SELECT AVG(gas_used) FROM eth_recent_blocks;
# Query to display the average trip distance
SELECT AVG(trip_distance) FROM taxi_trips;
```

The output displays the average gas used:

```bash
+------------------+
| avg |
+------------------+
| 15000000.1234567 |
+------------------+
+-------------------------------+
| avg(taxi_trips.trip_distance) |
+-------------------------------+
| 3.652169178958276 |
+-------------------------------+

Time: 0.005678123 seconds. 1 row.
Time: 0.031145625 seconds. 1 rows.
```
16 changes: 8 additions & 8 deletions spiceaidocs/docs/reference/spicepod/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ Inline example:

```yaml
datasets:
- from: spice.ai/eth.beacon.eigenlayer
name: strategy_manager_deposits
- from: spice.ai/spiceai/quickstart/datasets/taxi_trips
name: taxi_trips
acceleration:
enabled: true
mode: memory # / file
Expand Down Expand Up @@ -44,14 +44,14 @@ Relative path example:

```yaml
datasets:
- ref: datasets/eth_recent_transactions
- ref: datasets/taxi_trips
```

`datasets/eth_recent_transactions/dataset.yaml`
`datasets/taxi_trips/dataset.yaml`

```yaml
from: spice.ai/eth.recent_transactions
name: eth_recent_transactions
from: spice.ai/spiceai/quickstart/datasets/taxi_trips
name: taxi_trips
type: overwrite
acceleration:
enabled: true
Expand Down Expand Up @@ -105,8 +105,8 @@ An alternative to adding the dataset definition inline in the `spicepod.yaml` fi
**dataset.yaml**

```yaml
from: spice.ai/eth.recent_transactions
name: eth_recent_transactions
from: spice.ai/spiceai/quickstart/datasets/taxi_trips
name: taxi_trips
type: overwrite
acceleration:
enabled: true
Expand Down
4 changes: 2 additions & 2 deletions spiceaidocs/docs/reference/spicepod/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,8 +252,8 @@ A dataset defined inline.

```yaml
datasets:
- from: spice.ai/eth.recent_blocks
name: eth_blocks
- from: spice.ai/spiceai/quickstart/datasets/taxi_trips
name: taxi_trips
acceleration:
enabled: true
refresh_mode: full
Expand Down
Loading