Skip to content

Commit

Permalink
Update datasets spicepod spec
Browse files Browse the repository at this point in the history
  • Loading branch information
ewgenius committed Mar 25, 2024
1 parent 5cda75c commit 95b2fab
Showing 1 changed file with 27 additions and 35 deletions.
62 changes: 27 additions & 35 deletions spiceaidocs/content/en/reference/Spicepod/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Inline example:
`spicepod.yaml`
```yaml
datasets:
- from: spice.ai/eth/beacon/eigenlayer
- from: spiceai:spice.ai/eth/beacon/eigenlayer
name: strategy_manager_deposits
params:
app: goerli-app
Expand All @@ -31,7 +31,7 @@ datasets:
`spicepod.yaml`
```yaml
datasets:
- from: databricks.com/spiceai/datasets
- from: databricks:databricks.com/spiceai/datasets
name: uniswap_eth_usd
params:
environment: prod
Expand All @@ -44,63 +44,55 @@ datasets:
retention: 30m
```

`spicepod.yaml`
```yaml
datasets:
- from: local/Users/phillip/data/test.parquet
name: test
acceleration:
enabled: true
mode: inmemory # / file
engine: arrow # / duckdb
refresh_interval: 1h
refresh_mode: full / append # update / incremental
retention: 30m
```

Relative path example:

`spicepod.yaml`
```yaml
datasets:
- from: datasets/uniswap_v2_eth_usdc
- from: datasets/eth_recent_transactions
```

`datasets/uniswap_v2_eth_usdc/dataset.yaml`
`datasets/eth_recent_transactions/dataset.yaml`
```yaml
name: spiceai.uniswap_v2_eth_usdc
from: spiceai:spice.ai/eth.recent_transactions
name: eth_recent_transactions
type: overwrite
source: spice.ai
auth: spice.ai
acceleration:
enabled: true
refresh: 1h
```

## `name`
## `from`

The name of the dataset. This is used to reference the dataset in the pod manifest, as well as in external data sources.
The `from` field is a string that represents the Uniform Resource Identifier (URI) for the dataset. This URI is composed of two parts: a prefix indicating the source of the dataset, and the actual link to the dataset.

## `type`
The syntax for the `from` field is as follows:

The type of dataset. The following types are supported:
```yaml
from: <source>:<link>
```

- `overwrite` - Overwrites the dataset with the contents of the dataset source.
- `append` - Appends new data from dataset source to the dataset.
Where:

- `<source>`: The source of the dataset

## `source`
Currently supported sources:
- `spiceai`
- `dremio`
- `databricks`

The source of the dataset. The following sources are supported:
- `<link>`: The actual link to the dataset.

- `spice.ai`
- `dremio` (coming soon)
- `databricks` (coming soon)
## `name`

## `auth`
The name of the dataset. This is used to reference the dataset in the pod manifest, as well as in external data sources.

Optional. The authentication profile to use to connect to the dataset source. Use `spice login` to create a new authentication profile.
## `type`

If not specified, the default profile for the data source is used.
The type of dataset. The following types are supported:

- `overwrite` - Overwrites the dataset with the contents of the dataset source.
- `append` - Appends new data from dataset source to the dataset.

## `acceleration`

Expand Down

0 comments on commit 95b2fab

Please sign in to comment.