Skip to content

Commit

Permalink
Update dataset spicepod reference (#102)
Browse files Browse the repository at this point in the history
* Update dataset spicepod reference

* Update datasets.md
  • Loading branch information
phillipleblanc authored Feb 23, 2024
1 parent 29edece commit 14607f6
Show file tree
Hide file tree
Showing 10 changed files with 138 additions and 530 deletions.
32 changes: 0 additions & 32 deletions spiceaidocs/content/en/concepts/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,35 +25,3 @@ A `Pod` is a package of configuration and data used to train and deploy Spice.ai
A `Pod manifest` is a YAML file that describes how to connect data with a learning environment.

A Pod is constructed from the following components:

### Dataspace

A [dataspace]({{<ref "concepts/dataspaces">}}) is a specification on how the Spice.ai runtime and AI engine loads, processes and interacts with data from a single source. A dataspace may contain a single data connector and data processor. There may be multiple dataspace definitions within a pod. The fields specified in the union of dataspaces are used as inputs to the neural networks that Spice.ai trains.

A dataspace that doesn't contain a data connector/processor means that the observation data for this dataspace will be provided by calling [POST /pods/{pod}/observations]({{<ref api>}}).

### Data Connector

A [data connector]({{<ref "reference/pod#data-connector">}}) is a reuseable component that contains logic to fetch or ingest data from an external source. Spice.ai provides a general interface that anyone can implement to create a data connector, see the [data-components-contrib](https://github.com/spiceai/data-components-contrib/tree/trunk/dataconnectors) repo for more information.

### Data Processor

A [data processor]({{<ref "reference/pod#data-processor">}}) is a reusable component, composable with a data connector that contains logic to process raw connector data into [observations]({{<ref "api#observations">}}) and state Spice.ai can use.

Spice.ai provides a general interface that anyone can implement to create a data processor, see the [data-components-contrib](https://github.com/spiceai/data-components-contrib/tree/trunk/dataprocessors) repo for more information.

### Actions

[Actions]({{<ref "reference/pod#actions">}}) are the set of actions the Spice.ai runtime can recommend for a pod.

### Recommendations

To intelligently adapt its behavior, an application should query the Spice.ai runtime for which [action]({{<ref "reference/pod#actions">}}) it recommends to take given a specified time. The result of this query is a [recommendation]({{<ref "concepts/recommendations">}}).

If a time is not specified, the resulting recommendation query time will default to the time of the most recently ingested observation.

### Training Rewards

[Training Rewards]({{<ref "reference/pod#rewards">}}) are code definitions in Python that tell the Spice.ai AI Engine how to train the neural networks to achieve the desired goal. A reward is defined for each action specified in the pod.

In the future we will expand the languages we support for writing the reward functions in. [Let us know](mailto:[email protected]) which language you want to be able to write your reward functions in!
72 changes: 0 additions & 72 deletions spiceaidocs/content/en/concepts/rewards/_index.md

This file was deleted.

71 changes: 0 additions & 71 deletions spiceaidocs/content/en/concepts/rewards/external.md

This file was deleted.

2 changes: 0 additions & 2 deletions spiceaidocs/content/en/concepts/time/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,6 @@ params:
If not provided in the manifest, Spicepods will default to a period of **3 days**, intervals of **1 min**, and granularity of **10 seconds**. The period epoch will default to a dynamic epoch of the current time minus the period. In this mode, the period becomes a sliding window over time.
See reference documentation for [Spicepod params]({{<ref "reference/pod#params">}}).
### Period
The `period` defines the entire timespan the Spicepod will use for learning and decision-making.
Expand Down
14 changes: 13 additions & 1 deletion spiceaidocs/content/en/reference/Spicepod/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ metadata:
## `datasets`

A Spicepod can contain one or more [datasets](https://docs.spice.ai/reference/specifications/dataset-and-view-yaml-specification) referenced by relative path.
A Spicepod can contain one or more [datasets]({{<ref "reference/Spicepod/datasets">}}) referenced by relative path.

**Example**

Expand All @@ -60,6 +60,18 @@ datasets:
dependsOn: datasets/uniswap_eth_usdc
```

A dataset defined inline.

```yaml
datasets:
- name: spiceai.uniswap_v2_eth_usdc
type: overwrite
source: spice.ai
acceleration:
enabled: true
refresh: 1h
```

## `functions`

A Spicepod can contain one or more [functions](https://docs.spice.ai/reference/specifications/spice-functions-yaml-specification) referenced by relative path.
Expand Down
125 changes: 125 additions & 0 deletions spiceaidocs/content/en/reference/Spicepod/datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
type: docs
title: "Datasets"
linkTitle: "Datasets"
description: 'Datasets YAML reference'
weight: 80
---

A Spicepod can contain one or more datasets referenced by relative path, or defined inline.

# `datasets`

Inline example:

`spicepod.yaml`
```yaml
datasets:
- from: spice.ai/eth/beacon/eigenlayer
name: strategy_manager_deposits
params:
app: goerli-app
acceleration:
enabled: true
mode: inmemory # / file
engine: arrow # / duckdb
refresh_interval: 1h
refresh_mode: full / append # update / incremental
retention: 30m
```
`spicepod.yaml`
```yaml
datasets:
- from: databricks.com/spiceai/datasets
name: uniswap_eth_usd
params:
environment: prod
acceleration:
enabled: true
mode: inmemory # / file
engine: arrow # / duckdb
refresh_interval: 1h
refresh_mode: full / append # update / incremental
retention: 30m
```

`spicepod.yaml`
```yaml
datasets:
- from: local/Users/phillip/data/test.parquet
name: test
acceleration:
enabled: true
mode: inmemory # / file
engine: arrow # / duckdb
refresh_interval: 1h
refresh_mode: full / append # update / incremental
retention: 30m
```

Relative path example:

`spicepod.yaml`
```yaml
datasets:
- from: datasets/uniswap_v2_eth_usdc
```

`datasets/uniswap_v2_eth_usdc/dataset.yaml`
```yaml
name: spiceai.uniswap_v2_eth_usdc
type: overwrite
source: spice.ai
auth: spice.ai
acceleration:
enabled: true
refresh: 1h
```

## `name`

The name of the dataset. This is used to reference the dataset in the pod manifest, as well as in external data sources.

## `type`

The type of dataset. The following types are supported:

- `overwrite` - Overwrites the dataset with the contents of the dataset source.
- `append` - Appends new data from dataset source to the dataset.

## `source`

The source of the dataset. The following sources are supported:

- `spice.ai`
- `dremio` (coming soon)
- `databricks` (coming soon)

## `auth`

Optional. The authentication profile to use to connect to the dataset source. Use `spice login` to create a new authentication profile.

If not specified, the default profile for the data source is used.

## `acceleration`

Optional. Accelerate queries to the dataset by caching data locally.

## `acceleration.enabled`

Optional. Enable or disable acceleration.

## `acceleration.refresh`

Optional. The interval to refresh the data for the dataset if the dataset type is overwrite. Specified as a [duration literal]({{<ref "reference/duration">}}).

For `append` datasets, the refresh interval not used.

i.e. `1h` for 1 hour, `1m` for 1 minute, `1s` for 1 second, etc.

## `acceleration.retention`

Optional. Only supported for `append` datasets. Specifies how long to retain data updates from the data source before they are deleted. Specified as a [duration literal]({{<ref "reference/duration">}}).

If not specified, the default retention is to keep all data.
Loading

0 comments on commit 14607f6

Please sign in to comment.