-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update dataset spicepod reference (#102)
* Update dataset spicepod reference * Update datasets.md
- Loading branch information
1 parent
29edece
commit 14607f6
Showing
10 changed files
with
138 additions
and
530 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,35 +25,3 @@ A `Pod` is a package of configuration and data used to train and deploy Spice.ai | |
A `Pod manifest` is a YAML file that describes how to connect data with a learning environment. | ||
|
||
A Pod is constructed from the following components: | ||
|
||
### Dataspace | ||
|
||
A [dataspace]({{<ref "concepts/dataspaces">}}) is a specification on how the Spice.ai runtime and AI engine loads, processes and interacts with data from a single source. A dataspace may contain a single data connector and data processor. There may be multiple dataspace definitions within a pod. The fields specified in the union of dataspaces are used as inputs to the neural networks that Spice.ai trains. | ||
|
||
A dataspace that doesn't contain a data connector/processor means that the observation data for this dataspace will be provided by calling [POST /pods/{pod}/observations]({{<ref api>}}). | ||
|
||
### Data Connector | ||
|
||
A [data connector]({{<ref "reference/pod#data-connector">}}) is a reuseable component that contains logic to fetch or ingest data from an external source. Spice.ai provides a general interface that anyone can implement to create a data connector, see the [data-components-contrib](https://github.com/spiceai/data-components-contrib/tree/trunk/dataconnectors) repo for more information. | ||
|
||
### Data Processor | ||
|
||
A [data processor]({{<ref "reference/pod#data-processor">}}) is a reusable component, composable with a data connector that contains logic to process raw connector data into [observations]({{<ref "api#observations">}}) and state Spice.ai can use. | ||
|
||
Spice.ai provides a general interface that anyone can implement to create a data processor, see the [data-components-contrib](https://github.com/spiceai/data-components-contrib/tree/trunk/dataprocessors) repo for more information. | ||
|
||
### Actions | ||
|
||
[Actions]({{<ref "reference/pod#actions">}}) are the set of actions the Spice.ai runtime can recommend for a pod. | ||
|
||
### Recommendations | ||
|
||
To intelligently adapt its behavior, an application should query the Spice.ai runtime for which [action]({{<ref "reference/pod#actions">}}) it recommends to take given a specified time. The result of this query is a [recommendation]({{<ref "concepts/recommendations">}}). | ||
|
||
If a time is not specified, the resulting recommendation query time will default to the time of the most recently ingested observation. | ||
|
||
### Training Rewards | ||
|
||
[Training Rewards]({{<ref "reference/pod#rewards">}}) are code definitions in Python that tell the Spice.ai AI Engine how to train the neural networks to achieve the desired goal. A reward is defined for each action specified in the pod. | ||
|
||
In the future we will expand the languages we support for writing the reward functions in. [Let us know](mailto:[email protected]) which language you want to be able to write your reward functions in! |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,125 @@ | ||
--- | ||
type: docs | ||
title: "Datasets" | ||
linkTitle: "Datasets" | ||
description: 'Datasets YAML reference' | ||
weight: 80 | ||
--- | ||
|
||
A Spicepod can contain one or more datasets referenced by relative path, or defined inline. | ||
|
||
# `datasets` | ||
|
||
Inline example: | ||
|
||
`spicepod.yaml` | ||
```yaml | ||
datasets: | ||
- from: spice.ai/eth/beacon/eigenlayer | ||
name: strategy_manager_deposits | ||
params: | ||
app: goerli-app | ||
acceleration: | ||
enabled: true | ||
mode: inmemory # / file | ||
engine: arrow # / duckdb | ||
refresh_interval: 1h | ||
refresh_mode: full / append # update / incremental | ||
retention: 30m | ||
``` | ||
`spicepod.yaml` | ||
```yaml | ||
datasets: | ||
- from: databricks.com/spiceai/datasets | ||
name: uniswap_eth_usd | ||
params: | ||
environment: prod | ||
acceleration: | ||
enabled: true | ||
mode: inmemory # / file | ||
engine: arrow # / duckdb | ||
refresh_interval: 1h | ||
refresh_mode: full / append # update / incremental | ||
retention: 30m | ||
``` | ||
|
||
`spicepod.yaml` | ||
```yaml | ||
datasets: | ||
- from: local/Users/phillip/data/test.parquet | ||
name: test | ||
acceleration: | ||
enabled: true | ||
mode: inmemory # / file | ||
engine: arrow # / duckdb | ||
refresh_interval: 1h | ||
refresh_mode: full / append # update / incremental | ||
retention: 30m | ||
``` | ||
|
||
Relative path example: | ||
|
||
`spicepod.yaml` | ||
```yaml | ||
datasets: | ||
- from: datasets/uniswap_v2_eth_usdc | ||
``` | ||
|
||
`datasets/uniswap_v2_eth_usdc/dataset.yaml` | ||
```yaml | ||
name: spiceai.uniswap_v2_eth_usdc | ||
type: overwrite | ||
source: spice.ai | ||
auth: spice.ai | ||
acceleration: | ||
enabled: true | ||
refresh: 1h | ||
``` | ||
|
||
## `name` | ||
|
||
The name of the dataset. This is used to reference the dataset in the pod manifest, as well as in external data sources. | ||
|
||
## `type` | ||
|
||
The type of dataset. The following types are supported: | ||
|
||
- `overwrite` - Overwrites the dataset with the contents of the dataset source. | ||
- `append` - Appends new data from dataset source to the dataset. | ||
|
||
## `source` | ||
|
||
The source of the dataset. The following sources are supported: | ||
|
||
- `spice.ai` | ||
- `dremio` (coming soon) | ||
- `databricks` (coming soon) | ||
|
||
## `auth` | ||
|
||
Optional. The authentication profile to use to connect to the dataset source. Use `spice login` to create a new authentication profile. | ||
|
||
If not specified, the default profile for the data source is used. | ||
|
||
## `acceleration` | ||
|
||
Optional. Accelerate queries to the dataset by caching data locally. | ||
|
||
## `acceleration.enabled` | ||
|
||
Optional. Enable or disable acceleration. | ||
|
||
## `acceleration.refresh` | ||
|
||
Optional. The interval to refresh the data for the dataset if the dataset type is overwrite. Specified as a [duration literal]({{<ref "reference/duration">}}). | ||
|
||
For `append` datasets, the refresh interval not used. | ||
|
||
i.e. `1h` for 1 hour, `1m` for 1 minute, `1s` for 1 second, etc. | ||
|
||
## `acceleration.retention` | ||
|
||
Optional. Only supported for `append` datasets. Specifies how long to retain data updates from the data source before they are deleted. Specified as a [duration literal]({{<ref "reference/duration">}}). | ||
|
||
If not specified, the default retention is to keep all data. |
Oops, something went wrong.