Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine Learning section improvements #263

Merged
merged 5 commits into from
Jun 3, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions spiceaidocs/docs/machine-learning/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,14 @@ pagination_prev: null

:::warning[Early Preview]

The Spice ML runtime is in its early preview phase and is subject to modifications.
Machine Learning (ML) is in preview and is subject to modifications.

:::

Machine learning models can be added to the Spice runtime similarly to datasets. The Spice runtime will load it, just like a dataset.
ML models can be defined similarly to datasets. The runtime will load the model for inference.
lukekim marked this conversation as resolved.
Show resolved Hide resolved
lukekim marked this conversation as resolved.
Show resolved Hide resolved
lukekim marked this conversation as resolved.
Show resolved Hide resolved

Example:

```yaml
name: my_spicepod
version: v1beta1
Expand All @@ -33,6 +36,6 @@ datasets:
- from: spice.ai/eth.recent_blocks
name: eth_recent_blocks
acceleration:
enabled: true
refresh_mode: append
```
enabled: true
refresh_mode: append
```
78 changes: 47 additions & 31 deletions spiceaidocs/docs/machine-learning/inference/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: 'Machine Learning Inference'
sidebar_label: 'Machine Learning Inference'
title: 'Machine Learning Predictions'
sidebar_label: 'Machine Learning Predictions'
description: ''
sidebar_position: 2
pagination_prev: 'machine-learning/model-deployment/index'
Expand All @@ -10,17 +10,24 @@ pagination_next: null
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

The Spice ML runtime currently supports prediction via an API in the Spice runtime.
Spice includes dedicated predictions APIs.

## GET `/v1/models/:name/predict`

Make a prediction using a specific [deployed model](../model-deployment/index.md).

Example:

### GET `/v1/models/:name/predict`
```shell
curl "http://localhost:3000/v1/models/my_model_name/predict"
```
Where:
- `name`: References the name provided in the `spicepod.yaml`.

Parameters:

- `name`: References the model name defined in the `spicepod.yaml`.

### Response

#### Response
<Tabs>
<TabItem value="Success" label="Success" default>
```json
Expand Down Expand Up @@ -58,8 +65,12 @@ Where:
</TabItem>
</Tabs>

### POST `/v1/predict`
It's also possible to run multiple prediction models in parallel, useful for ensembling or A/B testing.
## POST `/v1/predict`

Make predictions using all loaded forecasting models in parallel, useful for ensembling or A/B testing.

Example:

```shell
curl --request POST \
--url http://localhost:3000/v1/predict \
Expand All @@ -74,34 +85,39 @@ curl --request POST \
]
}'
```
Where:
- Each `model_name` provided references a model `name` in the Spicepod.

####
Parameters:

- `model_name`: References a model name defined in the `spicepod.yaml`.

```json
{
"duration_ms": 81,
"predictions": [{
"status": "Success",
"model_name": "drive_stats_a",
"model_version": "1.0",
"lookback": 30,
"prediction": [0.45, 0.5, 0.55],
"duration_ms": 42
}, {
"status": "Success",
"model_name": "drive_stats_b",
"model_version": "1.0",
"lookback": 30,
"prediction": [0.43, 0.51, 0.53],
"duration_ms": 42
}]
"duration_ms": 81,
"predictions": [
{
"status": "Success",
"model_name": "drive_stats_a",
"model_version": "1.0",
"lookback": 30,
"prediction": [0.45, 0.5, 0.55],
"duration_ms": 42
},
{
"status": "Success",
"model_name": "drive_stats_b",
"model_version": "1.0",
"lookback": 30,
"prediction": [0.43, 0.51, 0.53],
"duration_ms": 42
}
]
}
```

:::warning[Limitations]
- Univariate predictions only
- Multiple covariates

- Univariate predictions only.
- Multiple covariates.
- Covariate and output variate must have a fixed time frequency.
- No support for discrete or exogenous variables.
:::
- :::
17 changes: 17 additions & 0 deletions spiceaidocs/docs/machine-learning/model-deployment/filesystem.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
title: 'Filesystem'
sidebar_label: 'Filesystem'
sidebar_position: 3
---

To use a model hosted on a filesystem, specify the file path in `from`.

Example:

```yaml
models:
- from: file://absolute/path/to/my/model.onnx
name: local_fs_model
datasets:
- taxi_trips
```
18 changes: 12 additions & 6 deletions spiceaidocs/docs/machine-learning/model-deployment/huggingface.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
---
title: "Huggingface"
sidebar_label: "Huggingface"
title: 'HuggingFace'
sidebar_label: 'HuggingFace'
sidebar_position: 1
---

To define a model component from HuggingFace, specify it in the `from` key.
To use a model hosted on HuggingFace, specify the `huggingface.co` path in the `from` key.

### Example

```yaml
models:
- from: huggingface:huggingface.co/spiceai/darts:latest
Expand All @@ -18,22 +19,27 @@ models:
```

### `from` Format

The `from` key follows the following regex format:

```regex
\A(huggingface:)(huggingface\.co\/)?(?<org>[\w\-]+)\/(?<model>[\w\-]+)(:(?<revision>[\w\d\-\.]+))?\z
```

#### Examples

- `huggingface:username/modelname`: Implies the latest version of `modelname` hosted by `username`.
- `huggingface:huggingface.co/username/modelname:revision`: Specifies a particular `revision` of `modelname` by `username`, including the optional domain.

#### Specification

1. **Prefix:** The value must start with `huggingface:`.
2. **Domain (Optional):** Optionally includes `huggingface.co/` immediately after the prefix. Currently no other Huggingface compatible services are supported.
2. **Domain (Optional):** Optionally includes `huggingface.co/` immediately after the prefix. Currently no other Huggingface compatible services are supported.
3. **Organization/User:** The HuggingFace organisation (`org`).
4. **Model Name:** After a `/`, the model name (`model`).
5. **Revision (Optional):** A colon (`:`) followed by the git-like revision identifier (`revision`).


:::warning[Limitations]

- ONNX format support only
:::
:::
30 changes: 16 additions & 14 deletions spiceaidocs/docs/machine-learning/model-deployment/index.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,30 @@
---
title: 'ML Model Deployment'
sidebar_label: 'ML Model Deployment'
title: 'Model Deployment'
sidebar_label: 'Model Deployment'
description: ''
sidebar_position: 1
pagination_next: 'machine-learning/inference/index'
---

Models can be loaded from a variety of sources:
- Local filesystem: Local ONNX files.
- HuggingFace: Models Hosted on HuggingFace.
- SpiceAI: Models trained on the Spice.AI Cloud Platform
Models can be loaded from:

A model component, within a Spicepod, has the following format.
- **Filesystem**: [ONNX](https://onnx.ai) models.
- **HuggingFace**: ONNX and GGUF models hosted on [HuggingFace](https://huggingface.co).
lukekim marked this conversation as resolved.
Show resolved Hide resolved
- **Spice Cloud Platform**: Models hosted on the [Spice Cloud Platform](https://docs.spice.ai)

Defined in the `spicepod.yml`, a `model` component has the following format.

| field | Description |
| ----------------- | ------------------------------------------------------------------- |
| `name` | Unique, readable name for the model within the Spicepod. |
| `from` | Source-specific address to uniquely identify a model |
| `datasets` | Datasets that the model depends on for inference |
| `files` (HF only) | Specify an individual file within the HuggingFace repository to use |

| ----------------- | ------------------------------------------------------------------- |
| `name` | Unique, readable name for the model within the Spicepod. |
| `from` | Source-specific address to uniquely identify a model |
| `datasets` | Datasets that the model depends on for inference |
| `files` (HF only) | Specify an individual file within the HuggingFace repository to use |

For more detail, refer to the `model` [reference specification](../../reference/spicepod/models.md).

## Model Source Docs

import DocCardList from '@theme/DocCardList';

<DocCardList />
<DocCardList />
16 changes: 0 additions & 16 deletions spiceaidocs/docs/machine-learning/model-deployment/local.md

This file was deleted.

29 changes: 18 additions & 11 deletions spiceaidocs/docs/machine-learning/model-deployment/spiceai.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
---
title: "SpiceAI"
sidebar_label: "SpiceAI"
title: 'Spice Cloud Platform'
sidebar_label: 'Spice Cloud Platform'
sidebar_position: 2
---

### Example
To run a model trained on the Spice.AI platform, specify it in the `from` key.
To use a model hosted on the Spice Cloud Platform, specify the `spice.ai` path in the `from` key.

Example:

```yaml
models:
- from: spice.ai/taxi_tech_co/taxi_drives/models/drive_stats
Expand All @@ -14,33 +16,38 @@ models:
- drive_stats_inferencing
```

This configuration allows for specifying models hosted by Spice AI, including their versions or specific training run IDs.
Specific versions can be used by refencing a version label or Training Run ID.

```yaml
models:
- from: spice.ai/taxi_tech_co/taxi_drives/models/drive_stats:latest # Git-like tagging
- from: spice.ai/taxi_tech_co/taxi_drives/models/drive_stats:latest # Label
name: drive_stats_a
datasets:
- drive_stats_inferencing

- from: spice.ai/taxi_tech_co/taxi_drives/models/drive_stats:60cb80a2-d59b-45c4-9b68-0946303bdcaf # Specific training run ID
- from: spice.ai/taxi_tech_co/taxi_drives/models/drive_stats:60cb80a2-d59b-45c4-9b68-0946303bdcaf # Training Run ID
name: drive_stats_b
datasets:
- drive_stats_inferencing
```

### `from` Format

The from key must conform to the following regex format:

```regex
\A(?:spice\.ai\/)?(?<org>[\w\-]+)\/(?<app>[\w\-]+)(?:\/models)?\/(?<model>[\w\-]+):(?<version>[\w\d\-\.]+)\z
```

#### Examples
Examples:

- `spice.ai/lukekim/smart/models/drive_stats:latest`: Refers to the latest version of the drive_stats model in the smart application by the user or organization lukekim.
- `spice.ai/lukekim/smart/drive_stats:60cb80a2-d59b-45c4-9b68-0946303bdcaf`: Specifies a model with a unique training run ID.

#### Specification
### Specification

1. **Prefix (Optional):** The value must start with `spice.ai/`.
1. **Organization/User:** The name of the organization or user (`org`) hosting the model.
1. **Application Name**: The name of the application (`app`) which the model belongs to.
4. **Model Name:** The name of the model (`model`).
5. **Version (Optional):** A colon (`:`) followed by the version identifier (`version`), which could be a semantic version, `latest` for the most recent version, or a specific training run ID.
1. **Model Name:** The name of the model (`model`).
1. **Version (Optional):** A colon (`:`) followed by the version identifier (`version`), which could be a semantic version, `latest` for the most recent version, or a specific training run ID.