Machine Learning section improvements (#263)

spiceai · Jun 3, 2024 · 45b0ff3 · 45b0ff3
1 parent 1137f08
commit 45b0ff3
Show file tree

Hide file tree

Showing 7 changed files with 118 additions and 83 deletions.
diff --git a/spiceaidocs/docs/machine-learning/index.md b/spiceaidocs/docs/machine-learning/index.md
@@ -8,11 +8,14 @@ pagination_prev: null
 
 :::warning[Early Preview]
 
-The Spice ML runtime is in its early preview phase and is subject to modifications.
+Machine Learning (ML) is in preview and is subject to modifications.
 
 :::
 
-Machine learning models can be added to the Spice runtime similarly to datasets. The Spice runtime will load it, just like a dataset. 
+ML models can be defined similarly to [Datasets](../reference/spicepod/datasets.md). The runtime will load the model for inference.
+
+Example:
+
 ```yaml
 name: my_spicepod
 version: v1beta1
@@ -33,6 +36,6 @@ datasets:
   - from: spice.ai/eth.recent_blocks
     name: eth_recent_blocks
     acceleration:
-        enabled: true
-        refresh_mode: append
-```
+      enabled: true
+      refresh_mode: append
+```
diff --git a/spiceaidocs/docs/machine-learning/inference/index.md b/spiceaidocs/docs/machine-learning/inference/index.md
@@ -1,6 +1,6 @@
 ---
-title: 'Machine Learning Inference'
-sidebar_label: 'Machine Learning Inference'
+title: 'Machine Learning Predictions'
+sidebar_label: 'Machine Learning Predictions'
 description: ''
 sidebar_position: 2
 pagination_prev: 'machine-learning/model-deployment/index'
@@ -10,17 +10,24 @@ pagination_next: null
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
 
-The Spice ML runtime currently supports prediction via an API in the Spice runtime. 
+Spice includes dedicated predictions APIs.
+
+## GET `/v1/models/:name/predict`
+
+Make a prediction using a specific [deployed model](../model-deployment/index.md).
+
+Example:
 
-### GET `/v1/models/:name/predict`
 ```shell
 curl "http://localhost:3000/v1/models/my_model_name/predict"
 ```
-Where: 
- - `name`: References the name provided in the `spicepod.yaml`.
 
+Parameters:
+
+- `name`: References the model name defined in the `spicepod.yaml`.
+
+### Response
 
-#### Response
 <Tabs>
   <TabItem value="Success" label="Success" default>
     ```json
@@ -58,8 +65,12 @@ Where:
   </TabItem>
 </Tabs>
 
-### POST `/v1/predict`
-It's also possible to run multiple prediction models in parallel, useful for ensembling or A/B testing. 
+## POST `/v1/predict`
+
+Make predictions using all loaded forecasting models in parallel, useful for ensembling or A/B testing.
+
+Example:
+
 ```shell
 curl --request POST \
   --url http://localhost:3000/v1/predict \
@@ -74,34 +85,39 @@ curl --request POST \
     ]
 }'
 ```
-Where:
-  - Each `model_name` provided references a model `name` in the Spicepod.
 
-#### 
+Parameters:
+
+- `model_name`: References a model name defined in the `spicepod.yaml`.
+
 ```json
 {
-    "duration_ms": 81,
-    "predictions": [{
-        "status": "Success",
-        "model_name": "drive_stats_a",
-        "model_version": "1.0",
-        "lookback": 30,
-        "prediction": [0.45, 0.5, 0.55],
-        "duration_ms": 42
-    }, {
-        "status": "Success",
-        "model_name": "drive_stats_b",
-        "model_version": "1.0",
-        "lookback": 30,
-        "prediction": [0.43, 0.51, 0.53],
-        "duration_ms": 42
-    }]
+  "duration_ms": 81,
+  "predictions": [
+    {
+      "status": "Success",
+      "model_name": "drive_stats_a",
+      "model_version": "1.0",
+      "lookback": 30,
+      "prediction": [0.45, 0.5, 0.55],
+      "duration_ms": 42
+    },
+    {
+      "status": "Success",
+      "model_name": "drive_stats_b",
+      "model_version": "1.0",
+      "lookback": 30,
+      "prediction": [0.43, 0.51, 0.53],
+      "duration_ms": 42
+    }
+  ]
 }
 ```
 
 :::warning[Limitations]
-- Univariate predictions only
-- Multiple covariates 
+
+- Univariate predictions only.
+- Multiple covariates.
 - Covariate and output variate must have a fixed time frequency.
 - No support for discrete or exogenous variables.
-:::
+- :::
diff --git a/spiceaidocs/docs/machine-learning/model-deployment/filesystem.md b/spiceaidocs/docs/machine-learning/model-deployment/filesystem.md
@@ -0,0 +1,17 @@
+---
+title: 'Filesystem'
+sidebar_label: 'Filesystem'
+sidebar_position: 3
+---
+
+To use a model hosted on a filesystem, specify the file path in `from`.
+
+Example:
+
+```yaml
+models:
+  - from: file://absolute/path/to/my/model.onnx
+    name: local_fs_model
+    datasets:
+      - taxi_trips
+```
diff --git a/spiceaidocs/docs/machine-learning/model-deployment/huggingface.md b/spiceaidocs/docs/machine-learning/model-deployment/huggingface.md
@@ -1,12 +1,13 @@
 ---
-title: "Huggingface"
-sidebar_label: "Huggingface"
+title: 'HuggingFace'
+sidebar_label: 'HuggingFace'
 sidebar_position: 1
 ---
 
-To define a model component from HuggingFace, specify it in the `from` key.
+To use a model hosted on HuggingFace, specify the `huggingface.co` path in the `from` key.
 
 ### Example
+
 ```yaml
 models:
   - from: huggingface:huggingface.co/spiceai/darts:latest
@@ -18,22 +19,27 @@ models:
 ```
 
 ### `from` Format
+
 The `from` key follows the following regex format:
+
 ```regex
 \A(huggingface:)(huggingface\.co\/)?(?<org>[\w\-]+)\/(?<model>[\w\-]+)(:(?<revision>[\w\d\-\.]+))?\z
 ```
+
 #### Examples
+
 - `huggingface:username/modelname`: Implies the latest version of `modelname` hosted by `username`.
 - `huggingface:huggingface.co/username/modelname:revision`: Specifies a particular `revision` of `modelname` by `username`, including the optional domain.
 
 #### Specification
+
 1. **Prefix:** The value must start with `huggingface:`.
-2. **Domain (Optional):** Optionally includes `huggingface.co/` immediately after the prefix. Currently no other Huggingface compatible services are supported. 
+2. **Domain (Optional):** Optionally includes `huggingface.co/` immediately after the prefix. Currently no other Huggingface compatible services are supported.
 3. **Organization/User:** The HuggingFace organisation (`org`).
 4. **Model Name:** After a `/`, the model name (`model`).
 5. **Revision (Optional):** A colon (`:`) followed by the git-like revision identifier (`revision`).
 
-
 :::warning[Limitations]
+
 - ONNX format support only
-:::
+  :::
diff --git a/spiceaidocs/docs/machine-learning/model-deployment/index.md b/spiceaidocs/docs/machine-learning/model-deployment/index.md
@@ -1,28 +1,30 @@
 ---
-title: 'ML Model Deployment'
-sidebar_label: 'ML Model Deployment'
+title: 'Model Deployment'
+sidebar_label: 'Model Deployment'
 description: ''
 sidebar_position: 1
 pagination_next: 'machine-learning/inference/index'
 ---
 
-Models can be loaded from a variety of sources: 
-- Local filesystem: Local ONNX files.
-- HuggingFace: Models Hosted on HuggingFace.
-- SpiceAI: Models trained on the Spice.AI Cloud Platform
+Models can be loaded from:
 
-A model component, within a Spicepod, has the following format. 
+- **Filesystem**: [ONNX](https://onnx.ai) models.
+- **HuggingFace**: ONNX and [GGUF](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) models hosted on [HuggingFace](https://huggingface.co).
+- **Spice Cloud Platform**: Models hosted on the [Spice Cloud Platform](https://docs.spice.ai)
 
+Defined in the `spicepod.yml`, a `model` component has the following format.
 
 | field             | Description                                                         |
-| ----------------- | ------------------------------------------------------------------- | 
-| `name`            | Unique, readable name for the model within the Spicepod.            | 
-| `from`            | Source-specific address to uniquely identify a model              | 
-| `datasets`        | Datasets that the model depends on for inference                    | 
-| `files` (HF only) | Specify an individual file within the HuggingFace repository to use | 
-
+| ----------------- | ------------------------------------------------------------------- |
+| `name`            | Unique, readable name for the model within the Spicepod.            |
+| `from`            | Source-specific address to uniquely identify a model                |
+| `datasets`        | Datasets that the model depends on for inference                    |
+| `files` (HF only) | Specify an individual file within the HuggingFace repository to use |
+
+For more detail, refer to the `model` [reference specification](../../reference/spicepod/models.md).
+
 ## Model Source Docs
 
 import DocCardList from '@theme/DocCardList';
 
-<DocCardList />
+<DocCardList />
diff --git a/spiceaidocs/docs/machine-learning/model-deployment/local.md b/spiceaidocs/docs/machine-learning/model-deployment/local.md
diff --git a/spiceaidocs/docs/machine-learning/model-deployment/spiceai.md b/spiceaidocs/docs/machine-learning/model-deployment/spiceai.md
@@ -1,11 +1,13 @@
 ---
-title: "SpiceAI"
-sidebar_label: "SpiceAI"
+title: 'Spice Cloud Platform'
+sidebar_label: 'Spice Cloud Platform'
 sidebar_position: 2
 ---
 
-### Example
-To run a model trained on the Spice.AI platform, specify it in the `from` key.
+To use a model hosted on the Spice Cloud Platform, specify the `spice.ai` path in the `from` key.
+
+Example:
+
 ```yaml
 models:
   - from: spice.ai/taxi_tech_co/taxi_drives/models/drive_stats
@@ -14,33 +16,38 @@ models:
       - drive_stats_inferencing
 ```
 
-This configuration allows for specifying models hosted by Spice AI, including their versions or specific training run IDs.
+Specific versions can be used by refencing a version label or Training Run ID.
+
 ```yaml
 models:
-  - from: spice.ai/taxi_tech_co/taxi_drives/models/drive_stats:latest # Git-like tagging
+  - from: spice.ai/taxi_tech_co/taxi_drives/models/drive_stats:latest # Label
     name: drive_stats_a
     datasets:
       - drive_stats_inferencing
 
-  - from: spice.ai/taxi_tech_co/taxi_drives/models/drive_stats:60cb80a2-d59b-45c4-9b68-0946303bdcaf # Specific training run ID
+  - from: spice.ai/taxi_tech_co/taxi_drives/models/drive_stats:60cb80a2-d59b-45c4-9b68-0946303bdcaf # Training Run ID
     name: drive_stats_b
     datasets:
       - drive_stats_inferencing
 ```
 
 ### `from` Format
+
 The from key must conform to the following regex format:
+
 ```regex
 \A(?:spice\.ai\/)?(?<org>[\w\-]+)\/(?<app>[\w\-]+)(?:\/models)?\/(?<model>[\w\-]+):(?<version>[\w\d\-\.]+)\z
 ```
 
-#### Examples
+Examples:
+
 - `spice.ai/lukekim/smart/models/drive_stats:latest`: Refers to the latest version of the drive_stats model in the smart application by the user or organization lukekim.
 - `spice.ai/lukekim/smart/drive_stats:60cb80a2-d59b-45c4-9b68-0946303bdcaf`: Specifies a model with a unique training run ID.
 
-#### Specification
+### Specification
+
 1. **Prefix (Optional):** The value must start with `spice.ai/`.
 1. **Organization/User:** The name of the organization or user (`org`) hosting the model.
 1. **Application Name**: The name of the application (`app`) which the model belongs to.
-4. **Model Name:** The name of the model (`model`).
-5. **Version (Optional):** A colon (`:`) followed by the version identifier (`version`), which could be a semantic version, `latest` for the most recent version, or a specific training run ID.
+1. **Model Name:** The name of the model (`model`).
+1. **Version (Optional):** A colon (`:`) followed by the version identifier (`version`), which could be a semantic version, `latest` for the most recent version, or a specific training run ID.