From 94b2581ee523ae49ee4e0dea0b1358ceccfadce2 Mon Sep 17 00:00:00 2001 From: Advayp <69655599+Advayp@users.noreply.github.com> Date: Thu, 9 Jan 2025 14:07:47 -0800 Subject: [PATCH 1/9] Improve `spice dataset` documentation --- spiceaidocs/docs/cli/reference/dataset.md | 40 ++++++++++++++++------- 1 file changed, 28 insertions(+), 12 deletions(-) diff --git a/spiceaidocs/docs/cli/reference/dataset.md b/spiceaidocs/docs/cli/reference/dataset.md index 25d7780bc..5c6587147 100644 --- a/spiceaidocs/docs/cli/reference/dataset.md +++ b/spiceaidocs/docs/cli/reference/dataset.md @@ -1,22 +1,38 @@ --- -title: "dataset" -sidebar_label: "dataset" -pagination_prev: null -pagination_next: null ---- - -Dataset operations + title: 'dataset' + sidebar_label: 'dataset' + pagination_prev: null + pagination_next: null + --- +Perform operations relating to Spice datasets. ### Usage - ```shell spice dataset [command] ``` -Available `command`s: + Available `command`s: + + - `configure`: Create/configure a dataset directly from the command-line, including customizing components such as Data Connector (`from` in a Spicepod) and acceleration along with other metadata. + + #### Flags + + - `-h`, `--help` Print this help message + + ### Sample Output -- `configure`: Configure a dataset + #### Output from Configure -#### Flags + ```bash +> spice dataset configure -- `-h`, `--help` Print this help message \ No newline at end of file + 2024/12/18 01:06:32 INFO dataset name: sample-project + taxi_trips # Input 1: Name of dataset + 2024/12/18 01:06:59 WARN Dataset names with hyphens should be quoted in queries: + i.e. SELECT * FROM "remote-source" + description: Taxi trips in s3 # Input 2: Description + from: s3://spiceai-demo-datasets/taxi_trips/2024/ # Input 3: Source + 2024/12/18 01:07:25 INFO locally accelerate (y/n)? (y) + n # Input 4: Acceleration + 2024/12/18 01:07:32 INFO Saved datasets/remote-source/dataset.yaml + ``` \ No newline at end of file From 75dff3f359f495b2d6753200001813492666ed07 Mon Sep 17 00:00:00 2001 From: Advayp <69655599+Advayp@users.noreply.github.com> Date: Thu, 9 Jan 2025 15:55:19 -0800 Subject: [PATCH 2/9] Add detailed explanation of output and results of `spice dataset configure` --- spiceaidocs/docs/cli/reference/dataset.md | 58 +++++++++++++++++++---- 1 file changed, 48 insertions(+), 10 deletions(-) diff --git a/spiceaidocs/docs/cli/reference/dataset.md b/spiceaidocs/docs/cli/reference/dataset.md index 5c6587147..695f6cc9d 100644 --- a/spiceaidocs/docs/cli/reference/dataset.md +++ b/spiceaidocs/docs/cli/reference/dataset.md @@ -4,7 +4,7 @@ pagination_prev: null pagination_next: null --- -Perform operations relating to Spice datasets. +Configure a Spice dataset. ### Usage ```shell @@ -15,24 +15,62 @@ spice dataset [command] - `configure`: Create/configure a dataset directly from the command-line, including customizing components such as Data Connector (`from` in a Spicepod) and acceleration along with other metadata. + **Note**: In order to run `spice dataset configure`, there *must* be a `spicepod.yaml` file in the root of your project directory. To create this file, see [`spice init`](/cli/reference/init). + #### Flags - `-h`, `--help` Print this help message - ### Sample Output + ### Examples - #### Output from Configure + When running `spice dataset configure`, Spice will prompt for four inputs: + 1. The name of the dataset, labelled by `(1)` below. + 2. The description of the dataset, labelled by `(2)` below. + 3. The source of the dataset, labelled by `(3)` below. Consult [Spice's supported data connectors](/components/data-connectors) to see possible values for this field. + 4. Whether or not to enable acceleration for this dataset, labelled by `(4)`. The default value for this input is `y`, enabling acceleration for this dataset. Learn more about acceleration in the [dataset acceleration reference](/components/data-accelerators). - ```bash + ```shell > spice dataset configure 2024/12/18 01:06:32 INFO dataset name: sample-project - taxi_trips # Input 1: Name of dataset + taxi-trips # (1) 2024/12/18 01:06:59 WARN Dataset names with hyphens should be quoted in queries: i.e. SELECT * FROM "remote-source" - description: Taxi trips in s3 # Input 2: Description - from: s3://spiceai-demo-datasets/taxi_trips/2024/ # Input 3: Source - 2024/12/18 01:07:25 INFO locally accelerate (y/n)? (y) - n # Input 4: Acceleration + description: Taxi trips in s3 # (2) + from: s3://spiceai-demo-datasets/taxi_trips/2024/ # (3) + 2024/12/18 01:075 INFO locally accelerate (y/n)? (y) + n # (4) 2024/12/18 01:07:32 INFO Saved datasets/remote-source/dataset.yaml - ``` \ No newline at end of file + ``` + +After execution, the directory structure looks like this for the above example: + ``` + ├── datasets + │ ├── taxi-trips + │ ├── dataset.yaml + ├── spicepod.yaml + └── ... + ``` + + The datasets folder includes the datasets for your project configured by using `spice dataset configure` or added manually. + +The `dataset.yaml` file in `./datasets/taxi-trips` is configured as defined by the inputs provided to `spice dataset configure`. For this example, the `datatset.yaml` file looks as follows: + +```yaml +from: s3://spiceai-demo-datasets/taxi_trips/2024/ +name: taxi-trips +description: Taxi trips in s3 +acceleration: + enabled: false +``` + +The command additionally updates the root `spicepod.yaml` file to include the configured dataset as a reference (`ref`). For this example, `spicepod.yaml` would include the following: +```yaml +version: v1 +kind: Spicepod +name: Taxi Trips with Spice +datasets: + - ref: datasets/taxi-trips +``` + +To learn more about Spice datasets and Spicepods, visit the [Spice dataset reference](/reference/spicepod/datasets) and [Spicepod reference](/reference/spicepod). \ No newline at end of file From 8029c382f9bf8d0cb71dfd62bd408d0f71329b90 Mon Sep 17 00:00:00 2001 From: Advayp <69655599+Advayp@users.noreply.github.com> Date: Thu, 9 Jan 2025 15:56:28 -0800 Subject: [PATCH 3/9] Change tense to reflect number of examples --- spiceaidocs/docs/cli/reference/dataset.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spiceaidocs/docs/cli/reference/dataset.md b/spiceaidocs/docs/cli/reference/dataset.md index 695f6cc9d..a6500c35e 100644 --- a/spiceaidocs/docs/cli/reference/dataset.md +++ b/spiceaidocs/docs/cli/reference/dataset.md @@ -21,7 +21,7 @@ spice dataset [command] - `-h`, `--help` Print this help message - ### Examples + ### Example When running `spice dataset configure`, Spice will prompt for four inputs: 1. The name of the dataset, labelled by `(1)` below. From e0524492c76b218b6a4388e0e7ead676eb6804ac Mon Sep 17 00:00:00 2001 From: Advayp <69655599+Advayp@users.noreply.github.com> Date: Thu, 9 Jan 2025 16:11:40 -0800 Subject: [PATCH 4/9] Update spiceaidocs/docs/cli/reference/dataset.md Co-authored-by: Jack Eadie --- spiceaidocs/docs/cli/reference/dataset.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spiceaidocs/docs/cli/reference/dataset.md b/spiceaidocs/docs/cli/reference/dataset.md index a6500c35e..694b42006 100644 --- a/spiceaidocs/docs/cli/reference/dataset.md +++ b/spiceaidocs/docs/cli/reference/dataset.md @@ -13,7 +13,7 @@ spice dataset [command] Available `command`s: - - `configure`: Create/configure a dataset directly from the command-line, including customizing components such as Data Connector (`from` in a Spicepod) and acceleration along with other metadata. + - `configure`: Create/configure a dataset directly from the command-line, including customizing components such as whether to add acceleration to the connector. **Note**: In order to run `spice dataset configure`, there *must* be a `spicepod.yaml` file in the root of your project directory. To create this file, see [`spice init`](/cli/reference/init). From 733c3b361a149b436d48347c5281f65d674f36db Mon Sep 17 00:00:00 2001 From: Advayp <69655599+Advayp@users.noreply.github.com> Date: Thu, 9 Jan 2025 16:13:20 -0800 Subject: [PATCH 5/9] Update previously incorrect styling --- spiceaidocs/docs/cli/reference/dataset.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spiceaidocs/docs/cli/reference/dataset.md b/spiceaidocs/docs/cli/reference/dataset.md index 694b42006..3f90f165a 100644 --- a/spiceaidocs/docs/cli/reference/dataset.md +++ b/spiceaidocs/docs/cli/reference/dataset.md @@ -1,6 +1,6 @@ --- - title: 'dataset' - sidebar_label: 'dataset' + title: "dataset" + sidebar_label: "dataset" pagination_prev: null pagination_next: null --- From 122eadb5525e841896f3dbe2d52604b8421aca61 Mon Sep 17 00:00:00 2001 From: Advayp <69655599+Advayp@users.noreply.github.com> Date: Fri, 10 Jan 2025 17:39:18 -0800 Subject: [PATCH 6/9] Address feedback --- spiceaidocs/docs/cli/reference/dataset.md | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/spiceaidocs/docs/cli/reference/dataset.md b/spiceaidocs/docs/cli/reference/dataset.md index 3f90f165a..8a910724b 100644 --- a/spiceaidocs/docs/cli/reference/dataset.md +++ b/spiceaidocs/docs/cli/reference/dataset.md @@ -26,21 +26,18 @@ spice dataset [command] When running `spice dataset configure`, Spice will prompt for four inputs: 1. The name of the dataset, labelled by `(1)` below. 2. The description of the dataset, labelled by `(2)` below. - 3. The source of the dataset, labelled by `(3)` below. Consult [Spice's supported data connectors](/components/data-connectors) to see possible values for this field. + 3. The source of the dataset, labelled by `(3)` below. Consult [Spice's supported data connectors](/components/data-connectors) to see possible values for this field. Note: Spice may prompt for a file format if necessary, as shown in the example below. 4. Whether or not to enable acceleration for this dataset, labelled by `(4)`. The default value for this input is `y`, enabling acceleration for this dataset. Learn more about acceleration in the [dataset acceleration reference](/components/data-accelerators). ```shell > spice dataset configure - 2024/12/18 01:06:32 INFO dataset name: sample-project - taxi-trips # (1) - 2024/12/18 01:06:59 WARN Dataset names with hyphens should be quoted in queries: - i.e. SELECT * FROM "remote-source" - description: Taxi trips in s3 # (2) - from: s3://spiceai-demo-datasets/taxi_trips/2024/ # (3) - 2024/12/18 01:075 INFO locally accelerate (y/n)? (y) - n # (4) - 2024/12/18 01:07:32 INFO Saved datasets/remote-source/dataset.yaml +dataset name: (spiceai) taxi-trips # (1) +description: Taxi Trips in S3 # (2) +from: s3://spiceai-demo-datasets/taxi_trips/2024/ # (3) +file_format (parquet/csv) (parquet) parquet +locally accelerate (y/n)? (y) y # (4) +2025/01/10 14:07:46 INFO Saved datasets/test/dataset.yaml ``` After execution, the directory structure looks like this for the above example: From 381d1380913632fc4503324326f54624b5eb49f3 Mon Sep 17 00:00:00 2001 From: Advay Patil Date: Fri, 10 Jan 2025 22:32:09 -0800 Subject: [PATCH 7/9] Fix Vercel build issue --- spiceaidocs/docs/cli/reference/dataset.md | 65 +++++++++++++---------- 1 file changed, 36 insertions(+), 29 deletions(-) diff --git a/spiceaidocs/docs/cli/reference/dataset.md b/spiceaidocs/docs/cli/reference/dataset.md index 8a910724b..21911baa6 100644 --- a/spiceaidocs/docs/cli/reference/dataset.md +++ b/spiceaidocs/docs/cli/reference/dataset.md @@ -1,35 +1,40 @@ --- - title: "dataset" - sidebar_label: "dataset" - pagination_prev: null - pagination_next: null - --- + +title: "dataset" +sidebar_label: "dataset" +pagination_prev: null +pagination_next: null + +--- + Configure a Spice dataset. ### Usage + ```shell spice dataset [command] ``` - Available `command`s: +Available `command`s: + +- `configure`: Create/configure a dataset directly from the command-line, including customizing components such as whether to add acceleration to the connector. - - `configure`: Create/configure a dataset directly from the command-line, including customizing components such as whether to add acceleration to the connector. +**Note**: In order to run `spice dataset configure`, there _must_ be a `spicepod.yaml` file in the root of your project directory. To create this file, see [`spice init`](/cli/reference/init). - **Note**: In order to run `spice dataset configure`, there *must* be a `spicepod.yaml` file in the root of your project directory. To create this file, see [`spice init`](/cli/reference/init). +#### Flags - #### Flags +- `-h`, `--help` Print this help message - - `-h`, `--help` Print this help message +### Example - ### Example +When running `spice dataset configure`, Spice will prompt for four inputs: - When running `spice dataset configure`, Spice will prompt for four inputs: - 1. The name of the dataset, labelled by `(1)` below. - 2. The description of the dataset, labelled by `(2)` below. - 3. The source of the dataset, labelled by `(3)` below. Consult [Spice's supported data connectors](/components/data-connectors) to see possible values for this field. Note: Spice may prompt for a file format if necessary, as shown in the example below. - 4. Whether or not to enable acceleration for this dataset, labelled by `(4)`. The default value for this input is `y`, enabling acceleration for this dataset. Learn more about acceleration in the [dataset acceleration reference](/components/data-accelerators). +1. The name of the dataset, labelled by `(1)` below. +2. The description of the dataset, labelled by `(2)` below. +3. The source of the dataset, labelled by `(3)` below. Consult [Spice's supported data connectors](/components/data-connectors) to see possible values for this field. Note: Spice may prompt for a file format if necessary, as shown in the example below. +4. Whether or not to enable acceleration for this dataset, labelled by `(4)`. The default value for this input is `y`, enabling acceleration for this dataset. Learn more about acceleration in the [dataset acceleration reference](/components/data-accelerators). - ```shell +```shell > spice dataset configure dataset name: (spiceai) taxi-trips # (1) @@ -38,18 +43,19 @@ from: s3://spiceai-demo-datasets/taxi_trips/2024/ # (3) file_format (parquet/csv) (parquet) parquet locally accelerate (y/n)? (y) y # (4) 2025/01/10 14:07:46 INFO Saved datasets/test/dataset.yaml - ``` +``` After execution, the directory structure looks like this for the above example: - ``` - ├── datasets - │ ├── taxi-trips - │ ├── dataset.yaml - ├── spicepod.yaml - └── ... - ``` - The datasets folder includes the datasets for your project configured by using `spice dataset configure` or added manually. +``` +├── datasets +│ ├── taxi-trips +│ ├── dataset.yaml +├── spicepod.yaml +└── ... +``` + +The datasets folder includes the datasets for your project configured by using `spice dataset configure` or added manually. The `dataset.yaml` file in `./datasets/taxi-trips` is configured as defined by the inputs provided to `spice dataset configure`. For this example, the `datatset.yaml` file looks as follows: @@ -58,16 +64,17 @@ from: s3://spiceai-demo-datasets/taxi_trips/2024/ name: taxi-trips description: Taxi trips in s3 acceleration: - enabled: false + - enabled: false ``` The command additionally updates the root `spicepod.yaml` file to include the configured dataset as a reference (`ref`). For this example, `spicepod.yaml` would include the following: + ```yaml version: v1 kind: Spicepod name: Taxi Trips with Spice datasets: - - ref: datasets/taxi-trips + - ref: datasets/taxi-trips ``` -To learn more about Spice datasets and Spicepods, visit the [Spice dataset reference](/reference/spicepod/datasets) and [Spicepod reference](/reference/spicepod). \ No newline at end of file +To learn more about Spice datasets and Spicepods, visit the [Spice dataset reference](/reference/spicepod/datasets) and [Spicepod reference](/reference/spicepod). From c4e97f7c3858742c31c7cb017497bb93f6c46a64 Mon Sep 17 00:00:00 2001 From: Advayp <69655599+Advayp@users.noreply.github.com> Date: Mon, 13 Jan 2025 16:56:14 -0800 Subject: [PATCH 8/9] Fix typo Co-authored-by: Jack Eadie --- website/docs/cli/reference/dataset.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/cli/reference/dataset.md b/website/docs/cli/reference/dataset.md index 21911baa6..bbe5a1ed8 100644 --- a/website/docs/cli/reference/dataset.md +++ b/website/docs/cli/reference/dataset.md @@ -57,7 +57,7 @@ After execution, the directory structure looks like this for the above example: The datasets folder includes the datasets for your project configured by using `spice dataset configure` or added manually. -The `dataset.yaml` file in `./datasets/taxi-trips` is configured as defined by the inputs provided to `spice dataset configure`. For this example, the `datatset.yaml` file looks as follows: +The `dataset.yaml` file in `./datasets/taxi-trips` is configured as defined by the inputs provided to `spice dataset configure`. For this example, the `dataset.yaml` file looks as follows: ```yaml from: s3://spiceai-demo-datasets/taxi_trips/2024/ From 20652b8bf2ea59cb584d3a28d0239f4611e54262 Mon Sep 17 00:00:00 2001 From: Advay Patil Date: Mon, 13 Jan 2025 17:05:04 -0800 Subject: [PATCH 9/9] Fix build issue --- website/docs/cli/reference/dataset.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/website/docs/cli/reference/dataset.md b/website/docs/cli/reference/dataset.md index bbe5a1ed8..2e5022993 100644 --- a/website/docs/cli/reference/dataset.md +++ b/website/docs/cli/reference/dataset.md @@ -19,7 +19,7 @@ Available `command`s: - `configure`: Create/configure a dataset directly from the command-line, including customizing components such as whether to add acceleration to the connector. -**Note**: In order to run `spice dataset configure`, there _must_ be a `spicepod.yaml` file in the root of your project directory. To create this file, see [`spice init`](/cli/reference/init). +**Note**: In order to run `spice dataset configure`, there _must_ be a `spicepod.yaml` file in the root of your project directory. To create this file, see [`spice init`](/docs/cli/reference/init). #### Flags @@ -31,8 +31,8 @@ When running `spice dataset configure`, Spice will prompt for four inputs: 1. The name of the dataset, labelled by `(1)` below. 2. The description of the dataset, labelled by `(2)` below. -3. The source of the dataset, labelled by `(3)` below. Consult [Spice's supported data connectors](/components/data-connectors) to see possible values for this field. Note: Spice may prompt for a file format if necessary, as shown in the example below. -4. Whether or not to enable acceleration for this dataset, labelled by `(4)`. The default value for this input is `y`, enabling acceleration for this dataset. Learn more about acceleration in the [dataset acceleration reference](/components/data-accelerators). +3. The source of the dataset, labelled by `(3)` below. Consult [Spice's supported data connectors](/docs/components/data-connectors) to see possible values for this field. Note: Spice may prompt for a file format if necessary, as shown in the example below. +4. Whether or not to enable acceleration for this dataset, labelled by `(4)`. The default value for this input is `y`, enabling acceleration for this dataset. Learn more about acceleration in the [dataset acceleration reference](/docs/components/data-accelerators). ```shell > spice dataset configure @@ -77,4 +77,4 @@ datasets: - ref: datasets/taxi-trips ``` -To learn more about Spice datasets and Spicepods, visit the [Spice dataset reference](/reference/spicepod/datasets) and [Spicepod reference](/reference/spicepod). +To learn more about Spice datasets and Spicepods, visit the [Spice dataset reference](/docs/reference/spicepod/datasets) and [Spicepod reference](/docs/reference/spicepod).