From a9451e6d07ae90428e879db8951edde4cb7d42f2 Mon Sep 17 00:00:00 2001 From: Luke Kim <80174+lukekim@users.noreply.github.com> Date: Wed, 27 Nov 2024 18:23:04 -0800 Subject: [PATCH 1/8] Documentation improvements for RC.1 --- .github/copilot-instructions.md | 5 ++ .../docs/components/embeddings/huggingface.md | 10 ++-- .../docs/components/embeddings/index.md | 11 +++-- .../docs/components/embeddings/local.md | 13 ++++-- .../docs/components/embeddings/openai.md | 22 ++++----- .../docs/components/models/filesystem.md | 13 ++++-- spiceaidocs/docs/components/models/openai.md | 9 ++-- spiceaidocs/docs/components/models/spiceai.md | 2 +- spiceaidocs/docs/components/views/index.md | 8 ++-- spiceaidocs/docs/features/cdc/index.md | 14 +++--- .../features/data-acceleration/constraints.md | 36 +++++++-------- .../docs/features/data-acceleration/index.md | 12 ++--- .../features/data-acceleration/indexes.md | 2 +- .../docs/features/data-ingestion/index.md | 18 ++++---- .../docs/features/federated-queries/index.md | 8 ++-- .../features/large-language-models/index.md | 4 +- .../features/large-language-models/memory.md | 11 +++-- .../parameter_overrides.md | 11 +++-- .../large-language-models/runtime_tools.md | 30 +++++++----- .../ml-model-serving/index.md | 2 +- spiceaidocs/docs/features/search/index.md | 2 +- .../docs/features/semantic-model/index.md | 8 ++-- spiceaidocs/docs/index.md | 12 ++--- .../docs/intelligent-applications/index.md | 46 +++++++++++++++++-- .../docs/reference/spicepod/catalogs.md | 4 +- .../docs/reference/spicepod/embeddings.md | 10 ++-- spiceaidocs/docs/reference/spicepod/index.md | 6 +-- spiceaidocs/docs/reference/spicepod/models.md | 22 +++++---- spiceaidocs/docs/use-cases/data-mesh/index.md | 8 ++-- .../docs/use-cases/database-cdn/index.md | 10 ++-- .../docs/use-cases/enterprise-search/index.md | 6 +-- spiceaidocs/docs/use-cases/rag/index.md | 4 +- 32 files changed, 218 insertions(+), 161 deletions(-) diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index cf5fdae3..7319ef74 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -6,13 +6,18 @@ Remember to be concise, but do not omit useful information. Pay attention to det Use plain, clear, simple, easy-to-understand language. Do not use hyperbole or hype. +Avoid "allows" to describe functionality. + Always provide references and citations with links. Adhere to the instructions in CONTRIBUTING.md. Never use the words: +- delve - seamlessly - empower / empowering - supercharge - countless +- enhance / enhancing +- allow / allowing diff --git a/spiceaidocs/docs/components/embeddings/huggingface.md b/spiceaidocs/docs/components/embeddings/huggingface.md index 086c772d..0955fdb1 100644 --- a/spiceaidocs/docs/components/embeddings/huggingface.md +++ b/spiceaidocs/docs/components/embeddings/huggingface.md @@ -4,7 +4,10 @@ sidebar_label: 'HuggingFace' sidebar_position: 2 --- -To run an embedding model from HuggingFace, specify the `huggingface` path in `from`. This will handle downloading and running the embedding model locally. +To use an embedding model from HuggingFace with Spice, specify the `huggingface` path in the `from` field of your configuration. The model and its related files will be automatically downloaded, loaded, and served locally by Spice. + +Here is an example configuration in `spicepod.yaml`: + ```yaml embeddings: - from: huggingface:huggingface.co/sentence-transformers/all-MiniLM-L6-v2 @@ -12,11 +15,12 @@ embeddings: ``` Supported models include: - - All models tagged as [text-embeddings-inference](https://huggingface.co/models?other=text-embeddings-inference) on Huggingface - - Any Huggingface repository with the correct files to be loaded as a [local embedding model](/components/embeddings/local.md). +- All models tagged as [text-embeddings-inference](https://huggingface.co/models?other=text-embeddings-inference) on Huggingface +- Any Huggingface repository with the correct files to be loaded as a [local embedding model](/components/embeddings/local.md). With the same semantics as [language models](/components/models/huggingface#access-tokens), `spice` can run private HuggingFace embedding models: + ```yaml embeddings: - from: huggingface:huggingface.co/secret-company/awesome-embedding-model diff --git a/spiceaidocs/docs/components/embeddings/index.md b/spiceaidocs/docs/components/embeddings/index.md index 092c7712..c2dc21b3 100644 --- a/spiceaidocs/docs/components/embeddings/index.md +++ b/spiceaidocs/docs/components/embeddings/index.md @@ -7,12 +7,12 @@ pagination_prev: null pagination_next: null --- -Embedding models are used to convert raw text into a numerical representation that can be used by machine learning models. - -Spice supports running embedding models locally, or use remote services such as OpenAI, or [la Plateforme](https://console.mistral.ai/). +Embedding models convert raw text into numerical representations that can be used by machine learning models. Spice supports running embedding models locally or using remote services such as OpenAI or [la Plateforme](https://console.mistral.ai/). Embedding models are defined in the `spicepod.yaml` file as top-level components. +Example configuration in `spicepod.yaml`: + ```yaml embeddings: - from: huggingface:huggingface.co/sentence-transformers/all-MiniLM-L6-v2 @@ -31,5 +31,6 @@ embeddings: ``` Embedding models can be used either by: - - An OpenAI-compatible [endpoint](/api/http/embeddings.md) - - By augmenting a dataset with column-level [embeddings](/reference/spicepod/datasets.md#embeddings), to provide vector-based [search functionality](/features/search/index.md#vector-search). + +- An OpenAI-compatible [endpoint](/api/http/embeddings.md) +- By augmenting a dataset with column-level [embeddings](/reference/spicepod/datasets.md#embeddings), to provide vector-based [search functionality](/features/search/index.md#vector-search). diff --git a/spiceaidocs/docs/components/embeddings/local.md b/spiceaidocs/docs/components/embeddings/local.md index 910e9277..56518275 100644 --- a/spiceaidocs/docs/components/embeddings/local.md +++ b/spiceaidocs/docs/components/embeddings/local.md @@ -4,7 +4,11 @@ sidebar_label: 'Local' sidebar_position: 3 --- -Embedding models can be run with files stored locally. +Embedding models can be run with files stored locally. This method is useful for using models that are not hosted on remote services. + +### Example Configuration + +To configure an embedding model using local files, you can specify the details in the `spicepod.yaml` file as shown below: ```yaml embeddings: @@ -16,6 +20,7 @@ embeddings: ``` ## Required Files - - Model file, one of: `model.safetensors`, `pytorch_model.bin`. - - A tokenizer file with the filename `tokenizer.json`. - - A config file with the filename `config.json`. + +- Model file, one of: `model.safetensors`, `pytorch_model.bin`. +- A tokenizer file with the filename `tokenizer.json`. +- A config file with the filename `config.json`. diff --git a/spiceaidocs/docs/components/embeddings/openai.md b/spiceaidocs/docs/components/embeddings/openai.md index 6d25d530..6a4fd5d6 100644 --- a/spiceaidocs/docs/components/embeddings/openai.md +++ b/spiceaidocs/docs/components/embeddings/openai.md @@ -4,20 +4,18 @@ sidebar_label: 'OpenAI' sidebar_position: 1 --- -To use a hosted OpenAI (or compatible) embedding model, specify the `openai` path in `from`. +To use a hosted OpenAI (or compatible) embedding model, specify the `openai` path in the `from` field of your configuration. If you want to use a specific model, include its model ID in the `from` field. If no model ID is specified, it defaults to `"text-embedding-3-small"`. -For a specific model, include it as the model ID in `from` (see example below). Defaults to `"text-embedding-3-small"`. -These parameters are specific to OpenAI models: +The following parameters are specific to OpenAI models: -| Parameter | Description | Default | -| ----- | ----------- | ------- | -| `openai_api_key` | The OpenAI API key. | - | -| `openai_org_id` | The OpenAI organization id. | - | -| `openai_project_id` | The OpenAI project id. | - | -| `endpoint` | The OpenAI API base endpoint. | `https://api.openai.com/v1` | +| Parameter | Description | Default | +| ------------------- | ------------------------------------- | --------------------------- | +| `openai_api_key` | The API key for accessing OpenAI. | - | +| `openai_org_id` | The organization ID for OpenAI. | - | +| `openai_project_id` | The project ID for OpenAI. | - | +| `endpoint` | The base endpoint for the OpenAI API. | `https://api.openai.com/v1` | - -Example: +Below is an example configuration in `spicepod.yaml`: ```yaml models: @@ -31,4 +29,4 @@ models: params: endpoint: https://api.mistral.ai/v1 api_key: ${ secrets:SPICE_MISTRAL_API_KEY } -``` \ No newline at end of file +``` diff --git a/spiceaidocs/docs/components/models/filesystem.md b/spiceaidocs/docs/components/models/filesystem.md index c0915893..11d155aa 100644 --- a/spiceaidocs/docs/components/models/filesystem.md +++ b/spiceaidocs/docs/components/models/filesystem.md @@ -5,7 +5,7 @@ sidebar_label: 'Filesystem' sidebar_position: 3 --- -To use a model hosted on a filesystem, specify the path to the model file in `from`. +To use a model hosted on a filesystem, specify the path to the model file in the `from` field. Supported formats include ONNX for traditional machine learning models and GGUF, GGML, and SafeTensor for large language models (LLMs). @@ -50,15 +50,17 @@ models: ``` ### Example: Loading from a directory + ```yaml models: - name: hello from: file:models/llms/llama3.2-1b-instruct/ ``` -Note: The folder provided should contain all the expected files (see examples above) to load a model in the base level. +Note: The folder provided should contain all the expected files (see examples above) to load a model in the base level. ### Example: Overriding the Chat Template + Chat templates convert the OpenAI compatible chat messages (see [format](https://platform.openai.com/docs/api-reference/chat/create#chat-create-messages)) and other components of a request into a stream of characters for the language model. It follows Jinja3 templating [syntax](https://jinja.palletsprojects.com/en/3.1.x/templates/). @@ -81,6 +83,7 @@ models: ``` #### Templating Variables - - `messages`: List of chat messages, in the OpenAI [format](https://platform.openai.com/docs/api-reference/chat/create#chat-create-messages). - - `add_generation_prompt`: Boolean flag whether to add a [generation prompt](https://huggingface.co/docs/transformers/main/chat_templating#what-are-generation-prompts). - - `tools`: List of callable tools, in the OpenAI [format](https://platform.openai.com/docs/api-reference/chat/create#chat-create-tools). + +- `messages`: List of chat messages, in the OpenAI [format](https://platform.openai.com/docs/api-reference/chat/create#chat-create-messages). +- `add_generation_prompt`: Boolean flag whether to add a [generation prompt](https://huggingface.co/docs/transformers/main/chat_templating#what-are-generation-prompts). +- `tools`: List of callable tools, in the OpenAI [format](https://platform.openai.com/docs/api-reference/chat/create#chat-create-tools). diff --git a/spiceaidocs/docs/components/models/openai.md b/spiceaidocs/docs/components/models/openai.md index 0ec53d19..e71769dc 100644 --- a/spiceaidocs/docs/components/models/openai.md +++ b/spiceaidocs/docs/components/models/openai.md @@ -5,16 +5,17 @@ sidebar_label: 'OpenAI' sidebar_position: 4 --- -To use a language model hosted on OpenAI (or compatible), specify the `openai` path in `from`. +To use a language model hosted on OpenAI (or compatible), specify the `openai` path in the `from` field. + +For a specific model, include it as the model ID in the `from` field (see example below). The default model is `"gpt-3.5-turbo"`. -For a specific model, include it as the model ID in `from` (see example below). Defaults to `"gpt-3.5-turbo"`. These parameters are specific to OpenAI models: | Param | Description | Default | | ------------------- | ----------------------------- | --------------------------- | | `openai_api_key` | The OpenAI API key. | - | -| `openai_org_id` | The OpenAI organization id. | - | -| `openai_project_id` | The OpenAI project id. | - | +| `openai_org_id` | The OpenAI organization ID. | - | +| `openai_project_id` | The OpenAI project ID. | - | | `endpoint` | The OpenAI API base endpoint. | `https://api.openai.com/v1` | Example: diff --git a/spiceaidocs/docs/components/models/spiceai.md b/spiceaidocs/docs/components/models/spiceai.md index a64507dd..f886e3f3 100644 --- a/spiceaidocs/docs/components/models/spiceai.md +++ b/spiceaidocs/docs/components/models/spiceai.md @@ -5,7 +5,7 @@ sidebar_label: 'Spice Cloud Platform' sidebar_position: 2 --- -To use a model hosted on the [Spice Cloud Platform](https://docs.spice.ai/building-blocks/spice-models), specify the `spice.ai` path in `from`. +To use a model hosted on the [Spice Cloud Platform](https://docs.spice.ai/building-blocks/spice-models), specify the `spice.ai` path in the `from` field. Example: diff --git a/spiceaidocs/docs/components/views/index.md b/spiceaidocs/docs/components/views/index.md index 38e75347..d0f8ad93 100644 --- a/spiceaidocs/docs/components/views/index.md +++ b/spiceaidocs/docs/components/views/index.md @@ -1,18 +1,20 @@ --- title: 'Views' sidebar_label: 'Views' -description: 'Documentation for defining Views' +description: 'Documentation for defining Views in Spice' sidebar_position: 7 --- -Views in Spice are virtual tables defined by SQL queries. They simplify complex queries and support reuse across applications. +Views in Spice are virtual tables defined by SQL queries. They help simplify complex queries and promote reuse across different applications by encapsulating query logic in a single, reusable entity. ## Defining a View -To define a view in `spicepod.yaml`, specify the `views` section. Each view requires a `name` and a `sql` field. +To define a view in the `spicepod.yaml` configuration file, specify the `views` section. Each view definition must include a `name` and a `sql` field. ### Example +The following example demonstrates how to define a view named `rankings` that lists the top five products based on the total count of orders: + ```yaml views: - name: rankings diff --git a/spiceaidocs/docs/features/cdc/index.md b/spiceaidocs/docs/features/cdc/index.md index 030e3a6c..0c71557c 100644 --- a/spiceaidocs/docs/features/cdc/index.md +++ b/spiceaidocs/docs/features/cdc/index.md @@ -7,33 +7,33 @@ pagination_prev: null pagination_next: null --- -Change Data Capture (CDC) is a technique that captures changed rows from a database's transaction log and delivers them to consumers with low latency. Leveraging this technique enables Spice to keep [locally accelerated](../data-acceleration/index.md) datasets up-to-date in real time with the source data, and it is highly efficient as it only transfers the changed rows instead of re-fetching the entire dataset on refresh. +Change Data Capture (CDC) captures changed rows from a database's transaction log and delivers them to consumers with low latency. This technique enables Spice to keep [locally accelerated](../data-acceleration/index.md) datasets up-to-date in real time with the source data. It is efficient because it only transfers the changed rows instead of re-fetching the entire dataset. ## Benefits -Leveraging locally accelerated datasets configured with CDC enables Spice to provide a solution that combines high-performance accelerated queries and efficient real-time delta updates. +Using locally accelerated datasets configured with CDC enables Spice to provide high-performance accelerated queries and efficient real-time updates. ## Example Use Case -Consider a fraud detection application that needs to determine whether a pending transaction is likely fraudulent. The application queries a Spice-accelerated real-time updated table of recent transactions to check if a pending transaction resembles known fraudulent ones. Using CDC, the table is kept up-to-date, allowing the application to quickly identify potential fraud. +Consider a fraud detection application that needs to determine whether a pending transaction is likely fraudulent. The application queries a Spice-accelerated, real-time updated table of recent transactions to check if a pending transaction resembles known fraudulent ones. With CDC, the table is kept up-to-date, allowing the application to quickly identify potential fraud. ## Considerations When configuring datasets to be accelerated with CDC, ensure that the [data connector](/components/data-connectors) supports CDC and can return a stream of row-level changes. See the [Supported Data Connectors](#supported-data-connectors) section for more information. -The startup time for CDC-accelerated datasets may be longer than that for non-CDC-accelerated datasets due to the initial synchronization of the dataset. +The startup time for CDC-accelerated datasets may be longer than for non-CDC-accelerated datasets due to the initial synchronization. :::tip -It's recommended to use CDC-accelerated datasets with persistent data accelerator configurations (i.e. `file` mode for [`DuckDB`](/components/data-accelerators/duckdb.md)/[`SQLite`](/components/data-accelerators/sqlite.md) or [`PostgreSQL`](/components/data-accelerators/postgres/index.md)). This ensures that when Spice restarts, it can resume from the last known state of the dataset instead of re-fetching the entire dataset. +It is recommended to use CDC-accelerated datasets with persistent data accelerator configurations (i.e., `file` mode for [`DuckDB`](/components/data-accelerators/duckdb.md)/[`SQLite`](/components/data-accelerators/sqlite.md) or [`PostgreSQL`](/components/data-accelerators/postgres/index.md)). This ensures that when Spice restarts, it can resume from the last known state of the dataset instead of re-fetching the entire dataset. ::: ## Supported Data Connectors -Enabling CDC via setting `refresh_mode: changes` in the acceleration settings requires support from the data connector to provide a stream of row-level changes. +Enabling CDC by setting `refresh_mode: changes` in the acceleration settings requires support from the data connector to provide a stream of row-level changes. -At present, the only supported data connector is [Debezium](/components/data-connectors/debezium.md).. +Currently, the only supported data connector is [Debezium](/components/data-connectors/debezium.md). ## Example diff --git a/spiceaidocs/docs/features/data-acceleration/constraints.md b/spiceaidocs/docs/features/data-acceleration/constraints.md index adaeeb73..d5c89718 100644 --- a/spiceaidocs/docs/features/data-acceleration/constraints.md +++ b/spiceaidocs/docs/features/data-acceleration/constraints.md @@ -5,11 +5,11 @@ sidebar_position: 2 description: 'Learn how to add/configure constraints on local acceleration tables in Spice.' --- -Constraints are rules that enforce data integrity in a database. Spice supports constraints on locally accelerated tables to ensure data quality, as well as configuring the behavior for inserting data updates that violate constraints. +Constraints enforce data integrity in a database. Spice supports constraints on locally accelerated tables to ensure data quality and configure behavior for data updates that violate constraints. Constraints are specified using [column references](#column-references) in the Spicepod via the `primary_key` field in the acceleration configuration. Additional unique constraints are specified via the [`indexes`](./indexes.md) field with the value `unique`. Data that violates these constraints will result in a [conflict](#handling-conflicts). -If there are multiple rows in the incoming data that violate any constraint, the entire incoming batch of data will be dropped. +If multiple rows in the incoming data violate any constraint, the entire incoming batch of data will be dropped. Example Spicepod: @@ -72,9 +72,8 @@ datasets: :::danger[Invalid] - ```yaml - datasets: - + ```yaml + datasets: - from: spice.ai/eth.recent_blocks name: eth.recent_blocks acceleration: @@ -82,11 +81,11 @@ datasets: engine: sqlite primary_key: hash indexes: - "(number, timestamp)": unique + '(number, timestamp)': unique on_conflict: hash: upsert - "(number, timestamp)": upsert - ``` + '(number, timestamp)': upsert + ``` ::: @@ -94,9 +93,8 @@ datasets: :::tip[Valid] - ```yaml - datasets: - + ```yaml + datasets: - from: spice.ai/eth.recent_blocks name: eth.recent_blocks acceleration: @@ -104,20 +102,20 @@ datasets: engine: sqlite primary_key: hash indexes: - "(number, timestamp)": unique + '(number, timestamp)': unique on_conflict: hash: drop - "(number, timestamp)": drop - ``` + '(number, timestamp)': drop + ``` ::: The following Spicepod is invalid because it specifies multiple `on_conflict` targets with `upsert` and `drop`: :::danger[Invalid] - ```yaml - datasets: + ```yaml + datasets: - from: spice.ai/eth.recent_blocks name: eth.recent_blocks acceleration: @@ -125,11 +123,11 @@ datasets: engine: sqlite primary_key: hash indexes: - "(number, timestamp)": unique + '(number, timestamp)': unique on_conflict: hash: upsert - "(number, timestamp)": drop - ``` + '(number, timestamp)': drop + ``` ::: diff --git a/spiceaidocs/docs/features/data-acceleration/index.md b/spiceaidocs/docs/features/data-acceleration/index.md index de44141e..92cd53dc 100644 --- a/spiceaidocs/docs/features/data-acceleration/index.md +++ b/spiceaidocs/docs/features/data-acceleration/index.md @@ -6,23 +6,23 @@ sidebar_position: 2 pagination_prev: null --- -Datasets can be locally accelerated by the Spice runtime, pulling data from any [Data Connector](/components/data-connectors) and storing it locally in a [Data Accelerator](/components/data-accelerators) for faster access. Additionally, the data can be configured to be kept up-to-date in realtime or on a refresh schedule, so you always have the latest data locally for querying. +Datasets can be locally accelerated by the Spice runtime, pulling data from any [Data Connector](/components/data-connectors) and storing it locally in a [Data Accelerator](/components/data-accelerators) for faster access. The data can be kept up-to-date in real-time or on a refresh schedule, ensuring you always have the latest data locally for querying. ## Benefits -When a dataset is locally accelerated by the Spice runtime, the data is stored alongside your application, providing much faster query times by cutting out network latency to make the request. This benefit is accentuated when the result of a query is large because the data does not need to be transferred over the network. Depending on the [Acceleration Engine](/components/data-accelerators) chosen, the locally accelerated data can also be stored in-memory, further reducing query times. [Indexes](./indexes.md) can also be applied, further speeding up certain types of queries. +Local data acceleration stores data alongside your application, providing faster query times by eliminating network latency. This is especially beneficial for large query results, as data transfer over the network is avoided. Depending on the [Acceleration Engine](/components/data-accelerators) used, data can also be stored in-memory, further reducing query times. [Indexes](./indexes.md) can be applied to speed up certain queries. -Locally accelerated datasets can also have [primary key constraints](./constraints.md) applied. This feature comes with the ability to specify what should happen when a constraint is violated, either drop the specific row that violates the constraint or upsert that row into the accelerated table. +Locally accelerated datasets can also have [primary key constraints](./constraints.md) applied. This feature allows specifying actions when a constraint is violated, such as dropping the violating row or upserting it into the accelerated table. ## Example Use Case -Consider a high volume e-trading frontend application backed by an AWS RDS database containing a table of trades. In order to retrieve all trades over the last 24 hours, the application would need to query the remote database for all trades in the last 24 hours and then transfer the data over the network. By accelerating the trades table locally using the [AWS RDS Data Connector](https://github.com/spiceai/quickstarts/tree/trunk/rds-aurora-mysql), we can bring the data to the application, saving the round trip time to the database and the time to transfer the data over the network. +Consider a high-volume e-trading frontend application backed by an AWS RDS database containing a table of trades. To retrieve all trades over the last 24 hours, the application would need to query the remote database and transfer the data over the network. By accelerating the trades table locally using the [AWS RDS Data Connector](https://github.com/spiceai/quickstarts/tree/trunk/rds-aurora-mysql), the data is brought to the application, saving round trip time and data transfer time. ## Considerations -Data Storage: Ensure that the local storage has enough capacity to store the accelerated data. The amount and type (i.e. Disk or RAM) of storage required will depend on the size of the dataset and the acceleration engine used. +Data Storage: Ensure local storage has enough capacity for the accelerated data. The required storage type (Disk or RAM) and amount depend on the dataset size and the acceleration engine used. -Data Security: Assess data sensitivity and secure network connections between edge and data connector when replicating data for further usage. Assess the security of any Data Accelerator that is external to the Spice runtime and connected to the Spice runtime. Implement encryption, access controls, and secure protocols. +Data Security: Assess data sensitivity and secure network connections between the edge and data connector when replicating data. Secure any external Data Accelerator connected to the Spice runtime with encryption, access controls, and secure protocols. ## Example diff --git a/spiceaidocs/docs/features/data-acceleration/indexes.md b/spiceaidocs/docs/features/data-acceleration/indexes.md index 34b17930..917e3017 100644 --- a/spiceaidocs/docs/features/data-acceleration/indexes.md +++ b/spiceaidocs/docs/features/data-acceleration/indexes.md @@ -5,7 +5,7 @@ sidebar_position: 1 description: 'Learn how to add indexes to local acceleration tables in Spice.' --- -Database indexes are an essential tool for optimizing the performance of queries. Learn how to add indexes to the tables that Spice creates to accelerate data locally. +Database indexes are essential for optimizing query performance. This document explains how to add indexes to tables created by Spice for local data acceleration. Example Spicepod: diff --git a/spiceaidocs/docs/features/data-ingestion/index.md b/spiceaidocs/docs/features/data-ingestion/index.md index a3a7b313..12f20400 100644 --- a/spiceaidocs/docs/features/data-ingestion/index.md +++ b/spiceaidocs/docs/features/data-ingestion/index.md @@ -7,31 +7,31 @@ pagination_prev: null pagination_next: null --- -Data can be ingested by the Spice runtime for replication to a Data Connector, like PostgreSQL or the Spice.ai Cloud platform. +Data can be ingested by the Spice runtime for replication to a Data Connector, such as PostgreSQL or the Spice.ai Cloud platform. -By default, the runtime exposes an [OpenTelemety](https://opentelemetry.io) (OTEL) endpoint at grpc://127.0.0.1:50052 for data ingestion. +By default, the runtime exposes an [OpenTelemetry](https://opentelemetry.io) (OTEL) endpoint at grpc://127.0.0.1:50052 for data ingestion. OTEL metrics will be inserted into datasets with matching names (metric name = dataset name) and optionally replicated to the dataset source. ## Benefits -Spice.ai OSS incorporates built-in data ingestion support, enabling the collection of the latest data from edge nodes for use in subsequent queries. This capability avoids the need for additional ETL pipelines, while also enhancing the speed of the feedback loop. +Spice.ai OSS includes built-in data ingestion support, allowing the collection of the latest data from edge nodes for use in subsequent queries. This feature eliminates the need for additional ETL pipelines and enhances the speed of the feedback loop. -As an example, consider CPU usage anomaly detection. When CPU metrics are sent to the Spice OpenTelemetry endpoint, the loaded machine learning model can utilize the most recent observations for inferencing and provide recommendations to the edge node. This process occurs rapidly on the edge itself, within milliseconds and without generating network traffic. +For example, consider CPU usage anomaly detection. When CPU metrics are sent to the Spice OpenTelemetry endpoint, the loaded machine learning model can use the most recent observations for inferencing and provide recommendations to the edge node. This process occurs quickly on the edge itself, within milliseconds, and without generating network traffic. -Additional, Spice will replicate the data periodically to the data connector for further usage. +Additionally, Spice will periodically replicate the data to the data connector for further use. ## Considerations -Data Quality: Leverage Spice SQL capabilities to transform and cleanse ingested edge data, ensuring high-quality inputs. +Data Quality: Use Spice SQL capabilities to transform and cleanse ingested edge data, ensuring high-quality inputs. -Data Security: Assess data sensitivity and secure network connections between edge and data connector when replicating data for further usage. Implement encryption, access controls, and secure protocols. +Data Security: Evaluate data sensitivity and secure network connections between the edge and data connector when replicating data for further use. Implement encryption, access controls, and secure protocols. ## Example ### [Disk SMART](https://en.wikipedia.org/wiki/Self-Monitoring,_Analysis_and_Reporting_Technology) -- Start Spice with the following dataset: +Start Spice with the following dataset: ```yaml datasets: @@ -44,7 +44,7 @@ datasets: enabled: true ``` -- Start telegraf with the following config: +Start telegraf with the following config: ``` [[inputs.smart]] diff --git a/spiceaidocs/docs/features/federated-queries/index.md b/spiceaidocs/docs/features/federated-queries/index.md index 6bf3e542..2d8e4709 100644 --- a/spiceaidocs/docs/features/federated-queries/index.md +++ b/spiceaidocs/docs/features/federated-queries/index.md @@ -7,17 +7,15 @@ pagination_prev: null pagination_next: null --- -Spice provides a powerful federated query feature that allows you to join and combine data from multiple data sources and perform complex queries. This feature enables you to leverage the full potential of your data by aggregating and analyzing information wherever it is stored. - -Spice supports federated query across databases (PostgreSQL, MySQL, etc.), data warehouses (Databricks, Snowflake, BigQuery, etc.), and data lakes (S3, MinIO, etc.). See [Data Connectors](/components/data-connectors/index.md) for the full list of supported sources. +Spice supports federated queries, enabling you to join and combine data from multiple sources, including databases (PostgreSQL, MySQL), data warehouses (Databricks, Snowflake, BigQuery), and data lakes (S3, MinIO). For a full list of supported sources, see [Data Connectors](/components/data-connectors/index.md). ### Getting Started -To get started with federated queries using Spice, follow these steps: +To start using federated queries in Spice, follow these steps: **Step 1.** Install Spice by following the [installation instructions](/getting-started/index.md). -**Step 2.** Clone the [Spice Quickstarts repo](https://github.com/spiceai/quickstarts) and navigate to the `federation` directory. +**Step 2.** Clone the Spice Quickstarts repository and navigate to the `federation` directory. ```bash git clone https://github.com/spiceai/quickstarts.git diff --git a/spiceaidocs/docs/features/large-language-models/index.md b/spiceaidocs/docs/features/large-language-models/index.md index e59a7e4b..7c0eb5d1 100644 --- a/spiceaidocs/docs/features/large-language-models/index.md +++ b/spiceaidocs/docs/features/large-language-models/index.md @@ -7,9 +7,7 @@ pagination_prev: null pagination_next: null --- -Spice provides a high-performance, OpenAI API-compatible AI Gateway optimized for managing and scaling large language models (LLMs). - -Additionally, Spice offers tools for Enterprise Retrieval-Augmented Generation (RAG), such as SQL query across federated datasets and an advanced search feature (see [Search](/features/search)). +Spice provides a high-performance, OpenAI API-compatible AI Gateway optimized for managing and scaling large language models (LLMs). Additionally, Spice offers tools for Enterprise Retrieval-Augmented Generation (RAG), such as SQL query across federated datasets and an advanced search feature (see [Search](/features/search)). Spice also supports **full OpenTelemetry observability**, enabling detailed tracking of data flows and requests for full transparency and easier debugging. diff --git a/spiceaidocs/docs/features/large-language-models/memory.md b/spiceaidocs/docs/features/large-language-models/memory.md index 51fcf262..95501cf3 100644 --- a/spiceaidocs/docs/features/large-language-models/memory.md +++ b/spiceaidocs/docs/features/large-language-models/memory.md @@ -13,9 +13,9 @@ Spice provides memory persistence tools that allow language models to store and ## Enabling Memory Tools -To enable memory tools for Spice models you need to: - 1. Define a `store` [memory](/components/data-connectors/memory.md) dataset. - 2. Specify `memory` in the model's `tools` parameter. +To enable memory tools for Spice models, define a `store` [memory](/components/data-connectors/memory.md) dataset and specify `memory` in the model's `tools` parameter. + +### Example: Enabling Memory Tools ```yaml datasets: @@ -31,5 +31,6 @@ models: ``` ## Available Tools - - `store_memory`: Store important information for future reference - - `load_memory`: Retrieve previously stored memories from the last time period. + +- `store_memory`: Store important information for future reference +- `load_memory`: Retrieve previously stored memories from the last time period. diff --git a/spiceaidocs/docs/features/large-language-models/parameter_overrides.md b/spiceaidocs/docs/features/large-language-models/parameter_overrides.md index e41750f2..8424f36d 100644 --- a/spiceaidocs/docs/features/large-language-models/parameter_overrides.md +++ b/spiceaidocs/docs/features/large-language-models/parameter_overrides.md @@ -8,21 +8,26 @@ pagination_next: null --- ### Chat Completion Parameter Overrides -[`v1/chat/completion`](/api/http/chat-completions) is an OpenAI compatible endpoint. -It supports all request body parameters defined in the [OpenAI reference documentation](https://platform.openai.com/docs/api-reference/chat/create). Spice can configure different defaults for these request parameters. +[`v1/chat/completion`](/api/http/chat-completions) is an OpenAI-compatible endpoint. It supports all request body parameters defined in the [OpenAI reference documentation](https://platform.openai.com/docs/api-reference/chat/create). Spice can configure different defaults for these request parameters. + +### Example: Setting Default Overrides + ```yaml models: - name: pirate-haikus from: openai:gpt-4o params: openai_temperature: 0.1 - openai_response_format: { "type": "json_object" } + openai_response_format: { 'type': 'json_object' } ``` + To specify a default override for a parameter, use the `openai_` prefix followed by the parameter name. For example, to set the `temperature` parameter to `0.1`, use `openai_temperature: 0.1`. ### System Prompt + In addition to any system prompts provided in message dialogue, or added by model providers, Spice can configure an additional system prompt. + ```yaml models: - name: pirate-haikus diff --git a/spiceaidocs/docs/features/large-language-models/runtime_tools.md b/spiceaidocs/docs/features/large-language-models/runtime_tools.md index 1a2af77f..fe2835d7 100644 --- a/spiceaidocs/docs/features/large-language-models/runtime_tools.md +++ b/spiceaidocs/docs/features/large-language-models/runtime_tools.md @@ -7,7 +7,10 @@ pagination_prev: null pagination_next: null --- -Spice provides a set of tools that let LLMs interact with the runtime. To provide these tools to a Spice model, specify them in its `params.tools`. +Spice provides tools that enable LLMs to interact with the runtime. To provide these tools to a Spice model, specify them in its `params.tools`. + +### Example: Specifying Tools for a Model + ```yaml models: - name: sql-model @@ -22,6 +25,7 @@ models: ``` To use all builtin tools with additional tools, use the `builtin` tool group. + ```yaml models: - name: full-runtime @@ -31,21 +35,23 @@ models: ``` ### Tool Recursion Limit + When a model requests to call a runtime tool, Spice runs the tool internally and feeds it back to the model. The `tool_recursion_limit` parameter limits the depth of internal recursion Spice will undertake. By default, Spice can infinitely recurse if the model requests to do so. ```yaml models: - - name: my-model - from: openai - params: - tool_recursion_limit: 3 + - name: my-model + from: openai + params: + tool_recursion_limit: 3 ``` ## Available tools - - `list_datasets`: List all available datasets in the runtime. - - `sql`: Execute SQL queries on the runtime. - - `table_schema`: Get the schema of a specific SQL table. - - `document_similarity`: For datasets with an embedding column, retrieve documents based on an input query. It is equivalent to [/v1/search](/api/http/search). - - `sample_distinct_columns`: For a dataset, generate a synthetic sample of data whereby each column has at least a number of distinct values. - - `random_sample`: Sample random rows from a table. - - `top_n_sample`: Sample the top N rows from a table based on a specified ordering. + +- `list_datasets`: List all available datasets in the runtime. +- `sql`: Execute SQL queries on the runtime. +- `table_schema`: Get the schema of a specific SQL table. +- `document_similarity`: For datasets with an embedding column, retrieve documents based on an input query. It is equivalent to [/v1/search](/api/http/search). +- `sample_distinct_columns`: For a dataset, generate a synthetic sample of data whereby each column has at least a number of distinct values. +- `random_sample`: Sample random rows from a table. +- `top_n_sample`: Sample the top N rows from a table based on a specified ordering. diff --git a/spiceaidocs/docs/features/machine-learning-models/ml-model-serving/index.md b/spiceaidocs/docs/features/machine-learning-models/ml-model-serving/index.md index 933e77f4..917c047e 100644 --- a/spiceaidocs/docs/features/machine-learning-models/ml-model-serving/index.md +++ b/spiceaidocs/docs/features/machine-learning-models/ml-model-serving/index.md @@ -9,7 +9,7 @@ pagination_next: null Spice supports loading and serving ONNX models and GGUF LLMs from various sources for embeddings and inference, including local filesystems, Hugging Face, and the Spice Cloud platform. -Example `spicepod.yml` loading a LLM from HuggingFace: +### Example: Loading a LLM from Hugging Face ```yaml models: diff --git a/spiceaidocs/docs/features/search/index.md b/spiceaidocs/docs/features/search/index.md index c1982768..1771997e 100644 --- a/spiceaidocs/docs/features/search/index.md +++ b/spiceaidocs/docs/features/search/index.md @@ -53,7 +53,7 @@ datasets: columns: - name: body embeddings: - - from: local_embedding_model # Embedding model used for this column + - from: local_embedding_model # Embedding model used for this column ``` By defining embeddings on the `body` column, Spice is now configured to execute similarity searches on the dataset. diff --git a/spiceaidocs/docs/features/semantic-model/index.md b/spiceaidocs/docs/features/semantic-model/index.md index 29a777cd..17a3b7c3 100644 --- a/spiceaidocs/docs/features/semantic-model/index.md +++ b/spiceaidocs/docs/features/semantic-model/index.md @@ -7,19 +7,17 @@ pagination_prev: null pagination_next: null --- -Semantic data models in Spice are defined using the `datasets[*].columns` configuration. - -Structured and meaningful data representations can be added to datasets, beneficial for both AI large language models (LLMs) and traditional data analysis. +Semantic data models in Spice are defined using the `datasets[*].columns` configuration. These models provide structured and meaningful data representations, which are beneficial for both AI large language models (LLMs) and traditional data analysis. ## Use-Cases ### Large Language Models (LLMs) -The semantic model will automatically be used by [Spice Models](/reference/spicepod/models.md) as context to produce more accurate and context-aware AI responses. +The semantic model is automatically used by [Spice Models](/reference/spicepod/models.md) as context to produce more accurate and context-aware AI responses. ## Defining a Semantic Model -Semantic data models are defined within the `spicepod.yaml` file, specifically under the `datasets` section. Each dataset supports `description`, `metadata` and a `columns` field where individual columns are described with metadata and features for utility and clarity. +Semantic data models are defined within the `spicepod.yaml` file, specifically under the `datasets` section. Each dataset supports `description`, `metadata`, and a `columns` field where individual columns are described with metadata and features for utility and clarity. ### Example Configuration diff --git a/spiceaidocs/docs/index.md b/spiceaidocs/docs/index.md index a9679220..5084c5b6 100644 --- a/spiceaidocs/docs/index.md +++ b/spiceaidocs/docs/index.md @@ -10,19 +10,19 @@ import ThemeBasedImage from '@site/src/components/ThemeBasedImage'; ## What is Spice? -**Spice** is a portable runtime offering developers a unified SQL interface to materialize, accelerate, and query data from any database, data warehouse, or data lake. +**Spice** is a portable runtime written in Rust that offers developers a unified SQL interface to materialize, accelerate, and query data from any database, data warehouse, or data lake. 📣 Read the [Spice.ai OSS announcement blog post](https://blog.spiceai.org/posts/2024/03/28/adding-spice-the-next-generation-of-spice.ai-oss/). Spice connects, fuses, and delivers data to applications, machine-learning models, and AI-backends, functioning as an application-specific, tier-optimized Database CDN. -The Spice runtime, written in Rust, is built-with industry leading technologies such as [Apache DataFusion](https://datafusion.apache.org), Apache Arrow, Apache Arrow Flight, SQLite, and DuckDB. +Spice is built-with industry leading technologies such as [Apache DataFusion](https://datafusion.apache.org), Apache Arrow, Apache Arrow Flight, SQLite, and DuckDB. ## Why Spice? -Spice makes it easy and fast to query data from one or more sources using SQL. You can co-locate a managed dataset with your application or machine learning model, and accelerate it with Arrow in-memory, SQLite/DuckDB, or with attached PostgreSQL for fast, high-concurrency, low-latency queries. Accelerated engines give you flexibility and control over query cost and performance. +Spice makes it fast and easy to query data from one or more sources using SQL. You can co-locate a managed dataset with your application or machine learning model, and accelerate it with Arrow in-memory, SQLite/DuckDB, or with attached PostgreSQL for fast, high-concurrency, low-latency queries. Accelerated engines give you flexibility and control over query cost and performance. @@ -40,7 +40,7 @@ Spice makes it easy and fast to query data from one or more sources using SQL. Y | | Spice | Trino/Presto | Dremio | Clickhouse | | -------------------------- | ---------------------------------- | -------------------------------- | -------------------------------- | ----------------------- | -| Primary Use-Case | Data & AI Applications | Big Data Analytics | Interative Analytics | Real-Time Analytics | +| Primary Use-Case | Data & AI Applications | Big Data Analytics | Interactive Analytics | Real-Time Analytics | | Typical Deployment | Colocated with application | Cloud Cluster | Cloud Cluster | On-Prem/Cloud Cluster | | Application-to-Data System | One-to-One/Many | Many-to-One | Many-to-One | Many-to-One | | Query Federation | Native with query push-down | Supported with push-down | Supported with limited push-down | Limited | @@ -64,10 +64,6 @@ Spice makes it easy and fast to query data from one or more sources using SQL. Y - **Is Spice a CDN for databases?** Yes, you can think of Spice like a CDN for different data sources. Using CDN concepts, Spice enables you to ship (load) a working set of your database (or data lake, or data warehouse) where it's most frequently accessed, like from a data application or for AI-inference. -:::warning[DEVELOPER PREVIEW] -Spice is under active **alpha** stage development and is not intended to be used in production until its **1.0-stable** release. If you are interested in running Spice in production, please get in touch below so we can support you. -::: - ### Intelligent Applications Spice enables developers to build both data _and_ AI-driven applications by co-locating data _and_ ML models with applications. Read more about the vision to enable the development of [intelligent AI-driven applications](./intelligent-applications/index.md). diff --git a/spiceaidocs/docs/intelligent-applications/index.md b/spiceaidocs/docs/intelligent-applications/index.md index 5c4cac56..c42c2e2a 100644 --- a/spiceaidocs/docs/intelligent-applications/index.md +++ b/spiceaidocs/docs/intelligent-applications/index.md @@ -7,14 +7,50 @@ pagination_prev: null pagination_next: null --- -As described in the blog post [Making Apps That Learn and Adapt](https://blog.spiceai.org/posts/2021/11/05/making-apps-that-learn-and-adapt/) the long-term vision for Spice.ai is to enable developers to easily build, deploy, and operate intelligent data and AI-driven applications. +## Building Data-Driven AI Applications with Spice.ai -With Spice.ai OSS, federated data and ML models are colocated with applications, creating lightweight, high-performance, AI-copilot sidecars. +Spice.ai represents a paradigm shift in how intelligent applications are developed, deployed, and managed. As outlined in the blog post [Making Apps That Learn and Adapt](https://blog.spiceai.org/posts/2021/11/05/making-apps-that-learn-and-adapt/), the goal of Spice.ai is to eliminate the technical complexity that often hampers developers when building AI-powered solutions. By colocating federated data and machine learning models with applications, Spice.ai provides lightweight, high-performance, and highly scalable AI copilot sidecars. These sidecars streamline application workflows, significantly enhancing both speed and efficiency. -A Spice.ai Intelligent Application +At its core, Spice.ai addresses the fragmented nature of traditional AI infrastructure. Data often resides in multiple systems: modern cloud-based warehouses, legacy databases, or unstructured formats like files on FTP servers. Integrating these disparate sources into a unified application pipeline typically requires extensive engineering effort. Spice.ai simplifies this process by federating data across all these sources, materializing it locally for low-latency access, and offering a unified SQL API. This eliminates the need for complex and costly ETL pipelines or federated query engines that operate with high latency. + +Spice.ai also colocates machine learning models with the application runtime. This approach reduces the data transfer overhead that occurs when sending data to external inference services. By performing inference locally, applications can respond faster and operate more reliably, even in environments with intermittent network connectivity. The result is an infrastructure that enables developers to focus on building value-driven features rather than wrestling with data and deployment complexities. + +--- ## The Intelligent Application Workflow -Dataset definitions and ML Models are packaged as Spicepods and can be published and distributed through the [Spicerack.org](https://spicerack.org) Spicepod registry. Federated datasets are locally materialized, accelerated, and provided to colocated Models. Applications call high-performance, low-latency ML inference APIs for AI generation, insights, recommendations, and forecasts ultimately to make intelligent, AI-driven decisions. Contextual application and environmental data is ingested and replicated back to cloud-scale compute clusters where improved versions of Models are trained and fined-tuned. New versioned Models are automatically deployed to the runtime and are A/B tested and flighted by the application in realtime. +The workflow for creating intelligent applications with Spice.ai is designed to provide developers with a straightforward, efficient path from data to decision-making. It begins with the creation of `Spicepods`, self-contained packages that define datasets and machine learning models. These packages can be distributed through [Spicerack.org](https://spicerack.org), a registry that allows developers to publish, share, and reuse datasets and models for various applications. + +Once deployed, federated datasets are materialized locally within the Spice runtime. Materialization involves prefetching and precomputing data, storing it in high-performance local stores like DuckDB or SQLite. This approach ensures that queries are executed with minimal latency, offering high concurrency and predictable performance. Accelerated access is made possible through advanced caching and query optimization techniques, enabling applications to perform even complex operations without relying on remote databases. + +Applications interact with the Spice runtime through high-performance APIs, calling machine learning models for inference tasks such as predictions, recommendations, or anomaly detection. These models are colocated with the runtime, allowing them to leverage the same locally materialized datasets. For example, an e-commerce application could use this infrastructure to provide real-time product recommendations based on user behavior, or a manufacturing system could detect equipment failures before they happen by analyzing time-series sensor data. + +As the application runs, contextual and environmental data—such as user actions or external sensor readings—is ingested into the runtime. This data is replicated back to centralized compute clusters where machine learning models are retrained and fine-tuned to improve accuracy and performance. The updated models are automatically versioned and deployed to the runtime, where they can be A/B tested in real time. This continuous feedback loop ensures that applications evolve and improve without manual intervention, reducing time to value while maintaining model relevance. + +![Spice.ai Intelligent Application Workflow](https://github.com/spiceai/docs/assets/80174/22b02c5e-5fcb-4856-b79d-911ac5d084c6) + +--- + +## Why Spice.ai Is the Future of Intelligent Applications + +Spice.ai introduces an entirely new way of thinking about application infrastructure by making intelligent applications accessible to developers of all skill levels. It replaces the need for custom integrations and fragmented tools with a unified runtime optimized for AI and data-driven applications. Unlike traditional architectures that rely heavily on centralized databases or cloud-based inference engines, Spice.ai focuses on tier-optimized deployments. This means that data and computation are colocated wherever the application runs—whether in the cloud, on-prem, or at the edge. + +Federation and materialization are at the heart of Spice.ai’s architecture. Instead of querying remote data sources directly, Spice.ai materializes working datasets locally. For example, a logistics application might materialize only the last seven days of shipment data from a cloud data lake, ensuring that 99% of queries are served locally while retaining the ability to fall back to the full dataset as needed. This reduces both latency and costs while improving the user experience. + +Machine learning models benefit from the same localized efficiency. Because the models are colocated with the runtime, inference happens in milliseconds rather than seconds, even for complex operations. This is critical for use cases like fraud detection, where split-second decisions can save businesses millions, or real-time personalization, where user engagement depends on instant feedback. + +Spice.ai also shines in its ability to integrate with diverse infrastructures. It supports modern cloud-native systems like Snowflake and Databricks, legacy databases like SQL Server, and even unstructured sources like files stored on FTP servers. With support for industry-standard APIs like JDBC, ODBC, and Arrow Flight, it integrates seamlessly into existing applications without requiring extensive refactoring. + +In addition to data acceleration and model inference, Spice.ai provides comprehensive observability and monitoring. Every query, inference, and data flow can be tracked and audited, ensuring that applications meet enterprise standards for security, compliance, and reliability. This makes Spice.ai particularly well-suited for industries such as healthcare, finance, and manufacturing, where data privacy and traceability are paramount. + +--- + +## Getting Started with Spice.ai + +Developers can start building intelligent applications with Spice.ai by installing the open-source runtime from [GitHub](https://github.com/spiceai/spiceai). The installation process is simple, and the runtime can be deployed across cloud, on-premises, or edge environments. Once installed, developers can create and manage `Spicepods` to define datasets and machine learning models. These `Spicepods` serve as the building blocks for their applications, streamlining data and model integration. + +For those looking to accelerate development, [Spicerack.org](https://spicerack.org) provides a curated library of reusable datasets and models. By using these pre-built components, developers can reduce time to deployment while focusing on the unique features of their applications. + +The Spice.ai community is an essential resource for new and experienced developers alike. Through forums, documentation, and hands-on support, the community helps developers unlock the full potential of intelligent applications. Whether you’re building real-time analytics systems, AI-enhanced enterprise tools, or edge-based IoT applications, Spice.ai provides the infrastructure you need to succeed. -OGP +In a world where intelligent applications are increasingly becoming the norm, Spice.ai stands out as the definitive platform for building fast, scalable, and secure AI-driven solutions. Its unified approach to data, computation, and machine learning sets a new standard for how applications are developed and deployed. diff --git a/spiceaidocs/docs/reference/spicepod/catalogs.md b/spiceaidocs/docs/reference/spicepod/catalogs.md index 5006b7ec..c9dfb703 100644 --- a/spiceaidocs/docs/reference/spicepod/catalogs.md +++ b/spiceaidocs/docs/reference/spicepod/catalogs.md @@ -17,7 +17,7 @@ catalogs: - from: spice.ai name: spiceai include: - - "tpch.*" # Include only the "tpch" tables. + - 'tpch.*' # Include only the "tpch" tables. ``` ## `from` @@ -54,7 +54,7 @@ An alternative to adding the catalog definition inline in the `spicepod.yaml` fi from: spice.ai name: spiceai include: - - "tpch.*" # Include only the "tpch" tables. + - 'tpch.*' # Include only the "tpch" tables. ``` **ref used in spicepod.yaml** diff --git a/spiceaidocs/docs/reference/spicepod/embeddings.md b/spiceaidocs/docs/reference/spicepod/embeddings.md index a60c2f2a..eb302e64 100644 --- a/spiceaidocs/docs/reference/spicepod/embeddings.md +++ b/spiceaidocs/docs/reference/spicepod/embeddings.md @@ -4,13 +4,11 @@ sidebar_label: 'Embeddings' description: 'Embeddings YAML reference' --- -# Embeddings - -Embeddings allow you to convert text or other data into vector representations, which can be used for various machine learning and natural language processing tasks. +Embeddings convert text or other data into vector representations for machine learning and natural language processing tasks. ## `embeddings` -The `embeddings` section in your configuration allows you to specify one or more embedding models to be used with your datasets. +The `embeddings` section in your configuration specifies one or more embedding models for your datasets. Example: @@ -19,7 +17,7 @@ embeddings: - from: huggingface:huggingface.co/sentence-transformers/all-MiniLM-L6-v2:latest name: text_embedder params: - max_length: "128" + max_length: '128' datasets: - my_text_dataset ``` @@ -54,4 +52,4 @@ Optional. A map of key-value pairs for additional parameters specific to the emb ### `dependsOn` -Optional. A list of dependencies that must be loaded and available before this embedding model. \ No newline at end of file +Optional. A list of dependencies that must be loaded and available before this embedding model. diff --git a/spiceaidocs/docs/reference/spicepod/index.md b/spiceaidocs/docs/reference/spicepod/index.md index 68741b66..ba6c9826 100644 --- a/spiceaidocs/docs/reference/spicepod/index.md +++ b/spiceaidocs/docs/reference/spicepod/index.md @@ -7,7 +7,7 @@ description: 'Detailed documentation on the Spicepod manifest syntax (spicepod.y ## About YAML syntax for Spicepod manifests (spicepod.yaml) -Spicepod manifests use YAML syntax and must be named `spicepod.yaml` or `spicepod.yml`. If you're new to YAML and want to learn more, see "[Learn YAML in Y minutes](https://learnxinyminutes.com/docs/yaml/)." +Spicepod manifests use YAML syntax and must be named `spicepod.yaml` or `spicepod.yml`. If you are new to YAML and want to learn more, see "[Learn YAML in Y minutes](https://learnxinyminutes.com/docs/yaml/)." Spicepod manifest files are stored in the root directory of your application code. @@ -25,7 +25,7 @@ The name of the Spicepod. ## `secrets` -The secrets section in the Spicepod manifest is optional and is used to configure how secrets are stored and accessed by the Spicepod. [Learn more](/components/secret-stores). +The secrets section in the Spicepod manifest is optional and is used to configure how secrets are stored and accessed by the Spicepod. For more information, see [Secret Stores](/components/secret-stores). ### `secrets.from` @@ -215,7 +215,7 @@ Example: runtime: cors: enabled: true - allowed_origins: ["https://example.com"] + allowed_origins: ['https://example.com'] ``` This configuration allows requests from the `https://example.com` origin only. diff --git a/spiceaidocs/docs/reference/spicepod/models.md b/spiceaidocs/docs/reference/spicepod/models.md index 26fe5d56..843bd046 100644 --- a/spiceaidocs/docs/reference/spicepod/models.md +++ b/spiceaidocs/docs/reference/spicepod/models.md @@ -1,6 +1,6 @@ --- -title: "Models" -sidebar_label: "Models" +title: 'Models' +sidebar_label: 'Models' description: 'Models YAML reference' pagination_next: null --- @@ -15,7 +15,6 @@ The model specifications are in early preview and are subject to change. Spice supports both traditional machine learning (ML) models and language models (LLMs). The configuration allows you to specify either type from a variety of sources. The model type is automatically determined based on the model source and files. - | field | Description | | ------------- | ----------------------------------------------------------------------- | | `name` | Unique, readable name for the model within the Spicepod. | @@ -43,7 +42,7 @@ models: - path: tokenizer.json type: tokenizer params: - max_length: "128" + max_length: '128' datasets: - my_text_dataset ``` @@ -72,10 +71,10 @@ The `` suffix of the `from` field is a unique (per source) identifier - For Spice AI: Supports only ML models. Represents the full path to the model in the Spice AI repository. Supports a version suffix (default to `latest`). - Example: `lukekim/smart/models/drive_stats:60cb80a2-d59b-45c4-9b68-0946303bdcaf` - For Hugging Face: A repo_id and, optionally, revision hash or tag. - - `Qwen/Qwen1.5-0.5B` (no revision) - - `meta-llama/Meta-Llama-3-8B:cd892e8f4da1043d4b01d5ea182a2e8412bf658f` (with revision hash) + - `Qwen/Qwen1.5-0.5B` (no revision) + - `meta-llama/Meta-Llama-3-8B:cd892e8f4da1043d4b01d5ea182a2e8412bf658f` (with revision hash) - For local files: Represents the absolute or relative path to the model weights file on the local file system. See [below](#files) for the accepted model weight types and formats. -- For OpenAI: Only supports LMs. For OpenAI models, valid IDs can be found in their model [documentation](https://platform.openai.com/docs/models/continuous-model-upgrades). For OpenAI compatible providers, specify the value required in their `v1/chat/completion` [payload](https://platform.openai.com/docs/api-reference/chat/create#chat-create-model). +- For OpenAI: Only supports LMs. For OpenAI models, valid IDs can be found in their model [documentation](https://platform.openai.com/docs/models/continuous-model-upgrades). For OpenAI compatible providers, specify the value required in their `v1/chat/completion` [payload](https://platform.openai.com/docs/api-reference/chat/create#chat-create-model). ### `name` @@ -94,16 +93,20 @@ Optional. A list of files associated with this model. Each file has: - `type`: Optional. The type of the file (automatically determined if not specified) File types include: + - `weights`: Model weights + - For ML models: typically `.onnx` files - For LLMs: `.gguf`, `.ggml`, `.safetensors`, or `pytorch_model.bin` files - These files contain the trained parameters of the model - `config`: Model configuration + - Usually a `config.json` file - Contains model architecture and hyperparameters - `tokenizer`: Tokenizer file + - Usually a `tokenizer.json` file - Defines how input text is converted into tokens for the model @@ -118,8 +121,9 @@ The system attempts to automatically determine the file type based on the file n Optional. A map of key-value pairs for additional parameters specific to the model. Example uses include: - - Setting default OpenAI request parameters for language models, see [parameter overrides](/features/large-language-models/parameter_overrides.md). - - Allowing Language models to perform actions against spice (e.g. making SQL queries), via language model tool use, see [runtime tools](/features/large-language-models/runtime_tools.md). + +- Setting default OpenAI request parameters for language models, see [parameter overrides](/features/large-language-models/parameter_overrides.md). +- Allowing Language models to perform actions against spice (e.g. making SQL queries), via language model tool use, see [runtime tools](/features/large-language-models/runtime_tools.md). ### `datasets` diff --git a/spiceaidocs/docs/use-cases/data-mesh/index.md b/spiceaidocs/docs/use-cases/data-mesh/index.md index ba46e017..2348c37b 100644 --- a/spiceaidocs/docs/use-cases/data-mesh/index.md +++ b/spiceaidocs/docs/use-cases/data-mesh/index.md @@ -7,10 +7,10 @@ pagination_prev: null pagination_next: null --- -## Accessing data across many, disparate data sources +## Accessing data across multiple, disparate data sources -[Federated SQL query](/features/federated-queries) across databases, data warehouses, and data lakes using [Data Connectors](/components/data-connectors). +Perform [federated SQL queries](/features/federated-queries) across databases, data warehouses, and data lakes using [Data Connectors](/components/data-connectors). -## Migrations from legacy data systems +## Migrating from legacy data systems -A drop-in solution to provides a single, unified endpoint to many data systems without changes to the application. +Spice provides a drop-in solution that offers a single, unified endpoint to multiple data systems without requiring changes to the application. diff --git a/spiceaidocs/docs/use-cases/database-cdn/index.md b/spiceaidocs/docs/use-cases/database-cdn/index.md index 6ca2b293..4d2859c0 100644 --- a/spiceaidocs/docs/use-cases/database-cdn/index.md +++ b/spiceaidocs/docs/use-cases/database-cdn/index.md @@ -7,18 +7,18 @@ pagination_prev: null pagination_next: null --- -## Slow data applications +## Enhancing data application performance Colocate a local working set of hot data with data applications and frontends to serve more concurrent requests and users with faster page loads and data updates. [Try the CQRS sample app](https://github.com/spiceai/samples/tree/trunk/acceleration#local-materialization-and-acceleration-cqrs-sample) -## Fragile data applications +## Increasing application resilience -Keep local replicas of data with the application for significantly higher application resilience and availability. +Maintain local replicas of data with the application to significantly enhance application resilience and availability. -## Slow dashboards, analytics, and BI +## Improving dashboard, analytics, and BI performance -Create a materialization layer for visualization products like Power BI, Tableau, or Superset for faster, more responsive dashboards without massive compute costs. +Create a materialization layer for visualization tools like Power BI, Tableau, or Superset to achieve faster, more responsive dashboards without incurring massive compute costs. [Watch the Apache Superset demo](https://github.com/spiceai/samples/blob/trunk/sales-bi/README.md) diff --git a/spiceaidocs/docs/use-cases/enterprise-search/index.md b/spiceaidocs/docs/use-cases/enterprise-search/index.md index 402cce23..2c946e91 100644 --- a/spiceaidocs/docs/use-cases/enterprise-search/index.md +++ b/spiceaidocs/docs/use-cases/enterprise-search/index.md @@ -7,8 +7,8 @@ pagination_prev: null pagination_next: null --- -## Vector similarily search across disparate and legacy data systems +## Vector similarity search across disparate and legacy data systems -Enterprises face a new challenge when using AI. They now need to access data from disparate and legacy systems so AI has full-knowledge for context. It needs to be fast to be useful. +Enterprises face the challenge of accessing data from various disparate and legacy systems to provide AI with comprehensive context. Speed is crucial for this process to be effective. -Spice is a blazingly fast knowledge index into structured and unstructured data. +Spice offers a fast knowledge index into both structured and unstructured data, enabling efficient vector similarity search across multiple data sources. This ensures that AI applications have access to the necessary data for accurate and timely responses. diff --git a/spiceaidocs/docs/use-cases/rag/index.md b/spiceaidocs/docs/use-cases/rag/index.md index 5b504e3e..c564ca46 100644 --- a/spiceaidocs/docs/use-cases/rag/index.md +++ b/spiceaidocs/docs/use-cases/rag/index.md @@ -7,6 +7,6 @@ pagination_prev: null pagination_next: null --- -Use Spice to access data across disparate datasources for Retrieval-Augmented-Generation (RAG). +Use Spice to access data across various data sources for Retrieval-Augmented-Generation (RAG). -Spice helps developers combine structured data via SQL query, and unstructured data recommended by built-in vector similarility search, to feed to large-language-models (LLMs) through a native AI-gateway. +Spice enables developers to combine structured data via SQL queries and unstructured data through built-in vector similarity search. This combined data can then be fed to large language models (LLMs) through a native AI gateway, enhancing the models' ability to generate accurate and contextually relevant responses. From 2c9bc9a005041c39b3c334a1205928b025f1e451 Mon Sep 17 00:00:00 2001 From: Luke Kim <80174+lukekim@users.noreply.github.com> Date: Wed, 27 Nov 2024 18:27:27 -0800 Subject: [PATCH 2/8] Add API updates --- spiceaidocs/docs/acknowledgements/index.md | 336 +++++++++--------- spiceaidocs/docs/api/adbc/index.md | 2 +- .../docs/api/arrow-flight-sql/index.md | 4 +- spiceaidocs/docs/api/http/catalogs.md | 4 +- spiceaidocs/docs/api/http/chat-completions.md | 2 +- spiceaidocs/docs/api/http/datasets.md | 4 +- spiceaidocs/docs/api/http/embeddings.md | 2 +- spiceaidocs/docs/api/http/index.md | 4 +- spiceaidocs/docs/api/http/ml-predict.md | 4 +- spiceaidocs/docs/api/http/refresh.md | 7 +- spiceaidocs/docs/api/http/search.md | 54 +-- spiceaidocs/docs/api/jdbc/index.md | 13 +- spiceaidocs/docs/api/odbc/index.md | 16 +- 13 files changed, 227 insertions(+), 225 deletions(-) diff --git a/spiceaidocs/docs/acknowledgements/index.md b/spiceaidocs/docs/acknowledgements/index.md index 91a01502..4f7d0aa1 100644 --- a/spiceaidocs/docs/acknowledgements/index.md +++ b/spiceaidocs/docs/acknowledgements/index.md @@ -9,8 +9,6 @@ pagination_next: null Spice AI acknowledges the following open source projects for making this project possible: - - ## Go Modules github.com/AzureAD/microsoft-authentication-library-for-go/apps, https://github.com/AzureAD/microsoft-authentication-library-for-go/blob/v1.3.2/LICENSE, MIT @@ -81,424 +79,424 @@ gopkg.in/yaml.v3, https://github.com/go-yaml/yaml/blob/v3.0.1/LICENSE, MIT ## Rust Crates -- ansi_term 0.12.1, MIT +- ansi_term 0.12.1, MIT
https://github.com/ogham/rust-ansi-term -- anyhow 1.0.93, Apache-2.0 OR MIT +- anyhow 1.0.93, Apache-2.0 OR MIT
https://github.com/dtolnay/anyhow -- arrow 53.2.0, Apache-2.0 +- arrow 53.2.0, Apache-2.0
https://github.com/apache/arrow-rs -- arrow-buffer 53.3.0, Apache-2.0 +- arrow-buffer 53.3.0, Apache-2.0
https://github.com/apache/arrow-rs -- arrow-csv 53.3.0, Apache-2.0 +- arrow-csv 53.3.0, Apache-2.0
https://github.com/apache/arrow-rs -- arrow-flight 53.2.0, Apache-2.0 +- arrow-flight 53.2.0, Apache-2.0
https://github.com/apache/arrow-rs -- arrow-ipc 53.2.0, Apache-2.0 +- arrow-ipc 53.2.0, Apache-2.0
https://github.com/apache/arrow-rs -- arrow-json 53.1.0, Apache-2.0 +- arrow-json 53.1.0, Apache-2.0
https://github.com/apache/arrow-rs -- arrow-json 53.2.0, Apache-2.0 +- arrow-json 53.2.0, Apache-2.0
https://github.com/apache/arrow-rs -- arrow-odbc 11.2.0, MIT +- arrow-odbc 11.2.0, MIT
https://github.com/pacman82/arrow-odbc -- arrow-schema 53.3.0, Apache-2.0 +- arrow-schema 53.3.0, Apache-2.0
https://github.com/apache/arrow-rs -- async-graphql 7.0.11, Apache-2.0 OR MIT +- async-graphql 7.0.11, Apache-2.0 OR MIT
https://github.com/async-graphql/async-graphql -- async-graphql-axum 7.0.11, Apache-2.0 OR MIT +- async-graphql-axum 7.0.11, Apache-2.0 OR MIT
https://github.com/async-graphql/async-graphql -- async-openai 0.24.1, MIT +- async-openai 0.24.1, MIT
https://github.com/64bit/async-openai -- async-stream 0.3.6, MIT +- async-stream 0.3.6, MIT
https://github.com/tokio-rs/async-stream -- async-trait 0.1.83, Apache-2.0 OR MIT +- async-trait 0.1.83, Apache-2.0 OR MIT
https://github.com/dtolnay/async-trait -- aws-config 1.5.10, Apache-2.0 +- aws-config 1.5.10, Apache-2.0
https://github.com/smithy-lang/smithy-rs -- aws-sdk-secretsmanager 1.53.0, Apache-2.0 +- aws-sdk-secretsmanager 1.53.0, Apache-2.0
https://github.com/awslabs/aws-sdk-rust -- aws-sdk-sts 1.50.0, Apache-2.0 +- aws-sdk-sts 1.50.0, Apache-2.0
https://github.com/awslabs/aws-sdk-rust -- axum 0.7.9, MIT +- axum 0.7.9, MIT
https://github.com/tokio-rs/axum -- axum-extra 0.9.4, MIT +- axum-extra 0.9.4, MIT
https://github.com/tokio-rs/axum -- azure_core 0.21.0, MIT +- azure_core 0.21.0, MIT
https://github.com/azure/azure-sdk-for-rust -- azure_storage 0.21.0, MIT +- azure_storage 0.21.0, MIT
https://github.com/azure/azure-sdk-for-rust -- azure_storage_blobs 0.21.0, MIT +- azure_storage_blobs 0.21.0, MIT
https://github.com/azure/azure-sdk-for-rust -- backoff 0.4.0, Apache-2.0 OR MIT +- backoff 0.4.0, Apache-2.0 OR MIT
https://github.com/ihrwein/backoff -- base64 0.13.1, Apache-2.0 OR MIT +- base64 0.13.1, Apache-2.0 OR MIT
https://github.com/marshallpierce/rust-base64 -- base64 0.21.7, Apache-2.0 OR MIT +- base64 0.21.7, Apache-2.0 OR MIT
https://github.com/marshallpierce/rust-base64 -- base64 0.22.1, Apache-2.0 OR MIT +- base64 0.22.1, Apache-2.0 OR MIT
https://github.com/marshallpierce/rust-base64 -- bb8 0.8.6, MIT +- bb8 0.8.6, MIT
https://github.com/djc/bb8 -- bigdecimal 0.4.6, Apache-2.0 OR MIT +- bigdecimal 0.4.6, Apache-2.0 OR MIT
https://github.com/akubera/bigdecimal-rs -- bollard 0.18.1, Apache-2.0 +- bollard 0.18.1, Apache-2.0
https://github.com/fussybeaver/bollard -- byte-unit 5.1.6, MIT +- byte-unit 5.1.6, MIT
https://github.com/magiclen/byte-unit -- bytes 1.8.0, MIT +- bytes 1.8.0, MIT
https://github.com/tokio-rs/bytes -- chrono 0.4.38, Apache-2.0 OR MIT +- chrono 0.4.38, Apache-2.0 OR MIT
https://github.com/chronotope/chrono -- chrono-tz 0.8.6, Apache-2.0 OR MIT +- chrono-tz 0.8.6, Apache-2.0 OR MIT
https://github.com/chronotope/chrono-tz -- chrono-tz 0.10.0, Apache-2.0 OR MIT +- chrono-tz 0.10.0, Apache-2.0 OR MIT
https://github.com/chronotope/chrono-tz -- clap 4.5.21, Apache-2.0 OR MIT +- clap 4.5.21, Apache-2.0 OR MIT
https://github.com/clap-rs/clap -- clickhouse-rs 1.1.0-alpha.1, MIT +- clickhouse-rs 1.1.0-alpha.1, MIT
https://github.com/suharev7/clickhouse-rs -- csv 1.3.1, MIT OR Unlicense +- csv 1.3.1, MIT OR Unlicense
https://github.com/BurntSushi/rust-csv -- dashmap 6.1.0, MIT +- dashmap 6.1.0, MIT
https://github.com/xacrimon/dashmap -- datafusion 43.0.0, Apache-2.0 +- datafusion 43.0.0, Apache-2.0
https://github.com/apache/datafusion -- datafusion-federation 0.1.6, Apache-2.0 +- datafusion-federation 0.1.6, Apache-2.0
-- datafusion-federation-sql 0.1.6, Apache-2.0 +- datafusion-federation-sql 0.1.6, Apache-2.0
-- datafusion-functions-json 0.43.0, Apache-2.0 +- datafusion-functions-json 0.43.0, Apache-2.0
https://github.com/datafusion-contrib/datafusion-functions-json/ - datafusion-table-providers 0.1.0,
https://github.com/datafusion-contrib/datafusion-table-providers -- delta_kernel 0.4.1, Apache-2.0 +- delta_kernel 0.4.1, Apache-2.0
https://github.com/delta-incubator/delta-kernel-rs -- dirs 5.0.1, Apache-2.0 OR MIT +- dirs 5.0.1, Apache-2.0 OR MIT
https://github.com/soc/dirs-rs -- docx-rs 0.4.17, MIT +- docx-rs 0.4.17, MIT
https://github.com/bokuweb/docx-rs -- dotenvy 0.15.7, MIT +- dotenvy 0.15.7, MIT
https://github.com/allan2/dotenvy -- duckdb 1.1.3, MIT +- duckdb 1.1.3, MIT
https://github.com/duckdb/duckdb-rs -- dyn-clone 1.0.17, Apache-2.0 OR MIT +- dyn-clone 1.0.17, Apache-2.0 OR MIT
https://github.com/dtolnay/dyn-clone -- either 1.13.0, Apache-2.0 OR MIT +- either 1.13.0, Apache-2.0 OR MIT
https://github.com/rayon-rs/either -- fundu 2.0.1, MIT +- fundu 2.0.1, MIT
https://github.com/fundu-rs/fundu -- futures 0.3.31, Apache-2.0 OR MIT +- futures 0.3.31, Apache-2.0 OR MIT
https://github.com/rust-lang/futures-rs -- globset 0.4.15, MIT OR Unlicense +- globset 0.4.15, MIT OR Unlicense
https://github.com/BurntSushi/ripgrep/tree/master/crates/globset -- graph-rs-sdk 2.0.1, MIT +- graph-rs-sdk 2.0.1, MIT
https://github.com/sreeise/graph-rs-sdk -- graphql-parser 0.4.0, Apache-2.0 OR MIT +- graphql-parser 0.4.0, Apache-2.0 OR MIT
-- headers-accept 0.1.4, MIT +- headers-accept 0.1.4, MIT
https://github.com/maxcountryman/headers-accept -- hf-hub 0.3.2, Apache-2.0 +- hf-hub 0.3.2, Apache-2.0
https://github.com/huggingface/hf-hub -- hostname 0.3.1, MIT +- hostname 0.3.1, MIT
https://github.com/svartalf/hostname -- hostname 0.4.0, MIT +- hostname 0.4.0, MIT
https://github.com/svartalf/hostname -- http 0.2.12, Apache-2.0 OR MIT +- http 0.2.12, Apache-2.0 OR MIT
https://github.com/hyperium/http -- http 1.1.0, Apache-2.0 OR MIT +- http 1.1.0, Apache-2.0 OR MIT
https://github.com/hyperium/http -- http-body-util 0.1.2, MIT +- http-body-util 0.1.2, MIT
https://github.com/hyperium/http-body -- humantime 2.1.0, Apache-2.0 OR MIT +- humantime 2.1.0, Apache-2.0 OR MIT
https://github.com/tailhook/humantime -- hyper 0.14.31, MIT +- hyper 0.14.31, MIT
https://github.com/hyperium/hyper -- hyper 1.5.1, MIT +- hyper 1.5.1, MIT
https://github.com/hyperium/hyper -- hyper-util 0.1.10, MIT +- hyper-util 0.1.10, MIT
https://github.com/hyperium/hyper-util -- indexmap 1.9.3, Apache-2.0 OR MIT +- indexmap 1.9.3, Apache-2.0 OR MIT
https://github.com/bluss/indexmap -- indexmap 2.6.0, Apache-2.0 OR MIT +- indexmap 2.6.0, Apache-2.0 OR MIT
https://github.com/indexmap-rs/indexmap -- insta 1.41.1, Apache-2.0 +- insta 1.41.1, Apache-2.0
https://github.com/mitsuhiko/insta -- itertools 0.10.5, Apache-2.0 OR MIT +- itertools 0.10.5, Apache-2.0 OR MIT
https://github.com/rust-itertools/itertools -- itertools 0.11.0, Apache-2.0 OR MIT +- itertools 0.11.0, Apache-2.0 OR MIT
https://github.com/rust-itertools/itertools -- itertools 0.12.1, Apache-2.0 OR MIT +- itertools 0.12.1, Apache-2.0 OR MIT
https://github.com/rust-itertools/itertools -- itertools 0.13.0, Apache-2.0 OR MIT +- itertools 0.13.0, Apache-2.0 OR MIT
https://github.com/rust-itertools/itertools -- jsonpath-rust 0.7.3, MIT +- jsonpath-rust 0.7.3, MIT
https://github.com/besok/jsonpath-rust -- jsonwebtoken 9.3.0, MIT +- jsonwebtoken 9.3.0, MIT
https://github.com/Keats/jsonwebtoken -- keyring 3.6.1, Apache-2.0 OR MIT +- keyring 3.6.1, Apache-2.0 OR MIT
https://github.com/hwchen/keyring-rs.git -- lazy_static 1.5.0, Apache-2.0 OR MIT +- lazy_static 1.5.0, Apache-2.0 OR MIT
https://github.com/rust-lang-nursery/lazy-static.rs -- logos 0.14.2, Apache-2.0 OR MIT +- logos 0.14.2, Apache-2.0 OR MIT
https://github.com/maciejhirsz/logos -- lopdf 0.34.0, MIT +- lopdf 0.34.0, MIT
https://github.com/J-F-Liu/lopdf.git -- mediatype 0.19.18, MIT +- mediatype 0.19.18, MIT
https://github.com/picoHz/mediatype -- mistralrs 0.3.2, MIT +- mistralrs 0.3.2, MIT
https://github.com/EricLBuehler/mistral.rs -- mistralrs-core 0.3.2, MIT +- mistralrs-core 0.3.2, MIT
https://github.com/EricLBuehler/mistral.rs -- moka 0.12.8, Apache-2.0 OR MIT +- moka 0.12.8, Apache-2.0 OR MIT
https://github.com/moka-rs/moka -- mysql_async 0.34.2, Apache-2.0 OR MIT +- mysql_async 0.34.2, Apache-2.0 OR MIT
https://github.com/blackbeam/mysql_async -- ndarray 0.15.6, Apache-2.0 OR MIT +- ndarray 0.15.6, Apache-2.0 OR MIT
https://github.com/rust-ndarray/ndarray -- ndarray 0.16.1, Apache-2.0 OR MIT +- ndarray 0.16.1, Apache-2.0 OR MIT
https://github.com/rust-ndarray/ndarray -- notify 7.0.0, CC0-1.0 +- notify 7.0.0, CC0-1.0
https://github.com/notify-rs/notify.git -- object_store 0.11.1, Apache-2.0 OR MIT +- object_store 0.11.1, Apache-2.0 OR MIT
https://github.com/apache/arrow-rs/tree/master/object_store -- odbc-api 8.1.2, MIT +- odbc-api 8.1.2, MIT
https://github.com/pacman82/odbc-api -- once_cell 1.20.2, Apache-2.0 OR MIT +- once_cell 1.20.2, Apache-2.0 OR MIT
https://github.com/matklad/once_cell -- opentelemetry 0.26.0, Apache-2.0 +- opentelemetry 0.26.0, Apache-2.0
https://github.com/open-telemetry/opentelemetry-rust -- opentelemetry 0.27.0, Apache-2.0 +- opentelemetry 0.27.0, Apache-2.0
https://github.com/open-telemetry/opentelemetry-rust -- opentelemetry-http 0.26.0, Apache-2.0 +- opentelemetry-http 0.26.0, Apache-2.0
https://github.com/open-telemetry/opentelemetry-rust -- opentelemetry-prometheus 0.17.0, Apache-2.0 +- opentelemetry-prometheus 0.17.0, Apache-2.0
https://github.com/open-telemetry/opentelemetry-rust -- opentelemetry-proto 0.27.0, Apache-2.0 +- opentelemetry-proto 0.27.0, Apache-2.0
https://github.com/open-telemetry/opentelemetry-rust/tree/main/opentelemetry-proto -- opentelemetry-zipkin 0.26.0, Apache-2.0 +- opentelemetry-zipkin 0.26.0, Apache-2.0
https://github.com/open-telemetry/opentelemetry-rust/tree/main/opentelemetry-zipkin -- opentelemetry_sdk 0.26.0, Apache-2.0 +- opentelemetry_sdk 0.26.0, Apache-2.0
https://github.com/open-telemetry/opentelemetry-rust -- opentelemetry_sdk 0.27.0, Apache-2.0 +- opentelemetry_sdk 0.27.0, Apache-2.0
https://github.com/open-telemetry/opentelemetry-rust -- parquet 53.2.0, Apache-2.0 +- parquet 53.2.0, Apache-2.0
https://github.com/apache/arrow-rs -- paste 1.0.15, Apache-2.0 OR MIT +- paste 1.0.15, Apache-2.0 OR MIT
https://github.com/dtolnay/paste -- pin-project 1.1.7, Apache-2.0 OR MIT +- pin-project 1.1.7, Apache-2.0 OR MIT
https://github.com/taiki-e/pin-project -- pkcs8 0.10.2, Apache-2.0 OR MIT +- pkcs8 0.10.2, Apache-2.0 OR MIT
https://github.com/RustCrypto/formats/tree/master/pkcs8 -- prometheus 0.13.4, Apache-2.0 +- prometheus 0.13.4, Apache-2.0
https://github.com/tikv/rust-prometheus -- prometheus-parse 0.2.5, Apache-2.0 +- prometheus-parse 0.2.5, Apache-2.0
https://github.com/ccakes/prometheus-parse-rs -- prost 0.11.9, Apache-2.0 +- prost 0.11.9, Apache-2.0
https://github.com/tokio-rs/prost -- prost 0.12.6, Apache-2.0 +- prost 0.12.6, Apache-2.0
https://github.com/tokio-rs/prost -- prost 0.13.3, Apache-2.0 +- prost 0.13.3, Apache-2.0
https://github.com/tokio-rs/prost -- pulldown-cmark 0.12.2, MIT +- pulldown-cmark 0.12.2, MIT
https://github.com/raphlinus/pulldown-cmark -- rand 0.7.3, Apache-2.0 OR MIT +- rand 0.7.3, Apache-2.0 OR MIT
https://github.com/rust-random/rand -- rand 0.8.5, Apache-2.0 OR MIT +- rand 0.8.5, Apache-2.0 OR MIT
https://github.com/rust-random/rand -- rdkafka 0.37.0, MIT +- rdkafka 0.37.0, MIT
https://github.com/fede1024/rust-rdkafka -- regex 1.11.1, Apache-2.0 OR MIT +- regex 1.11.1, Apache-2.0 OR MIT
https://github.com/rust-lang/regex -- reqwest 0.11.27, Apache-2.0 OR MIT +- reqwest 0.11.27, Apache-2.0 OR MIT
https://github.com/seanmonstar/reqwest -- reqwest 0.12.9, Apache-2.0 OR MIT +- reqwest 0.12.9, Apache-2.0 OR MIT
https://github.com/seanmonstar/reqwest -- reqwest-eventsource 0.6.0, Apache-2.0 OR MIT +- reqwest-eventsource 0.6.0, Apache-2.0 OR MIT
https://github.com/jpopesculian/reqwest-eventsource -- rusqlite 0.31.0, MIT +- rusqlite 0.31.0, MIT
https://github.com/rusqlite/rusqlite -- rustls 0.21.12, Apache-2.0 OR ISC OR MIT +- rustls 0.21.12, Apache-2.0 OR ISC OR MIT
https://github.com/rustls/rustls -- rustls 0.23.18, Apache-2.0 OR ISC OR MIT +- rustls 0.23.18, Apache-2.0 OR ISC OR MIT
https://github.com/rustls/rustls -- rustls-native-certs 0.6.3, Apache-2.0 OR ISC OR MIT +- rustls-native-certs 0.6.3, Apache-2.0 OR ISC OR MIT
https://github.com/ctz/rustls-native-certs -- rustls-native-certs 0.8.1, Apache-2.0 OR ISC OR MIT +- rustls-native-certs 0.8.1, Apache-2.0 OR ISC OR MIT
https://github.com/rustls/rustls-native-certs -- rustls-pemfile 1.0.4, Apache-2.0 OR ISC OR MIT +- rustls-pemfile 1.0.4, Apache-2.0 OR ISC OR MIT
https://github.com/rustls/pemfile -- rustls-pemfile 2.2.0, Apache-2.0 OR ISC OR MIT +- rustls-pemfile 2.2.0, Apache-2.0 OR ISC OR MIT
https://github.com/rustls/pemfile -- rustyline 15.0.0, MIT +- rustyline 15.0.0, MIT
https://github.com/kkawakam/rustyline -- schemars 0.8.21, MIT +- schemars 0.8.21, MIT
https://github.com/GREsau/schemars -- scopeguard 1.2.0, Apache-2.0 OR MIT +- scopeguard 1.2.0, Apache-2.0 OR MIT
https://github.com/bluss/scopeguard -- secrecy 0.8.0, Apache-2.0 OR MIT +- secrecy 0.8.0, Apache-2.0 OR MIT
https://github.com/iqlusioninc/crates/tree/main/secrecy -- serde 1.0.215, Apache-2.0 OR MIT +- serde 1.0.215, Apache-2.0 OR MIT
https://github.com/serde-rs/serde -- serde-value 0.7.0, MIT +- serde-value 0.7.0, MIT
https://github.com/arcnmx/serde-value -- serde_json 1.0.132, Apache-2.0 OR MIT +- serde_json 1.0.132, Apache-2.0 OR MIT
https://github.com/serde-rs/json -- serde_yaml 0.9.34+deprecated, Apache-2.0 OR MIT +- serde_yaml 0.9.34+deprecated, Apache-2.0 OR MIT
https://github.com/dtolnay/serde-yaml -- sha2 0.10.8, Apache-2.0 OR MIT +- sha2 0.10.8, Apache-2.0 OR MIT
https://github.com/RustCrypto/hashes -- snafu 0.8.5, Apache-2.0 OR MIT +- snafu 0.8.5, Apache-2.0 OR MIT
https://github.com/shepmaster/snafu -- snmalloc-rs 0.3.6, MIT +- snmalloc-rs 0.3.6, MIT
https://github.com/SchrodingerZhu/snmalloc-rs -- snowflake-api 0.9.0, Apache-2.0 +- snowflake-api 0.9.0, Apache-2.0
https://github.com/mycelial/snowflake-rs -- spark-connect-rs 0.0.1-beta.4, Apache-2.0 +- spark-connect-rs 0.0.1-beta.4, Apache-2.0
https://github.com/sjrusso8/spark-connect-rs -- ssh2 0.9.4, Apache-2.0 OR MIT +- ssh2 0.9.4, Apache-2.0 OR MIT
https://github.com/alexcrichton/ssh2-rs -- suppaftp 5.4.0, Apache-2.0 +- suppaftp 5.4.0, Apache-2.0
https://github.com/veeso/suppaftp -- tempfile 3.14.0, Apache-2.0 OR MIT +- tempfile 3.14.0, Apache-2.0 OR MIT
https://github.com/Stebalien/tempfile - text-embeddings-backend 1.5.0, @@ -513,86 +511,86 @@ gopkg.in/yaml.v3, https://github.com/go-yaml/yaml/blob/v3.0.1/LICENSE, MIT - text-embeddings-core 1.5.0,
-- text-splitter 0.18.1, MIT +- text-splitter 0.18.1, MIT
https://github.com/benbrandt/text-splitter -- tiberius 0.12.3, Apache-2.0 OR MIT +- tiberius 0.12.3, Apache-2.0 OR MIT
https://github.com/prisma/tiberius -- tiktoken-rs 0.6.0, MIT +- tiktoken-rs 0.6.0, MIT
https://github.com/zurawiki/tiktoken-rs -- tokenizers 0.20.3, Apache-2.0 +- tokenizers 0.20.3, Apache-2.0
https://github.com/huggingface/tokenizers -- tokio 1.41.1, MIT +- tokio 1.41.1, MIT
https://github.com/tokio-rs/tokio -- tokio-postgres 0.7.12, Apache-2.0 OR MIT +- tokio-postgres 0.7.12, Apache-2.0 OR MIT
https://github.com/sfackler/rust-postgres -- tokio-rusqlite 0.5.1, MIT +- tokio-rusqlite 0.5.1, MIT
https://github.com/programatik29/tokio-rusqlite -- tokio-rustls 0.24.1, Apache-2.0 OR MIT +- tokio-rustls 0.24.1, Apache-2.0 OR MIT
https://github.com/rustls/tokio-rustls -- tokio-rustls 0.26.0, Apache-2.0 OR MIT +- tokio-rustls 0.26.0, Apache-2.0 OR MIT
https://github.com/rustls/tokio-rustls -- tokio-stream 0.1.16, MIT +- tokio-stream 0.1.16, MIT
https://github.com/tokio-rs/tokio -- tokio-util 0.7.12, MIT +- tokio-util 0.7.12, MIT
https://github.com/tokio-rs/tokio -- tonic 0.12.3, MIT +- tonic 0.12.3, MIT
https://github.com/hyperium/tonic -- tonic-health 0.12.3, MIT +- tonic-health 0.12.3, MIT
https://github.com/hyperium/tonic -- tower 0.4.13, MIT +- tower 0.4.13, MIT
https://github.com/tower-rs/tower -- tower 0.5.1, MIT +- tower 0.5.1, MIT
https://github.com/tower-rs/tower -- tower-http 0.6.2, MIT +- tower-http 0.6.2, MIT
https://github.com/tower-rs/tower-http -- tracing 0.1.40, MIT +- tracing 0.1.40, MIT
https://github.com/tokio-rs/tracing -- tracing-futures 0.2.5, MIT +- tracing-futures 0.2.5, MIT
https://github.com/tokio-rs/tracing -- tracing-opentelemetry 0.27.0, MIT +- tracing-opentelemetry 0.27.0, MIT
https://github.com/tokio-rs/tracing-opentelemetry -- tracing-subscriber 0.3.18, MIT +- tracing-subscriber 0.3.18, MIT
https://github.com/tokio-rs/tracing -- tract-core 0.21.7, Apache-2.0 OR MIT +- tract-core 0.21.7, Apache-2.0 OR MIT
https://github.com/snipsco/tract -- tract-onnx 0.21.7, Apache-2.0 OR MIT +- tract-onnx 0.21.7, Apache-2.0 OR MIT
https://github.com/snipsco/tract -- trust-dns-resolver 0.23.2, Apache-2.0 OR MIT +- trust-dns-resolver 0.23.2, Apache-2.0 OR MIT
https://github.com/bluejekyll/trust-dns -- url 2.5.4, Apache-2.0 OR MIT +- url 2.5.4, Apache-2.0 OR MIT
https://github.com/servo/rust-url -- uuid 0.8.2, Apache-2.0 OR MIT +- uuid 0.8.2, Apache-2.0 OR MIT
https://github.com/uuid-rs/uuid -- uuid 1.11.0, Apache-2.0 OR MIT +- uuid 1.11.0, Apache-2.0 OR MIT
https://github.com/uuid-rs/uuid -- winver 1.0.0, MIT +- winver 1.0.0, MIT
https://github.com/rhysd/winver -- x509-certificate 0.23.1, MPL-2.0 +- x509-certificate 0.23.1, MPL-2.0
https://github.com/indygreg/cryptography-rs.git diff --git a/spiceaidocs/docs/api/adbc/index.md b/spiceaidocs/docs/api/adbc/index.md index e3c44f02..1387d194 100644 --- a/spiceaidocs/docs/api/adbc/index.md +++ b/spiceaidocs/docs/api/adbc/index.md @@ -17,7 +17,7 @@ Get started with ADBC using Python. ### Installation -Start Python Envionment. +Start a Python environment. ```shell python diff --git a/spiceaidocs/docs/api/arrow-flight-sql/index.md b/spiceaidocs/docs/api/arrow-flight-sql/index.md index 85f45c66..43cafaa1 100644 --- a/spiceaidocs/docs/api/arrow-flight-sql/index.md +++ b/spiceaidocs/docs/api/arrow-flight-sql/index.md @@ -7,10 +7,10 @@ description: 'Query Spice using JDBC/ODBC/ADBC' [Arrow Flight SQL](https://arrow.apache.org/docs/format/FlightSql.html) is a protocol for interacting with SQL databases using the Arrow in-memory format and the Flight RPC framework. -Spice implements the Flight SQL protocol which enables querying the datasets configured in Spice via tools that support connecting via one of the Arrow Flight SQL drivers, such as [DBeaver](https://dbeaver.io), [Tableau](https://www.tableau.com/), or [Power BI](https://www.microsoft.com/en-us/power-platform/products/power-bi). +Spice implements the Flight SQL protocol, enabling querying of the datasets configured in Spice via tools that support connecting via one of the Arrow Flight SQL drivers, such as [DBeaver](https://dbeaver.io), [Tableau](https://www.tableau.com/), or [Power BI](https://www.microsoft.com/en-us/power-platform/products/power-bi). arrow flight and spice ## Authentication -API Key authentication is supported for the Arrow Flight SQL endpoint. See [API Key Authentication](../../api/auth/index.md) for more details. +API Key authentication is supported for the Arrow Flight SQL endpoint. For more details, see [API Key Authentication](../../api/auth/index.md). diff --git a/spiceaidocs/docs/api/http/catalogs.md b/spiceaidocs/docs/api/http/catalogs.md index 1dc1bde7..859e0933 100644 --- a/spiceaidocs/docs/api/http/catalogs.md +++ b/spiceaidocs/docs/api/http/catalogs.md @@ -5,9 +5,9 @@ description: 'Fetch catalogs' sidebar_position: 5 --- -Returns the list of configured [catalogs](/components/catalogs) +The `GET /v1/catalogs` endpoint returns a list of configured catalogs. -Example: +Example request: ```bash curl --request GET \ diff --git a/spiceaidocs/docs/api/http/chat-completions.md b/spiceaidocs/docs/api/http/chat-completions.md index 1192ddef..7d6f1804 100644 --- a/spiceaidocs/docs/api/http/chat-completions.md +++ b/spiceaidocs/docs/api/http/chat-completions.md @@ -5,7 +5,7 @@ description: '' sidebar_position: 6 --- -Chat completions is an OpenAI compatible endpoint. +The `POST /v1/chat/completions` endpoint is an OpenAI compatible endpoint for generating chat completions. Specify the model by providing the component name in the `model` key. For example: diff --git a/spiceaidocs/docs/api/http/datasets.md b/spiceaidocs/docs/api/http/datasets.md index aeabd6d7..fd9e4bc9 100644 --- a/spiceaidocs/docs/api/http/datasets.md +++ b/spiceaidocs/docs/api/http/datasets.md @@ -5,9 +5,9 @@ description: 'Fetch datasets' sidebar_position: 2 --- -Returns a the list of configured datasets. +The `GET /v1/datasets` endpoint returns a list of configured datasets. -Example: +Example request: ```bash curl --request GET \ diff --git a/spiceaidocs/docs/api/http/embeddings.md b/spiceaidocs/docs/api/http/embeddings.md index 85d049d3..9a640087 100644 --- a/spiceaidocs/docs/api/http/embeddings.md +++ b/spiceaidocs/docs/api/http/embeddings.md @@ -5,7 +5,7 @@ description: '' sidebar_position: 7 --- -Chat completions is an OpenAI compatible endpoint. +The `POST /v1/embeddings` endpoint is an OpenAI compatible endpoint for generating embeddings. Specify the embedding model by providing the component name in the `model` key. For example: diff --git a/spiceaidocs/docs/api/http/index.md b/spiceaidocs/docs/api/http/index.md index 9f682ccd..4e165736 100644 --- a/spiceaidocs/docs/api/http/index.md +++ b/spiceaidocs/docs/api/http/index.md @@ -13,11 +13,11 @@ import DocCardList from '@theme/DocCardList'; ## Authentication -API Key authentication is supported for all HTTP routes. See [API Key Authentication](../../api/auth/index.md) for more details. +API Key authentication is supported for all HTTP routes. For more details, see [API Key Authentication](../../api/auth/index.md). ## Cross-Origin Resource Sharing (CORS) -CORS is disabled by default. Enable for all origins with: +CORS is disabled by default. To enable CORS for all origins, add the following configuration to your `spicepod.yaml` file: ```yaml runtime: diff --git a/spiceaidocs/docs/api/http/ml-predict.md b/spiceaidocs/docs/api/http/ml-predict.md index a87d1d3c..feff2bb5 100644 --- a/spiceaidocs/docs/api/http/ml-predict.md +++ b/spiceaidocs/docs/api/http/ml-predict.md @@ -8,9 +8,9 @@ sidebar_position: 10 import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; -Make predictions using all loaded forecasting models in parallel, useful for ensembling or A/B testing. +The `POST /v1/predict` endpoint is used to make predictions using all loaded forecasting models in parallel. This is useful for ensembling or A/B testing different models. -Example: +Example request: ```shell curl --request POST \ diff --git a/spiceaidocs/docs/api/http/refresh.md b/spiceaidocs/docs/api/http/refresh.md index a9f49229..596d4f2b 100644 --- a/spiceaidocs/docs/api/http/refresh.md +++ b/spiceaidocs/docs/api/http/refresh.md @@ -10,10 +10,11 @@ pagination_next: null Performs an on-demand refresh for an accelerated dataset. On-demand refresh applies only to `full` and `append` refresh modes (not `changes`). Request Body: - - `refresh_sql` (String, Optional): Refresh SQL to use, see [Refresh SQL docs](/components/data-accelerators/data-refresh.md#refresh-sql). Defaults to the `refresh_sql` specified in the spicepod. - - `refresh_mode` (String, Optional): Refresh mode to use, see [Refresh Modes docs](/components/data-accelerators/data-refresh.md#refresh-modes). Defaults to `refresh_mode` specified in the spicepod. - Example: +- `refresh_sql` (String, Optional): Refresh SQL to use, see [Refresh SQL docs](/components/data-accelerators/data-refresh.md#refresh-sql). Defaults to the `refresh_sql` specified in the spicepod. +- `refresh_mode` (String, Optional): Refresh mode to use, see [Refresh Modes docs](/components/data-accelerators/data-refresh.md#refresh-modes). Defaults to `refresh_mode` specified in the spicepod. + +Example: ```bash curl -i -XPOST 127.0.0.1:8090/v1/datasets/taxi_trips/acceleration/refresh \ diff --git a/spiceaidocs/docs/api/http/search.md b/spiceaidocs/docs/api/http/search.md index 9b3e54c4..30a6ed65 100644 --- a/spiceaidocs/docs/api/http/search.md +++ b/spiceaidocs/docs/api/http/search.md @@ -10,15 +10,17 @@ pagination_next: null Performs a basic vector similarity search across one or more datasets. Request Body - - `datasets` (array of strings, Optional): Names of the dataset components to perform the similarity search on. Each dataset must have exactly one column augmented with an embedding. If None, all available datasets are used. - - `text` (string): Query plaintext used to retrieve similar rows from the underlying datasets listed in the `from` request key. - - `limit` (integer): The number of rows to return, per `from` dataset. Default: 3. - - `where` (string): An SQL filter predicate to apply within the search. - - `additional_columns` (array of strings): Additional columns, from the datasets, to return in the response (under `.matches[*].metadata`). + +- `datasets` (array of strings, Optional): Names of the dataset components to perform the similarity search on. Each dataset must have exactly one column augmented with an embedding. If None, all available datasets are used. +- `text` (string): Query plaintext used to retrieve similar rows from the underlying datasets listed in the `from` request key. +- `limit` (integer): The number of rows to return, per `from` dataset. Default: 3. +- `where` (string): An SQL filter predicate to apply within the search. +- `additional_columns` (array of strings): Additional columns, from the datasets, to return in the response (under `.matches[*].metadata`). #### Example Spicepod + ```yaml embeddings: - name: embedding_maker @@ -34,6 +36,7 @@ datasets: ``` Request + ```shell curl -XPOST http://localhost:3000/v1/search \ -d '{ @@ -46,27 +49,30 @@ curl -XPOST http://localhost:3000/v1/search \ ``` Response + ```json { - "matches": [{ - "value": "I booked use some tickets", - "dataset": "app_messages", - "primary_key": {"id": "6fd5a215-0881-421d-ace0-b293b83452b5"}, - "metadata": {"timestamp": 1724716542} - }, - { - "value": "direct to Narata", - "dataset": "app_messages", - "primary_key": {"id": "8a25595f-99fb-4404-8c82-e1046d8f4c4b"}, - "metadata": {"timestamp": 1724715881} - }, - { - "value": "Yes, we're sitting together", - "dataset": "app_messages", - "primary_key": {"id": "8421ed84-b86d-4b10-b4da-7a432e8912c0"}, - "metadata": {"timestamp": 1724716123} - }], - "duration_ms": 42, + "matches": [ + { + "value": "I booked use some tickets", + "dataset": "app_messages", + "primary_key": { "id": "6fd5a215-0881-421d-ace0-b293b83452b5" }, + "metadata": { "timestamp": 1724716542 } + }, + { + "value": "direct to Narata", + "dataset": "app_messages", + "primary_key": { "id": "8a25595f-99fb-4404-8c82-e1046d8f4c4b" }, + "metadata": { "timestamp": 1724715881 } + }, + { + "value": "Yes, we're sitting together", + "dataset": "app_messages", + "primary_key": { "id": "8421ed84-b86d-4b10-b4da-7a432e8912c0" }, + "metadata": { "timestamp": 1724716123 } + } + ], + "duration_ms": 42 } ``` diff --git a/spiceaidocs/docs/api/jdbc/index.md b/spiceaidocs/docs/api/jdbc/index.md index 1ee2514c..a15fec95 100644 --- a/spiceaidocs/docs/api/jdbc/index.md +++ b/spiceaidocs/docs/api/jdbc/index.md @@ -16,17 +16,14 @@ Spice supports JDBC clients through a JDBC driver implementation based on the [F ### Download the Flight SQL JDBC driver - Find the appropriate [Flight SQL JDBC driver](https://central.sonatype.com/artifact/org.apache.arrow/flight-sql-jdbc-driver/versions) version. -- Click **Browse** next to the version you want to download +- Click **Browse** next to the version you want to download - Click the `flight-sql-jdbc-driver-XX.XX.XX.jar` file (with only the `.jar` file extension) from the list of files to download the driver jar file ### Add the driver to your application Follow the instructions specific to your application for adding a custom JDBC driver. Examples: -**Tableau**: - - Windows: `C:\Program Files\Tableau\Drivers` - - Mac: `~/Library/Tableau/Drivers` - - Linux: `/opt/tableau/tableau_driver/jdbc` - Start or restart Tableau +**Tableau**: - Windows: `C:\Program Files\Tableau\Drivers` - Mac: `~/Library/Tableau/Drivers` - Linux: `/opt/tableau/tableau_driver/jdbc` - Start or restart Tableau [Full instruction](/clients/tableau) @@ -51,8 +48,8 @@ Follow the instructions specific to your application for adding a custom JDBC dr - **URL**: `jdbc:arrow-flight-sql://{host}:{port}` - **Dialect**: `PostgreSQL` - For example: - +For example: + 1. **Ensure Spice is running** 1. Click **Connect** @@ -75,4 +72,4 @@ Replace `` with the API key value. The `user` and `password` In the configured application, run a sample query, such as `SELECT * FROM taxi_trips;` -![Query Results](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/0e9f3c0f-2e03-47f9-8d5e-65e078d7e900/public "Query Results") +![Query Results](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/0e9f3c0f-2e03-47f9-8d5e-65e078d7e900/public 'Query Results') diff --git a/spiceaidocs/docs/api/odbc/index.md b/spiceaidocs/docs/api/odbc/index.md index 31a65b09..93932df0 100644 --- a/spiceaidocs/docs/api/odbc/index.md +++ b/spiceaidocs/docs/api/odbc/index.md @@ -81,14 +81,14 @@ Spice supports ODBC clients through an ODBC driver implementation based on the [ ### ODBC Connection Parameters -| Name | Type | Description | -|-|-|-| -| host | string | The IP address or hostname for the Spice runtime. | -| port | integer | The Spice runtime Arrow Flight endpoint port number | -| useEncryption | integer | Configures the driver to use an SSL-encrypted connection. Accepted values: `true` (default) - The client communicates with the Spice runtime only using SSL encryption and `false` - SSL encryption is disabled. | -| disableCertificateVerification | integer | Specifies whether the driver should verify the host certificate against the trust store. Default is `false`| -| useSystemTrustStore | integer | Controls whether to use a CA certificate from the system's trust store, or from a specified .pem file. If `true` - The driver verifies the connection using a certificate in the system trust store. IF `false` - The driver verifies the connection using the .pem file specified by the `trustedCerts` parameter. `true` on Windows and macOS, `false` on Linux by default | -| trustedCerts | string | The full path of the .pem file containing certificates trusted by a CA, for the purpose of verifying the server. If this option is not set, then the driver defaults to using the trusted CA certificates .pem file installed by the driver. | +| Name | Type | Description | +| ------------------------------ | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| host | string | The IP address or hostname for the Spice runtime. | +| port | integer | The Spice runtime Arrow Flight endpoint port number | +| useEncryption | integer | Configures the driver to use an SSL-encrypted connection. Accepted values: `true` (default) - The client communicates with the Spice runtime only using SSL encryption and `false` - SSL encryption is disabled. | +| disableCertificateVerification | integer | Specifies whether the driver should verify the host certificate against the trust store. Default is `false` | +| useSystemTrustStore | integer | Controls whether to use a CA certificate from the system's trust store, or from a specified .pem file. If `true` - The driver verifies the connection using a certificate in the system trust store. IF `false` - The driver verifies the connection using the .pem file specified by the `trustedCerts` parameter. `true` on Windows and macOS, `false` on Linux by default | +| trustedCerts | string | The full path of the .pem file containing certificates trusted by a CA, for the purpose of verifying the server. If this option is not set, then the driver defaults to using the trusted CA certificates .pem file installed by the driver. | :::note The ODBC driver for Arrow Flight SQL does not support password-protected `.pem/.crt` files or multiple `.crt` certificates in a single `.pem/.crt` file. From d77fed497584a03b89f446a153a13bbcde145d6d Mon Sep 17 00:00:00 2001 From: Luke Kim <80174+lukekim@users.noreply.github.com> Date: Wed, 27 Nov 2024 18:29:28 -0800 Subject: [PATCH 3/8] Tweak --- .github/copilot-instructions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 7319ef74..180fe998 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -6,7 +6,7 @@ Remember to be concise, but do not omit useful information. Pay attention to det Use plain, clear, simple, easy-to-understand language. Do not use hyperbole or hype. -Avoid "allows" to describe functionality. +Avoid "allows" to describe functionality. Use "helps" where it makes sense. Always provide references and citations with links. From 40bb50a022848ccc9b30c9bb00450e678c104e95 Mon Sep 17 00:00:00 2001 From: Luke Kim <80174+lukekim@users.noreply.github.com> Date: Wed, 27 Nov 2024 18:32:01 -0800 Subject: [PATCH 4/8] Add client improvements --- spiceaidocs/docs/clients/DBeaver/index.md | 51 ++++++----- spiceaidocs/docs/clients/grafana/index.md | 91 +++++++++---------- .../docs/clients/jetbrains-datagrip/index.md | 54 +++++------ spiceaidocs/docs/clients/superset/index.md | 7 +- spiceaidocs/docs/clients/tableau/index.md | 15 ++- 5 files changed, 113 insertions(+), 105 deletions(-) diff --git a/spiceaidocs/docs/clients/DBeaver/index.md b/spiceaidocs/docs/clients/DBeaver/index.md index 5c9672b2..da257add 100644 --- a/spiceaidocs/docs/clients/DBeaver/index.md +++ b/spiceaidocs/docs/clients/DBeaver/index.md @@ -1,6 +1,6 @@ --- -title: "DBeaver" -sidebar_label: "DBeaver" +title: 'DBeaver' +sidebar_label: 'DBeaver' description: 'Configure DBeaver to query Spice via JDBC' sidebar_position: 2 pagination_prev: 'clients/index' @@ -13,57 +13,64 @@ pagination_next: null 3. Download the [Apache Arrow Flight SQL JDBC driver](https://search.maven.org/search?q=a:flight-sql-jdbc-driver) - choose the "jar" option. -4. Launch DBeaver +4. Launch DBeaver 5. In the DBeaver application menu bar, open the "Database" menu and choose: "Driver Manager": -![Driver manager menu option](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/691d1f83-c1d0-4ad8-ec8d-d8f37ccc9d00/public "Driver manager menu option") + ![Driver manager menu option](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/691d1f83-c1d0-4ad8-ec8d-d8f37ccc9d00/public 'Driver manager menu option') 6. Click the "New" button on the right: -![Driver manager new button](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/5783d944-daae-4735-99e9-976f974bc100/public "Driver manager new button") + ![Driver manager new button](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/5783d944-daae-4735-99e9-976f974bc100/public 'Driver manager new button') 7. Add the JDBC jar file: + 1. Click the "Libraries" tab 1. Click the: "Add File" button 1. Choose the "flight-sql-jdbc-driver-15.0.1.jar" jar file (the file downloaded in step 3 above) - and click "Open" - ![Select jar file](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/19900f7a-f00f-473d-780e-4a28c2ecd800/public "Select jar file") + ![Select jar file](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/19900f7a-f00f-473d-780e-4a28c2ecd800/public 'Select jar file') 1. Close the Driver editor window with the blue "OK" button on the lower-right 8. Enter the driver settings: + 1. Click the "Settings" tab - 1. In the "Driver Name" field - enter: ```Apache Arrow Flight SQL``` - 1. In the "URL Template" field - enter: ```jdbc:arrow-flight-sql://{host}:{port}?useEncryption=false&disableCertificateVerification=true``` - - If [API key authentication](../../api/auth/index.md) is enabled, the URL template should be: ```jdbc:arrow-flight-sql://{host}:{port}?useEncryption=false&disableCertificateVerification=true&user=&password=``` - where `` is the API key value + 1. In the "Driver Name" field - enter: `Apache Arrow Flight SQL` + 1. In the "URL Template" field - enter: `jdbc:arrow-flight-sql://{host}:{port}?useEncryption=false&disableCertificateVerification=true` + + - If [API key authentication](../../api/auth/index.md) is enabled, the URL template should be: `jdbc:arrow-flight-sql://{host}:{port}?useEncryption=false&disableCertificateVerification=true&user=&password=` - where `` is the API key value + 1. In the "Driver Type" drop-down box - choose: "SQLite" 1. Select "No authentication" - - This should be selected even if API key authentication is enabled in the runtime, as the API key is supplied via the URL template above. + + - This should be selected even if API key authentication is enabled in the runtime, as the API key is supplied via the URL template above. + 1. The driver manager "Edit Driver" window should look like this: - ![Driver Manager completed](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/20348c42-117b-4763-80d2-6e615b23ae00/public "Driver Manager completed") + ![Driver Manager completed](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/20348c42-117b-4763-80d2-6e615b23ae00/public 'Driver Manager completed') 1. Click the blue "OK" button on the lower-right to save the driver 1. Close the "Driver Manager" window by clicking the blue "Close" button on the lower-right. 9. Create a new Database Connection: + 1. In the DBeaver application menu bar, open the "Database" menu and choose: "New Database Connection": - ![New Database Connection](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/acdf7251-4238-44ee-9639-0c557518da00/public "New Database Connection") - 1. In the "Connect to a database" window - type: ```Flight``` in the search bar - 1. Choose the ```Apache Arrow Flight SQL``` driver - the window should look like this: - ![Connect to a database window](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/61cee5fe-dc75-4ac1-e558-eea3aff4c100/public "Connect to a database window") + ![New Database Connection](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/acdf7251-4238-44ee-9639-0c557518da00/public 'New Database Connection') + 1. In the "Connect to a database" window - type: `Flight` in the search bar + 1. Choose the `Apache Arrow Flight SQL` driver - the window should look like this: + ![Connect to a database window](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/61cee5fe-dc75-4ac1-e558-eea3aff4c100/public 'Connect to a database window') 1. Click the blue "Next >" button on the bottom of the window 1. On the next screen, the JDBC URL should be filled out already - just supply the Host (`localhost`) and Port (`50051`) values for the Spice runtime. The window should look like this: - ![Connect to a database window 2](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/2a2b2fdc-00db-49d3-5359-059b12342b00/public "Connect to a database window 2") + ![Connect to a database window 2](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/2a2b2fdc-00db-49d3-5359-059b12342b00/public 'Connect to a database window 2') 1. Click the "Test Connection" button - the window should look like this: - ![Test Connection results](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/a3fc5f5f-a39f-47ce-7955-4b384ec1ae00/public "Test Connection results") + ![Test Connection results](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/a3fc5f5f-a39f-47ce-7955-4b384ec1ae00/public 'Test Connection results') 1. Click the blue "OK" button to close the Connection test window 1. Click the "Connection details (name, type, ...)" button on the right 1. In the "General" section, enter: `Spice Runtime` for the "Connection name". It should look like this: - ![Name the Database Connection](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/f6d04fe1-92a1-4082-d4ea-e9daacaca200/public) + ![Name the Database Connection](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/f6d04fe1-92a1-4082-d4ea-e9daacaca200/public) 1. Click the blue "Finish" button to save the connection 10. Run a query: 1. Right-click on the Database Connection on the left - choose: "SQL Editor", and then: "Open SQL Console" as shown here: - ![Open SQL Console](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/642a5885-9e3f-4dd7-ef43-72bfce27bb00/public "Open SQL Console") - 1. In the Console window - run a query - something like: ```SELECT * FROM taxi_trips;``` + ![Open SQL Console](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/642a5885-9e3f-4dd7-ef43-72bfce27bb00/public 'Open SQL Console') + 1. In the Console window - run a query - something like: `SELECT * FROM taxi_trips;` 1. Click the triangle button to execute the SQL statement - as shown below (or use keyboard shortcut: Ctrl+Enter): - ![Execute SQL](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/2134e47b-a066-47e9-1d48-06352675f400/public "Execute SQL") + ![Execute SQL](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/2134e47b-a066-47e9-1d48-06352675f400/public 'Execute SQL') 1. See the query results as shown in this screenshot: - ![Query Results](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/0e9f3c0f-2e03-47f9-8d5e-65e078d7e900/public "Query Results") + ![Query Results](https://imagedelivery.net/HyTs22ttunfIlvyd6vumhQ/0e9f3c0f-2e03-47f9-8d5e-65e078d7e900/public 'Query Results') 1. DBeaver is now configured to query the Spice runtime using SQL! 🎉 diff --git a/spiceaidocs/docs/clients/grafana/index.md b/spiceaidocs/docs/clients/grafana/index.md index 7bf8372e..cb9dc7df 100644 --- a/spiceaidocs/docs/clients/grafana/index.md +++ b/spiceaidocs/docs/clients/grafana/index.md @@ -1,6 +1,6 @@ --- -title: "Grafana & Prometheus" -sidebar_label: "Grafana & Prometheus" +title: 'Grafana & Prometheus' +sidebar_label: 'Grafana & Prometheus' description: 'Monitoring Spice instances with Grafana & Prometheus' pagination_prev: 'clients/index' pagination_next: null @@ -39,65 +39,64 @@ global: ## Local Quickstart -This tutorial creates and configures Grafana and Prometheus locally to scrape and display metrics from several Spice instances. It assumes: - - Two Spice runtimes, `spiced-main` and `spiced-edge`, are running on `127.0.0.1:9091` and `127.0.0.1:9092` respectively. +This tutorial creates and configures Grafana and Prometheus locally to scrape and display metrics from several Spice instances. It assumes: - Two Spice runtimes, `spiced-main` and `spiced-edge`, are running on `127.0.0.1:9091` and `127.0.0.1:9092` respectively. 1. Create a `compose.yaml`: - ```yaml - version: "3" - services: - prometheus: - image: prom/prometheus:latest - volumes: - - ./prometheus.yaml:/etc/prometheus/prometheus.yml - ports: - - 9090:9090 - network_mode: "host" - grafana: - image: grafana/grafana:latest - volumes: - - ./.grafana/provisioning:/etc/grafana/provisioning - ports: - - 3000:3000 - network_mode: "host" - ``` + ```yaml + version: '3' + services: + prometheus: + image: prom/prometheus:latest + volumes: + - ./prometheus.yaml:/etc/prometheus/prometheus.yml + ports: + - 9090:9090 + network_mode: 'host' + grafana: + image: grafana/grafana:latest + volumes: + - ./.grafana/provisioning:/etc/grafana/provisioning + ports: + - 3000:3000 + network_mode: 'host' + ``` 1. Create a `prometheus.yaml` to - ```yaml - global: - scrape_interval: 1s - scrape_configs: - - job_name: spiced-main - static_configs: - - targets: ['127.0.0.1:9091'] - - job_name: spiced-edge - static_configs: - - targets: ['127.0.0.1:9092'] - ``` + ```yaml + global: + scrape_interval: 1s + scrape_configs: + - job_name: spiced-main + static_configs: + - targets: ['127.0.0.1:9091'] + - job_name: spiced-edge + static_configs: + - targets: ['127.0.0.1:9092'] + ``` 1. Add a prometheus as a source to grafana. Create a `.grafana/provisioning/datasources/prometheus.yml` - ```yaml - apiVersion: 1 + ```yaml + apiVersion: 1 - datasources: - - name: Prometheus - type: prometheus - access: proxy - url: http://localhost:9090 - isDefault: true - ``` + datasources: + - name: Prometheus + type: prometheus + access: proxy + url: http://localhost:9090 + isDefault: true + ``` 1. Run the Docker Compose - ```bash - docker-compose up - ``` + ```bash + docker-compose up + ``` 1. Go to `http://localhost:3000/dashboard/import` and add the JSON from [monitoring/grafana-dashboard.json](https://github.com/spiceai/spiceai/blob/trunk/monitoring/grafana-dashboard.json). 1. The dashboard will have data from the Spice runtimes. - \ No newline at end of file + diff --git a/spiceaidocs/docs/clients/jetbrains-datagrip/index.md b/spiceaidocs/docs/clients/jetbrains-datagrip/index.md index 5a4240b1..71f65eb3 100644 --- a/spiceaidocs/docs/clients/jetbrains-datagrip/index.md +++ b/spiceaidocs/docs/clients/jetbrains-datagrip/index.md @@ -1,6 +1,6 @@ --- -title: "JetBrains DataGrip" -sidebar_label: "JetBrains DataGrip" +title: 'JetBrains DataGrip' +sidebar_label: 'JetBrains DataGrip' description: 'Configure JetBrains Datagrip to query Spice via JDBC' sidebar_position: 3 pagination_prev: 'clients/index' @@ -16,35 +16,37 @@ pagination_next: null 4. Launch DataGrip 5. In Database Explorer menu, select "+" and choose "Driver" - ![Data Sources and Drivers menu option](./img/datagrip-1.png "Data Sources and Drivers menu option") + ![Data Sources and Drivers menu option](./img/datagrip-1.png 'Data Sources and Drivers menu option') 6. Add the JSBC jar file: - 1. Click the "+" button in "Driver Files" selection - 1. Click the "Custom JARs" button - 1. Choose the `flight-sql-jdbc-driver-.jar` jar file (the file downloaded in step 3 above) - and click "Open" - 1. Click the "Class:" selector - 1. Select `org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver` - ![Driver Class selector](./img/datagrip-3.png "Driver Class selector") - -8. Enter the driver settings: - 1. In the "Name" field - enter: ```Apache Arrow Flight SQL``` - 1. Add "URL Template" Default: `jdbc:arrow-flight-sql://{host}:{port}\?useEncryption=false&disableCertificateVerification=true` - 1. Click "Ok" - ![Driver creation window](./img/datagrip-4.png "Driver creation window") - -9. Create a new Database Connection: - 1. In Database Explorer menu, select "+", choose "Data Source" > "Arrow Flight JDBC" - 2. Set the host to `localhost` and the port to `50051` - 3. In "Authentication" select "No auth" - 4. Click "Test Connection" to verify - -![New Data Source](./img/datagrip-5.png "New Data Source") + + 1. Click the "+" button in "Driver Files" selection + 1. Click the "Custom JARs" button + 1. Choose the `flight-sql-jdbc-driver-.jar` jar file (the file downloaded in step 3 above) - and click "Open" + 1. Click the "Class:" selector + 1. Select `org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver` + ![Driver Class selector](./img/datagrip-3.png 'Driver Class selector') + +7. Enter the driver settings: + + 1. In the "Name" field - enter: `Apache Arrow Flight SQL` + 1. Add "URL Template" Default: `jdbc:arrow-flight-sql://{host}:{port}\?useEncryption=false&disableCertificateVerification=true` + 1. Click "Ok" + ![Driver creation window](./img/datagrip-4.png 'Driver creation window') + +8. Create a new Database Connection: + 1. In Database Explorer menu, select "+", choose "Data Source" > "Arrow Flight JDBC" + 2. Set the host to `localhost` and the port to `50051` + 3. In "Authentication" select "No auth" + 4. Click "Test Connection" to verify + +![New Data Source](./img/datagrip-5.png 'New Data Source') 10. Run a query: 1. Right-click on the connection in Database Explorer and choose "New" > "Query Console" - ![Create new Query Console](./img/datagrip-6.png "Create new Query Console") - 1. In the Console window - add a query - something like: ```SELECT * FROM taxi_trips;``` and click the triangle button to execute the SQL statement + ![Create new Query Console](./img/datagrip-6.png 'Create new Query Console') + 1. In the Console window - add a query - something like: `SELECT * FROM taxi_trips;` and click the triangle button to execute the SQL statement 1. See the query results: - ![Query Results](./img/datagrip-7.png "Query Results") + ![Query Results](./img/datagrip-7.png 'Query Results') DataGrip is now configured to query the Spice runtime using SQL! 🎉 diff --git a/spiceaidocs/docs/clients/superset/index.md b/spiceaidocs/docs/clients/superset/index.md index 967087e5..7fdb4fcc 100644 --- a/spiceaidocs/docs/clients/superset/index.md +++ b/spiceaidocs/docs/clients/superset/index.md @@ -13,7 +13,7 @@ import TabItem from '@theme/TabItem'; Use [Apache Superset](https://superset.apache.org/) to query and visualize datasets loaded in Spice. > Apache Superset is a modern, enterprise-ready business intelligence web application. It is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple pie charts to highly detailed deck.gl geospatial charts. -> +> > – [Apache Superset documentation](https://superset.apache.org/docs/intro/) ## Start Apache Superset with Flight SQL & DataFusion SQL Dialect support @@ -42,6 +42,7 @@ Select the appropriate tab based on whether you are experimenting with this feat Log into Apache Superset at [http://localhost:8088](http://localhost:8088) with the username and password `admin/admin`. Follow the below steps to configure a database connection to Spice manually, or run `make import-dashboards` to automatically configure the connection and create a sample dashboard. + ## Generic / Virtual Machine @@ -68,7 +69,7 @@ Select the appropriate tab based on whether you are experimenting with this feat ## Temporary Docker Container Modification It's possible to modify a running Docker container to install the library, but the change will be lost on container restart. - + ```bash docker exec -u root -it superset /bin/bash @@ -104,4 +105,4 @@ Click `Test Connection` to verify the connection. Click `Connect` to save the connection. -Start exploring the datasets loaded in Spice by creating a new dataset in Apache Superset to match one of the existing tables. \ No newline at end of file +Start exploring the datasets loaded in Spice by creating a new dataset in Apache Superset to match one of the existing tables. diff --git a/spiceaidocs/docs/clients/tableau/index.md b/spiceaidocs/docs/clients/tableau/index.md index c9a428f8..3fa76ce5 100644 --- a/spiceaidocs/docs/clients/tableau/index.md +++ b/spiceaidocs/docs/clients/tableau/index.md @@ -25,7 +25,7 @@ Download and install [Tableau Desktop](https://www.tableau.com/products/desktop/ - Visit the [Flight SQL JDBC driver](https://central.sonatype.com/artifact/org.apache.arrow/flight-sql-jdbc-driver/) page - Select the **Versions** tab -- Click **Browse** next to the version you want to download +- Click **Browse** next to the version you want to download - Click the `flight-sql-jdbc-driver-XX.XX.XX.jar` file (with only the `.jar` file extension) from the list of files to download the driver jar file 2. **Copy the downloaded jar file into the following directory based on your operating system** @@ -39,13 +39,15 @@ Download and install [Tableau Desktop](https://www.tableau.com/products/desktop/ 1. Open **Tableau** 2. In the **Connect** column, under **To a Server**, select **Other Databases (JDBC)**. 3. Provide the following configuration: - - **URL**: `jdbc:arrow-flight-sql://127.0.0.1:50051?useEncryption=false` - - **Dialect**: `PostgreSQL` - + +- **URL**: `jdbc:arrow-flight-sql://127.0.0.1:50051?useEncryption=false` +- **Dialect**: `PostgreSQL` + + 4. Ensure Spice is running 5. Click **Sign In** -## Working with Spice datasets +## Working with Spice datasets Once connected, Spice datasets will be listed under the `datafusion.public` schema. @@ -60,6 +62,3 @@ Tableau support is currently in alpha, and not all functionality is supported. U - - - From 70846c6124e854f2d7287eca425583889b46b916 Mon Sep 17 00:00:00 2001 From: Luke Kim <80174+lukekim@users.noreply.github.com> Date: Wed, 27 Nov 2024 18:37:13 -0800 Subject: [PATCH 5/8] Update FAQ --- spiceaidocs/docs/faq/index.md | 56 ++++++++++++++++++++++++----------- 1 file changed, 38 insertions(+), 18 deletions(-) diff --git a/spiceaidocs/docs/faq/index.md b/spiceaidocs/docs/faq/index.md index 67a6ff57..67dcedf2 100644 --- a/spiceaidocs/docs/faq/index.md +++ b/spiceaidocs/docs/faq/index.md @@ -9,42 +9,62 @@ sidebar_position: 2 ## 1. What is Spice? -Spice is a portable runtime that offers developers a unified SQL interface to materialize, accelerate, and query data from any database, data warehouse, or data lake. It functions as an application-specific, tier-optimized Database CDN. +Spice is a portable runtime written in Rust that provides a unified SQL interface for developers to materialize, accelerate, and query data from various sources, including databases, data warehouses, and data lakes. It acts as an application-specific, tier-optimized Database CDN, bringing data closer to applications for faster and more efficient access. ## 2. Why should I use Spice? -Spice makes it easy and fast to query data from one or more sources using SQL. You can co-locate a managed dataset with your application or machine learning model and accelerate it with Arrow in-memory, SQLite/DuckDB, or PostgreSQL for fast, high-concurrency, low-latency queries. +Spice simplifies querying data from one or more sources by enabling developers to co-locate datasets with their applications or machine learning models. With support for in-memory Arrow records, SQLite/DuckDB, and PostgreSQL, Spice accelerates queries with high concurrency and low latency. This makes it ideal for use cases requiring fast, reliable, and cost-efficient data access. ## 3. How is Spice different? -- **Application-focused:** Designed to integrate at the application level with a flexible 1:1 or 1:N application-to-Spice mapping. -- **Dual-Engine Acceleration:** Supports OLAP and OLTP databases at the dataset level. -- **Separation of Materialization and Storage/Compute:** Separates storage and compute for optimal data placement. -- **Edge to Cloud Native:** Deployable anywhere from standalone instances to Kubernetes containers and public clouds. +- **Application-Centric Design:** Spice is designed for 1:1 or 1:N mappings between applications and Spice instances, making it flexible for tenant-specific or customer-specific configurations. Unlike traditional databases designed for many applications sharing one data system, Spice often runs one instance per application or tenant. +- **Dual-Engine Acceleration:** Spice supports both OLAP (DuckDB/Arrow) and OLTP (SQLite/PostgreSQL) databases at the dataset level, providing flexibility for various query workloads. +- **Separation of Materialization and Storage/Compute:** Spice enables data to remain close to its source while materializing working sets for fast access, reducing data movement and query latency. +- **Deployment Flexibility:** Deployable across infrastructure tiers, including edge, on-prem, and cloud environments. Spice can run as a standalone instance, sidecar, microservice, or cluster. ## 4. Is Spice a cache? -No, but you can think of Spice data materialization as an active cache or data prefetcher. Unlike a cache that fetches data on a miss, Spice prefetches and materializes filtered data on an interval or as new data is available. +Not solely. Spice can be thought of as an active cache or working dataset prefetcher. Unlike traditional caches that fetch data on a miss, Spice prefetches and materializes data based on filters or intervals, ensuring data is ready and optimized for querying. ## 5. Is Spice a CDN for databases? -Yes, Spice acts like a CDN for different data sources. It allows you to load a working set of your database where it's most frequently accessed, such as from a data application or for AI inference. +Yes, Spice functions like a CDN for databases by loading and materializing working datasets where they are most frequently accessed. This reduces latency and improves efficiency for applications, particularly in scenarios involving frequent queries or AI inference. ## 6. How does Spice differ from Trino/Presto and Dremio? -Spice is designed for data and AI applications, while systems like Trino/Presto and Dremio are optimized for big data and real-time analytics. Spice specializes in high-concurrency, low-latency access and data materialization close to the application. +Spice is purpose-built for data and AI applications, emphasizing low-latency access, materialization, and proximity to the application. Trino/Presto and Dremio are primarily optimized for big data analytics and rely on centralized cloud clusters. -A key differentiator of Spice is its single-node distributed nature, which sets it apart from bulky, centralized Cloud Data Warehouses (CDW). Instead of consolidating data access into a central hub, Spice facilitates bringing working sets of use-case/application-specific data closer to where it's actually queried and used. This architecture provides several advantages: +Spice’s decentralized approach brings working datasets closer to their point of use, offering several key benefits: -- **Proximity:** By co-locating data with applications, Spice reduces latency and improves performance for frequent data access patterns. -- **Flexibility:** You can quickly spin up multiple, lightweight instances of Spice tailored for different datasets and use cases. -- **Scalability:** Decentralized materialization allows for better resource control and optimization, as each standalone instance can scale independently based on its workload requirements. -- **Efficiency:** Reduces the need for massive data movement operations across the network, lowering bandwidth consumption and speeding up query times. +- **Proximity to Applications:** Materialized datasets reduce latency and boost performance for frequent queries. +- **Lightweight Deployments:** Single-node runtime enables flexible scaling and avoids the need for large, centralized clusters. +- **Improved Efficiency:** Reduces data movement across networks, cutting costs and speeding up access times. -### 7. How does Spice compare to Spark? +## 7. How does Spice compare to Spark? -Spark is primarily designed for large-scale data processing and batch-processing pipelines with its distributed computing engine. In contrast, Spice is focused on accelerating data access and query speeds for applications through materialization and tier-optimized storage strategies. +While Spark excels at distributed batch processing and large-scale data transformations, Spice focuses on enabling real-time, low-latency data access for applications. By materializing data locally and supporting tiered storage, Spice accelerates query performance in use cases where fast access and high concurrency are essential. -### 8. How does Spice compare to DuckDB? +## 8. How does Spice compare to DuckDB? -DuckDB is an embedded database designed for OLAP queries on large datasets. Spice integrates DuckDB to accelerate queries and as a data connector, meaning you can use Spice to access any data DuckDB can access and query. +DuckDB is an embedded analytics database optimized for OLAP queries on large datasets. Spice integrates DuckDB as part of its runtime for materialization and acceleration. This means developers can leverage DuckDB’s capabilities within Spice while also benefiting from Spice’s broader data federation, multi-engine support, and deployment flexibility. + +## 9. Can Spice handle federated queries? + +Yes, Spice natively supports federated queries across diverse data sources with advanced query push-down capabilities. This allows it to execute portions of a query directly on the source database, reducing the amount of data transferred and improving query performance. + +## 10. What AI capabilities does Spice provide? + +Spice offers a unified API for both data and AI/ML workflows. It includes endpoints for model inference, embeddings, and an AI gateway supporting popular providers like OpenAI and Anthropic. While Spice emphasizes data readiness as the first step in AI workflows, it also accelerates AI applications by co-locating data and inference engines for real-time performance. + +## 11. What deployment options does Spice support? + +Spice is highly flexible and supports multiple deployment configurations: + +- **Standalone Binary:** Lightweight and easy to set up locally or in production environments. +- **Sidecar or Microservice:** Ideal for colocating with specific applications. +- **Cluster Deployments:** Scalable setups for large workloads. +- **Infrastructure Tiers:** Deployable across edge, on-prem, and cloud environments, enabling tiered data access optimization. + +## 12. How can I get started? + +Visit the [Spice.ai Quickstart Guide](https://docs.spiceai.org/quickstart/) to set up the runtime, connect to data sources, and start querying in minutes. Comprehensive examples and step-by-step instructions are available to help you get the most out of Spice. From 521aeda5f27cdc619c2f7f87c0f3cafd3993486b Mon Sep 17 00:00:00 2001 From: Evgenii Khramkov Date: Thu, 28 Nov 2024 12:30:59 +0900 Subject: [PATCH 6/8] Update spiceaidocs/docs/api/jdbc/index.md --- spiceaidocs/docs/api/jdbc/index.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/spiceaidocs/docs/api/jdbc/index.md b/spiceaidocs/docs/api/jdbc/index.md index a15fec95..2eea7881 100644 --- a/spiceaidocs/docs/api/jdbc/index.md +++ b/spiceaidocs/docs/api/jdbc/index.md @@ -23,7 +23,12 @@ Spice supports JDBC clients through a JDBC driver implementation based on the [F Follow the instructions specific to your application for adding a custom JDBC driver. Examples: -**Tableau**: - Windows: `C:\Program Files\Tableau\Drivers` - Mac: `~/Library/Tableau/Drivers` - Linux: `/opt/tableau/tableau_driver/jdbc` - Start or restart Tableau +**Tableau**: + +- Windows: `C:\Program Files\Tableau\Drivers` +- Mac: `~/Library/Tableau/Drivers` +- Linux: `/opt/tableau/tableau_driver/jdbc` +- Start or restart Tableau [Full instruction](/clients/tableau) From d806d5efebd8158676302b3e57f7f0b43fee39f1 Mon Sep 17 00:00:00 2001 From: Evgenii Khramkov Date: Thu, 28 Nov 2024 12:32:49 +0900 Subject: [PATCH 7/8] Update spiceaidocs/docs/clients/grafana/index.md Co-authored-by: Phillip LeBlanc --- spiceaidocs/docs/clients/grafana/index.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/spiceaidocs/docs/clients/grafana/index.md b/spiceaidocs/docs/clients/grafana/index.md index cb9dc7df..a0da464f 100644 --- a/spiceaidocs/docs/clients/grafana/index.md +++ b/spiceaidocs/docs/clients/grafana/index.md @@ -39,7 +39,8 @@ global: ## Local Quickstart -This tutorial creates and configures Grafana and Prometheus locally to scrape and display metrics from several Spice instances. It assumes: - Two Spice runtimes, `spiced-main` and `spiced-edge`, are running on `127.0.0.1:9091` and `127.0.0.1:9092` respectively. +This tutorial creates and configures Grafana and Prometheus locally to scrape and display metrics from several Spice instances. It assumes: + - Two Spice runtimes, `spiced-main` and `spiced-edge`, are running on `127.0.0.1:9091` and `127.0.0.1:9092` respectively. 1. Create a `compose.yaml`: From 000f85c5b759999266fb528d297705409f142755 Mon Sep 17 00:00:00 2001 From: Evgenii Khramkov Date: Thu, 28 Nov 2024 12:34:22 +0900 Subject: [PATCH 8/8] Update spiceaidocs/docs/intelligent-applications/index.md Co-authored-by: Phillip LeBlanc --- spiceaidocs/docs/intelligent-applications/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spiceaidocs/docs/intelligent-applications/index.md b/spiceaidocs/docs/intelligent-applications/index.md index c42c2e2a..8d5c093c 100644 --- a/spiceaidocs/docs/intelligent-applications/index.md +++ b/spiceaidocs/docs/intelligent-applications/index.md @@ -25,7 +25,7 @@ Once deployed, federated datasets are materialized locally within the Spice runt Applications interact with the Spice runtime through high-performance APIs, calling machine learning models for inference tasks such as predictions, recommendations, or anomaly detection. These models are colocated with the runtime, allowing them to leverage the same locally materialized datasets. For example, an e-commerce application could use this infrastructure to provide real-time product recommendations based on user behavior, or a manufacturing system could detect equipment failures before they happen by analyzing time-series sensor data. -As the application runs, contextual and environmental data—such as user actions or external sensor readings—is ingested into the runtime. This data is replicated back to centralized compute clusters where machine learning models are retrained and fine-tuned to improve accuracy and performance. The updated models are automatically versioned and deployed to the runtime, where they can be A/B tested in real time. This continuous feedback loop ensures that applications evolve and improve without manual intervention, reducing time to value while maintaining model relevance. +As the application runs, contextual and environmental data—such as user actions or external sensor readings are ingested into the runtime. This data is replicated back to centralized compute clusters where machine learning models are retrained and fine-tuned to improve accuracy and performance. The updated models are automatically versioned and deployed to the runtime, where they can be A/B tested in real time. This continuous feedback loop ensures that applications evolve and improve without manual intervention, reducing time to value while maintaining model relevance. ![Spice.ai Intelligent Application Workflow](https://github.com/spiceai/docs/assets/80174/22b02c5e-5fcb-4856-b79d-911ac5d084c6)