diff --git a/spiceaidocs/docs/features/embeddings/index.md b/spiceaidocs/docs/features/embeddings/index.md index 2d55bde2..7789ed5d 100644 --- a/spiceaidocs/docs/features/embeddings/index.md +++ b/spiceaidocs/docs/features/embeddings/index.md @@ -38,7 +38,7 @@ See [Embedding components](/components/embeddings/) for more information on embe ## Embedding Methods -### Pass-through Embeddings +### Passthrough Embeddings Datasets that already include embeddings can utilize the same functionalities (e.g., vector search) as those augmented with embeddings using Spice. To ensure compatibility, these table columns must adhere to the following constraints: @@ -126,7 +126,7 @@ datasets: 3. **Embeddings Column Data Type:** - The embeddings column must have the following [Arrow data type](reference/datatypes.md) when loaded into Spice: 1. `FixedSizeList[Float32 or Float64, N]`, where `N` is the dimension (size) of the embedding vector. `FixedSizeList` is used for efficient storage and processing of fixed-size vectors. - 2. If the column is [**chunked**](#chunking-support), use `List[FixedSizeList[Float32 or Float64, N]]`. + 2. If the column is [**chunked**](#chunking), use `List[FixedSizeList[Float32 or Float64, N]]`. 4. **Offset Column for Chunked Data:** - If the underlying column is chunked, there must be an additional offset column named `_offsets` with the following Arrow data type: @@ -200,7 +200,7 @@ datasets: target_chunk_size: 512 ``` -The `body` column will be divided into chunks of approximately 512 tokens, while maintaining structural and semantic integrity (e.g. not splitting sentences). See the [API reference](/reference/spicepod/datasets.md#columns-embeddings-chunking) for full details. +The `body` column will be divided into chunks of approximately 512 tokens, while maintaining structural and semantic integrity (e.g. not splitting sentences). See the [API reference](/reference/spicepod/datasets#columns-embeddings-chunking) for full details. #### Row Identifiers diff --git a/spiceaidocs/docs/features/search/index.md b/spiceaidocs/docs/features/search/index.md index 9e8ad881..b40dce18 100644 --- a/spiceaidocs/docs/features/search/index.md +++ b/spiceaidocs/docs/features/search/index.md @@ -72,7 +72,7 @@ curl -XPOST http://localhost:8090/v1/search \ For more details, see the [API reference for /v1/search](/api/http/search). -Spice also supports vector search on datasets with preexisting embeddings. See [below](/features/embeddings/index.md#passthrough-embeddings) for compatibility details. +Spice also supports vector search on datasets with preexisting embeddings. See [below](/features/embeddings#passthrough-embeddings) for compatibility details. ### Document Retrieval @@ -136,7 +136,7 @@ Datasets that already include embeddings can utilize the same functionalities (e - The embeddings column must have the following [Arrow data type](reference/datatypes.md) when loaded into Spice: 1. `FixedSizeList[Float32 or Float64, N]`, where `N` is the dimension (size) of the embedding vector. `FixedSizeList` is used for efficient storage and processing of fixed-size vectors. - 2. If the column is [**chunked**](/features/embeddings/index.md#chunking), use `List[FixedSizeList[Float32 or Float64, N]]`. + 2. If the column is [**chunked**](/features/embeddings#chunking), use `List[FixedSizeList[Float32 or Float64, N]]`. 4. **Offset Column for Chunked Data:** - If the underlying column is chunked, there must be an additional offset column named `_offsets` with the following Arrow data type: diff --git a/spiceaidocs/docs/reference/spicepod/datasets.md b/spiceaidocs/docs/reference/spicepod/datasets.md index c14c5af2..18f21dba 100644 --- a/spiceaidocs/docs/reference/spicepod/datasets.md +++ b/spiceaidocs/docs/reference/spicepod/datasets.md @@ -422,7 +422,7 @@ Optional. For datasets without a primary key, used to explicitly specify column( Specifying a `row_id` enables unique identifier lookups for datasets from external systems that may not have a primary key. -## `columns[*].embeddings[*].chunking` +## `columns[*].embeddings[*].chunking` {#columns-embeddings-chunking} Optional. The configuration to enable and define the chunking strategy for the embedding column. @@ -501,7 +501,7 @@ Optional. The number of tokens to overlap between chunks. Defaults to `0`. Optional. If enabled, the content of each chunk will be trimmed to remove leading and trailing whitespace. Defaults to `true`. -## `metdata` +## `metdata` {#metadata} Optional. Additional key-value metadata for the dataset. Used as part of the [Semantic Data Model](/features/semantic-model/index.md).