Skip to content

Commit

Permalink
linking
Browse files Browse the repository at this point in the history
  • Loading branch information
Jeadie committed Dec 21, 2024
1 parent bfb525d commit 2948c36
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 7 deletions.
6 changes: 3 additions & 3 deletions spiceaidocs/docs/features/embeddings/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ See [Embedding components](/components/embeddings/) for more information on embe
## Embedding Methods
### Pass-through Embeddings
### Passthrough Embeddings
Datasets that already include embeddings can utilize the same functionalities (e.g., vector search) as those augmented with embeddings using Spice. To ensure compatibility, these table columns must adhere to the following constraints:
Expand Down Expand Up @@ -126,7 +126,7 @@ datasets:
3. **Embeddings Column Data Type:**
- The embeddings column must have the following [Arrow data type](reference/datatypes.md) when loaded into Spice:
1. `FixedSizeList[Float32 or Float64, N]`, where `N` is the dimension (size) of the embedding vector. `FixedSizeList` is used for efficient storage and processing of fixed-size vectors.
2. If the column is [**chunked**](#chunking-support), use `List[FixedSizeList[Float32 or Float64, N]]`.
2. If the column is [**chunked**](#chunking), use `List[FixedSizeList[Float32 or Float64, N]]`.

4. **Offset Column for Chunked Data:**
- If the underlying column is chunked, there must be an additional offset column named `<column_name>_offsets` with the following Arrow data type:
Expand Down Expand Up @@ -200,7 +200,7 @@ datasets:
target_chunk_size: 512
```

The `body` column will be divided into chunks of approximately 512 tokens, while maintaining structural and semantic integrity (e.g. not splitting sentences). See the [API reference](/reference/spicepod/datasets.md#columns-embeddings-chunking) for full details.
The `body` column will be divided into chunks of approximately 512 tokens, while maintaining structural and semantic integrity (e.g. not splitting sentences). See the [API reference](/reference/spicepod/datasets#columns-embeddings-chunking) for full details.

#### Row Identifiers

Expand Down
4 changes: 2 additions & 2 deletions spiceaidocs/docs/features/search/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ curl -XPOST http://localhost:8090/v1/search \

For more details, see the [API reference for /v1/search](/api/http/search).

Spice also supports vector search on datasets with preexisting embeddings. See [below](/features/embeddings/index.md#passthrough-embeddings) for compatibility details.
Spice also supports vector search on datasets with preexisting embeddings. See [below](/features/embeddings#passthrough-embeddings) for compatibility details.

### Document Retrieval

Expand Down Expand Up @@ -136,7 +136,7 @@ Datasets that already include embeddings can utilize the same functionalities (e

- The embeddings column must have the following [Arrow data type](reference/datatypes.md) when loaded into Spice:
1. `FixedSizeList[Float32 or Float64, N]`, where `N` is the dimension (size) of the embedding vector. `FixedSizeList` is used for efficient storage and processing of fixed-size vectors.
2. If the column is [**chunked**](/features/embeddings/index.md#chunking), use `List[FixedSizeList[Float32 or Float64, N]]`.
2. If the column is [**chunked**](/features/embeddings#chunking), use `List[FixedSizeList[Float32 or Float64, N]]`.

4. **Offset Column for Chunked Data:**
- If the underlying column is chunked, there must be an additional offset column named `<column_name>_offsets` with the following Arrow data type:
Expand Down
4 changes: 2 additions & 2 deletions spiceaidocs/docs/reference/spicepod/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -422,7 +422,7 @@ Optional. For datasets without a primary key, used to explicitly specify column(

Specifying a `row_id` enables unique identifier lookups for datasets from external systems that may not have a primary key.

## `columns[*].embeddings[*].chunking`
## `columns[*].embeddings[*].chunking` {#columns-embeddings-chunking}

Optional. The configuration to enable and define the chunking strategy for the embedding column.

Expand Down Expand Up @@ -501,7 +501,7 @@ Optional. The number of tokens to overlap between chunks. Defaults to `0`.

Optional. If enabled, the content of each chunk will be trimmed to remove leading and trailing whitespace. Defaults to `true`.

## `metdata`
## `metdata` {#metadata}

Optional. Additional key-value metadata for the dataset. Used as part of the [Semantic Data Model](/features/semantic-model/index.md).

Expand Down

0 comments on commit 2948c36

Please sign in to comment.