Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Commit

Permalink
Vector Search - EXPERIMENTAL (#248)
Browse files Browse the repository at this point in the history
* Init spec

* Fix the vector store fields

* Add an information on the invalid_search_vector error code

* Add an information on the invalid_vectors_field error codes

* Define the new max_vector_size analytic

* Update the open-api file with vector capabilities

* Apply suggestions from code review

* Update open-api.yaml

Co-authored-by: Maria Craig <[email protected]>

* Update open-api.yaml

Co-authored-by: Maria Craig <[email protected]>

* Update text/0118-search-api.md

Co-authored-by: Maria Craig <[email protected]>

* Update text/0061-error-format-and-definitions.md

Co-authored-by: Maria Craig <[email protected]>

---------

Co-authored-by: Kerollmops <[email protected]>
Co-authored-by: Maria Craig <[email protected]>
  • Loading branch information
3 people authored Jul 31, 2023
1 parent d82e976 commit 1a3f49d
Show file tree
Hide file tree
Showing 5 changed files with 193 additions and 69 deletions.
9 changes: 9 additions & 0 deletions open-api.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -711,6 +711,11 @@ components:
description: Query string.
default: '""'
example: '"Back to the future"'
vector:
type: array
description: Query vector.
default: 'null'
example: '[0.8, 0.145, 0.26, 0.3]'
attributesToRetrieve:
type: array
description: 'Array of attributes whose fields will be present in the returned documents. Defaults to the [displayedAttributes list](https://docs.meilisearch.com/reference/features/settings.html#displayed-attributes) which contains by default all attributes found in the documents.'
Expand Down Expand Up @@ -1754,6 +1759,8 @@ paths:
> info
> Use the reserved `_geo` object to add geo coordinates to a document. `_geo` is an object made of `lat` and `lng` field.
>
> Use the reserved `_vectors` arrays of floats to add embeddings to a document. `_vectors` is an array of floats or multiple arrays of floats in an outer array.
tags:
- Documents
security:
Expand Down Expand Up @@ -1808,6 +1815,8 @@ paths:
> info
> Use the reserved `_geo` object to add geo coordinates to a document. `_geo` is an object made of `lat` and `lng` field.
>
> Use the reserved `_vectors` arrays of floats to add embeddings to a document. `_vectors` is an array of floats or multiple arrays of floats in an outer array.
tags:
- Documents
security:
Expand Down
2 changes: 2 additions & 0 deletions text/0034-telemetry-policies.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ The collected data is sent to [Segment](https://segment.com/). Segment is a plat
| `filter.with_geoBoundingBox` | `true` if the filter rule `_geoBoundingBox` was used in this batch, otherwise `false`| false | `Documents Searched POST`, `Documents Searched GET` |
| `filter.most_used_syntax` | Most used filter syntax among all requests containing the `filter` parameter in this batch | string | `Documents Searched POST`, `Documents Searched GET` |
| `q.max_terms_number` | Highest number of terms given for the `q` parameter in this batch | 5 | `Documents Searched POST`, `Documents Searched GET` |
| `vector.max_vector_size` | Highest number of dimensions given for the `vector` parameter in this batch | 1536 | `Documents Searched POST`, `Documents Searched GET`, `Documents Searched by Multi-Search POST` |
| `pagination.max_limit` | Highest value given for the `limit` parameter in this batch | 60 | `Documents Searched POST`, `Documents Searched GET`, `Documents Fetched GET`, `Documents Fetched POST` |
| `pagination.max_offset` | Highest value given for the `offset` parameter in this batch | 1000 | `Documents Searched POST`, `Documents Searched GET`, `Documents Fetched GET`, `Documents Fetched POST` |
| `pagination.most_used_navigation` | Most used search results navigation among all search requests in this batch. `estimated` / `exhaustive` | `estimated` | `Documents Searched POST`, `Documents Searched GET` |
Expand Down Expand Up @@ -273,6 +274,7 @@ This property allows us to gather essential information to better understand on
| filter.avg_criteria_number | The average number of filter criteria among all the requests containing the `filter` parameter in the aggregated event. `"filter": []` equals to `0` while not sending `filter` does not influence the average in the aggregated event. | `4` |
| filter.most_used_syntax | The most used filter syntax among all the requests containing the requests containing the `filter` parameter in the aggregated event. `string` / `array` / `mixed` | `mixed` |
| q.max_terms_number | The maximum number of terms for the `q` parameter among all requests in the aggregated event. | `5` |
| vector.max_vector_size | The maximum number of dimensions for the `vector` parameter among all requests in the aggregated event. | `1536` |
| pagination.max_limit | The maximum limit encountered among all requests in the aggregated event. | `20` |
| pagination.max_offset | The maximum offset encountered among all requests in the aggregated event. | `1000` |
| pagination.most_used_navigation | Most used search results navigation among all requests in the aggregated event. `estimated` / `exhaustive` | `estimated` ||
Expand Down
57 changes: 57 additions & 0 deletions text/0061-error-format-and-definitions.md
Original file line number Diff line number Diff line change
Expand Up @@ -1474,6 +1474,31 @@ HTTP Code: `400 Bad Request`

---

## invalid_search_vector

`Synchronous`

### Context

This error occurs for the listed reasons:
- if a value with a different type than `Array of Float` or `null` for `vector` is specified.
- if the vector length differs from the documents `_vectors` length.

### Error Definition

HTTP Code: `400 Bad Request`

```json
{
"message": "`:deserr_helper`",
"code": "invalid_search_vector",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_vector"
}
```

---

## invalid_search_offset

`Synchronous`
Expand Down Expand Up @@ -1937,6 +1962,38 @@ These errors occurs when the `_geo` field of a document payload is not valid. Ei

---

## invalid_document_vectors_field

`Asynchronous`

### Context

This error occurs when the `_vectors` field of a document payload is not valid either due to the type of it or the number of dimensions.

### Error Definition

#### Variant: `_vectors` field value type is invalid

```json
{
"message": "The `_vectors` field in the document with the id: `:documentId` is not an array. Was expecting an array of floats or an array of arrays of floats but instead got `:field`",
"code": "invalid_document_vectors_type",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_vectors_type"
}
```

#### Variant: Number of dimensions is not correct

```json
{
"message": "Invalid vector dimensions: expected: `:expected`, found: `:found`.",
...
}
```

---

## payload_too_large

`Synchronous`
Expand Down
70 changes: 51 additions & 19 deletions text/0118-search-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ If a master key is used to secure a Meilisearch instance, the auth layer returns
| [`cropMarker`](#3115-cropmarker) | String | False |
| [`showMatchesPosition`](#3116-showmatchesposition) | Boolean | False |
| [`matchingStrategy`](#3117-matchingStrategy) | String | False |
| [`vector`](#3118-vector) `EXPERIMENTAL` | Array of Float | False |


#### 3.1.1. `q`
Expand Down Expand Up @@ -912,22 +913,34 @@ The documents containing ALL the query words (i.e. in the `q` parameter) are ret

Only the documents containing ALL the query words (i.e. in the `q` parameter) are returned by Meilisearch. If Meilisearch doesn't have enough documents to fit the requested `limit`, it returns the documents found without trying to match more documents.

#### 3.1.18. `vector` `EXPERIMENTAL`

- Type: Array of Float
- Required: False
- Default: []

Request the nearest documents based on the query vector embedding given.

- 🔴 Sending a value with a different type than `Array of Float` or `null` as a value for `vector` returns an [invalid_search_vector](0061-error-format-and-definitions.md#invalid_search_vector) error.
- 🔴 Sending a value for `vector` whose length differs from the documents `_vectors` length returns an [invalid_search_vector](0061-error-format-and-definitions.md#invalid_search_vector) error.

### 3.2. Search Response Properties

| Field | Type | Required |
|-------------------------------------------------|------------|----------|
| [`hits`](#321-hits) | Array[Hit] | True |
| [`limit`](#322-limit) | Integer | False |
| [`offset`](#323-offset) | Integer | False |
| [`estimatedTotalHits`](#324-estimatedTotalHits) | Integer | False |
| [`page`](#325-page) | Integer | False |
| [`hitsPerPage`](#326-hitsperpage) | Integer | False |
| [`totalPages`](#327-totalpages) | Integer | False |
| [`totalHits`](#328-totalhits) | Integer | False |
| [`facetDistribution`](#329-facetdistribution) | Object | False |
| [`facetStats`](#3210-facetstats) | Object | False |
| [`processingTimeMs`](#3211-processingtimems) | Integer | True |
| [`query`](#3212-query) | String | True |
| Field | Type | Required |
|-------------------------------------------------|----------------|-----------|
| [`hits`](#321-hits) | Array[Hit] | True |
| [`limit`](#322-limit) | Integer | False |
| [`offset`](#323-offset) | Integer | False |
| [`estimatedTotalHits`](#324-estimatedTotalHits) | Integer | False |
| [`page`](#325-page) | Integer | False |
| [`hitsPerPage`](#326-hitsperpage) | Integer | False |
| [`totalPages`](#327-totalpages) | Integer | False |
| [`totalHits`](#328-totalhits) | Integer | False |
| [`facetDistribution`](#329-facetdistribution) | Object | False |
| [`facetStats`](#3210-facetstats) | Object | False |
| [`processingTimeMs`](#3211-processingtimems) | Integer | True |
| [`query`](#3212-query) | String | True |
| [`vector`](#3213-vector) `EXPERIMENTAL` | Array of Float | False |

#### 3.2.1. `hits`

Expand All @@ -944,11 +957,12 @@ A search result can contain special properties. See [3.2.1.1. `hit` Special Prop

##### 3.2.1.1. `hit` Special Properties

| Field | Type | Required |
|----------------------------------------------|---------|----------|
| [`_geoDistance`](#32111-geodistance) | Integer | False |
| [`_formatted`](#32112-formatted) | Object | False |
| [`_matchesPosition`](#32113-matchesposition) | Object | False |
| Field | Type | Required |
|-------------------------------------------------------------------|---------|----------|
| [`_geoDistance`](#32111-geodistance) | Integer | False |
| [`_formatted`](#32112-formatted) | Object | False |
| [`_matchesPosition`](#32113-matchesposition) | Object | False |
| [`_semanticScore`](#32114-semanticscore) `EXPERIMENTAL` | Float | False |

###### 3.2.1.1.1. `_geoDistance`

Expand Down Expand Up @@ -1155,6 +1169,15 @@ The beginning of a matching term within a field is indicated by `start`, and its

> See [3.1.14. `showMatchesPosition`](#3116-showmatchesposition) section.
###### 3.2.1.1.4. `_semanticScore` `EXPERIMENTAL`

- Type: Float
- Required: False

Contains the semantic similarity score of the document for a vector search when `vector` has been provided. The score is represented as a dot product.

> See [3.1.18 `vector`](#3118-vector-experimental)
#### 3.2.2. `limit`

- Type: Integer
Expand Down Expand Up @@ -1271,6 +1294,15 @@ Query originating the response. Equals to the `q` search parameter.

> See [3.1.1. `q`](#311-q) section.
#### 3.2.13. `vector` `EXPERIMENTAL`

- Type: Array of Float
- Required: False

Vector query embedding originating the response. Equals to the `vector` search parameter if specified.

> See [3.1.18. `vector`](#3118-vector-experimental)
## 2. Technical Details
n/a

Expand Down
Loading

0 comments on commit 1a3f49d

Please sign in to comment.