[DOCS] Add sparse-vector field type to docs, changed references (#100348

)
elastic · Oct 6, 2023 · f2dfbfe · f2dfbfe
1 parent 7cffacb
commit f2dfbfe
Show file tree

Hide file tree

Showing 5 changed files with 178 additions and 139 deletions.
diff --git a/docs/reference/mapping/types.asciidoc b/docs/reference/mapping/types.asciidoc
@@ -83,6 +83,7 @@ as-you-type completion.
 ==== Document ranking types
 
 <<dense-vector,`dense_vector`>>::   Records dense vectors of float values.
+<<sparse-vector,`sparse_vector`>>:: Records sparse vectors of float values.
 <<rank-feature,`rank_feature`>>::   Records a numeric feature to boost hits at
                                     query time.
 <<rank-features,`rank_features`>>:: Records numeric features to boost hits at
@@ -179,6 +180,8 @@ include::types/search-as-you-type.asciidoc[]
 
 include::types/shape.asciidoc[]
 
+include::types/sparse-vector.asciidoc[]
+
 include::types/text.asciidoc[]
 
 include::types/token-count.asciidoc[]

diff --git a/docs/reference/mapping/types/sparse-vector.asciidoc b/docs/reference/mapping/types/sparse-vector.asciidoc
@@ -0,0 +1,36 @@
+[[sparse-vector]]
+=== Sparse vector field type
+++++
+<titleabbrev>Sparse vector</titleabbrev>
+++++
+
+A `sparse_vector` field can index features and weights so that they can later be used to query
+documents in queries with a <<query-dsl-text-expansion-query,`text_expansion`>> query.
+
+`sparse_vector` is the field type that should be used with <<elser-mappings, ELSER mappings>>.
+
+[source,console]
+--------------------------------------------------
+PUT my-index
+{
+  "mappings": {
+    "properties": {
+      "text.tokens": {
+        "type": "sparse_vector"
+      }
+    }
+  }
+}
+--------------------------------------------------
+
+See <<semantic-search-elser, semantic search with ELSER>> for a complete example on adding documents
+ to a `sparse_vector` mapped field using ELSER.
+
+NOTE: `sparse_vector` fields only support single-valued fields and strictly positive
+values. Multi-valued fields and negative values will be rejected.
+
+NOTE: `sparse_vector` fields do not support querying, sorting or aggregating. They may
+only be used within <<query-dsl-text-expansion-query,`text_expansion`>> queries.
+
+NOTE: `sparse_vector` fields only preserve 9 significant bits for the precision, which
+translates to a relative error of about 0.4%.
diff --git a/docs/reference/query-dsl/text-expansion-query.asciidoc b/docs/reference/query-dsl/text-expansion-query.asciidoc
@@ -4,9 +4,9 @@
 <titleabbrev>Text expansion</titleabbrev>
 ++++
 
-The text expansion query uses a {nlp} model to convert the query text into a 
-list of token-weight pairs which are then used in a query against a 
-<<rank-features,rank features field>>.
+The text expansion query uses a {nlp} model to convert the query text into a
+list of token-weight pairs which are then used in a query against a
+<<sparse-vector,sparse vector>> or <<rank-features,rank features>> field.
 
 [discrete]
 [[text-expansion-query-ex-request]]
@@ -19,7 +19,7 @@ GET _search
 {
    "query":{
       "text_expansion":{
-         "<rank_features_field>":{
+         "<sparse_vector_field>":{
             "model_id":"the model to produce the token weights",
             "model_text":"the query string"
          }
@@ -33,33 +33,33 @@ GET _search
 [[text-expansion-query-params]]
 === Top level parameters for `text_expansion`
 
-`<rank_features_field>`:::
+`<sparse_vector_field>`:::
 (Required, object)
-The name of the field that contains the token-weight pairs the NLP model created 
+The name of the field that contains the token-weight pairs the NLP model created
 based on the input text.
 
 [discrete]
 [[text-expansion-rank-feature-field-params]]
-=== Top level parameters for `<rank_features_field>`
+=== Top level parameters for `<sparse_vector_field>`
 
 `model_id`::::
 (Required, string)
-The ID of the model to use to convert the query text into token-weight pairs. It 
-must be the same model ID that was used to create the tokens from the input 
+The ID of the model to use to convert the query text into token-weight pairs. It
+must be the same model ID that was used to create the tokens from the input
 text.
 
 `model_text`::::
 (Required, string)
-The query text you want to use for search. 
+The query text you want to use for search.
 
 
 [discrete]
 [[text-expansion-query-example]]
 === Example
 
-The following is an example of the `text_expansion` query that references the 
-ELSER model to perform semantic search. For a more detailed description of how 
-to perform semantic search by using ELSER and the `text_expansion` query, refer 
+The following is an example of the `text_expansion` query that references the
+ELSER model to perform semantic search. For a more detailed description of how
+to perform semantic search by using ELSER and the `text_expansion` query, refer
 to <<semantic-search-elser,this tutorial>>.
 
 [source,console]
@@ -82,25 +82,25 @@ GET my-index/_search
 [[optimizing-text-expansion]]
 === Optimizing the search performance of the text_expansion query
 
-https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand[Max WAND] 
-is an optimization technique used by {es} to skip documents that cannot score 
-competitively against the current best matching documents. However, the tokens 
-generated by the ELSER model don't work well with the Max WAND optimization. 
-Consequently, enabling Max WAND can actually increase query latency for 
-`text_expansion`. For datasets of a significant size, disabling Max 
+https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand[Max WAND]
+is an optimization technique used by {es} to skip documents that cannot score
+competitively against the current best matching documents. However, the tokens
+generated by the ELSER model don't work well with the Max WAND optimization.
+Consequently, enabling Max WAND can actually increase query latency for
+`text_expansion`. For datasets of a significant size, disabling Max
 WAND leads to lower query latencies.
 
 Max WAND is controlled by the
-<<track-total-hits, track_total_hits>> query parameter. Setting track_total_hits 
-to true forces {es} to consider all documents, resulting in lower query 
-latencies for the `text_expansion` query. However, other {es} queries run slower 
+<<track-total-hits, track_total_hits>> query parameter. Setting track_total_hits
+to true forces {es} to consider all documents, resulting in lower query
+latencies for the `text_expansion` query. However, other {es} queries run slower
 when Max WAND is disabled.
 
-If you are combining the `text_expansion` query with standard text queries in a 
-compound search, it is recommended to measure the query performance before 
+If you are combining the `text_expansion` query with standard text queries in a
+compound search, it is recommended to measure the query performance before
 deciding which setting to use.
 
-NOTE: The `track_total_hits` option applies to all queries in the search request 
-and may be optimal for some queries but not for others. Take into account the 
-characteristics of all your queries to determine the most suitable 
+NOTE: The `track_total_hits` option applies to all queries in the search request
+and may be optimal for some queries but not for others. Take into account the
+characteristics of all your queries to determine the most suitable
 configuration.