Skip to content

Commit

Permalink
[DOCS] Rewrite terms query (#42889)
Browse files Browse the repository at this point in the history
  • Loading branch information
jrodewig committed Jun 6, 2019
1 parent 27b7f10 commit cd55995
Show file tree
Hide file tree
Showing 2 changed files with 212 additions and 81 deletions.
1 change: 1 addition & 0 deletions docs/reference/index-modules.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,7 @@ specific index module:
This setting is only applicable when highlighting is requested on a text that was indexed without offsets or term vectors.
By default this settings is unset in 6.x, defaults to `-1`.

[[index-max-terms-count]]
`index.max_terms_count`::

The maximum number of terms that can be used in Terms Query.
Expand Down
292 changes: 211 additions & 81 deletions docs/reference/query-dsl/terms-query.asciidoc
Original file line number Diff line number Diff line change
@@ -1,125 +1,255 @@
[[query-dsl-terms-query]]
=== Terms Query

Filters documents that have fields that match any of the provided terms
(*not analyzed*). For example:
Returns documents that contain one or more *exact* terms in a provided field.

The `terms` query is the same as the <<query-dsl-term-query, `term` query>>,
except you can search for multiple values.

[[terms-query-ex-request]]
==== Example request

The following search returns documents where the `user` field contains `kimchy`
or `elasticsearch`.

[source,js]
--------------------------------------------------
----
GET /_search
{
"query": {
"terms" : { "user" : ["kimchy", "elasticsearch"]}
"query" : {
"terms" : {
"user" : ["kimchy", "elasticsearch"],
"boost" : 1.0
}
}
}
--------------------------------------------------
----
// CONSOLE

NOTE: Highlighting `terms` queries is best-effort only, so terms of a `terms`
query might not be highlighted depending on the highlighter implementation that
is selected and on the number of terms in the `terms` query.
[[terms-top-level-params]]
==== Top-level parameters for `terms`
`<field>`::
+
--
Field you wish to search.

The value of this parameter is an array of terms you wish to find in the
provided field. To return a document, one or more terms must exactly match a
field value, including whitespace and capitalization.

By default, {es} limits the `terms` query to a maximum of 65,536
terms. You can change this limit using the <<index-max-terms-count,
`index.max_terms_count`>> setting.

[NOTE]
To use the field values of an existing document as search terms, use the
<<query-dsl-terms-lookup, terms lookup>> parameters.
--

`boost`::
+
--
Floating point number used to decrease or increase the
<<query-filter-context, relevance scores>> of a query. Default is `1.0`.
Optional.

You can use the `boost` parameter to adjust relevance scores for searches
containing two or more queries.

Boost values are relative to the default value of `1.0`. A boost value between
`0` and `1.0` decreases the relevance score. A value greater than `1.0`
increases the relevance score.
--

[[terms-query-notes]]
==== Notes

[[query-dsl-terms-query-highlighting]]
===== Highlighting `terms` queries
<<search-request-highlighting,Highlighting>> is best-effort only. {es} may not
return highlight results for `terms` queries depending on:

* Highlighter type
* Number of terms in the query

[float]
[[query-dsl-terms-lookup]]
===== Terms lookup mechanism
===== Terms lookup
Terms lookup fetches the field values of an existing document. {es} then uses
those values as search terms. This can be helpful when searching for a large set
of terms.

When it's needed to specify a `terms` filter with a lot of terms it can
be beneficial to fetch those term values from a document in an index. A
concrete example would be to filter tweets tweeted by your followers.
Potentially the amount of user ids specified in the terms filter can be
a lot. In this scenario it makes sense to use the terms filter's terms
lookup mechanism.
Because terms lookup fetches values from a document, the <<mapping-source-field,
`_source`>> mapping field must be enabled to use terms lookup. The `_source`
field is enabled by default.

The terms lookup mechanism supports the following options:
[NOTE]
By default, {es} limits the `terms` query to a maximum of 65,536
terms. This includes terms fetched using terms lookup. You can change
this limit using the <<index-max-terms-count, `index.max_terms_count`>> setting.

[horizontal]
To perform a terms lookup, use the following parameters.

[[query-dsl-terms-lookup-params]]
====== Terms lookup parameters
`index`::
The index to fetch the term values from.
Name of the index from which to fetch field values.

`type`::
The type to fetch the term values from.

`id`::
The id of the document to fetch the term values from.
<<mapping-id-field,ID>> of the document from which to fetch field values.

`path`::
The field specified as path to fetch the actual values for the
`terms` filter.
+
--
Name of the field from which to fetch field values. {es} uses
these values as search terms for the query.

If the field values include an array of nested inner objects, you can access
those objects using dot notation syntax.
--

`routing`::
A custom routing value to be used when retrieving the
external terms doc.

The values for the `terms` filter will be fetched from a field in a
document with the specified id in the specified type and index.
Internally a get request is executed to fetch the values from the
specified path. At the moment for this feature to work the `_source`
needs to be stored.

Also, consider using an index with a single shard and fully replicated
across all nodes if the "reference" terms data is not large. The lookup
terms filter will prefer to execute the get request on a local node if
possible, reducing the need for networking.

[WARNING]
Executing a Terms Query request with a lot of terms can be quite slow,
as each additional term demands extra processing and memory.
To safeguard against this, the maximum number of terms that can be used
in a Terms Query both directly or through lookup has been limited to `65536`.
This default maximum can be changed for a particular index with the index setting
`index.max_terms_count`.

[float]
===== Terms lookup twitter example
At first we index the information for user with id 2, specifically, its
followers, then index a tweet from user with id 1. Finally we search on
all the tweets that match the followers of user 2.
Custom <<mapping-routing-field, routing value>> of the document from which to
fetch term values. If a custom routing value was provided when the document was
indexed, this parameter is required.

[[query-dsl-terms-lookup-example]]
====== Terms lookup example

To see how terms lookup works, try the following example.

. Create an index with a `keyword` field named `color`.
+
--

[source,js]
--------------------------------------------------
PUT /users/_doc/2
----
PUT my_index
{
"followers" : ["1", "3"]
"mappings" : {
"properties" : {
"color" : { "type" : "keyword" }
}
}
}
----
// CONSOLE
--

PUT /tweets/_doc/1
. Index a document with an ID of 1 and values of `["blue", "green"]` in the
`color` field.
+
--

[source,js]
----
PUT my_index/_doc/1
{
"user" : "1"
"color": ["blue", "green"]
}
----
// CONSOLE
// TEST[continued]
--

GET /tweets/_search
. Index another document with an ID of 2 and value of `blue` in the `color`
field.
+
--

[source,js]
----
PUT my_index/_doc/2
{
"query" : {
"terms" : {
"user" : {
"index" : "users",
"type" : "_doc",
"id" : "2",
"path" : "followers"
}
}
}
"color": "blue"
}
--------------------------------------------------
----
// CONSOLE
// TEST[continued]
--

. Use the `terms` query with terms lookup parameters to find documents
containing one or more of the same terms as document 2. Include the `pretty`
parameter so the response is more readable.
+
--

////
[source,js]
----
POST my_index/_refresh
----
// CONSOLE
// TEST[continued]
The structure of the external terms document can also include an array of
inner objects, for example:
////

[source,js]
--------------------------------------------------
PUT /users/_doc/2
----
GET my_index/_search?pretty
{
"followers" : [
{
"id" : "1"
},
{
"id" : "2"
}
]
"query": {
"terms": {
"color" : {
"index" : "my_index",
"id" : "2",
"path" : "color"
}
}
}
}
--------------------------------------------------
----
// CONSOLE
// TEST[continued]

Because document 2 and document 1 both contain `blue` as a value in the `color`
field, {es} returns both documents.

In which case, the lookup path will be `followers.id`.
[source,js]
----
{
"took" : 17,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"color" : [
"blue",
"green"
]
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"color" : "blue"
}
}
]
}
}
----
// TESTRESPONSE[s/"took" : 17/"took" : $body.took/]
--

0 comments on commit cd55995

Please sign in to comment.