Skip to content

Commit

Permalink
Merge pull request elastic#10985 from jpountz/enhancement/remove_filters
Browse files Browse the repository at this point in the history
Query DSL: Remove filter parsers.
  • Loading branch information
jpountz committed May 7, 2015
2 parents 6dd8434 + a0af88e commit e7540e9
Show file tree
Hide file tree
Showing 329 changed files with 2,722 additions and 9,464 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Response:

==== High-precision requests

When requesting detailed buckets (typically for displaying a "zoomed in" map) a filter like <<query-dsl-geo-bounding-box-filter,geo_bounding_box>> should be applied to narrow the subject area otherwise potentially millions of buckets will be created and returned.
When requesting detailed buckets (typically for displaying a "zoomed in" map) a filter like <<query-dsl-geo-bounding-box-query,geo_bounding_box>> should be applied to narrow the subject area otherwise potentially millions of buckets will be created and returned.

[source,js]
--------------------------------------------------
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/api-conventions.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ can be specified as a whole number representing time in milliseconds, or as a ti
=== Distance Units

Wherever distances need to be specified, such as the `distance` parameter in
the <<query-dsl-geo-distance-filter>>), the default unit if none is specified is
the <<query-dsl-geo-distance-query>>), the default unit if none is specified is
the meter. Distances can be specified in other units, such as `"1km"` or
`"2mi"` (2 miles).

Expand All @@ -174,7 +174,7 @@ Centimeter:: `cm` or `centimeters`
Millimeter:: `mm` or `millimeters`
Nautical mile:: `NM`, `nmi` or `nauticalmiles`

The `precision` parameter in the <<query-dsl-geohash-cell-filter>> accepts
The `precision` parameter in the <<query-dsl-geohash-cell-query>> accepts
distances with the above units, but if no unit is specified, then the
precision is interpreted as the length of the geohash.

Expand Down
13 changes: 4 additions & 9 deletions docs/reference/getting-started.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -865,12 +865,9 @@ curl -XPOST 'localhost:9200/bank/_search?pretty' -d '

In the previous section, we skipped over a little detail called the document score (`_score` field in the search results). The score is a numeric value that is a relative measure of how well the document matches the search query that we specified. The higher the score, the more relevant the document is, the lower the score, the less relevant the document is.

All queries in Elasticsearch trigger computation of the relevance scores. In cases where we do not need the relevance scores, Elasticsearch provides another query capability in the form of <<query-dsl-filters,filters>. Filters are similar in concept to queries except that they are optimized for much faster execution speeds for two primary reasons:
But queries do not always to produce scores, in particular when they are only used for "filtering" the document set. Elasticsearch detects these situations and automatically optimizes query execution in order not to compute useless scores.

* Filters do not score so they are faster to execute than queries
* Filters can be http://www.elastic.co/blog/all-about-elasticsearch-filter-bitsets/[cached in memory] allowing repeated search executions to be significantly faster than queries

To understand filters, let's first introduce the <<query-dsl-filtered-query,`filtered` query>>, which allows you to combine a query (like `match_all`, `match`, `bool`, etc.) together with a filter. As an example, let's introduce the <<query-dsl-range-filter,`range` filter>>, which allows us to filter documents by a range of values. This is generally used for numeric or date filtering.
To understand filters, let's first introduce the <<query-dsl-filtered-query,`filtered` query>>, which allows you to combine a query (like `match_all`, `match`, `bool`, etc.) together with another query which is only used for filtering. As an example, let's introduce the <<query-dsl-range-query,`range` query>>, which allows us to filter documents by a range of values. This is generally used for numeric or date filtering.

This example uses a filtered query to return all accounts with balances between 20000 and 30000, inclusive. In other words, we want to find accounts with a balance that is greater than or equal to 20000 and less than or equal to 30000.

Expand All @@ -894,11 +891,9 @@ curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
}'
--------------------------------------------------

Dissecting the above, the filtered query contains a `match_all` query (the query part) and a `range` filter (the filter part). We can substitute any other query into the query part as well as any other filter into the filter part. In the above case, the range filter makes perfect sense since documents falling into the range all match "equally", i.e., no document is more relevant than another.

In general, the easiest way to decide whether you want a filter or a query is to ask yourself if you care about the relevance score or not. If relevance is not important, use filters, otherwise, use queries. If you come from a SQL background, queries and filters are similar in concept to the `SELECT WHERE` clause, although more so for filters than queries.
Dissecting the above, the filtered query contains a `match_all` query (the query part) and a `range` query (the filter part). We can substitute any other queries into the query and the filter parts. In the above case, the range query makes perfect sense since documents falling into the range all match "equally", i.e., no document is more relevant than another.

In addition to the `match_all`, `match`, `bool`, `filtered`, and `range` queries, there are a lot of other query/filter types that are available and we won't go into them here. Since we already have a basic understanding of how they work, it shouldn't be too difficult to apply this knowledge in learning and experimenting with the other query/filter types.
In addition to the `match_all`, `match`, `bool`, `filtered`, and `range` queries, there are a lot of other query types that are available and we won't go into them here. Since we already have a basic understanding of how they work, it shouldn't be too difficult to apply this knowledge in learning and experimenting with the other query types.

=== Executing Aggregations

Expand Down
2 changes: 1 addition & 1 deletion docs/reference/mapping/types/geo-point-type.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ length (eg `12`, the default) or to a distance (eg `1km`).
More usefully, set the `geohash_prefix` option to `true` to not only index
the geohash value, but all the enclosing cells as well. For instance, a
geohash of `u30` will be indexed as `[u,u3,u30]`. This option can be used
by the <<query-dsl-geohash-cell-filter>> to find geopoints within a
by the <<query-dsl-geohash-cell-query>> to find geopoints within a
particular cell very efficiently.

[float]
Expand Down
4 changes: 1 addition & 3 deletions docs/reference/mapping/types/geo-shape-type.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,7 @@ used when either the data being indexed or the queries being executed
contain shapes other than just points.

You can query documents using this type using
<<query-dsl-geo-shape-filter,geo_shape Filter>>
or <<query-dsl-geo-shape-query,geo_shape
Query>>.
<<query-dsl-geo-shape-query,geo_shape Query>>.

[[geo-shape-mapping-options]]
[float]
Expand Down
3 changes: 1 addition & 2 deletions docs/reference/mapping/types/nested-type.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,7 @@ By keeping each nested object separate, the association between the
smith` would *not* match this document.

Searching on nested docs can be done using either the
<<query-dsl-nested-query,nested query>> or
<<query-dsl-nested-filter,nested filter>>.
<<query-dsl-nested-query,nested query>>.

==== Mapping

Expand Down
2 changes: 1 addition & 1 deletion docs/reference/migration/migrate_1_4.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ two ways to make sure that a field mapping exist:
[float]
=== Aliases

<<indices-aliases,Aliases>> can include <<query-dsl-filters,filters>> which
<<indices-aliases,Aliases>> can include <<query-dsl,filters>> which
are automatically applied to any search performed via the alias.
<<filtered,Filtered aliases>> created with version `1.4.0` or later can only
refer to field names which exist in the mappings of the index (or indices)
Expand Down
14 changes: 13 additions & 1 deletion docs/reference/migration/migrate_2_0.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,11 @@ in addition to the actual HTTP status code. We removed `status` field in json re

=== Java API

Some query builders have been removed or renamed:
`org.elasticsearch.index.queries.FilterBuilders` has been removed as part of the merge of
queries and filters. These filters are now available in `QueryBuilders` with the same name.
All methods that used to accept a `FilterBuilder` now accept a `QueryBuilder` instead.

In addition some query builders have been removed or renamed:

* `commonTerms(...)` renamed with `commonTermsQuery(...)`
* `queryString(...)` renamed with `queryStringQuery(...)`
Expand Down Expand Up @@ -436,6 +440,14 @@ ignored. Instead filters are always used as their own cache key and elasticsearc
makes decisions by itself about whether it should cache filters based on how
often they are used.

==== Query/filter merge

Elasticsearch no longer makes a difference between queries and filters in the
DSL; it detects when scores are not needed and automatically optimizes the
query to not compute scores and optionally caches the result.

As a consequence the `query` filter serves no purpose anymore and is deprecated.

=== Snapshot and Restore

The obsolete parameters `expand_wildcards_open` and `expand_wildcards_close` are no longer
Expand Down
36 changes: 14 additions & 22 deletions docs/reference/query-dsl.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,35 +8,27 @@ queries. In general, there are basic queries such as
<<query-dsl-term-query,term>> or
<<query-dsl-prefix-query,prefix>>. There are
also compound queries like the
<<query-dsl-bool-query,bool>> query. Queries can
also have filters associated with them such as the
<<query-dsl-bool-query,bool>> query.

While queries have scoring capabilities, in some contexts they will
only be used to filter the result set, such as in the
<<query-dsl-filtered-query,filtered>> or
<<query-dsl-constant-score-query,constant_score>>
queries, with specific filter queries.
queries.

Think of the Query DSL as an AST of queries. Certain queries can contain
other queries (like the
<<query-dsl-bool-query,bool>> query), others can
contain filters (like the
<<query-dsl-constant-score-query,constant_score>>),
and some can contain both a query and a filter (like the
<<query-dsl-filtered-query,filtered>>). Each of
those can contain *any* query of the list of queries or *any* filter
from the list of filters, resulting in the ability to build quite
Think of the Query DSL as an AST of queries.
Some queries can be used by themselves like the
<<query-dsl-term-query,term>> query but other queries can contain
queries (like the <<query-dsl-bool-query,bool>> query), and each
of these composite queries can contain *any* query of the list of
queries, resulting in the ability to build quite
complex (and interesting) queries.

Both queries and filters can be used in different APIs. For example,
Queries can be used in different APIs. For example,
within a <<search-request-query,search query>>, or
as an <<search-aggregations-bucket-filter-aggregation,aggregation filter>>.
This section explains the components (queries and filters) that can form the
AST one can use.

Filters are very handy since they perform an order of magnitude better
than plain queries since no scoring is performed and they are
automatically cached.
This section explains the queries that can form the AST one can use.

--

include::query-dsl/queries.asciidoc[]

include::query-dsl/filters.asciidoc[]
include::query-dsl/index.asciidoc[]
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
[[query-dsl-and-filter]]
=== And Filter
[[query-dsl-and-query]]
== And Query

deprecated[2.0.0, Use the `bool` filter instead]
deprecated[2.0.0, Use the `bool` query instead]

A filter that matches documents using the `AND` boolean operator on other
filters. Can be placed within queries that accept a filter.
A query that matches documents using the `AND` boolean operator on other
queries.

[source,js]
--------------------------------------------------
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[[query-dsl-bool-query]]
=== Bool Query
== Bool Query

A query that matches documents matching boolean combinations of other
queries. The bool query maps to Lucene `BooleanQuery`. It is built using
Expand All @@ -22,6 +22,9 @@ parameter.
documents.
|=======================================================================

IMPORTANT: If this query is used in a filter context and it has `should`
clauses then at least one `should` clause is required to match.

The bool query also supports `disable_coord` parameter (defaults to
`false`). Basically the coord similarity computes a score factor based
on the fraction of all query terms that a document contains. See Lucene
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[[query-dsl-boosting-query]]
=== Boosting Query
== Boosting Query

The `boosting` query can be used to effectively demote results that
match a given query. Unlike the "NOT" clause in bool query, this still
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
[[query-dsl-common-terms-query]]
=== Common Terms Query
== Common Terms Query

The `common` terms query is a modern alternative to stopwords which
improves the precision and recall of search results (by taking stopwords
into account), without sacrificing performance.

[float]
==== The problem
=== The problem

Every term in a query has a cost. A search for `"The brown fox"`
requires three term queries, one for each of `"the"`, `"brown"` and
Expand All @@ -25,7 +25,7 @@ and `"not happy"`) and we lose recall (eg text like `"The The"` or
`"To be or not to be"` would simply not exist in the index).

[float]
==== The solution
=== The solution

The `common` terms query divides the query terms into two groups: more
important (ie _low frequency_ terms) and less important (ie _high
Expand Down Expand Up @@ -63,7 +63,7 @@ site, common terms like `"clip"` or `"video"` will automatically behave
as stopwords without the need to maintain a manual list.

[float]
==== Examples
=== Examples

In this example, words that have a document frequency greater than 0.1%
(eg `"this"` and `"is"`) will be treated as _common terms_.
Expand Down
18 changes: 18 additions & 0 deletions docs/reference/query-dsl/constant-score-query.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[[query-dsl-constant-score-query]]
== Constant Score Query

A query that wraps another query and simply returns a
constant score equal to the query boost for every document in the
filter. Maps to Lucene `ConstantScoreQuery`.

[source,js]
--------------------------------------------------
{
"constant_score" : {
"filter" : {
"term" : { "user" : "kimchy"}
},
"boost" : 1.2
}
}
--------------------------------------------------
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[[query-dsl-dis-max-query]]
=== Dis Max Query
== Dis Max Query

A query that generates the union of documents produced by its
subqueries, and that scores each document with the maximum score for
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[[query-dsl-exists-filter]]
=== Exists Filter
[[query-dsl-exists-query]]
== Exists Query

Returns documents that have at least one non-`null` value in the original field:

Expand All @@ -14,7 +14,7 @@ Returns documents that have at least one non-`null` value in the original field:
}
--------------------------------------------------

For instance, these documents would all match the above filter:
For instance, these documents would all match the above query:

[source,js]
--------------------------------------------------
Expand All @@ -28,7 +28,7 @@ For instance, these documents would all match the above filter:
<2> Even though the `standard` analyzer would emit zero tokens, the original field is non-`null`.
<3> At least one non-`null` value is required.

These documents would *not* match the above filter:
These documents would *not* match the above query:

[source,js]
--------------------------------------------------
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,9 @@
[[query-dsl-filtered-query]]
=== Filtered Query
== Filtered Query

The `filtered` query is used to combine another query with any
<<query-dsl-filters,filter>>. Filters are usually faster than queries because:

* they don't have to calculate the relevance `_score` for each document --
the answer is just a boolean ``Yes, the document matches the filter'' or
``No, the document does not match the filter''.
* the results from most filters can be cached in memory, making subsequent
executions faster.
The `filtered` query is used to combine a query which will be used for
scoring with another query which will only be used for filtering the result
set.

TIP: Exclude as many document as you can with a filter, then query just the
documents that remain.
Expand Down Expand Up @@ -50,7 +45,7 @@ curl -XGET localhost:9200/_search -d '
<1> The `filtered` query is passed as the value of the `query`
parameter in the search request.

==== Filtering without a query
=== Filtering without a query

If a `query` is not specified, it defaults to the
<<query-dsl-match-all-query,`match_all` query>>. This means that the
Expand All @@ -77,7 +72,7 @@ curl -XGET localhost:9200/_search -d '
==== Multiple filters

Multiple filters can be applied by wrapping them in a
<<query-dsl-bool-filter,`bool` filter>>, for example:
<<query-dsl-bool-query,`bool` query>>, for example:

[source,js]
--------------------------------------------------
Expand All @@ -98,9 +93,6 @@ Multiple filters can be applied by wrapping them in a
}
--------------------------------------------------

Similarly, multiple queries can be combined with a
<<query-dsl-bool-query,`bool` query>>.

==== Filter strategy

You can control how the filter and query are executed with the `strategy`
Expand Down
Loading

0 comments on commit e7540e9

Please sign in to comment.