Skip to content

Commit

Permalink
Allow format sort values of date fields (#70357)
Browse files Browse the repository at this point in the history
If a search after request targets multiple indices and some of its sort 
field has type `date` in one index but `date_nanos` in other indices,
then Elasticsearch won't interpret the search_after parameter correctly
in every target index. The sort value of a date field by default is a
long of milliseconds since the epoch while a date_nanos field is a long
of nanoseconds.

This commit introduces the `format` parameter in the sort field so a 
sort value of a date or date_nanos will be formatted using a date format
in a search response.

The below example illustrates how to use this new parameter.

```js
{
    "query": {
        "match_all": {}
    },
    "sort": [
        {
            "timestamp": { 
                "order": "asc",
                "format": "strict_date_optional_time_nanos"
           }
        }
    ]
}
```

```js
{
    "query": {
        "match_all": {}
    },
    "sort": [
        {
            "timestamp": { 
                "order": "asc",
                "format": "strict_date_optional_time_nanos"
            }
        }
    ],
    "search_after": [
        "2015-01-01T12:10:30.123456789Z" // in `strict_date_optional_time_nanos` format
    ]
}
```

Closes #69192
  • Loading branch information
dnhatn authored Mar 17, 2021
1 parent fec4ee5 commit 8b5aa84
Show file tree
Hide file tree
Showing 9 changed files with 396 additions and 26 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,12 @@ NOTE: Search after requests have optimizations that make them faster when the so
order is `_shard_doc` and total hits are not tracked. If you want to iterate over all documents regardless of the
order, this is the most efficient option.

IMPORTANT: If the `sort` field is a <<date,`date`>> in some target data streams or indices
but a <<date_nanos,`date_nanos`>> field in other targets, use the `numeric_type` parameter
to convert the values to a single resolution and the `format` parameter to specify a
<<mapping-date-format, date format>> for the `sort` field. Otherwise, {es} won't interpret
the search after parameter correctly in each request.

[source,console]
----
GET /_search
Expand All @@ -96,7 +102,7 @@ GET /_search
"keep_alive": "1m"
},
"sort": [ <2>
{"@timestamp": "asc"}
{"@timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos", "numeric_type" : "date_nanos" }}
]
}
----
Expand All @@ -107,7 +113,7 @@ GET /_search

The search response includes an array of `sort` values for each hit. If you used
a PIT, a tiebreaker is included as the last `sort` values for each hit.
This tiebreaker called `_shard_doc` is added automically on every search requests that use a PIT.
This tiebreaker called `_shard_doc` is added automatically on every search requests that use a PIT.
The `_shard_doc` value is the combination of the shard index within the PIT and the Lucene's internal doc ID,
it is unique per document and constant within a PIT.
You can also add the tiebreaker explicitly in the search request to customize the order:
Expand All @@ -127,7 +133,7 @@ GET /_search
"keep_alive": "1m"
},
"sort": [ <2>
{"@timestamp": "asc"},
{"@timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos"}},
{"_shard_doc": "desc"}
]
}
Expand Down Expand Up @@ -156,7 +162,7 @@ GET /_search
"_score" : null,
"_source" : ...,
"sort" : [ <2>
4098435132000,
"2021-05-20T05:30:04.832Z",
4294967298 <3>
]
}
Expand Down Expand Up @@ -190,10 +196,10 @@ GET /_search
"keep_alive": "1m"
},
"sort": [
{"@timestamp": "asc"}
{"@timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos"}}
],
"search_after": [ <2>
4098435132000,
"2021-05-20T05:30:04.832Z",
4294967298
],
"track_total_hits": false <3>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ PUT /my-index-000001
GET /my-index-000001/_search
{
"sort" : [
{ "post_date" : {"order" : "asc"}},
{ "post_date" : {"order" : "asc", "format": "strict_date_optional_time_nanos"}},
"user",
{ "name" : "desc" },
{ "age" : "desc" },
Expand All @@ -51,8 +51,25 @@ should sort by `_doc`. This especially helps when <<scroll-search-results,scroll
[discrete]
=== Sort Values

The sort values for each document returned are also returned as part of
the response.
The search response includes `sort` values for each document. Use the `format`
parameter to specify a <<built-in-date-formats,date format>> for the `sort`
values of <<date,`date`>> and <<date_nanos,`date_nanos`>> fields. The following
search returns `sort` values for the `post_date` field in the
`strict_date_optional_time_nanos` format.

[source,console]
--------------------------------------------------
GET /my-index-000001/_search
{
"sort" : [
{ "post_date" : {"format": "strict_date_optional_time_nanos"}}
],
"query" : {
"term" : { "user" : "kimchy" }
}
}
--------------------------------------------------
// TEST[continued]

[discrete]
=== Sort Order
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -242,3 +242,134 @@
size: 1
sort: ["_shard_doc"]
search_after: [ 0L ]

---
"Format sort values":
- skip:
version: " - 7.99.99"
reason: Format sort output is introduced in 8.0

- do:
indices.create:
index: test
body:
mappings:
properties:
timestamp:
type: date
format: yyyy-MM-dd HH:mm:ss.SSS
- do:
indices.create:
index: test_nanos
body:
mappings:
properties:
timestamp:
type: date_nanos
format: dd/MM/yyyy HH:mm:ss.SSS
- do:
bulk:
refresh: true
index: test
body: |
{"index":{}}
{"timestamp":"2021-10-13 00:30:04.828"}
{"index":{}}
{"timestamp":"2021-06-11 04:30:04.828"}
{"index":{}}
{"timestamp":"2021-02-11 08:30:04.828"}
- do:
bulk:
refresh: true
index: test_nanos
body: |
{"index":{}}
{"timestamp":"21/08/2021 03:30:04.732"}
{"index":{}}
{"timestamp":"20/05/2021 05:30:04.832"}
{"index":{}}
{"timestamp":"15/04/2021 06:30:04.821"}
- do:
search:
index: test
body:
size: 1
sort: [{timestamp: {"order" : "asc", "format": "strict_date_optional_time_nanos"}}]
- match: {hits.total.value: 3 }
- length: {hits.hits: 1 }
- match: {hits.hits.0._source.timestamp: "2021-02-11 08:30:04.828" }
- match: {hits.hits.0.sort: ["2021-02-11T08:30:04.828Z"] }

- do:
search:
index: test
body:
size: 1
sort: [{timestamp: {"order" : "asc", "format": "strict_date_optional_time_nanos"}}]
search_after: ["2021-02-11T08:30:04.828Z"]
- match: {hits.total.value: 3 }
- length: {hits.hits: 1 }
- match: {hits.hits.0._source.timestamp: "2021-06-11 04:30:04.828" }
- match: {hits.hits.0.sort: ["2021-06-11T04:30:04.828Z"] }

# mismatch format
- do:
catch: /failed to parse date field/
search:
index: test
body:
size: 1
sort: [{ timestamp: {"order" : "asc", "format": "yyyy-MM-dd HH:mm:ss.SSS"}}]
search_after: [ "2021-02-11T08:30:04.828Z" ]
- do:
catch: /failed to parse date field/
search:
index: test
body:
size: 1
sort: [ { timestamp: { "order": "asc", "format": "epoch_millis" } } ]
search_after: [ "2021-02-11T08:30:04.828Z" ]
- do:
search:
index: test
body:
size: 1
sort: [{timestamp: {"order" : "asc", "format": "yyyy-MM-dd | HH:mm:ss.SSS"}}]
search_after: ["2021-02-11 | 08:30:04.828"]
- match: {hits.total.value: 3 }
- length: {hits.hits: 1 }
- match: {hits.hits.0._source.timestamp: "2021-06-11 04:30:04.828" }
- match: {hits.hits.0.sort: ["2021-06-11 | 04:30:04.828"] }

# Mixed two types with numeric
- do:
search:
index: tes*
body:
size: 2
sort: [ { timestamp: { "order": "asc", "format": "strict_date_optional_time_nanos", "numeric_type": "date_nanos" } } ]
- match: { hits.total.value: 6 }
- length: { hits.hits: 2 }
- match: { hits.hits.0._index: test }
- match: { hits.hits.0._source.timestamp: "2021-02-11 08:30:04.828" }
- match: { hits.hits.0.sort: [ "2021-02-11T08:30:04.828Z" ] }
- match: { hits.hits.1._index: test_nanos }
- match: { hits.hits.1._source.timestamp: "15/04/2021 06:30:04.821" }
- match: { hits.hits.1.sort: [ "2021-04-15T06:30:04.821Z" ] }

- do:
search:
index: test*
body:
size: 2
sort: [ { timestamp: { "order": "asc", "format": "strict_date_optional_time_nanos", "numeric_type": "date" } } ]
search_after: [ "2021-04-15T06:30:04.821Z" ]
- match: { hits.total.value: 6 }
- length: { hits.hits: 2 }
- match: { hits.hits.0._index: test_nanos }
- match: { hits.hits.0._source.timestamp: "20/05/2021 05:30:04.832" }
- match: { hits.hits.0.sort: [ "2021-05-20T05:30:04.832Z" ] }
- match: { hits.hits.1._index: test }
- match: { hits.hits.1._source.timestamp: "2021-06-11 04:30:04.828" }
- match: { hits.hits.1.sort: [ "2021-06-11T04:30:04.828Z" ] }
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,20 @@
package org.elasticsearch.search.searchafter;

import org.elasticsearch.action.admin.indices.create.CreateIndexRequestBuilder;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexRequestBuilder;
import org.elasticsearch.action.search.SearchPhaseExecutionException;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.ShardSearchFailure;
import org.elasticsearch.action.support.WriteRequest;
import org.elasticsearch.cluster.metadata.IndexMetadata;
import org.elasticsearch.common.UUIDs;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.rest.RestStatus;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.sort.SortBuilders;
import org.elasticsearch.search.sort.SortOrder;
import org.elasticsearch.test.ESIntegTestCase;
import org.hamcrest.Matchers;
Expand All @@ -30,6 +36,9 @@
import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked;
import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
import static org.elasticsearch.index.query.QueryBuilders.matchAllQuery;
import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertFailures;
import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertNoFailures;
import static org.hamcrest.Matchers.arrayContaining;
import static org.hamcrest.Matchers.containsString;
import static org.hamcrest.Matchers.equalTo;

Expand Down Expand Up @@ -182,6 +191,80 @@ public void testWithSimpleTypes() throws Exception {
assertSearchFromWithSortValues(INDEX_NAME, documents, reqSize);
}

public void testWithCustomFormatSortValueOfDateField() throws Exception {
final XContentBuilder mappings = jsonBuilder();
mappings.startObject().startObject("properties");
{
mappings.startObject("start_date");
mappings.field("type", "date");
mappings.field("format", "yyyy-MM-dd");
mappings.endObject();
}
{
mappings.startObject("end_date");
mappings.field("type", "date");
mappings.field("format", "yyyy-MM-dd");
mappings.endObject();
}
mappings.endObject().endObject();
assertAcked(client().admin().indices().prepareCreate("test")
.setSettings(Settings.builder().put(IndexMetadata.SETTING_NUMBER_OF_SHARDS, between(1, 3)))
.setMapping(mappings));


client().prepareBulk().setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE)
.add(new IndexRequest("test").id("1").source("start_date", "2019-03-24", "end_date", "2020-01-21"))
.add(new IndexRequest("test").id("2").source("start_date", "2018-04-23", "end_date", "2021-02-22"))
.add(new IndexRequest("test").id("3").source("start_date", "2015-01-22", "end_date", "2022-07-23"))
.add(new IndexRequest("test").id("4").source("start_date", "2016-02-21", "end_date", "2024-03-24"))
.add(new IndexRequest("test").id("5").source("start_date", "2017-01-20", "end_date", "2025-05-28"))
.get();

SearchResponse resp = client().prepareSearch("test")
.addSort(SortBuilders.fieldSort("start_date").setFormat("dd/MM/yyyy"))
.addSort(SortBuilders.fieldSort("end_date").setFormat("yyyy-MM-dd"))
.setSize(2)
.get();
assertNoFailures(resp);
assertThat(resp.getHits().getHits()[0].getSortValues(), arrayContaining("22/01/2015", "2022-07-23"));
assertThat(resp.getHits().getHits()[1].getSortValues(), arrayContaining("21/02/2016", "2024-03-24"));

resp = client().prepareSearch("test")
.addSort(SortBuilders.fieldSort("start_date").setFormat("dd/MM/yyyy"))
.addSort(SortBuilders.fieldSort("end_date").setFormat("yyyy-MM-dd"))
.searchAfter(new String[]{"21/02/2016", "2024-03-24"})
.setSize(2)
.get();
assertNoFailures(resp);
assertThat(resp.getHits().getHits()[0].getSortValues(), arrayContaining("20/01/2017", "2025-05-28"));
assertThat(resp.getHits().getHits()[1].getSortValues(), arrayContaining("23/04/2018", "2021-02-22"));

resp = client().prepareSearch("test")
.addSort(SortBuilders.fieldSort("start_date").setFormat("dd/MM/yyyy"))
.addSort(SortBuilders.fieldSort("end_date")) // it's okay because end_date has the format "yyyy-MM-dd"
.searchAfter(new String[]{"21/02/2016", "2024-03-24"})
.setSize(2)
.get();
assertNoFailures(resp);
assertThat(resp.getHits().getHits()[0].getSortValues(), arrayContaining("20/01/2017", 1748390400000L));
assertThat(resp.getHits().getHits()[1].getSortValues(), arrayContaining("23/04/2018", 1613952000000L));

SearchRequestBuilder searchRequest = client().prepareSearch("test")
.addSort(SortBuilders.fieldSort("start_date").setFormat("dd/MM/yyyy"))
.addSort(SortBuilders.fieldSort("end_date").setFormat("epoch_millis"))
.searchAfter(new Object[]{"21/02/2016", 1748390400000L})
.setSize(2);
assertNoFailures(searchRequest.get());

searchRequest = client().prepareSearch("test")
.addSort(SortBuilders.fieldSort("start_date").setFormat("dd/MM/yyyy"))
.addSort(SortBuilders.fieldSort("end_date").setFormat("epoch_millis")) // wrong format
.searchAfter(new Object[]{"21/02/2016", "23/04/2018"})
.setSize(2);
assertFailures(searchRequest, RestStatus.BAD_REQUEST,
containsString("failed to parse date field [23/04/2018] with format [epoch_millis]"));
}

private static class ListComparator implements Comparator<List> {
@Override
public int compare(List o1, List o2) {
Expand Down
Loading

0 comments on commit 8b5aa84

Please sign in to comment.