Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatted sort values for search_after #69192

Closed
jimczi opened this issue Feb 18, 2021 · 3 comments · Fixed by #70357
Closed

Formatted sort values for search_after #69192

jimczi opened this issue Feb 18, 2021 · 3 comments · Fixed by #70357
Assignees
Labels
>feature :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@jimczi
Copy link
Contributor

jimczi commented Feb 18, 2021

Today the sort values used to rank each hit in the response are exposed as raw values in an array (response.hits.hit.0.sort).
These values are meant to be copied in search_after request in order to paginate efficiently over a set of results.

By default, the sort value for date and date_nanos field is represented as a long, that's the internal representation that we use for this field. This leaking of internal representation is problematic because the returned value cannot be interpreted without context. date returns the number of milliseconds since epoch while date_nanos returns the number of nanoseconds. In order to fix this discrepancy we'd like to gradually introduce formatted sort values.

At first we'd like to add a format option to any sort value in a search request. Setting a format there would ensure that the sort values in the response would be formatted accordingly:

{
    "query": {
        "match_all": {}
    },
    "sort": [
        {
            "field": "timestamp",
            "format": "strict_date_optional_time_nanos"
        }
    ]
}

The same format would also be used to parse the search_after value so that copying the sort values directly in search_after continues to work:

{
    "query": {
        "match_all": {}
    },
    "sort": [
        {
            "field": "timestamp",
            "format": "strict_date_optional_time_nanos"
        }
    ],
    "search_after": [
        "2015-01-01T12:10:30.123456789Z"
    ]
}

It would be nice to also apply the formatter of the field by default if no format is specified. That would solve the leaking of internal representation entirely but would have more impact on users.

@jimczi jimczi added >feature :Search/Search Search-related issues that do not fall into other categories labels Feb 18, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Feb 18, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Feb 19, 2021

@jimczi Thanks for the proposal, Jim.

This proposal is also relevant and useful for unsigned_long which returns sort values of long or BigInteger type depending on a value, and there was a request to return unsigned_long values are String.

I am also wondering about having format for other parts of a search request, e.g. field, docvalues_fields etc. May be in this case, it makes sense to have format as a separate section of a search request so that it can be applied in every part (sort, fields ...), e.g:

{
  "query": {
    "match_all": {}
  },
  "formats": [
    {
      "field": "timestamp",
      "format": "strict_date_optional_time_nanos"
    },
    {
      "field": "my_unsigned_long",
      "format": "string"
    }
  ],
  "sort": ["timestamp", "my_unsigned_long"],
  "fields": ["timestamp", "my_unsigned_long"]
}

@dnhatn
Copy link
Member

dnhatn commented Mar 13, 2021

@mayya-sharipova While your proposal avoids specifying the format multiple times, I think the implementation is a bit more complicated and BWC is also an issue. Also, if a user wants the highest resolution for sort, but a lower resolution for fields, then it's not possible with a single format session.

I've open #70357 for this. It would be great if you can take a look.

jimczi added a commit to jimczi/elasticsearch that referenced this issue Mar 16, 2021
This commit updates the default format of date_nanos field
on existing and new indices to use `strict_date_optional_time_nanos` instead of
`strict_date_optional_time`.
Using `strict_date_optional_time` as the default format for date_nanos doesn't
make sense because it accepts and parses dates with nanosecond precision,
but when it formats it drops the nanoseconds.
The change should be transparent for users, these formats accept the same input.

Relates elastic#69192
Closes elastic#67063
dnhatn added a commit that referenced this issue Mar 17, 2021
If a search after request targets multiple indices and some of its sort 
field has type `date` in one index but `date_nanos` in other indices,
then Elasticsearch won't interpret the search_after parameter correctly
in every target index. The sort value of a date field by default is a
long of milliseconds since the epoch while a date_nanos field is a long
of nanoseconds.

This commit introduces the `format` parameter in the sort field so a 
sort value of a date or date_nanos will be formatted using a date format
in a search response.

The below example illustrates how to use this new parameter.

```js
{
    "query": {
        "match_all": {}
    },
    "sort": [
        {
            "timestamp": { 
                "order": "asc",
                "format": "strict_date_optional_time_nanos"
           }
        }
    ]
}
```

```js
{
    "query": {
        "match_all": {}
    },
    "sort": [
        {
            "timestamp": { 
                "order": "asc",
                "format": "strict_date_optional_time_nanos"
            }
        }
    ],
    "search_after": [
        "2015-01-01T12:10:30.123456789Z" // in `strict_date_optional_time_nanos` format
    ]
}
```

Closes #69192
jimczi added a commit that referenced this issue Mar 17, 2021
This commit updates the default format of date_nanos field
on existing and new indices to use `strict_date_optional_time_nanos` instead of
`strict_date_optional_time`.
Using `strict_date_optional_time` as the default format for date_nanos doesn't
make sense because it accepts and parses dates with nanosecond precision,
but when it formats it drops the nanoseconds.
The change should be transparent for users, these formats accept the same input.

Relates #69192
Closes #67063
jimczi added a commit that referenced this issue Mar 17, 2021
This commit updates the default format of date_nanos field
on existing and new indices to use `strict_date_optional_time_nanos` instead of
`strict_date_optional_time`.
Using `strict_date_optional_time` as the default format for date_nanos doesn't
make sense because it accepts and parses dates with nanosecond precision,
but when it formats it drops the nanoseconds.
The change should be transparent for users, these formats accept the same input.

Relates #69192
Closes #67063
jimczi added a commit that referenced this issue Mar 18, 2021
This commit updates the default format of date_nanos field
on existing and new indices to use `strict_date_optional_time_nanos` instead of
`strict_date_optional_time`.
Using `strict_date_optional_time` as the default format for date_nanos doesn't
make sense because it accepts and parses dates with nanosecond precision,
but when it formats it drops the nanoseconds.
The change should be transparent for users, these formats accept the same input.

Relates #69192
Closes #67063
dnhatn added a commit to dnhatn/elasticsearch that referenced this issue Mar 20, 2021
If a search after request targets multiple indices and some of its sort
field has type `date` in one index but `date_nanos` in other indices,
then Elasticsearch won't interpret the search_after parameter correctly
in every target index. The sort value of a date field by default is a
long of milliseconds since the epoch while a date_nanos field is a long
of nanoseconds.

This commit introduces the `format` parameter in the sort field so a
sort value of a date or date_nanos will be formatted using a date format
in a search response.

The below example illustrates how to use this new parameter.

```js
{
    "query": {
        "match_all": {}
    },
    "sort": [
        {
            "timestamp": {
                "order": "asc",
                "format": "strict_date_optional_time_nanos"
           }
        }
    ]
}
```

```js
{
    "query": {
        "match_all": {}
    },
    "sort": [
        {
            "timestamp": {
                "order": "asc",
                "format": "strict_date_optional_time_nanos"
            }
        }
    ],
    "search_after": [
        "2015-01-01T12:10:30.123456789Z" // in `strict_date_optional_time_nanos` format
    ]
}
```

Closes elastic#69192
dnhatn added a commit that referenced this issue Mar 21, 2021
If a search after request targets multiple indices and some of its sort
field has type `date` in one index but `date_nanos` in other indices,
then Elasticsearch won't interpret the search_after parameter correctly
in every target index. The sort value of a date field by default is a
long of milliseconds since the epoch while a date_nanos field is a long
of nanoseconds.

This commit introduces the `format` parameter in the sort field so a
sort value of a date or date_nanos will be formatted using a date format
in a search response.

The below example illustrates how to use this new parameter.

```js
{
    "query": {
        "match_all": {}
    },
    "sort": [
        {
            "timestamp": {
                "order": "asc",
                "format": "strict_date_optional_time_nanos"
           }
        }
    ]
}
```

```js
{
    "query": {
        "match_all": {}
    },
    "sort": [
        {
            "timestamp": {
                "order": "asc",
                "format": "strict_date_optional_time_nanos"
            }
        }
    ],
    "search_after": [
        "2015-01-01T12:10:30.123456789Z" // in `strict_date_optional_time_nanos` format
    ]
}
```

Closes #69192
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants