Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting descending on datefield with nanosecond-precision fails for missing values #73763

Closed
EmilBode opened this issue Jun 4, 2021 · 12 comments · Fixed by #74760
Closed

Sorting descending on datefield with nanosecond-precision fails for missing values #73763

EmilBode opened this issue Jun 4, 2021 · 12 comments · Fixed by #74760
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@EmilBode
Copy link

EmilBode commented Jun 4, 2021

Elasticsearch version: 7.13.1 (newest)

Plugins installed: None

JVM version: ES builtin,
openjdk 16 2021-03-16
OpenJDK Runtime Environment AdoptOpenJDK (build 16+36)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 16+36, mixed mode, sharing)

OS version: Windows 10, 64bit, version 2004, build 19041.985

Steps to reproduce:

POST _bulk
{"index":{"_index":"test","_id":"1"}}
{"field1":"value1", "timestamp": "2021-06-04T13:50:00.000000001"}
{"index":{"_index":"test","_id":"2"}}
{"field1":"value2"}

And the query

GET test/_search
{
  "sort": {
    "timestamp": {
      "order": "desc",
      "format": "strict_date_optional_time_nanos",
      "numeric_type": "date_nanos"
    }
  }
}

Expected behaviour
I'd expect both docs to be returned without errors (first doc1, then doc2 with a missing timestamp)

Actual behaviour

{
  "error" : {
    "root_cause" : [ ],
    "type" : "search_phase_execution_exception",
    "reason" : "",
    "phase" : "fetch",
    "grouped" : true,
    "failed_shards" : [ ],
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "nanoseconds [-9223372036854775808] are before the epoch in 1970 and cannot be processed in nanosecond resolution"
    }
  },
  "status" : 400
}

Observations

My guess would be that missing values are filled in with a value "before any possible value" (to be first for ascending order, or last in descending order), but that this subsequently fails when showing as a formatted date.

@priyamounica
Copy link

Hi, Can I work on this issue?

I have good hands-on experience as a backend developer in Java and expertise in Spring Boot and love to work on this.

As a first-time contributor to open source projects, I'm excitingly looking for some interesting stuff to start with and found this suitable for me as I have previously worked in REST APIs involving Sorting, Search, Pagination, etc.

@EmilBode
Copy link
Author

EmilBode commented Jun 5, 2021

@priyamounica You would have my gratitude, :-)
Though I don't think I'd be the right person to give you permission. If there's anyone from the ES-team who could chime in, that'd be best.

@cbuescher cbuescher added :Search/Search Search-related issues that do not fall into other categories v7.13.2 v8.0.0 and removed needs:triage Requires assignment of a team area label labels Jun 7, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jun 7, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@cbuescher
Copy link
Member

I was able to reproduce on 7.13. Note that the "format" in the query seems to causes the error, not the numeric_type. This works for me though:

PUT test
{
  "mappings": {
    "properties": {
    "field1" : {
      "type" : "text"
    },
    "timestamp" : {
      "type" : "date_nanos",
      "format": "strict_date_optional_time_nanos"
    }
  }}
}

POST _bulk
{"index":{"_index":"test","_id":"1"}}
{"field1":"value1", "timestamp": "2021-06-04T13:50:00.000000001"}
{"index":{"_index":"test","_id":"2"}}
{"field1":"value2"}
{"index":{"_index":"test","_id":"3"}}
{"field1":"value1", "timestamp": "2021-06-04T13:50:00.000000002"}

GET test/_search
{
  "sort": {
    "timestamp": {
      "order": "desc",
      "numeric_type": "date_nanos"
    }
  }
}

@cbuescher
Copy link
Member

My guess would be that missing values are filled in with a value "before any possible value" (to be first for ascending order, or last in descending order), but that this subsequently fails when showing as a formatted date.

This is right, for dates (which are internally represented as long) we use Long.MIN_Value, which is negative. Nanosecond dates cannot be negative though, so we should probably use another default in this case. In the meantime using "missing": 0 should also be a good workaround since that will resolve to "1970-01-01T00:00:00.000Z"

@cbuescher
Copy link
Member

Note that both "order": "asc", "missing" : "_first" and "order": "desc", "missing" : "_last" will be problematic. Without a "missing" value asc works as expected, desc is the problematic case described above.

@cbuescher cbuescher self-assigned this Jun 7, 2021
@cbuescher
Copy link
Member

@priyamounica if you want to take this one up I'd suggest you to look at the resolution of the "missing" field in FieldSortBuilder#build. I think we should manually set it to the right value for data_nanos. We would also need tests for all these cases in FieldSortBuilderTests I think.

@priyamounica
Copy link

@priyamounica You would have my gratitude, :-)
Though I don't think I'd be the right person to give you permission. If there's anyone from the ES-team who could chime in, that'd be best.

Thank you so much @EmilBode

@priyamounica
Copy link

@priyamounica if you want to take this one up I'd suggest you to look at the resolution of the "missing" field in FieldSortBuilder#build. I think we should manually set it to the right value for data_nanos. We would also need tests for all these cases in FieldSortBuilderTests I think.

@cbuescher Sure, Can I start working on this?

@cbuescher
Copy link
Member

Can I start working on this?

Sure, please let me know if you need more pointers to what needs to be done, but I already pointed out some of the cases that should adressed and a potential idea where a fix might be possible. I haven't checked the details yet though. Also please include tests for the above scenarios in a PR. Let me know when you need more info.

@cbuescher
Copy link
Member

@priyamounica I just wanted to check if you had any chance looking at this issue. It would be nice getting this fixed in the next minor release I think. If you cannot work on it in the next few days please let me know, I think we'd like to take it over then.

@priyamounica
Copy link

@cbuescher Please take it over

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Jun 30, 2021
For missing values on date fields we use Long.MIN_VALUE by default. This is okay
when the resolution of the field is milliseconds. For nanoseconds though,
negative values can lead to IllegalArgumentExpections when we try to format them
internally.
This change fixes this by explicitely setting the minimum value to 0L (which
corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other
explicit missing value is defined and the target numeric type is a nanosecond
type (this is true for nanosecond fields and when "numeric_type" is explicitely
set). This way we correct the behaviour for single typed indices and cases where
we are sorting across multiple indices with mixed "date" and "date_nanos" type
where "numeric_type" is set in the sort definition.

Closes elastic#73763
cbuescher pushed a commit that referenced this issue Jul 7, 2021
For missing values on date fields we use Long.MIN_VALUE by default. This is okay
when the resolution of the field is milliseconds. For nanoseconds though,
negative values can lead to IllegalArgumentExpections when we try to format them
internally.
This change fixes this by explicitely setting the minimum value to 0L (which
corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other
explicit missing value is defined and the target numeric type is a nanosecond
type (this is true for nanosecond fields and when "numeric_type" is explicitely
set). This way we correct the behaviour for single typed indices and cases where
we are sorting across multiple indices with mixed "date" and "date_nanos" type
where "numeric_type" is set in the sort definition.

Closes #73763
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this issue Jul 7, 2021
For missing values on date fields we use Long.MIN_VALUE by default. This is okay
when the resolution of the field is milliseconds. For nanoseconds though,
negative values can lead to IllegalArgumentExpections when we try to format them
internally.
This change fixes this by explicitely setting the minimum value to 0L (which
corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other
explicit missing value is defined and the target numeric type is a nanosecond
type (this is true for nanosecond fields and when "numeric_type" is explicitely
set). This way we correct the behaviour for single typed indices and cases where
we are sorting across multiple indices with mixed "date" and "date_nanos" type
where "numeric_type" is set in the sort definition.

Closes elastic#73763
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this issue Jul 7, 2021
For missing values on date fields we use Long.MIN_VALUE by default. This is okay
when the resolution of the field is milliseconds. For nanoseconds though,
negative values can lead to IllegalArgumentExpections when we try to format them
internally.
This change fixes this by explicitely setting the minimum value to 0L (which
corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other
explicit missing value is defined and the target numeric type is a nanosecond
type (this is true for nanosecond fields and when "numeric_type" is explicitely
set). This way we correct the behaviour for single typed indices and cases where
we are sorting across multiple indices with mixed "date" and "date_nanos" type
where "numeric_type" is set in the sort definition.

Closes elastic#73763
cbuescher pushed a commit that referenced this issue Jul 7, 2021
#75064)

For missing values on date fields we use Long.MIN_VALUE by default. This is okay
when the resolution of the field is milliseconds. For nanoseconds though,
negative values can lead to IllegalArgumentExpections when we try to format them
internally.
This change fixes this by explicitely setting the minimum value to 0L (which
corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other
explicit missing value is defined and the target numeric type is a nanosecond
type (this is true for nanosecond fields and when "numeric_type" is explicitely
set). This way we correct the behaviour for single typed indices and cases where
we are sorting across multiple indices with mixed "date" and "date_nanos" type
where "numeric_type" is set in the sort definition.

Closes #73763
cbuescher pushed a commit that referenced this issue Jul 7, 2021
…#75065)

For missing values on date fields we use Long.MIN_VALUE by default. This is okay
when the resolution of the field is milliseconds. For nanoseconds though,
negative values can lead to IllegalArgumentExpections when we try to format them
internally.
This change fixes this by explicitely setting the minimum value to 0L (which
corresponds to 1970-01-01T00:00:00.000 for nanosecond resolution) when no other
explicit missing value is defined and the target numeric type is a nanosecond
type (this is true for nanosecond fields and when "numeric_type" is explicitely
set). This way we correct the behaviour for single typed indices and cases where
we are sorting across multiple indices with mixed "date" and "date_nanos" type
where "numeric_type" is set in the sort definition.

Closes #73763
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants