Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fields API can lose precision when fetching dates #70085

Open
nik9000 opened this issue Mar 8, 2021 · 10 comments
Open

Fields API can lose precision when fetching dates #70085

nik9000 opened this issue Mar 8, 2021 · 10 comments
Labels
>bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v7.9.0

Comments

@nik9000
Copy link
Member

nik9000 commented Mar 8, 2021

We allow folks to specify dates like:

"d": 6123.123

Which you should read as 6123123000 nanoseconds since epoch. There are a lot of layers, but when we see this format we pretty much shift the decimal point six places to the right and parse the whole thing as nanoseconds. Then we use the standard java time utilities to convert that nanosecond precision instant into whatever the date's native format is. This all works fine on parsing.

But in a few places we let jackson deserialize the _source without type hints. When it sees a number like 6123.123 it automatically converts it to a double precision floating point value. This will lose precision for big dates. With date its mostly ok because double's maximum precise value is 9,007,199,254,740,992 which is something like the year 287396. If I live so long I'll be glad to fix the bugs. I say mostly here because there are some issues with rounding so we might end up being off by a millisecond. Sadly, with date_nanos we lose precision on any date after 6am, April 15th, 1970 UTC.

So, you rightly ask, when do we let jackson deserialize the _source? Well, when scripts access _source via params._source. And when we perform an _update action. And in the fields API. And maybe other places I haven't found yet.

Note: This describes the state we'll be in if we merge #70040.

@nik9000 nik9000 added >bug discuss :Search/Search Search-related issues that do not fall into other categories v7.9.0 labels Mar 8, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Mar 8, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@nik9000
Copy link
Member Author

nik9000 commented Mar 8, 2021

I pulled that big number from the wikipedia page on double precision floating point numbers. It's a good read.

@nik9000
Copy link
Member Author

nik9000 commented Mar 8, 2021

OOOOOOH! Another place where just fetch the source and let jackson do whatever is in _source fetching. But only when you add pretty. If you leave it off you get what you originally sent us:

curl -XDELETE -uelastic:password -HContent-Type:application/json localhost:9200/test
curl -XPUT -uelastic:password -HContent-Type:application/json localhost:9200/test -d'{
  "mappings": {
    "properties": {
      "d": {
        "type": "date"
      }
    }
  }
}'

curl -XPOST -uelastic:password -HContent-Type:application/json 'localhost:9200/test/_doc?refresh&pretty' -d'{"d": 1615237777000.123456}'
curl -XPOST -uelastic:password -HContent-Type:application/json localhost:9200/test/_search?pretty -d'{
  "fields": [
    {"field": "d"}
  ]
}'
#         "_source" : {
#           "d" : 1.6152377770001235E12 <----- That ain't what I sent!
#         },
#         "fields" : {
#           "d" : [
#             "2021-03-08T21:09:37.000Z"
#           ]
#         }


curl -XPOST -uelastic:password -HContent-Type:application/json localhost:9200/test/_search -d'{
  "fields": [
    {"field": "d"}
  ]
}'
# "_source":{"d": 1615237777000.123456}      <---- Ok!

curl -XPOST -uelastic:password -HContent-Type:application/json localhost:9200/test/_search -d'{
  "fields": [
    {"field": "d"}
  ]
}' | jq .
#         "_source": {
#           "d": 1615237777000.1235         <---- JQ breaks it too!
#         },
#         "fields": {
#           "d": [
#             "2021-03-08T21:09:37.000Z"
#           ]
#         }


@jpountz
Copy link
Contributor

jpountz commented Mar 29, 2021

Would this benefit from #46531?

@nik9000
Copy link
Member Author

nik9000 commented Mar 29, 2021 via email

@nik9000
Copy link
Member Author

nik9000 commented Mar 29, 2021 via email

@nik9000
Copy link
Member Author

nik9000 commented Apr 12, 2021

A bunch of folks got together and thought about this for a while. We decided to do three things "real soon":

  • Update the docs for date field to explain that they are based on strings and to advise against sending numbers with decimal points. (Advise against dates with decimal points #71578)
  • Make sure the source filtering API doesn't lose precision on these numbers.
  • Make sure the fields API doesn't lose precision on these dates.

To well and truly fix this issue would require a bunch of breaking changes that we think would be rough on folks. Or it'd require a bunch of advancements in scripts that aren't "soon" things.

nik9000 added a commit to nik9000/elasticsearch that referenced this issue Apr 12, 2021
We accept dates with a decimal point like `2113413.13241324` and parse
them *somehow*. But there are cases where we'll lose precision on those
dates, see elastic#70085. This advises folks not to use that format. We'll
continue to accept those dates for backwards compatibility but you
should avoid using them.
nik9000 added a commit that referenced this issue Apr 13, 2021
We accept dates with a decimal point like `2113413.13241324` and parse
them *somehow*. But there are cases where we'll lose precision on those
dates, see #70085. This advises folks not to use that format. We'll
continue to accept those dates for backwards compatibility but you
should avoid using them.

Co-authored-by: Adrien Grand <[email protected]>
nik9000 added a commit to nik9000/elasticsearch that referenced this issue Apr 13, 2021
We accept dates with a decimal point like `2113413.13241324` and parse
them *somehow*. But there are cases where we'll lose precision on those
dates, see elastic#70085. This advises folks not to use that format. We'll
continue to accept those dates for backwards compatibility but you
should avoid using them.

Co-authored-by: Adrien Grand <[email protected]>
nik9000 added a commit to nik9000/elasticsearch that referenced this issue Apr 13, 2021
We accept dates with a decimal point like `2113413.13241324` and parse
them *somehow*. But there are cases where we'll lose precision on those
dates, see elastic#70085. This advises folks not to use that format. We'll
continue to accept those dates for backwards compatibility but you
should avoid using them.

Co-authored-by: Adrien Grand <[email protected]>
nik9000 added a commit that referenced this issue Apr 13, 2021
We accept dates with a decimal point like `2113413.13241324` and parse
them *somehow*. But there are cases where we'll lose precision on those
dates, see #70085. This advises folks not to use that format. We'll
continue to accept those dates for backwards compatibility but you
should avoid using them.

Co-authored-by: Adrien Grand <[email protected]>
nik9000 added a commit that referenced this issue Apr 13, 2021
We accept dates with a decimal point like `2113413.13241324` and parse
them *somehow*. But there are cases where we'll lose precision on those
dates, see #70085. This advises folks not to use that format. We'll
continue to accept those dates for backwards compatibility but you
should avoid using them.

Co-authored-by: Adrien Grand <[email protected]>
@jpountz jpountz removed the discuss label Jul 19, 2021
@skumarp7
Copy link

When can we expect the below action items to be addressed?

  1. Make sure the source filtering API doesn't lose precision on these numbers.
  2. Make sure the fields API doesn't lose precision on these dates.

@javanna
Copy link
Member

javanna commented Jun 13, 2022

This issue stalled over time, I believe due to the breaking nature of some of the needed changes. We have been discussing though adding support for loading from stored fields as well as doc_values , which is not a direct fix for the precision loss, but it could help if we avoided loading from _source in this specific case. Sounds like a hack, but I thought I would bring it up to see what others think.

@benwtrent benwtrent added the priority:normal A label for assessing bug priority to be used by ES engineers label Jul 10, 2024
@javanna javanna added :Search Foundations/Search Catch all for Search Foundations and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 17, 2024
@elasticsearchmachine elasticsearchmachine removed the Team:Search Meta label for search team label Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug priority:normal A label for assessing bug priority to be used by ES engineers :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v7.9.0
Projects
None yet
Development

No branches or pull requests

7 participants