Patch up fetching dates from source #70040

nik9000 · 2021-03-05T21:30:41Z

This fixes an issue that fields has with dates sent in the
###.###### format.

If you send a date in the format #####.###### we'll parse the bit
before the decimal place as the number of milliseconds since epoch and
we'll parse the bit after the decimal as the number of nanoseconds since
the start of that millisecond. This works and is convenient for some
folks. Sadly, the code that back the fields API for dates doesn't work
with the string format in this case - it works with a double. double
is bad for two reasons:

It's default string representation is scientific notation and our
parsers don't know how to deal with that.
It loses precision relative to the string representation. double
only has 52 bits of mantissa which can precisely store the number of
nanoseconds until about 6am on April 15th, 1970. After that it starts
to lose precision.

This fixed the first issue, getting us the correct string
representation is a "quick and dirty" way. It just converts the double
back to a string. But we still lose precision. Fixing that would require
a larger change.....

Closes #69382

This fixes an issue that `fields` has with dates sent in the `###.######` format. If you send a date in the format `#####.######` we'll parse the bit before the decimal place as the number of milliseconds since epoch and we'll parse the bit after the decimal as the number of nanoseconds since the start of that millisecond. This works and is convenient for some folks. Sadly, the code that back the `fields` API for dates doesn't work with the string format in this case - it works with a `double`. `double` is bad for two reasons: 1. It's default string representation is scientific notation and our parsers don't know how to deal with that. 2. It loses precision relative to the string representation. `double` only has 52 bits of mantissa which can precisely store the number of nanoseconds until about 6am on April 15th, 1970. After that it starts to lose precision. This fixed the first issue, getting us the correct string representation is a "quick and dirty" way. It just converts the `double` back to a string. But we still lose precision. Fixing that would require a larger change.....

nik9000 · 2021-03-05T21:31:56Z

I am 10000000% sure this is the wrong way to fix this. But it stops us from ignoring dates sent as fixed point numbers. We just sometimes get rounding errors.....

pgomulka

I wonder if the loose of precision is acceptable (although I also can not come up with a better solution)..
we actually allow 19digis for epoch seconds and 9digits for nanoseconds..

a wild idea.. Are we able to configure jackson to parse floats as bigdecimals or strings? there is something like DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS
Or even something more complex like custom json deserialiser and force parsing doubles as strings only for date fields (totally no idea if that is possible)

nik9000 · 2021-03-08T12:10:41Z

I imagine the loss of precision isn't acceptable. But maybe it's acceptable in the short term. I thing changing SourceLookup to return big decimal would be kind of breaky. Scripts get those numbers. Update gets them. I imagine others do too. I was thinking it might be better to try and move off of the "to map" parsing so we can share more code with the from xconent parsing. But that's hard! The code doesn't really want to be shared and the callers don't expect it.

…

On Mon, Mar 8, 2021, 03:57 Przemyslaw Gomulka ***@***.***> wrote: ***@***.**** commented on this pull request. I wonder if the loose of precision is acceptable (although I also can not come up with a better solution).. we actually allow 19digis for epoch seconds and 9digits for nanoseconds.. a wild idea.. Are we able to configure jackson to parse floats as bigdecimals or strings? there is something like DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS Or even something more complex like custom json deserialiser and force parsing doubles as strings only for date fields (totally no idea if that is possible) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#70040 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABUXIRFJLSQCAZREQY3UILTCSGQPANCNFSM4YV5PECQ> .

nik9000 · 2021-03-08T16:14:59Z

I've split up the tests for easier failure diagnosis and to make it easier to add @Repeat to individual tests. I believe I've worked through the randomized failures as they stand now. In the future I'd like to force all subclasses of MapperTestCase to implement this sort of method but I worry that it'd make the change too large to backport as far as we might like. And I worry that it'll fail randomly in old branches and be a nuisance. So I'd like to do that work in a follow up.

pgomulka · 2021-03-08T17:49:59Z

@elasticmachine update branch

pgomulka

LGTM, thanks for the fix. I don't think we can come up with anything better loosing precision short term

pgomulka · 2021-03-08T18:19:08Z

server/src/main/java/org/elasticsearch/index/mapper/DateFieldMapper.java

@@ -355,7 +373,8 @@ public ValueFetcher valueFetcher(SearchExecutionContext context, String format)
            return new SourceValueFetcher(name(), context, nullValue) {
                @Override
                public String parseSourceValue(Object value) {
-                    String date = value.toString();
+                    String date = value instanceof Number ? NUMBER_FORMAT.format(value) : value.toString();
+                    // TODO can we emit a warning if we're losing precision here? I'm not sure we can.


this will be used per each document in request, so just using HeaderWarning.addWarning(formattedMessage); would probably spam response headers.
I am not sure we have a convenient way of doing this. THe code could look like this...

static Logger l = LogManager.getLogger("dateFieldMapperHeader"); static { final LoggerContext context = (LoggerContext) LogManager.getContext(false); final Configuration configuration = context.getConfiguration(); RateLimitingFilter rateLimitingFilter = new RateLimitingFilter(); HeaderWarningAppender dateFieldMapperHeaderAppender = new HeaderWarningAppender("dateFieldMapperHeaderAppender", rateLimitingFilter); Loggers.addAppender(LogManager.getLogger("dateFieldMapperHeader"), dateFieldMapperHeaderAppender); }

and then a usage

public String parseSourceValue(Object value) { l.info(new ESLogMessage("someMessage").with(DeprecatedMessage.KEY_FIELD_NAME, someCleverSearchRequestUUID));

Yeah! I was more thinking about how to detect that we're losing precision. I think it's probably safest to leave it for a follow up change.

elasticmachine · 2021-03-08T20:56:30Z

Pinging @elastic/es-search (Team:Search)

This fixes an issue that `fields` has with dates sent in the `###.######` format. If you send a date in the format `#####.######` we'll parse the bit before the decimal place as the number of milliseconds since epoch and we'll parse the bit after the decimal as the number of nanoseconds since the start of that millisecond. This works and is convenient for some folks. Sadly, the code that back the `fields` API for dates doesn't work with the string format in this case - it works with a `double`. `double` is bad for two reasons: 1. It's default string representation is scientific notation and our parsers don't know how to deal with that. 2. It loses precision relative to the string representation. `double` only has 52 bits of mantissa which can precisely store the number of nanoseconds until about 6am on April 15th, 1970. After that it starts to lose precision. This fixed the first issue, getting us the correct string representation is a "quick and dirty" way. It just converts the `double` back to a string. But we still lose precision. Fixing that would require a larger change.....

…tic#70117) This fixes an issue that `fields` has with dates sent in the `###.######` format. If you send a date in the format `#####.######` we'll parse the bit before the decimal place as the number of milliseconds since epoch and we'll parse the bit after the decimal as the number of nanoseconds since the start of that millisecond. This works and is convenient for some folks. Sadly, the code that back the `fields` API for dates doesn't work with the string format in this case - it works with a `double`. `double` is bad for two reasons: 1. It's default string representation is scientific notation and our parsers don't know how to deal with that. 2. It loses precision relative to the string representation. `double` only has 52 bits of mantissa which can precisely store the number of nanoseconds until about 6am on April 15th, 1970. After that it starts to lose precision. This fixed the first issue, getting us the correct string representation is a "quick and dirty" way. It just converts the `double` back to a string. But we still lose precision. Fixing that would require a larger change.....

This fixes an issue that `fields` has with dates sent in the `###.######` format. If you send a date in the format `#####.######` we'll parse the bit before the decimal place as the number of milliseconds since epoch and we'll parse the bit after the decimal as the number of nanoseconds since the start of that millisecond. This works and is convenient for some folks. Sadly, the code that back the `fields` API for dates doesn't work with the string format in this case - it works with a `double`. `double` is bad for two reasons: 1. It's default string representation is scientific notation and our parsers don't know how to deal with that. 2. It loses precision relative to the string representation. `double` only has 52 bits of mantissa which can precisely store the number of nanoseconds until about 6am on April 15th, 1970. After that it starts to lose precision. This fixed the first issue, getting us the correct string representation is a "quick and dirty" way. It just converts the `double` back to a string. But we still lose precision. Fixing that would require a larger change.....

nik9000 requested review from javanna, romseygeek, jtibshirani and pgomulka March 5, 2021 21:30

pgomulka reviewed Mar 8, 2021

View reviewed changes

nik9000 mentioned this pull request Mar 8, 2021

Fields API can lose precision when fetching dates #70085

Open

nik9000 marked this pull request as ready for review March 8, 2021 16:11

nik9000 added 8 commits March 8, 2021 11:15

help1

2503060

MORe

681ea6e

Javadoc

a0be373

More why

66c059f

Drop repeat

c862c4d

Sort and unique

48c277d

Explain

e57869f

Merge branch 'master' into ohnodateswhyyougottabelikethis

e64dd81

nik9000 mentioned this pull request Mar 8, 2021

failed to parse date field after upgrade to 7.11.x from 7.10.2 #69382

Closed

nik9000 requested a review from pgomulka March 8, 2021 17:45

Merge branch 'master' into ohnodateswhyyougottabelikethis

e6b3b8c

pgomulka approved these changes Mar 8, 2021

View reviewed changes

nik9000 merged commit f2e19c1 into elastic:master Mar 8, 2021

nik9000 added backport pending v7.11.3 v7.12.0 v7.13.0 v8.0.0 labels Mar 8, 2021

nik9000 added the :Search/Search Search-related issues that do not fall into other categories label Mar 8, 2021

elasticmachine added the Team:Search Meta label for search team label Mar 8, 2021

nik9000 added the >bug label Mar 8, 2021

nik9000 removed the backport pending label Mar 11, 2021

pgomulka mentioned this pull request Apr 12, 2021

Floating-point value accepted for a date field during indexing, but fails later on update/re-index #71311

Open

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch up fetching dates from source #70040

Patch up fetching dates from source #70040

nik9000 commented Mar 5, 2021 •

edited

Loading

nik9000 commented Mar 5, 2021

pgomulka left a comment

nik9000 commented Mar 8, 2021 via email

nik9000 commented Mar 8, 2021

pgomulka commented Mar 8, 2021

pgomulka left a comment

pgomulka Mar 8, 2021

nik9000 Mar 8, 2021

elasticmachine commented Mar 8, 2021

Patch up fetching dates from source #70040

Patch up fetching dates from source #70040

Conversation

nik9000 commented Mar 5, 2021 • edited Loading

nik9000 commented Mar 5, 2021

pgomulka left a comment

Choose a reason for hiding this comment

nik9000 commented Mar 8, 2021 via email

nik9000 commented Mar 8, 2021

pgomulka commented Mar 8, 2021

pgomulka left a comment

Choose a reason for hiding this comment

pgomulka Mar 8, 2021

Choose a reason for hiding this comment

nik9000 Mar 8, 2021

Choose a reason for hiding this comment

elasticmachine commented Mar 8, 2021

nik9000 commented Mar 5, 2021 •

edited

Loading