-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patch up fetching dates from source #70040
Patch up fetching dates from source #70040
Conversation
This fixes an issue that `fields` has with dates sent in the `###.######` format. If you send a date in the format `#####.######` we'll parse the bit before the decimal place as the number of milliseconds since epoch and we'll parse the bit after the decimal as the number of nanoseconds since the start of that millisecond. This works and is convenient for some folks. Sadly, the code that back the `fields` API for dates doesn't work with the string format in this case - it works with a `double`. `double` is bad for two reasons: 1. It's default string representation is scientific notation and our parsers don't know how to deal with that. 2. It loses precision relative to the string representation. `double` only has 52 bits of mantissa which can precisely store the number of nanoseconds until about 6am on April 15th, 1970. After that it starts to lose precision. This fixed the first issue, getting us the correct string representation is a "quick and dirty" way. It just converts the `double` back to a string. But we still lose precision. Fixing that would require a larger change.....
I am 10000000% sure this is the wrong way to fix this. But it stops us from ignoring dates sent as fixed point numbers. We just sometimes get rounding errors..... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the loose of precision is acceptable (although I also can not come up with a better solution)..
we actually allow 19digis for epoch seconds and 9digits for nanoseconds..
a wild idea.. Are we able to configure jackson to parse floats as bigdecimals or strings? there is something like DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS
Or even something more complex like custom json deserialiser and force parsing doubles as strings only for date fields (totally no idea if that is possible)
I imagine the loss of precision isn't acceptable. But maybe it's acceptable
in the short term. I thing changing SourceLookup to return big decimal
would be kind of breaky. Scripts get those numbers. Update gets them. I
imagine others do too.
I was thinking it might be better to try and move off of the "to map"
parsing so we can share more code with the from xconent parsing. But that's
hard! The code doesn't really want to be shared and the callers don't
expect it.
…On Mon, Mar 8, 2021, 03:57 Przemyslaw Gomulka ***@***.***> wrote:
***@***.**** commented on this pull request.
I wonder if the loose of precision is acceptable (although I also can not
come up with a better solution)..
we actually allow 19digis for epoch seconds and 9digits for nanoseconds..
a wild idea.. Are we able to configure jackson to parse floats as
bigdecimals or strings? there is something like
DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS
Or even something more complex like custom json deserialiser and force
parsing doubles as strings only for date fields (totally no idea if that is
possible)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#70040 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABUXIRFJLSQCAZREQY3UILTCSGQPANCNFSM4YV5PECQ>
.
|
I've split up the tests for easier failure diagnosis and to make it easier to add |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fix. I don't think we can come up with anything better loosing precision short term
@@ -355,7 +373,8 @@ public ValueFetcher valueFetcher(SearchExecutionContext context, String format) | |||
return new SourceValueFetcher(name(), context, nullValue) { | |||
@Override | |||
public String parseSourceValue(Object value) { | |||
String date = value.toString(); | |||
String date = value instanceof Number ? NUMBER_FORMAT.format(value) : value.toString(); | |||
// TODO can we emit a warning if we're losing precision here? I'm not sure we can. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be used per each document in request, so just using HeaderWarning.addWarning(formattedMessage);
would probably spam response headers.
I am not sure we have a convenient way of doing this. THe code could look like this...
static Logger l = LogManager.getLogger("dateFieldMapperHeader");
static {
final LoggerContext context = (LoggerContext) LogManager.getContext(false);
final Configuration configuration = context.getConfiguration();
RateLimitingFilter rateLimitingFilter = new RateLimitingFilter();
HeaderWarningAppender dateFieldMapperHeaderAppender = new HeaderWarningAppender("dateFieldMapperHeaderAppender", rateLimitingFilter);
Loggers.addAppender(LogManager.getLogger("dateFieldMapperHeader"), dateFieldMapperHeaderAppender);
}
and then a usage
public String parseSourceValue(Object value) {
l.info(new ESLogMessage("someMessage").with(DeprecatedMessage.KEY_FIELD_NAME, someCleverSearchRequestUUID));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! I was more thinking about how to detect that we're losing precision. I think it's probably safest to leave it for a follow up change.
Pinging @elastic/es-search (Team:Search) |
This fixes an issue that `fields` has with dates sent in the `###.######` format. If you send a date in the format `#####.######` we'll parse the bit before the decimal place as the number of milliseconds since epoch and we'll parse the bit after the decimal as the number of nanoseconds since the start of that millisecond. This works and is convenient for some folks. Sadly, the code that back the `fields` API for dates doesn't work with the string format in this case - it works with a `double`. `double` is bad for two reasons: 1. It's default string representation is scientific notation and our parsers don't know how to deal with that. 2. It loses precision relative to the string representation. `double` only has 52 bits of mantissa which can precisely store the number of nanoseconds until about 6am on April 15th, 1970. After that it starts to lose precision. This fixed the first issue, getting us the correct string representation is a "quick and dirty" way. It just converts the `double` back to a string. But we still lose precision. Fixing that would require a larger change.....
This fixes an issue that `fields` has with dates sent in the `###.######` format. If you send a date in the format `#####.######` we'll parse the bit before the decimal place as the number of milliseconds since epoch and we'll parse the bit after the decimal as the number of nanoseconds since the start of that millisecond. This works and is convenient for some folks. Sadly, the code that back the `fields` API for dates doesn't work with the string format in this case - it works with a `double`. `double` is bad for two reasons: 1. It's default string representation is scientific notation and our parsers don't know how to deal with that. 2. It loses precision relative to the string representation. `double` only has 52 bits of mantissa which can precisely store the number of nanoseconds until about 6am on April 15th, 1970. After that it starts to lose precision. This fixed the first issue, getting us the correct string representation is a "quick and dirty" way. It just converts the `double` back to a string. But we still lose precision. Fixing that would require a larger change.....
…tic#70117) This fixes an issue that `fields` has with dates sent in the `###.######` format. If you send a date in the format `#####.######` we'll parse the bit before the decimal place as the number of milliseconds since epoch and we'll parse the bit after the decimal as the number of nanoseconds since the start of that millisecond. This works and is convenient for some folks. Sadly, the code that back the `fields` API for dates doesn't work with the string format in this case - it works with a `double`. `double` is bad for two reasons: 1. It's default string representation is scientific notation and our parsers don't know how to deal with that. 2. It loses precision relative to the string representation. `double` only has 52 bits of mantissa which can precisely store the number of nanoseconds until about 6am on April 15th, 1970. After that it starts to lose precision. This fixed the first issue, getting us the correct string representation is a "quick and dirty" way. It just converts the `double` back to a string. But we still lose precision. Fixing that would require a larger change.....
This fixes an issue that `fields` has with dates sent in the `###.######` format. If you send a date in the format `#####.######` we'll parse the bit before the decimal place as the number of milliseconds since epoch and we'll parse the bit after the decimal as the number of nanoseconds since the start of that millisecond. This works and is convenient for some folks. Sadly, the code that back the `fields` API for dates doesn't work with the string format in this case - it works with a `double`. `double` is bad for two reasons: 1. It's default string representation is scientific notation and our parsers don't know how to deal with that. 2. It loses precision relative to the string representation. `double` only has 52 bits of mantissa which can precisely store the number of nanoseconds until about 6am on April 15th, 1970. After that it starts to lose precision. This fixed the first issue, getting us the correct string representation is a "quick and dirty" way. It just converts the `double` back to a string. But we still lose precision. Fixing that would require a larger change.....
This fixes an issue that `fields` has with dates sent in the `###.######` format. If you send a date in the format `#####.######` we'll parse the bit before the decimal place as the number of milliseconds since epoch and we'll parse the bit after the decimal as the number of nanoseconds since the start of that millisecond. This works and is convenient for some folks. Sadly, the code that back the `fields` API for dates doesn't work with the string format in this case - it works with a `double`. `double` is bad for two reasons: 1. It's default string representation is scientific notation and our parsers don't know how to deal with that. 2. It loses precision relative to the string representation. `double` only has 52 bits of mantissa which can precisely store the number of nanoseconds until about 6am on April 15th, 1970. After that it starts to lose precision. This fixed the first issue, getting us the correct string representation is a "quick and dirty" way. It just converts the `double` back to a string. But we still lose precision. Fixing that would require a larger change.....
This fixes an issue that `fields` has with dates sent in the `###.######` format. If you send a date in the format `#####.######` we'll parse the bit before the decimal place as the number of milliseconds since epoch and we'll parse the bit after the decimal as the number of nanoseconds since the start of that millisecond. This works and is convenient for some folks. Sadly, the code that back the `fields` API for dates doesn't work with the string format in this case - it works with a `double`. `double` is bad for two reasons: 1. It's default string representation is scientific notation and our parsers don't know how to deal with that. 2. It loses precision relative to the string representation. `double` only has 52 bits of mantissa which can precisely store the number of nanoseconds until about 6am on April 15th, 1970. After that it starts to lose precision. This fixed the first issue, getting us the correct string representation is a "quick and dirty" way. It just converts the `double` back to a string. But we still lose precision. Fixing that would require a larger change.....
This fixes an issue that
fields
has with dates sent in the###.######
format.If you send a date in the format
#####.######
we'll parse the bitbefore the decimal place as the number of milliseconds since epoch and
we'll parse the bit after the decimal as the number of nanoseconds since
the start of that millisecond. This works and is convenient for some
folks. Sadly, the code that back the
fields
API for dates doesn't workwith the string format in this case - it works with a
double
.double
is bad for two reasons:
parsers don't know how to deal with that.
double
only has 52 bits of mantissa which can precisely store the number of
nanoseconds until about 6am on April 15th, 1970. After that it starts
to lose precision.
This fixed the first issue, getting us the correct string
representation is a "quick and dirty" way. It just converts the
double
back to a string. But we still lose precision. Fixing that would require
a larger change.....
Closes #69382