Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch up fetching dates from source #70040

Merged
merged 10 commits into from
Mar 8, 2021

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Mar 5, 2021

This fixes an issue that fields has with dates sent in the
###.###### format.

If you send a date in the format #####.###### we'll parse the bit
before the decimal place as the number of milliseconds since epoch and
we'll parse the bit after the decimal as the number of nanoseconds since
the start of that millisecond. This works and is convenient for some
folks. Sadly, the code that back the fields API for dates doesn't work
with the string format in this case - it works with a double. double
is bad for two reasons:

  1. It's default string representation is scientific notation and our
    parsers don't know how to deal with that.
  2. It loses precision relative to the string representation. double
    only has 52 bits of mantissa which can precisely store the number of
    nanoseconds until about 6am on April 15th, 1970. After that it starts
    to lose precision.

This fixed the first issue, getting us the correct string
representation is a "quick and dirty" way. It just converts the double
back to a string. But we still lose precision. Fixing that would require
a larger change.....

Closes #69382

This fixes an issue that `fields` has with dates sent in the
`###.######` format.

If you send a date in the format `#####.######` we'll parse the bit
before the decimal place as the number of milliseconds since epoch and
we'll parse the bit after the decimal as the number of nanoseconds since
the start of that millisecond. This works and is convenient for some
folks. Sadly, the code that back the `fields` API for dates doesn't work
with the string format in this case - it works with a `double`. `double`
is bad for two reasons:
1. It's default string representation is scientific notation and our
   parsers don't know how to deal with that.
2. It loses precision relative to the string representation. `double`
   only has 52 bits of mantissa which can precisely store the number of
   nanoseconds until about 6am on April 15th, 1970. After that it starts
   to lose precision.

This fixed the first issue, getting us the correct string
representation is a "quick and dirty" way. It just converts the `double`
back to a string. But we still lose precision. Fixing that would require
a larger change.....
@nik9000
Copy link
Member Author

nik9000 commented Mar 5, 2021

I am 10000000% sure this is the wrong way to fix this. But it stops us from ignoring dates sent as fixed point numbers. We just sometimes get rounding errors.....

Copy link
Contributor

@pgomulka pgomulka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the loose of precision is acceptable (although I also can not come up with a better solution)..
we actually allow 19digis for epoch seconds and 9digits for nanoseconds..

a wild idea.. Are we able to configure jackson to parse floats as bigdecimals or strings? there is something like DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS
Or even something more complex like custom json deserialiser and force parsing doubles as strings only for date fields (totally no idea if that is possible)

@nik9000
Copy link
Member Author

nik9000 commented Mar 8, 2021 via email

@nik9000 nik9000 marked this pull request as ready for review March 8, 2021 16:11
@nik9000
Copy link
Member Author

nik9000 commented Mar 8, 2021

I've split up the tests for easier failure diagnosis and to make it easier to add @Repeat to individual tests. I believe I've worked through the randomized failures as they stand now. In the future I'd like to force all subclasses of MapperTestCase to implement this sort of method but I worry that it'd make the change too large to backport as far as we might like. And I worry that it'll fail randomly in old branches and be a nuisance. So I'd like to do that work in a follow up.

@pgomulka
Copy link
Contributor

pgomulka commented Mar 8, 2021

@elasticmachine update branch

Copy link
Contributor

@pgomulka pgomulka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix. I don't think we can come up with anything better loosing precision short term

@@ -355,7 +373,8 @@ public ValueFetcher valueFetcher(SearchExecutionContext context, String format)
return new SourceValueFetcher(name(), context, nullValue) {
@Override
public String parseSourceValue(Object value) {
String date = value.toString();
String date = value instanceof Number ? NUMBER_FORMAT.format(value) : value.toString();
// TODO can we emit a warning if we're losing precision here? I'm not sure we can.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be used per each document in request, so just using HeaderWarning.addWarning(formattedMessage); would probably spam response headers.
I am not sure we have a convenient way of doing this. THe code could look like this...

 static Logger l = LogManager.getLogger("dateFieldMapperHeader");

    static {
        final LoggerContext context = (LoggerContext) LogManager.getContext(false);
        final Configuration configuration = context.getConfiguration();


        RateLimitingFilter rateLimitingFilter = new RateLimitingFilter();
        HeaderWarningAppender dateFieldMapperHeaderAppender = new HeaderWarningAppender("dateFieldMapperHeaderAppender", rateLimitingFilter);
        Loggers.addAppender(LogManager.getLogger("dateFieldMapperHeader"), dateFieldMapperHeaderAppender);


    }

and then a usage

                public String parseSourceValue(Object value) {
                    l.info(new ESLogMessage("someMessage").with(DeprecatedMessage.KEY_FIELD_NAME, someCleverSearchRequestUUID));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! I was more thinking about how to detect that we're losing precision. I think it's probably safest to leave it for a follow up change.

@nik9000 nik9000 added the :Search/Search Search-related issues that do not fall into other categories label Mar 8, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Mar 8, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@nik9000 nik9000 added the >bug label Mar 8, 2021
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Mar 8, 2021
This fixes an issue that `fields` has with dates sent in the
`###.######` format.

If you send a date in the format `#####.######` we'll parse the bit
before the decimal place as the number of milliseconds since epoch and
we'll parse the bit after the decimal as the number of nanoseconds since
the start of that millisecond. This works and is convenient for some
folks. Sadly, the code that back the `fields` API for dates doesn't work
with the string format in this case - it works with a `double`. `double`
is bad for two reasons:
1. It's default string representation is scientific notation and our
   parsers don't know how to deal with that.
2. It loses precision relative to the string representation. `double`
   only has 52 bits of mantissa which can precisely store the number of
   nanoseconds until about 6am on April 15th, 1970. After that it starts
   to lose precision.

This fixed the first issue, getting us the correct string
representation is a "quick and dirty" way. It just converts the `double`
back to a string. But we still lose precision. Fixing that would require
a larger change.....
nik9000 added a commit that referenced this pull request Mar 9, 2021
This fixes an issue that `fields` has with dates sent in the
`###.######` format.

If you send a date in the format `#####.######` we'll parse the bit
before the decimal place as the number of milliseconds since epoch and
we'll parse the bit after the decimal as the number of nanoseconds since
the start of that millisecond. This works and is convenient for some
folks. Sadly, the code that back the `fields` API for dates doesn't work
with the string format in this case - it works with a `double`. `double`
is bad for two reasons:
1. It's default string representation is scientific notation and our
   parsers don't know how to deal with that.
2. It loses precision relative to the string representation. `double`
   only has 52 bits of mantissa which can precisely store the number of
   nanoseconds until about 6am on April 15th, 1970. After that it starts
   to lose precision.

This fixed the first issue, getting us the correct string
representation is a "quick and dirty" way. It just converts the `double`
back to a string. But we still lose precision. Fixing that would require
a larger change.....
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Mar 9, 2021
…tic#70117)

This fixes an issue that `fields` has with dates sent in the
`###.######` format.

If you send a date in the format `#####.######` we'll parse the bit
before the decimal place as the number of milliseconds since epoch and
we'll parse the bit after the decimal as the number of nanoseconds since
the start of that millisecond. This works and is convenient for some
folks. Sadly, the code that back the `fields` API for dates doesn't work
with the string format in this case - it works with a `double`. `double`
is bad for two reasons:
1. It's default string representation is scientific notation and our
   parsers don't know how to deal with that.
2. It loses precision relative to the string representation. `double`
   only has 52 bits of mantissa which can precisely store the number of
   nanoseconds until about 6am on April 15th, 1970. After that it starts
   to lose precision.

This fixed the first issue, getting us the correct string
representation is a "quick and dirty" way. It just converts the `double`
back to a string. But we still lose precision. Fixing that would require
a larger change.....
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Mar 9, 2021
This fixes an issue that `fields` has with dates sent in the
`###.######` format.

If you send a date in the format `#####.######` we'll parse the bit
before the decimal place as the number of milliseconds since epoch and
we'll parse the bit after the decimal as the number of nanoseconds since
the start of that millisecond. This works and is convenient for some
folks. Sadly, the code that back the `fields` API for dates doesn't work
with the string format in this case - it works with a `double`. `double`
is bad for two reasons:
1. It's default string representation is scientific notation and our
   parsers don't know how to deal with that.
2. It loses precision relative to the string representation. `double`
   only has 52 bits of mantissa which can precisely store the number of
   nanoseconds until about 6am on April 15th, 1970. After that it starts
   to lose precision.

This fixed the first issue, getting us the correct string
representation is a "quick and dirty" way. It just converts the `double`
back to a string. But we still lose precision. Fixing that would require
a larger change.....
nik9000 added a commit that referenced this pull request Mar 9, 2021
This fixes an issue that `fields` has with dates sent in the
`###.######` format.

If you send a date in the format `#####.######` we'll parse the bit
before the decimal place as the number of milliseconds since epoch and
we'll parse the bit after the decimal as the number of nanoseconds since
the start of that millisecond. This works and is convenient for some
folks. Sadly, the code that back the `fields` API for dates doesn't work
with the string format in this case - it works with a `double`. `double`
is bad for two reasons:
1. It's default string representation is scientific notation and our
   parsers don't know how to deal with that.
2. It loses precision relative to the string representation. `double`
   only has 52 bits of mantissa which can precisely store the number of
   nanoseconds until about 6am on April 15th, 1970. After that it starts
   to lose precision.

This fixed the first issue, getting us the correct string
representation is a "quick and dirty" way. It just converts the `double`
back to a string. But we still lose precision. Fixing that would require
a larger change.....
nik9000 added a commit that referenced this pull request Mar 11, 2021
This fixes an issue that `fields` has with dates sent in the
`###.######` format.

If you send a date in the format `#####.######` we'll parse the bit
before the decimal place as the number of milliseconds since epoch and
we'll parse the bit after the decimal as the number of nanoseconds since
the start of that millisecond. This works and is convenient for some
folks. Sadly, the code that back the `fields` API for dates doesn't work
with the string format in this case - it works with a `double`. `double`
is bad for two reasons:
1. It's default string representation is scientific notation and our
   parsers don't know how to deal with that.
2. It loses precision relative to the string representation. `double`
   only has 52 bits of mantissa which can precisely store the number of
   nanoseconds until about 6am on April 15th, 1970. After that it starts
   to lose precision.

This fixed the first issue, getting us the correct string
representation is a "quick and dirty" way. It just converts the `double`
back to a string. But we still lose precision. Fixing that would require
a larger change.....
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v7.11.3 v7.12.0 v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

failed to parse date field after upgrade to 7.11.x from 7.10.2
4 participants