-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Add new nanosecond supporting field mapper #32601
Comments
Pinging @elastic/es-core-infra |
Relates #10005 |
Another alternative to use a double with "days since epoch" or similar. Dates around the current time/epoch have highest precision, but datetimes far away from today loose precision. We use this aproach for scientific data, as it is obvious that exact date times only make sense around now, not for stuff far away. /** Converts a TemporalAccessor to a double (days since epoch). Includes time, if available. */
public static double temporalToDouble(TemporalAccessor accessor) {
double r = accessor.getLong(ChronoField.EPOCH_DAY);
if (accessor.isSupported(ChronoField.NANO_OF_DAY)) {
r += accessor.getLong(ChronoField.NANO_OF_DAY) / NANOS_PER_DAY;
}
return r;
}
/** Converts a double with the days since epoch to an Instant. */
public static Instant doubleToInstant(double epochDouble) {
final long epochDays = (long) epochDouble;
return Instant.EPOCH.plus(epochDays, ChronoUnit.DAYS)
.plusNanos(Math.round((epochDouble - epochDays) * NANOS_PER_DAY));
} Just ideas! (this code is untested, I just converted it from millies to nanos, maybe there are some sign problem, but i think it's tested also for dates before epoch). |
As far as I remember PostgreSQL internally uses the same data type for SQL timestamps. |
This change adds an option to the `FieldSortBuilder` that allows to transform the type of a numeric field into another. Possible values for this option are `long` that transforms the source field into an integer and `double` that transforms the source field into a floating point. This new option is useful for cross-index search when the sort field is mapped differently on some indices. For instance if a field is mapped as a floating point in one index and as an integer in another it is possible to align the type for both indices using the `numeric_type` option: ``` { "sort": { "field": "my_field", "numeric_type": "double" <1> } } ``` <1> Ensure that values for this field are transformed to a floating point if needed. Only `long` and `double` are supported at the moment but the goal is to also handle `date` and `date_nanos` when elastic#32601 is merged.
This new field mapper should support nanosecond timestamps. There are ideas to add support for this. You could come up with a new data structure that supports any date with a nanosecond resolution - which means that you need another data structure than the current long value we use for our current dates. This also implies that indexing and querying will be more expensive.
The other alternative would be to use a long and store the nanoseconds since the epoch. This limits our dates to our range of 1677 toll 2262, meaning we cannot store birthdays from many people in wikipedia. However, when you need nanosecond resolution it is usually about log files and not about birth dates. And those log files usually fit into the above mentioned date range.
This issue suggest to implement a
timestamp
(names are just suggestions here) field mapper, that stores dates in nanosecond resolution as a long.This mapper needs to reject any date that is out of the above range when indexing (which also means there is a query short circuit).
Backwards compatibility
The most important part is to be able to search across shards where one field is a long in milliseconds and one field a long in nanoseconds. Adrien came up with the idea of extending
org.elasticsearch.common.lucene.Lucene.readSortValue(StreamInput in)
and add a special type to mark a sorting astimestamp as nanoseconds
, this way merging of results will be possible by adapting the values before merge.Something to keep in mind here: When mixing indices that have dates in nanos and dates in millis, and we convert to nanos, we cannot deal with dates outside of the nanosecond range. So we have to error out when such a query comes in before doing flawed conversions.
Note: If the long is treated unsigned we could move the range (also requiring more different conversions, if mixing up with millis)
Aggregations
Having nanosecond resolution buckets would result in a lot of buckets, so I do consider this a second step and this should not stop adding the field mapper to add first preliminary support.
Relates #27330
The text was updated successfully, but these errors were encountered: