forked from elastic/elasticsearch
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Create a custom parser for parsing ISO8601 datetime variants (elastic…
…#106486) This adds a hand-written parser for parsing fixed ISO8601 datetime strings, for the `iso8601`, `strict_date_optional_time`, and `strict_date_optional_time_nanos` date formats. If the new parser fails to parse a string, the existing parsers are then tried, so existing behaviour is maintained. There is a new JVM option added that can force use of the existing parsers, if that is needed for any reason.
- Loading branch information
Showing
8 changed files
with
1,371 additions
and
44 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
pr: 106486 | ||
summary: Create custom parser for ISO-8601 datetimes | ||
area: Infra/Core | ||
type: enhancement | ||
issues: | ||
- 102063 | ||
highlight: | ||
title: New custom parser for ISO-8601 datetimes | ||
body: |- | ||
This introduces a new custom parser for ISO-8601 datetimes, for the `iso8601`, `strict_date_optional_time`, and | ||
`strict_date_optional_time_nanos` built-in date formats. This provides a performance improvement over the | ||
default Java date-time parsing. Whilst it maintains much of the same behaviour, | ||
the new parser does not accept nonsensical date-time strings that have multiple fractional seconds fields | ||
or multiple timezone specifiers. If the new parser fails to parse a string, it will then use the previous parser | ||
to parse it. If a large proportion of the input data consists of these invalid strings, this may cause | ||
a small performance degradation. If you wish to force the use of the old parsers regardless, | ||
set the JVM property `es.datetime.java_time_parsers=true` on all ES nodes. |
68 changes: 68 additions & 0 deletions
68
server/src/main/java/org/elasticsearch/common/time/CharSubSequence.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the Elastic License | ||
* 2.0 and the Server Side Public License, v 1; you may not use this file except | ||
* in compliance with, at your election, the Elastic License 2.0 or the Server | ||
* Side Public License, v 1. | ||
*/ | ||
|
||
package org.elasticsearch.common.time; | ||
|
||
import java.util.stream.IntStream; | ||
|
||
/** | ||
* A CharSequence that provides a subsequence of another CharSequence without allocating a new backing array (as String does) | ||
*/ | ||
class CharSubSequence implements CharSequence { | ||
private final CharSequence wrapped; | ||
private final int startOffset; // inclusive | ||
private final int endOffset; // exclusive | ||
|
||
CharSubSequence(CharSequence wrapped, int startOffset, int endOffset) { | ||
if (startOffset < 0) throw new IllegalArgumentException(); | ||
if (endOffset > wrapped.length()) throw new IllegalArgumentException(); | ||
if (endOffset < startOffset) throw new IllegalArgumentException(); | ||
|
||
this.wrapped = wrapped; | ||
this.startOffset = startOffset; | ||
this.endOffset = endOffset; | ||
} | ||
|
||
@Override | ||
public int length() { | ||
return endOffset - startOffset; | ||
} | ||
|
||
@Override | ||
public char charAt(int index) { | ||
int adjustedIndex = index + startOffset; | ||
if (adjustedIndex < startOffset || adjustedIndex >= endOffset) throw new IndexOutOfBoundsException(index); | ||
return wrapped.charAt(adjustedIndex); | ||
} | ||
|
||
@Override | ||
public boolean isEmpty() { | ||
return startOffset == endOffset; | ||
} | ||
|
||
@Override | ||
public CharSequence subSequence(int start, int end) { | ||
int adjustedStart = start + startOffset; | ||
int adjustedEnd = end + startOffset; | ||
if (adjustedStart < startOffset) throw new IndexOutOfBoundsException(start); | ||
if (adjustedEnd > endOffset) throw new IndexOutOfBoundsException(end); | ||
if (adjustedStart > adjustedEnd) throw new IndexOutOfBoundsException(); | ||
|
||
return wrapped.subSequence(adjustedStart, adjustedEnd); | ||
} | ||
|
||
@Override | ||
public IntStream chars() { | ||
return wrapped.chars().skip(startOffset).limit(endOffset - startOffset); | ||
} | ||
|
||
@Override | ||
public String toString() { | ||
return wrapped.subSequence(startOffset, endOffset).toString(); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.