-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse datetimes and timestamps with leading and/or trailing whitespace #5544
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if for Strptime
patterns we might not want to trim the parser spec: if the parser is %Y
(this pattern only parse years, but that's not the point), it won't parse 2024
as valid because it will expect those spaces we just removed
That makes sense. Let me add a commit on top of this PR. |
@@ -36,22 +36,24 @@ pub fn parse_date_time_str( | |||
date_time_str: &str, | |||
date_time_formats: &[DateTimeInputFormat], | |||
) -> Result<TantivyDateTime, String> { | |||
let trimmed_date_time_str = date_time_str.trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let trimmed_date_time_str = date_time_str.trim(); | |
let trimmed_date_time_str = date_time_str.trim_ascii(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably minor, but triming unicode is considerably more expensive than trimming ascii.
(It requires decoding utf-8, and check if each individual char is a whitespace or not. )
I'd feel safer if we restricted ourselves to ascii. It will just prevent us from trimming weird whitespace like the japanese " ".
@guilload can you merge this? |
and update tests to show pattern is already trimmed by default
it seems the strptime is already lenient to whitespaces inside the pattern (but not the parsed string, which we trim ourselves anyway). I updated the tests in consequence the java-compatible parser used for range requests doesn't have access to that trim code I think, but it's not going to reject documents, so I don't think we necessarily care as much |
Description
Per title. Request by Airmail. Supported by ES.
How was this PR tested?
make test-all