-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow parsing numbers w/ underscores (e.g. 1_000) from strings #868
Conversation
src/input/input_json.rs
Outdated
JsonInput::String(str) => match str_as_int(self, str) { | ||
Ok(i) => Ok(i), | ||
Err(_) => str_as_int(self, &str.replace('_', "")), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to only eat the performance cost in the error case. This means that parsing 1_000
will be slower than 1000
but I think 1000
is more common so that should be okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid duplication should we put this inside str_as_int
? It also has strip_decimal_zeros
path, we could combine the two stripping operations for efficiency's sake.
please review |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #868 +/- ##
==========================================
+ Coverage 94.05% 94.06% +0.01%
==========================================
Files 102 102
Lines 15028 15039 +11
Branches 25 25
==========================================
+ Hits 14135 14147 +12
+ Misses 887 886 -1
Partials 6 6
Continue to review full report in Codecov by Sentry.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to see these fixes moved inside of str_as_int
(and new one for float) and also some concerns on edge cases.
src/input/input_json.rs
Outdated
JsonInput::String(str) => str_as_int(self, str), | ||
JsonInput::String(str) => match str_as_int(self, str) { | ||
Ok(i) => Ok(i), | ||
Err(_) => str_as_int(self, &str.replace('_', "")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some edge cases: trailing / leading underscores are not allowed in Python, nor is double-underscore.
>>> int("5__0")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '5__0'
>>> int("5_0_")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '5_0_'
>>> int("_5_0")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '_5_0'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh annoying
src/input/input_json.rs
Outdated
JsonInput::String(str) => match str_as_int(self, str) { | ||
Ok(i) => Ok(i), | ||
Err(_) => str_as_int(self, &str.replace('_', "")), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid duplication should we put this inside str_as_int
? It also has strip_decimal_zeros
path, we could combine the two stripping operations for efficiency's sake.
src/input/input_json.rs
Outdated
Err(_) => match str.replace('_', "").parse::<f64>() { | ||
Ok(i) => Ok(EitherFloat::F64(i)), | ||
Err(_) => Err(ValError::new(ErrorTypeDefaults::FloatParsing, self)), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should introduce str_as_float
to avoid similar duplication.
CodSpeed Performance ReportMerging #868 will degrade performances by 48.65%Comparing 🎉 Hooray!
|
Benchmark | main |
parse-numbers-underscores |
Change | |
---|---|---|---|---|
❌ | test_core_string_lax_wrong |
37.9 µs | 63.7 µs | -40.5% |
❌ | test_tagged_union_int_keys_python |
33 µs | 64.2 µs | -48.65% |
🆕 | test_decimal_from_string_pyd |
N/A | 69.4 µs | N/A |
🆕 | test_decimal_from_string_core |
N/A | 69.9 µs | N/A |
🆕 | test_decimal_from_string_limit |
N/A | 23.6 µs | N/A |
For anyone wondering, the decision to add this is (as I understand it) because in V1 we used |
please review |
@davidhewitt are we ready to merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better! Just two tiny nits. Also I'll try to fix the integration tests shortly.
/// and if it's not subsequent parsing will just fail | ||
fn strip_underscores(s: &str) -> Option<String> { | ||
if s.starts_with('_') || s.ends_with('_') || !s.contains('_') || s.contains("__") { | ||
// no underscores to strip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be nice to add a comment here explaining why startswith / endswith / __ are rejected, because otherwise this comment is just a touch confusing.
Co-authored-by: David Hewitt <[email protected]>
Fixes pydantic/pydantic#7053
Selected Reviewer: @samuelcolvin