Allow parsing numbers w/ underscores (e.g. 1_000) from strings #868

adriangb · 2023-08-09T14:08:33Z

adriangb · 2023-08-09T14:09:21Z

src/input/input_json.rs

+            JsonInput::String(str) => match str_as_int(self, str) {
+                Ok(i) => Ok(i),
+                Err(_) => str_as_int(self, &str.replace('_', "")),
+            },


The idea is to only eat the performance cost in the error case. This means that parsing 1_000 will be slower than 1000 but I think 1000 is more common so that should be okay.

To avoid duplication should we put this inside str_as_int ? It also has strip_decimal_zeros path, we could combine the two stripping operations for efficiency's sake.

adriangb · 2023-08-09T14:09:28Z

please review

codecov · 2023-08-09T14:12:12Z

Codecov Report

Merging #868 (8aece8a) into main (3f7df80) will increase coverage by 0.01%.
Report is 4 commits behind head on main.
The diff coverage is 100.00%.

❗ Current head 8aece8a differs from pull request most recent head fb7c5a6. Consider uploading reports for the commit fb7c5a6 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #868      +/-   ##
==========================================
+ Coverage   94.05%   94.06%   +0.01%     
==========================================
  Files         102      102              
  Lines       15028    15039      +11     
  Branches       25       25              
==========================================
+ Hits        14135    14147      +12     
+ Misses        887      886       -1     
  Partials        6        6

Files Changed	Coverage Δ
src/input/input_json.rs	`91.55% <100.00%> (+0.22%)`	⬆️
src/input/input_python.rs	`98.14% <100.00%> (-0.01%)`	⬇️
src/input/shared.rs	`96.59% <100.00%> (+0.81%)`	⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3f7df80...fb7c5a6. Read the comment docs.

davidhewitt

I'd prefer to see these fixes moved inside of str_as_int (and new one for float) and also some concerns on edge cases.

davidhewitt · 2023-08-09T14:12:18Z

src/input/input_json.rs

-            JsonInput::String(str) => str_as_int(self, str),
+            JsonInput::String(str) => match str_as_int(self, str) {
+                Ok(i) => Ok(i),
+                Err(_) => str_as_int(self, &str.replace('_', "")),


Some edge cases: trailing / leading underscores are not allowed in Python, nor is double-underscore.

>>> int("5__0") Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid literal for int() with base 10: '5__0' >>> int("5_0_") Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid literal for int() with base 10: '5_0_' >>> int("_5_0") Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid literal for int() with base 10: '_5_0'

Ugh annoying

davidhewitt · 2023-08-09T14:13:08Z

src/input/input_json.rs

+            JsonInput::String(str) => match str_as_int(self, str) {
+                Ok(i) => Ok(i),
+                Err(_) => str_as_int(self, &str.replace('_', "")),
+            },


To avoid duplication should we put this inside str_as_int ? It also has strip_decimal_zeros path, we could combine the two stripping operations for efficiency's sake.

davidhewitt · 2023-08-09T14:14:06Z

src/input/input_json.rs

+                Err(_) => match str.replace('_', "").parse::<f64>() {
+                    Ok(i) => Ok(EitherFloat::F64(i)),
+                    Err(_) => Err(ValError::new(ErrorTypeDefaults::FloatParsing, self)),
+                },


I think we should introduce str_as_float to avoid similar duplication.

codspeed-hq · 2023-08-09T14:16:59Z

CodSpeed Performance Report

Merging #868 will degrade performances by 48.65%

_{Comparing parse-numbers-underscores (fb7c5a6) with main (3f7df80)}

🎉 Hooray! `pytest-codspeed` just leveled up to 2.1.0!

A heads-up, this is a breaking change and it might affect your current performance baseline a bit. But here's the exciting part - it's packed with new, cool features and promises improved result stability 🥳!
Curious about what's new? Visit our releases page to delve into all the awesome details about this new version.

Summary

❌ 2 regressions
✅ 133 untouched benchmarks

🆕 3 new benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`parse-numbers-underscores`	Change
❌	`test_core_string_lax_wrong`	37.9 µs	63.7 µs	-40.5%
❌	`test_tagged_union_int_keys_python`	33 µs	64.2 µs	-48.65%
🆕	`test_decimal_from_string_pyd`	N/A	69.4 µs	N/A
🆕	`test_decimal_from_string_core`	N/A	69.9 µs	N/A
🆕	`test_decimal_from_string_limit`	N/A	23.6 µs	N/A

davidhewitt · 2023-08-10T07:24:29Z

For anyone wondering, the decision to add this is (as I understand it) because in V1 we used int(s) and float(s) to parse strings, which do accept these underscores. So this is a regression from V1.

adriangb · 2023-08-10T14:44:06Z

please review

adriangb · 2023-08-10T14:44:16Z

@davidhewitt are we ready to merge this?

davidhewitt

Much better! Just two tiny nits. Also I'll try to fix the integration tests shortly.

src/input/shared.rs

davidhewitt · 2023-08-10T15:37:51Z

src/input/shared.rs

+/// and if it's not subsequent parsing will just fail
+fn strip_underscores(s: &str) -> Option<String> {
+    if s.starts_with('_') || s.ends_with('_') || !s.contains('_') || s.contains("__") {
+        // no underscores to strip


It may be nice to add a comment here explaining why startswith / endswith / __ are rejected, because otherwise this comment is just a touch confusing.

Co-authored-by: David Hewitt <[email protected]>

Allow parsing numbers w/ underscores (e.g. 1_000) from strings

9a52223

adriangb commented Aug 9, 2023

View reviewed changes

pydantic-hooky bot added the ready for review label Aug 9, 2023

pydantic-hooky bot assigned samuelcolvin Aug 9, 2023

davidhewitt requested changes Aug 9, 2023

View reviewed changes

pydantic-hooky bot added awaiting author revision and removed ready for review labels Aug 9, 2023

pydantic-hooky bot assigned adriangb and unassigned samuelcolvin Aug 9, 2023

adriangb added 2 commits August 9, 2023 11:06

Move into shared functions

815662a

switch order

8aece8a

pydantic-hooky bot added ready for review and removed awaiting author revision labels Aug 10, 2023

pydantic-hooky bot assigned samuelcolvin and unassigned adriangb Aug 10, 2023

davidhewitt requested changes Aug 10, 2023

View reviewed changes

pydantic-hooky bot added awaiting author revision and removed ready for review labels Aug 10, 2023

pydantic-hooky bot assigned adriangb and unassigned samuelcolvin Aug 10, 2023

adriangb and others added 2 commits August 10, 2023 10:40

Update src/input/shared.rs

a8230a8

Co-authored-by: David Hewitt <[email protected]>

Fix str_as_float

fb7c5a6

adriangb enabled auto-merge (squash) August 10, 2023 15:51

davidhewitt approved these changes Aug 10, 2023

View reviewed changes

adriangb merged commit 87b4789 into main Aug 10, 2023
28 of 29 checks passed

adriangb deleted the parse-numbers-underscores branch August 10, 2023 15:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow parsing numbers w/ underscores (e.g. 1_000) from strings #868

Allow parsing numbers w/ underscores (e.g. 1_000) from strings #868

adriangb commented Aug 9, 2023 •

edited by pydantic-hooky bot

Loading

adriangb Aug 9, 2023

davidhewitt Aug 9, 2023

adriangb commented Aug 9, 2023

codecov bot commented Aug 9, 2023 •

edited

Loading

davidhewitt left a comment

davidhewitt Aug 9, 2023

adriangb Aug 9, 2023

davidhewitt Aug 9, 2023

davidhewitt Aug 9, 2023

codspeed-hq bot commented Aug 9, 2023 •

edited

Loading

davidhewitt commented Aug 10, 2023

adriangb commented Aug 10, 2023

adriangb commented Aug 10, 2023

davidhewitt left a comment

davidhewitt Aug 10, 2023

Allow parsing numbers w/ underscores (e.g. 1_000) from strings #868

Allow parsing numbers w/ underscores (e.g. 1_000) from strings #868

Conversation

adriangb commented Aug 9, 2023 • edited by pydantic-hooky bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangb commented Aug 9, 2023

codecov bot commented Aug 9, 2023 • edited Loading

Codecov Report

davidhewitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codspeed-hq bot commented Aug 9, 2023 • edited Loading

Merging #868 will degrade performances by 48.65%

🎉 Hooray! pytest-codspeed just leveled up to 2.1.0!

Summary

Benchmarks breakdown

davidhewitt commented Aug 10, 2023

adriangb commented Aug 10, 2023

adriangb commented Aug 10, 2023

davidhewitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangb commented Aug 9, 2023 •

edited by pydantic-hooky bot

Loading

codecov bot commented Aug 9, 2023 •

edited

Loading

codspeed-hq bot commented Aug 9, 2023 •

edited

Loading

🎉 Hooray! `pytest-codspeed` just leveled up to 2.1.0!