Stricter validation of RFC 6901 array indices #80

asmello · 2024-10-18T21:45:01Z

RFC 6901 is very strict about what it allows as an array index. Leading zeros are disallowed, and only digit characters are valid.

I had previously added a check for the leading zeros, but as it turns out that wasn't sufficient. Rust permits + as a prefix in a stringified usize, so we're currently more permissive than RFC 6901.

I think with this explicit check we finally reach exact parity with RFC 6901 in this aspect. is_ascii_digit is documented to check for the U+0030 ‘0’ ..= U+0039 ‘9’. range exactly, which is what we need.

This is a bug fix, though technically breaking.

codecov-commenter · 2024-10-18T21:48:47Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.1%. Comparing base (9a15d91) to head (31c7965).

Additional details and impacted files

Files with missing lines	Coverage Δ
src/assign.rs	`99.6% <100.0%> (+<0.1%)`	⬆️
src/index.rs	`100.0% <100.0%> (ø)`

asmello · 2024-10-18T22:02:33Z

Unsurprisingly, this is failing the semver check since I added a new error variant. I don't expect this to happen again, so I don't think we should have #[non_exhaustive], and should instead just accept the break. After all, semantically this is a break anyway (we're failing inputs we weren't before).

I haven't used semver-checks before, so not sure what the usual procedure is, maybe we just let the check fail and merge anyway? Unsure if there's a way to tell it we acknowledge and accept the break so the job passes.

chanced · 2024-10-21T00:04:25Z

You know, at this point, I must have read that constraint at least 5 times. There was even a moment when I mused on their insistence on no leading zeroes.

At no point did my brain recognize that I should actually do something about it. Like it just didn't even register as a potential task.

I truly appreciate your help.

chanced · 2024-10-21T01:10:05Z

Yea, semver is getting bumped anyway. I'd much rather not go with #[non_exhaustive]. I seriously doubt it is going to change again. Even if there turns out to be something we missed, I think the variants on errors will provide enough cover. I do regret naming most errors redundantly though.

Awesome catch on the +!

asmello · 2024-10-21T16:11:04Z

Awesome catch on the +!

Can't take credit, this was a test on json-patch (awesome of them to catch that, but to be fair they have had lots more user testing).

You know, at this point, I must have read that constraint at least 5 times. There was even a moment when I mused on their insistence on no leading zeroes.

It's pretty subtle. I previously did consider doing an explicit check for all digits, but in my mind the combination of checking for leading zeros + parsing usize was absolutely equivalent, and I didn't want to pay the price of the extra check. But alas, didn't know Rust accepted + either (and who knows what else - oh wait, I guess it's specified here).

Now I'm wondering if the String containing the unique characters was overkill. It's very unlikely std will change the parsing logic, as that'd be a breaking change. What do you think? Shall we reconsider before the next release?

chanced · 2024-10-21T18:59:45Z

I'm currently leaning toward just keeping the string given its very likely on an error path at that point but then again, perhaps not.

asmello · 2024-10-21T19:06:24Z

Yeah, give it some thought. It's not terrible to keep the string, if we decide to. The error path is very unlikely so performance isn't really a concern. Just think it's best we make a conscious call now before committing to it in a release.

chanced · 2024-10-21T19:33:41Z

We could add the offset of the first invalid character from within the token. We do that with InvalidEncodingError. It'd help facilitate displaying the error to the user.

asmello · 2024-10-21T20:15:08Z

That's not a bad idea. It's generalisable when if we ever do need to support other failure modes. I like it.

asmello · 2024-10-21T21:21:44Z

Updated #94 to reflect this. I've now made the error more complex than before, but I think it's worth it.

BTW I wonder if we should explore integrating with miette for nicer diagnostic messages. It was designed for helping with designing errors that include a source and offset (which we often do). We probably don't need / shouldn't use the dynamic part though, we already got plenty good custom errors (miette would mostly help with rendering them nicely). Might be tricky to avoid std though...

chanced · 2024-10-22T11:59:15Z

Oh, I forgot all about miette! Reading up now.

asmello added 2 commits October 18, 2024 22:43

stricter indices

14a9067

unique chars

c4440b5

remove std usage

66c72fb

make codecov happy

5bc7a6e

asmello requested a review from chanced October 18, 2024 22:06

asmello mentioned this pull request Oct 18, 2024

[wip] take better advantage of jsonptr idubrov/json-patch#45

Draft

update changelog

7b734cc

Merge branch 'main' into am/digits

31c7965

chanced merged commit adfc513 into chanced:main Oct 21, 2024
19 of 20 checks passed

asmello deleted the am/digits branch October 21, 2024 16:11

asmello mentioned this pull request Oct 21, 2024

Improve ParseIndexError::InvalidCharacter error #94

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stricter validation of RFC 6901 array indices #80

Stricter validation of RFC 6901 array indices #80

asmello commented Oct 18, 2024

codecov-commenter commented Oct 18, 2024 •

edited

Loading

asmello commented Oct 18, 2024

chanced commented Oct 21, 2024 •

edited

Loading

chanced commented Oct 21, 2024

asmello commented Oct 21, 2024

chanced commented Oct 21, 2024

asmello commented Oct 21, 2024

chanced commented Oct 21, 2024 •

edited

Loading

asmello commented Oct 21, 2024

asmello commented Oct 21, 2024

chanced commented Oct 22, 2024

Stricter validation of RFC 6901 array indices #80

Stricter validation of RFC 6901 array indices #80

Conversation

asmello commented Oct 18, 2024

codecov-commenter commented Oct 18, 2024 • edited Loading

Codecov Report

asmello commented Oct 18, 2024

chanced commented Oct 21, 2024 • edited Loading

chanced commented Oct 21, 2024

asmello commented Oct 21, 2024

chanced commented Oct 21, 2024

asmello commented Oct 21, 2024

chanced commented Oct 21, 2024 • edited Loading

asmello commented Oct 21, 2024

asmello commented Oct 21, 2024

chanced commented Oct 22, 2024

codecov-commenter commented Oct 18, 2024 •

edited

Loading

chanced commented Oct 21, 2024 •

edited

Loading

chanced commented Oct 21, 2024 •

edited

Loading