Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stricter validation of RFC 6901 array indices #80

Merged
merged 6 commits into from
Oct 21, 2024

Conversation

asmello
Copy link
Collaborator

@asmello asmello commented Oct 18, 2024

RFC 6901 is very strict about what it allows as an array index. Leading zeros are disallowed, and only digit characters are valid.

I had previously added a check for the leading zeros, but as it turns out that wasn't sufficient. Rust permits + as a prefix in a stringified usize, so we're currently more permissive than RFC 6901.

I think with this explicit check we finally reach exact parity with RFC 6901 in this aspect. is_ascii_digit is documented to check for the U+0030 ‘0’ ..= U+0039 ‘9’. range exactly, which is what we need.

This is a bug fix, though technically breaking.

@codecov-commenter
Copy link

codecov-commenter commented Oct 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.1%. Comparing base (9a15d91) to head (31c7965).

Additional details and impacted files
Files with missing lines Coverage Δ
src/assign.rs 99.6% <100.0%> (+<0.1%) ⬆️
src/index.rs 100.0% <100.0%> (ø)

@asmello
Copy link
Collaborator Author

asmello commented Oct 18, 2024

Unsurprisingly, this is failing the semver check since I added a new error variant. I don't expect this to happen again, so I don't think we should have #[non_exhaustive], and should instead just accept the break. After all, semantically this is a break anyway (we're failing inputs we weren't before).

I haven't used semver-checks before, so not sure what the usual procedure is, maybe we just let the check fail and merge anyway? Unsure if there's a way to tell it we acknowledge and accept the break so the job passes.

@chanced
Copy link
Owner

chanced commented Oct 21, 2024

You know, at this point, I must have read that constraint at least 5 times. There was even a moment when I mused on their insistence on no leading zeroes.

At no point did my brain recognize that I should actually do something about it. Like it just didn't even register as a potential task.

I truly appreciate your help.

@chanced
Copy link
Owner

chanced commented Oct 21, 2024

Yea, semver is getting bumped anyway. I'd much rather not go with #[non_exhaustive]. I seriously doubt it is going to change again. Even if there turns out to be something we missed, I think the variants on errors will provide enough cover. I do regret naming most errors redundantly though.

Awesome catch on the +!

@chanced chanced merged commit adfc513 into chanced:main Oct 21, 2024
19 of 20 checks passed
@asmello
Copy link
Collaborator Author

asmello commented Oct 21, 2024

Awesome catch on the +!

Can't take credit, this was a test on json-patch (awesome of them to catch that, but to be fair they have had lots more user testing).

You know, at this point, I must have read that constraint at least 5 times. There was even a moment when I mused on their insistence on no leading zeroes.

It's pretty subtle. I previously did consider doing an explicit check for all digits, but in my mind the combination of checking for leading zeros + parsing usize was absolutely equivalent, and I didn't want to pay the price of the extra check. But alas, didn't know Rust accepted + either (and who knows what else - oh wait, I guess it's specified here).

Now I'm wondering if the String containing the unique characters was overkill. It's very unlikely std will change the parsing logic, as that'd be a breaking change. What do you think? Shall we reconsider before the next release?

@chanced
Copy link
Owner

chanced commented Oct 21, 2024

I'm currently leaning toward just keeping the string given its very likely on an error path at that point but then again, perhaps not.

@asmello
Copy link
Collaborator Author

asmello commented Oct 21, 2024

Yeah, give it some thought. It's not terrible to keep the string, if we decide to. The error path is very unlikely so performance isn't really a concern. Just think it's best we make a conscious call now before committing to it in a release.

@chanced
Copy link
Owner

chanced commented Oct 21, 2024

We could add the offset of the first invalid character from within the token. We do that with InvalidEncodingError. It'd help facilitate displaying the error to the user.

@asmello
Copy link
Collaborator Author

asmello commented Oct 21, 2024

That's not a bad idea. It's generalisable when if we ever do need to support other failure modes. I like it.

@asmello
Copy link
Collaborator Author

asmello commented Oct 21, 2024

Updated #94 to reflect this. I've now made the error more complex than before, but I think it's worth it.

BTW I wonder if we should explore integrating with miette for nicer diagnostic messages. It was designed for helping with designing errors that include a source and offset (which we often do). We probably don't need / shouldn't use the dynamic part though, we already got plenty good custom errors (miette would mostly help with rendering them nicely). Might be tricky to avoid std though...

@chanced
Copy link
Owner

chanced commented Oct 22, 2024

Oh, I forgot all about miette! Reading up now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants