-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Treat gc=No characters as numeric #51609
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Mark-Simulacrum (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
r? @SimonSapin |
#15283 changed from literally testing Changing the behavior now (even if back from a bug) after 4 years might be considered breaking, but we already do that when updating to a new version of Unicode. (Though only code points that were unassigned in the earlier version are affected.) @rust-lang/libs, any thoughts on the stability aspect? |
By the way, I’m very late for this conversation since this method has been stable since 1.0, but I wondered what it would be used for. In https://github.com/search?l=Rust&p=1&q=.is_numeric%28%29&type=Code most of the results seem to be one of:
If this method was proposed now I’d argue against including it in the standard library since it’s usually not what people are looking for. |
Should we just deprecate this entirely? I agree that is_ascii_digit is almost of the time what you're actually looking for. |
I’d be ok with deprecating this and (some?) other Unicode character database-based Back to this PR: regardless of deprecation this function will stick around so we need to decide on its behavior. I think it’s clear that this PLR is a bug fix that restores the behavior to what was intended (and initially implemented). However the buggy behavior has been around for 4 years, and even documented with examples. (This PR is changing the doctest.) |
I think that the doctests should be treated as more important documentation than the text itself. IMHO the text should be updated to match the doctest, not the other way around. I agree with deprecating, too. I honestly can't think of any specific reason why code which requires this specific definition of numeric wouldn't be using a crate like |
I would personally be in favor of landing this patch and continuing to keep these methods. While this contradicts the doctests it doesn't contradict the documentation nor the naive interpretation of this and other While this can be a footgun sometimes it doesn't mean that it's always a footgun. If we really want to remove the method then we could consider renaming it to |
Ping from triage @dscorbett , we haven't heard from you for a while, will you have time to like into this PR |
I agree with @alexcrichton: the documentation defines the function in terms of Nd, Nl, and No, so that is what the functions should do. The two contradictory cases in the doctests are not as important, because readers are, I think, more likely to read the main documentation carefully than the tests; but if they do read the tests, they might just conclude that Unicode doesn’t consider those two characters numeric. (In fact, I initially didn’t remove the test for Contra @SimonSapin, Unicode does not guarantee that general categories are stable, even for assigned code points. For example, in past versions, U+16EE RUNIC ARLAUG SYMBOL changed from No to Nl, U+2160 ROMAN NUMERAL ONE from So to Nl, and U+19DA NEW TAI LUE THAM DIGIT ONE from Nd to No. It is therefore a mistake for a client of these functions to rely on any particular behavior for any given code point. I searched for uses of |
Ping from triage! @alexcrichton / @SimonSapin: How should this PR move forward? |
Do others from @rust-lang/libs feel opposed to merging this? I've outlined above why I think it's fine to land this and why I don't think we're ready yet to deprecate this method |
Ok! Sounds like not a huge amount of thoughts, so let's... @bors: r+ |
📌 Commit 5150ff0 has been approved by |
Treat gc=No characters as numeric [`char::is_numeric`](https://doc.rust-lang.org/std/primitive.char.html#method.is_numeric) and [`char::is_alphanumeric`](https://doc.rust-lang.org/std/primitive.char.html#method.is_alphanumeric) are documented to be defined “in terms of the Unicode General Categories 'Nd', 'Nl', 'No'”, but unicode.py does not group 'No' with the other 'N' categories. These functions therefore currently return `false` for characters like ⟨¾⟩ and ⟨①⟩.
☀️ Test successful - status-appveyor, status-travis |
char::is_numeric
andchar::is_alphanumeric
are documented to be defined “in terms of the Unicode General Categories 'Nd', 'Nl', 'No'”, but unicode.py does not group 'No' with the other 'N' categories. These functions therefore currently returnfalse
for characters like ⟨¾⟩ and ⟨①⟩.