Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(Data/UInt): deprecate UInt8.isUpper etc #13521

Closed
wants to merge 2 commits into from
Closed

Conversation

kim-em
Copy link
Contributor

@kim-em kim-em commented Jun 4, 2024

I will be moving Char.toUInt8 and Char.ofUInt8 up to Lean shortly. (Done now in leanprover/lean4#4348.) This deprecates some functions that have bad semantics: if you want to treat your UInt8 as a Char, do that explicitly!

This will be a short deprecation cycle, as we'd like to get this stuff out of Mathlib asap.


Open in Gitpod

@digama0
Copy link
Member

digama0 commented Jun 4, 2024

This seems more semantically suspect than what is being deprecated. Either Char.isUpper is unicode aware, in which case UInt8.isUpper can do it faster, or it is not in which case it has a bad name and should be renamed to Char.isAsciiUpper (and UInt8.isUpper can probably still do it faster).

@digama0
Copy link
Member

digama0 commented Jun 4, 2024

Note that the original purpose of these functions is for being able to efficiently process large (validated) ASCII byte strings without the size or decoding cost of UTF-8 and unicode-aware string processing. Arguably this should be replaced by def AsciiChar := UInt8, but it seems difficult to make this play well with ByteArray. Also, Rust does it.

Mathlib/Data/Char.lean Outdated Show resolved Hide resolved
Co-authored-by: Richard Osborn <[email protected]>
@kim-em
Copy link
Contributor Author

kim-em commented Jun 5, 2024

I'm pretty frustrated here, to be honest. I thought making this PR would be a helpful compromise rather than just deleting this section outright, because you care about deprecating things before deletion.

@rosborn
Copy link
Collaborator

rosborn commented Jun 5, 2024

I agree that an Ascii type would be better than pretending UInt8 is Ascii, but are people needing to process large Ascii bytestrings? Can't they just assume its UTF-8?

@digama0
Copy link
Member

digama0 commented Jun 5, 2024

I'm pretty frustrated here, to be honest. I thought making this PR would be a helpful compromise rather than just deleting this section outright, because you care about deprecating things before deletion.

I don't really agree they should be deprecated at all, they serve a useful purpose. The rust function puts Ascii in the name of the function, and I would agree with adding it to the name of the function here, but deletion is not appropriate.

I agree that an Ascii type would be better than pretending UInt8 is Ascii, but are people needing to process large Ascii bytestrings? Can't they just assume its UTF-8?

This was added precisely because I had a project that was measurably improved by the use of this function. UTF-8 decoding isn't free, especially when it's done in the lean interpreter. And much more important than the 32->8 bit part is not having to handle multilingual case-folding.

@kim-em
Copy link
Contributor Author

kim-em commented Jun 5, 2024

@digama0, I agree these functions could well be useful to someone. However they should not live in Mathlib.

@kim-em
Copy link
Contributor Author

kim-em commented Jun 5, 2024

I would be content to change the deprecation message to instead not give a replacement suggestion, and to add a module doc advising that if any downstream users are using these that they should find a new home for them (possibly at the point of use) because they will be removed from Mathlib.

@Parcly-Taxel
Copy link
Collaborator

I'm pretty frustrated here, to be honest. I thought making this PR would be a helpful compromise rather than just deleting this section outright, because you care about deprecating things before deletion.

Go on, just delete the section outright.

@eric-wieser
Copy link
Member

Arguably this should be replaced by def AsciiChar := UInt8, but it seems difficult to make this play well with ByteArray.

Could you elaborate on this? Putting things in a new namespace seems like an easy way to get out of the way of core

@kbuzzard
Copy link
Member

kbuzzard commented Jun 5, 2024

I am confused. Isn't the point of this exercise to get the material which is only relevant for program verification and which doesn't involve mathlib notions and which is irrelevant for mathematics, out of mathlib? Why do we want to put it in a new namespace rather than putting it in the hands of the Lean developers? It's only there at all because of some historical coincidence.

@eric-wieser
Copy link
Member

I would be content to change the deprecation message to instead not give a replacement suggestion, and to add a module doc advising that if any downstream users are using these that they should find a new home for them (possibly at the point of use) because they will be removed from Mathlib.

This sounds better than the current state of this PR to me, but it would be good to wait to see what Mario thinks about it.

@leanprover-community-mathlib4-bot leanprover-community-mathlib4-bot added the merge-conflict The PR has a merge conflict with master, and needs manual merging. (this label is managed by a bot) label Jun 7, 2024
@kim-em kim-em closed this Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-conflict The PR has a merge conflict with master, and needs manual merging. (this label is managed by a bot)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants