chore(Data/UInt): deprecate UInt8.isUpper etc #13521

kim-em · 2024-06-04T23:06:20Z

I will be moving Char.toUInt8 and Char.ofUInt8 up to Lean shortly. (Done now in leanprover/lean4#4348.) This deprecates some functions that have bad semantics: if you want to treat your UInt8 as a Char, do that explicitly!

This will be a short deprecation cycle, as we'd like to get this stuff out of Mathlib asap.

digama0 · 2024-06-04T23:37:39Z

This seems more semantically suspect than what is being deprecated. Either Char.isUpper is unicode aware, in which case UInt8.isUpper can do it faster, or it is not in which case it has a bad name and should be renamed to Char.isAsciiUpper (and UInt8.isUpper can probably still do it faster).

digama0 · 2024-06-04T23:42:03Z

Note that the original purpose of these functions is for being able to efficiently process large (validated) ASCII byte strings without the size or decoding cost of UTF-8 and unicode-aware string processing. Arguably this should be replaced by def AsciiChar := UInt8, but it seems difficult to make this play well with ByteArray. Also, Rust does it.

Mathlib/Data/Char.lean

Co-authored-by: Richard Osborn <[email protected]>

kim-em · 2024-06-05T00:19:59Z

I'm pretty frustrated here, to be honest. I thought making this PR would be a helpful compromise rather than just deleting this section outright, because you care about deprecating things before deletion.

rosborn · 2024-06-05T00:38:00Z

I agree that an Ascii type would be better than pretending UInt8 is Ascii, but are people needing to process large Ascii bytestrings? Can't they just assume its UTF-8?

digama0 · 2024-06-05T00:47:08Z

I'm pretty frustrated here, to be honest. I thought making this PR would be a helpful compromise rather than just deleting this section outright, because you care about deprecating things before deletion.

I don't really agree they should be deprecated at all, they serve a useful purpose. The rust function puts Ascii in the name of the function, and I would agree with adding it to the name of the function here, but deletion is not appropriate.

I agree that an Ascii type would be better than pretending UInt8 is Ascii, but are people needing to process large Ascii bytestrings? Can't they just assume its UTF-8?

This was added precisely because I had a project that was measurably improved by the use of this function. UTF-8 decoding isn't free, especially when it's done in the lean interpreter. And much more important than the 32->8 bit part is not having to handle multilingual case-folding.

kim-em · 2024-06-05T01:46:01Z

@digama0, I agree these functions could well be useful to someone. However they should not live in Mathlib.

kim-em · 2024-06-05T05:33:52Z

I would be content to change the deprecation message to instead not give a replacement suggestion, and to add a module doc advising that if any downstream users are using these that they should find a new home for them (possibly at the point of use) because they will be removed from Mathlib.

Parcly-Taxel · 2024-06-05T05:37:17Z

I'm pretty frustrated here, to be honest. I thought making this PR would be a helpful compromise rather than just deleting this section outright, because you care about deprecating things before deletion.

Go on, just delete the section outright.

eric-wieser · 2024-06-05T07:57:05Z

Arguably this should be replaced by def AsciiChar := UInt8, but it seems difficult to make this play well with ByteArray.

Could you elaborate on this? Putting things in a new namespace seems like an easy way to get out of the way of core

kbuzzard · 2024-06-05T09:01:50Z

I am confused. Isn't the point of this exercise to get the material which is only relevant for program verification and which doesn't involve mathlib notions and which is irrelevant for mathematics, out of mathlib? Why do we want to put it in a new namespace rather than putting it in the hands of the Lean developers? It's only there at all because of some historical coincidence.

eric-wieser · 2024-06-05T09:29:39Z

I would be content to change the deprecation message to instead not give a replacement suggestion, and to add a module doc advising that if any downstream users are using these that they should find a new home for them (possibly at the point of use) because they will be removed from Mathlib.

This sounds better than the current state of this PR to me, but it would be good to wait to see what Mario thinks about it.

chore(Data/UInt): deprecate UInt8.isUpper etc

10e3e32

kim-em added the awaiting-review label Jun 4, 2024

rosborn requested changes Jun 4, 2024

View reviewed changes

Mathlib/Data/Char.lean Outdated Show resolved Hide resolved

Update Mathlib/Data/Char.lean

4edb6b9

Co-authored-by: Richard Osborn <[email protected]>

rosborn approved these changes Jun 5, 2024

View reviewed changes

leanprover-community-mathlib4-bot added the merge-conflict The PR has a merge conflict with master, and needs manual merging. (this label is managed by a bot) label Jun 7, 2024

kim-em closed this Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(Data/UInt): deprecate UInt8.isUpper etc #13521

chore(Data/UInt): deprecate UInt8.isUpper etc #13521

kim-em commented Jun 4, 2024 •

edited

Loading

digama0 commented Jun 4, 2024

digama0 commented Jun 4, 2024

kim-em commented Jun 5, 2024

rosborn commented Jun 5, 2024 •

edited

Loading

digama0 commented Jun 5, 2024

kim-em commented Jun 5, 2024

kim-em commented Jun 5, 2024

Parcly-Taxel commented Jun 5, 2024

eric-wieser commented Jun 5, 2024

kbuzzard commented Jun 5, 2024

eric-wieser commented Jun 5, 2024

chore(Data/UInt): deprecate UInt8.isUpper etc #13521

chore(Data/UInt): deprecate UInt8.isUpper etc #13521

Conversation

kim-em commented Jun 4, 2024 • edited Loading

digama0 commented Jun 4, 2024

digama0 commented Jun 4, 2024

kim-em commented Jun 5, 2024

rosborn commented Jun 5, 2024 • edited Loading

digama0 commented Jun 5, 2024

kim-em commented Jun 5, 2024

kim-em commented Jun 5, 2024

Parcly-Taxel commented Jun 5, 2024

eric-wieser commented Jun 5, 2024

kbuzzard commented Jun 5, 2024

eric-wieser commented Jun 5, 2024

kim-em commented Jun 4, 2024 •

edited

Loading

rosborn commented Jun 5, 2024 •

edited

Loading