Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toLower and toUpper in Basic.lean are incomplete #741

Closed
1 task
lovettchris opened this issue Oct 26, 2021 · 8 comments
Closed
1 task

toLower and toUpper in Basic.lean are incomplete #741

lovettchris opened this issue Oct 26, 2021 · 8 comments

Comments

@lovettchris
Copy link
Contributor

Prerequisites

  • Put an X between the brackets on this line if you have done all of the following:
    • Checked that your issue isn't already filed.
    • Reduced the issue to a self-contained, reproducible test case.

Description

See ucase.cpp for full unicode consortium implementation of ucase_toFullLower.

Steps to Reproduce

#eval String.toUpper "α"

Expected behavior: [What you expect to happen]
Should return unicode value 913, which is the greek letter "Α"

Actual behavior: [What actually happens]
Instead returns unmodified string "α"

Reproduces how often: [What percentage of the time does it reproduce?]
100%

Versions

"leanprover/lean4:nightly-2021-10-22"

Additional Information

Any additional information, configuration or data that might be necessary to reproduce the issue.

@tydeu
Copy link
Member

tydeu commented Oct 29, 2021

@lovettchris toUpper/toLower are just ASCII uppercase and lowercase, not full Unicode. While Lean does support Unicode symbols, it does so only at a byte level. It does not link to a Unicode library like ICU, and thus it has no information about what codepoints belong to which character categories or the like.

From what I can tell, the support mostly exists for allowing exoteric mathematical symbols in source code, not to do string processing or to support coding in foreign languages (e.g., #698). While there are many users from countries where English is not the primary language, they thus far seem to have decided to stick with English (see the Zulip thread on the question of translating the mathlib docs for an example). That is, there does not seem to be sufficient demand to support other languages at the moment to justify the cost needed to do so.

@lovettchris
Copy link
Contributor Author

I'm just pointing out every other programming language that provides toUpper and toLower do it properly, for example, here's the Rust implementation .So it will confuse a lot of folks when they realize the Lean version isn't. So I would recommend either fixing it or removing it. Statically linking the official unicode library would not be hard to do. This also helps with "isWhitespace". So either Lean is trying to be a full programming language or it isn't... Sure the priority can be lower than other things right now, but I think it needs to get fixed at some point.

@tydeu
Copy link
Member

tydeu commented Oct 29, 2021

@lovettchris I mean, the toupper/tolower in C doesn't either? (And even in C++, it still wouldn't uppercase Greek unless you were in a Greek supporting locale.) So, I do not think lack this feature inhibits it from being a "full programming language". Regardless, once Lean gets a proper the standard library (e.g., #528), things like proper Unicode support can be in there. But, as I understand it, that is not a priority for now. I do agree, however, that this is a desirable feature in the future.

@Kha
Copy link
Member

Kha commented Oct 29, 2021

So either Lean is trying to be a full programming language or it isn't... Sure the priority can be lower than other things right now, but I think it needs to get fixed at some point.

It definitely should be. The short answer is that "trying to be a full programming language" is not the main focus right now.

@leodemoura
Copy link
Member

Regardless, once Lean gets a proper the standard library (e.g., #528), things like proper Unicode support can be in there. But, as I understand it, that is not a priority for now. I do agree, however, that this is a desirable feature in the future.

As @tydeu pointed out, this discussion should be part of the standard library design. I am closing this issue here since it is not blocking any Lean user.

@lovettchris
Copy link
Contributor Author

Leo where are the "standard library design" issues kept then so I can move it there?

@leodemoura
Copy link
Member

Leo where are the "standard library design" issues kept then so I can move it there?

We currently don't have a repo for the std library yet. It is not the current focus right now.

@leodemoura
Copy link
Member

Note that it may not even be a single repo, but a collection of repos.

ChrisHughes24 pushed a commit to ChrisHughes24/lean4 that referenced this issue Dec 2, 2022
TODO:
* Before merge
- [x] fix a bug in linarith in mathlib3 I just found ...
- [x] depends on: leanprover#519 
- [x] style lint
- [x] docs
- [X] move theory stubs to a separate PR, for easier tracking: leanprover#733 
- [x] failing to parse the `LinarithConfig` option

* Before or after merge?
- [ ] Implement the `removeNe` preprocessor.
- [ ] Add support for restricting to a single type. How to store a `Type` in `LinarithConfig`?

* After merge
- [ ] Teach `norm_num` to solve `example [LinearOrderedRing α] : (0 : α) < 37 := by norm_num`.
- [ ] Port `zify_proof` (plumbing for `zify`), and add the `natToInt` preprocessor.
  Mostly done now, but see leanprover#741 before all the tests will work.
- [ ] Port `cancel_denoms` tactic, and add the `cancelDenoms` preprocessor.
- [ ] Add the `nlinarith` preprocessor and front-end syntax.


Co-authored-by: Mario Carneiro <[email protected]>
Co-authored-by: Scott Morrison <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants