Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with cherokee characters & toCaseFold #277

Closed
felixonmars opened this issue Feb 25, 2020 · 5 comments · Fixed by #402
Closed

Problems with cherokee characters & toCaseFold #277

felixonmars opened this issue Feb 25, 2020 · 5 comments · Fixed by #402
Labels
question Requires more investigation

Comments

@felixonmars
Copy link

Ref haskellari/binary-instances#7:

Cherokee letters should fold to upper case, but now the don't converge:

Prelude Data.Char Data.Text> toCaseFold "\43929"
"\5065"
Prelude Data.Char Data.Text> toCaseFold "\5065"
"\43929"

The docs say:

toCaseFold :: Text -> Text
O(n) Convert a string to folded case. Subject to fusion.

This function is mainly useful for performing caseless (also known as case insensitive) string comparisons.

A string x is a caseless match for a string y if and only if:

toCaseFold x == toCaseFold y

https://unicode.org/faq/casemap_charprop.html
Says

Q: What happens if the uppercase letter is the one that is already encoded?
A: That situation is more complicated. When the existing encoded letter is an uppercase letter and the proposal is to encode a new lowercase letter case pair for it, that is normally disallowed. The case folding for the existing uppercase letter would change, and that is blocked by the requirement for case folding stability. In exceptional situations, if a lowercase letter must be added, it would need to be case-folded to the existing uppercase letter, rather than changing the case folding for that existing letter. Such an exceptional situation did, in fact, apply for the addition of Cherokee lowercase syllables in Version 8.0. Cherokee case folding rules were specified to map to the old uppercase syllables, to preserve case folding stability for them.

@hvr
Copy link
Member

hvr commented Mar 25, 2020

Ok, so this is due to the rules from
https://github.com/haskell/text/blob/master/Data/Text/Internal/Fusion/CaseMapping.hs

specifically

foldMapping '\xab99' s = Yield '\x13c9' (CC s '\x0000' '\x0000')

and

foldMapping c s = Yield (toLower c) (CC s '\0' '\0') 

@trofi
Copy link

trofi commented Aug 20, 2020

Also hit it in https://bugs.gentoo.org/736388

phadej added a commit to phadej/text that referenced this issue Aug 25, 2020
- Add property and regression test that toCaseFold should be idempotent
- Add scripts/tests.sh to run tests with all GHCs.
  There are plenty of setup commands to pass.
- Rework CaseFolding.hs so it considers that toLower behaves
  differently with different GHCs, and therefore fallbacks to it
  only when it behaves consistently.
  For that purpose a helper `scripts/Dump.hs` is added.

Note: `toLower`, `toUpper`, and `toTitle` would benefit from using
dumped database as well.  This commit is already quite big, so that is
left for a follow up.
phadej added a commit to phadej/text that referenced this issue Aug 25, 2020
- Add property and regression test that toCaseFold should be idempotent
- Add scripts/tests.sh to run tests with all GHCs.
  There are plenty of setup commands to pass.
- Rework CaseFolding.hs so it considers that toLower behaves
  differently with different GHCs, and therefore fallbacks to it
  only when it behaves consistently.
  For that purpose a helper `scripts/Dump.hs` is added.
  Also MurMurVariant is quick hash which works with all GHCs,
  it should notice if there are changes in the behaviour.

Note: `toLower`, `toUpper`, and `toTitle` would benefit from using
dumped database as well.  This commit is already quite big, so that is
left for a follow up.
@ezzieyguywuf
Copy link

@phadej I see that three commits reference this issue - is this still open?

@phadej
Copy link
Contributor

phadej commented Dec 29, 2020

Yes, #293

ezzieyguywuf added a commit to ezzieyguywuf/gentoo-haskell that referenced this issue Jan 7, 2021
test was previously restricted in 1.0.0.1 due to an upstream issue in
ghc/text[1]. While the issue remains open, I have succesufully compiled
the test suite for binary-instances-1.0.1, and it seems to pass all
tests.

[1]: haskell/text#277

Signed-off-by: Wolfgang E. Sanyer <[email protected]>
ezzieyguywuf added a commit to ezzieyguywuf/gentoo-haskell that referenced this issue Jan 9, 2021
test was previously restricted in 1.0.0.1 due to an upstream issue in
ghc/text[1]. While the issue remains open, I have succesufully compiled
the test suite for binary-instances-1.0.1, and it seems to pass all
tests.

[1]: haskell/text#277

Signed-off-by: Wolfgang E. Sanyer <[email protected]>
ezzieyguywuf added a commit to ezzieyguywuf/gentoo-haskell that referenced this issue Jan 9, 2021
test was previously restricted in 1.0.0.1 due to an upstream issue in
ghc/text[1]. While the issue remains open, I have succesufully compiled
the test suite for binary-instances-1.0.1, and it seems to pass all
tests.

[1]: haskell/text#277

Signed-off-by: Wolfgang E. Sanyer <[email protected]>
ezzieyguywuf added a commit to ezzieyguywuf/gentoo-haskell that referenced this issue Jan 12, 2021
test was previously restricted in 1.0.0.1 due to an upstream issue in
ghc/text[1]. While the issue remains open, I have succesufully compiled
the test suite for binary-instances-1.0.1, and it seems to pass all
tests.

[1]: haskell/text#277

Signed-off-by: Wolfgang E. Sanyer <[email protected]>
ezzieyguywuf added a commit to ezzieyguywuf/gentoo-haskell that referenced this issue Jan 12, 2021
test was previously restricted in 1.0.0.1 due to an upstream issue in
ghc/text[1]. While the issue remains open, I have succesufully compiled
the test suite for binary-instances-1.0.1, and it seems to pass all
tests.

[1]: haskell/text#277

Signed-off-by: Wolfgang E. Sanyer <[email protected]>
ezzieyguywuf added a commit to ezzieyguywuf/gentoo-haskell that referenced this issue Jan 12, 2021
test was previously restricted in 1.0.0.1 due to an upstream issue in
ghc/text[1]. While the issue remains open, I have succesufully compiled
the test suite for binary-instances-1.0.1, and it seems to pass all
tests.

[1]: haskell/text#277

Signed-off-by: Wolfgang E. Sanyer <[email protected]>
ezzieyguywuf added a commit to ezzieyguywuf/gentoo-haskell that referenced this issue Jan 13, 2021
test was previously restricted in 1.0.0.1 due to an upstream issue in
ghc/text[1]. While the issue remains open, I have succesufully compiled
the test suite for binary-instances-1.0.1, and it seems to pass all
tests.

[1]: haskell/text#277

Signed-off-by: Wolfgang E. Sanyer <[email protected]>
trofi pushed a commit to gentoo-haskell/gentoo-haskell that referenced this issue Jan 14, 2021
test was previously restricted in 1.0.0.1 due to an upstream issue in
ghc/text[1]. While the issue remains open, I have succesufully compiled
the test suite for binary-instances-1.0.1, and it seems to pass all
tests.

[1]: haskell/text#277

Signed-off-by: Wolfgang E. Sanyer <[email protected]>
Signed-off-by: Sergei Trofimovich <[email protected]>
@Lysxia Lysxia added the question Requires more investigation label Mar 7, 2021
@phadej
Copy link
Contributor

phadej commented Jan 1, 2022

This issue still happens with text-2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Requires more investigation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants