Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test suite failure, Georgian, GHC >= 8.8.x #541

Open
chessai opened this issue Oct 16, 2020 · 3 comments
Open

test suite failure, Georgian, GHC >= 8.8.x #541

chessai opened this issue Oct 16, 2020 · 3 comments

Comments

@chessai
Copy link
Contributor

chessai commented Oct 16, 2020

#540 (comment)

Possibly a bug introduced by a dependency. Needs further investigation, since 8.8.3 being the default compiler is imminent for this project.

@chessai
Copy link
Contributor Author

chessai commented Nov 11, 2020

have narrowed down the discrepancy to matchFirstAnywhere. using ghc865 vs ghc884, trace is exactly the same until this point.

ghc865:

> concatMapM (matchFirst "ოცდაერთი" Stash.empty) []
Identity []
> concatMapM (matchFirstAnywhere "ოცდაერთი" Stash.empty) (rulesFor (makeLocale KA Nothing) (HashSet.fromList [Seal Numeral]))
Identity [("integer (0..19)",8,[Node {nodeRange = Range 4 8, token = Token RegexMatch (GroupMatch ["\4308\4320\4311\4312",""]), children = [], rule = Nothing}]),("integer (20..90)",4,[Node {nodeRange = Range 0 4, token = Token RegexMatch (GroupMatch ["\4317\4330\4307\4304"]), children = [], rule = Nothing}])]

ghc884:

> concatMapM (matchFirst "ოცდაერთი" Stash.empty) []
Identity []
> concatMapM (matchFirstAnywhere "ოცდაერთი" Stash.empty) (rulesFor (makeLocale KA Nothing) (HashSet.fromList [Seal Numeral]))

And further, into lookupItemAnywhere:

ghc865:

> concatMapM (matchFirstAnywhere' (Document.fromText "ოცდაერთი") Stash.empty) (rulesFor (makeLocale KA Nothing) (HashSet.fromList [Seal Numeral]))
("integer (0..19)",8,[Node {nodeRange = Range 4 8, token = Token RegexMatch (GroupMatch ["\4308\4320\4311\4312",""]), children = [], rule = Nothing}])
("integer (20..90)",4,[Node {nodeRange = Range 0 4, token = Token RegexMatch (GroupMatch ["\4317\4330\4307\4304"]), children = [], rule = Nothing}])
[("integer (0..19)",8,[Node {nodeRange = Range 4 8, token = Token RegexMatch (GroupMatch ["\4308\4320\4311\4312",""]), children = [], rule = Nothing}]),("integer (20..90)",4,[Node {nodeRange = Range 0 4, token = Token RegexMatch (GroupMatch ["\4317\4330\4307\4304"]), children = [], rule = Nothing}])]

> concatMapM (\p -> lookupItemAnywhere "ოცდაერთი" p Stash.empty) $ map head $ map pattern $ rulesFor (makeLocale KA Nothing) (HashSet.fromList [Seal Numeral])
Identity [Node {nodeRange = Range 4 8, token = Token RegexMatch (GroupMatch ["\4308\4320\4311\4312",""]), children = [], rule = Nothing},Node {nodeRange = Range 0 4, token = Token RegexMatch (GroupMatch ["\4317\4330\4307\4304"]), children = [], rule = Nothing}]

ghc884:

> concatMapM (matchFirstAnywhere' (Document.fromText "ოცდაერთი") Stash.empty) (rulesFor (makeLocale KA Nothing) (HashSet.fromList [Seal Numeral]))
[]

> concatMapM (\p -> lookupItemAnywhere "ოცდაერთი" p Stash.empty) $ map head $ map pattern $ rulesFor (makeLocale KA Nothing) (HashSet.fromList [Seal Numeral])
Identity []

and in lookupRegexAnywhere/lookupRegexCommon:

ghc865:

> filter (not . null) $ map (runDuckling . lookupRegexAnywhere "ოცდაერთი") $ [re | r <- rulesFor (makeLocale KA Nothing) (HashSet.fromList [Seal Numeral]), Regex re <- pattern r]
[[Node {nodeRange = Range 4 8, token = Token RegexMatch (GroupMatch ["\4308\4320\4311\4312",""]), children = [], rule = Nothing}],[Node {nodeRange = Range 0 4, token = Token RegexMatch (GroupMatch ["\4317\4330\4307\4304"]), children = [], rule = Nothing}],[Node {nodeRange = Range 2 4, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing}]]

> concatMap (\re -> runDuckling $ lookupRegexCommon "ოცდაერთი" re 0 Regex.matchAll) $ [re | r <- rulesFor (makeLocale KA Nothing) (HashSet.fromList [Seal Numeral]), Regex re <- pattern r]
[Node {nodeRange = Range 4 8, token = Token RegexMatch (GroupMatch ["\4308\4320\4311\4312",""]), children = [], rule = Nothing},Node {nodeRange = Range 0 4, token = Token RegexMatch (GroupMatch ["\4317\4330\4307\4304"]), children = [], rule = Nothing},Node {nodeRange = Range 2 4, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing}]

ghc884:

> filter (not . null) $ map (runDuckling . lookupRegexAnywhere "ოცდაერთი") $ [re | r <- rulesFor (makeLocale KA Nothing) (HashSet.fromList [Seal Numeral]), Regex re <- pattern r]
[]

> concatMap (\re -> runDuckling $ lookupRegexCommon "ოცდაერთი" re 0 Regex.matchAll) $ [re | r <- rulesFor (makeLocale KA Nothing) (HashSet.fromList [Seal Numeral]), Regex re <- pattern r]
[]

Looks like the culprit is isRangeValid.

@chessai
Copy link
Contributor Author

chessai commented Nov 11, 2020

definitely a run-in with #420 (also see #442)

@chessai
Copy link
Contributor Author

chessai commented Nov 11, 2020

This is due to https://gitlab.haskell.org/ghc/ghc/-/commit/14d88380ecb909e7032598aaad4efebb72561784

In particular, Data.Char.isLower now behaves properly for even more unicode character points. We were relying on incorrect behaviour

isRangeValid should definitely be fixed to be less janky, i wonder if it's possible to move away from recovering matching from text

@chessai chessai changed the title test suite failure, Georgian, GHC 8.8.3 only test suite failure, Georgian, GHC >= 8.8.x Nov 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant