Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic-bytestring wrapper does not work with left contexts #53

Open
jmoy opened this issue Dec 20, 2014 · 1 comment
Open

basic-bytestring wrapper does not work with left contexts #53

jmoy opened this issue Dec 20, 2014 · 1 comment

Comments

@jmoy
Copy link
Contributor

jmoy commented Dec 20, 2014

The basic-bytestring wrapper does not work correctly with left contexts when provided with characters which are encoded as multiple bytes in UTF-8.

The following program produces True,False while I expect it to produce True,True.

{
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Data.ByteString.Lazy.Char8 as B
}

%wrapper "basic-bytestring"

tokens :-
  a^b    {const True}
  a      {const True}
  ∃^∀    {const True}
  ∃      {const True}
  .      {const False}

{
main::IO ()
main = do
  print . and . alexScanTokens $ "ab"
  print . and . alexScanTokens $ "∃∀"
}

I think this is due to alexGetByte for this wrapper remembering the last byte rather than the last character.

Since converting input bytes to characters puts unnecessary costs on the users of this wrapper maybe we should just not implement left contexts in this case?

@abt8601
Copy link

abt8601 commented Nov 13, 2020

@jmoy Your program does not perform UTF-8 encoding correctly. The fromString instance of Data.ByteString.Lazy.ByteString just maps each code point c to c `mod` 255. To correctly perform UTF-8 encoding, you can use Data.Text.Lazy.Text and Data.Text.Lazy.Encoding.encodeUtf8. The modified program is as follows.

{
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Data.Text.Lazy as T
import Data.Text.Lazy.Encoding
}

%wrapper "basic-bytestring"

tokens :-
  a^b    {const True}
  a      {const True}
  ∃^∀    {const True}
  ∃      {const True}
  .      {const False}

{
main::IO ()
main = do
  print . and . alexScanTokens . encodeUtf8 $ "ab"
  print . and . alexScanTokens . encodeUtf8 $ "∃∀"
}

However, the result is still True,False.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants