Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignored token still being captured #180

Closed
stoneRdev opened this issue Jul 26, 2021 · 1 comment
Closed

Ignored token still being captured #180

stoneRdev opened this issue Jul 26, 2021 · 1 comment

Comments

@stoneRdev
Copy link

stoneRdev commented Jul 26, 2021

Hello, I have been working on string parsing, and can't seem to get the token ignore operator to work.

Here's my grammar:

String <- StringStart < StringContent > StringEnd
StringContent <- StringInnerContent*
StringInnerContent 
    <- !($tag / "\\") < . >
    / Escape
    / FullEscapeSequence
~StringStart <- $tag< StringTag >
~StringEnd <- $tag
StringTag <- "'" / '"'
FullEscapeSequence
    <- "\\n" | "\\r" | "\\t" | "\\b" | "\\f" | "\\v" | "\\0"
Escape 
    <- NonCapturingEscape < (StringTag / ("x" [0-9a-fA-F]{2} ) / "\\") >
~NonCapturingEscape <- "\\"

Test input:
"a \n \r \t \b \f \v \0 \' \" \\ \xff"
Output (from PEG Playground):

- String (a \n \r \t \b \f \v \0 \' \" \\ \xff)

Expected output:

- String (a \n \r \t \b \f \v \0 ' " \ xff)

I have tried a few variations, but this one seems to me to be the one most indicative of what I need.

I'm trying to parse strings with escape sequences, some keeping the escape while others drop it, but the Escape still includes the backslashes, even though I'm using token boundaries and the ignore operator to exclude NonCapturingEscape.

Am I doing something wrong? Or is this a bug in the parser?

Thank you for your time!

-- As a side, after I get this figured out I'm planning on making my grammar also accept escaped escape sequences, where input \\n becomes \n. If you have any pointers there I would be greatly appreciative! Thank you

@yhirose
Copy link
Owner

yhirose commented Jul 31, 2021

@stoneRdev, in short, cpp-peglib can't handle what you are trying to do with AstBase::token since it's a string view region to the original parsed text instead of a string as below:

cpp-peglib/peglib.h

Lines 3852 to 3853 in 4109480

const bool is_token;
const std::string_view token;

In other words, resolving escape sequence isn't the parser's job...

If I do the similar to what you want to do, String shouldn't be a token rule, but a regular rule which contains child character nodes. I actually have done the similar in my interpreter to implement interpolated string. An interpolated string is expressed as a list of EXPRESSION and INTERPOLATED_CONTENT nodes.
https://github.com/yhirose/culebra/blob/cb52dd66a7a205ace5296d09a6ec27764d658d04/include/interpreter.h#L75-L76

The interpolated content nodes will be properly resolved later in the interpreter.
https://github.com/yhirose/culebra/blob/cb52dd66a7a205ace5296d09a6ec27764d658d04/include/interpreter.h#L1044-L1056

You may want to check #44 as well.

Hoe the above explanation helps you!

@yhirose yhirose closed this as completed Jul 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants