Skip to content

Commit

Permalink
clarify yypushback behaviour with surrogate characters
Browse files Browse the repository at this point in the history
see also issue #215
  • Loading branch information
lsf37 committed Nov 3, 2017
1 parent 023ef9e commit 846d20c
Showing 1 changed file with 10 additions and 5 deletions.
15 changes: 10 additions & 5 deletions docs/md/lex-specs.md
Original file line number Diff line number Diff line change
Expand Up @@ -1053,12 +1053,13 @@ Currently, the API consists of the following methods and member fields:

- `int yylength()`

returns the length of the matched input text region (does not
require a `String` object to be created)
returns the length of the matched input text region as number of Java `chars`
(as opposed to Unicode code points). Does notrequire a `String` object to be
created.

- `char yycharat(int pos)`

returns the character at position `pos` from the matched text. It is
returns the Java `char` at position `pos` from the matched text. It is
equivalent to `yytext().charAt(pos)`, but faster. `pos` must be a
value from `0` to `yylength()-1`.

Expand Down Expand Up @@ -1126,9 +1127,10 @@ Currently, the API consists of the following methods and member fields:

- `void yypushback(int number)`

pushes `number` characters of the matched text back into the input
pushes `number` Java `char`s (as opposed to Unicode code points)
of the matched text back into the input
stream. They will be read again in the next call of the scanning
method. The number of characters to be read again must not be
method. The number of chars to be read again must not be
greater than the length of the matched text. The pushed back
characters will not be included in `yylength()` and `yytext()`. Note
that in Java strings are unchangeable, i.e. an action code like
Expand All @@ -1144,6 +1146,9 @@ Currently, the API consists of the following methods and member fields:

will return the matched text minus the last character.

Note that with Unicode surrogate characters it is possible that
expressions such as `[^]` match more than one `char`.

- `int yyline`

contains the current line of input (starting with 0, only active
Expand Down

0 comments on commit 846d20c

Please sign in to comment.