You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The lexer does not correctly handle input strings containing a Unicode escape sequence like 'Fran\u00E7ois', due to token recognition error. Wrapping the input stream in a CaseInsensitiveInputStream makes it work though.
Here is a unit test demo:
@TestvoidtestLexerUnicodeEscapes() {
Strings = "'Fran\\u00E7ois'";
// Using a plain CodePointCharStream failsIllegalStateExceptionexc = assertThrows(IllegalStateException.class, () -> {
tryLexing(CharStreams.fromString(s));
});
assertEquals("Syntax error on line 1:0: token recognition error at: ''Fran\\u00E'.", exc.getMessage());
// Wrapping it in a CaseInsensitiveInputStream makes it work. Why?CommonTokenStreamtokens = tryLexing(newCaseInsensitiveInputStream(CharStreams.fromString(s)));
assertEquals(2, tokens.size());
}
privateCommonTokenStreamtryLexing(CharStreamstream) {
ApexLexerlexer = newApexLexer(stream);
lexer.removeErrorListeners(); // Avoid distracting "token recognition error" stderr outputlexer.addErrorListener(newBaseErrorListener() {
@OverridepublicvoidsyntaxError(Recognizer<?, ?> recognizer, ObjectoffendingSymbol, intline,
intcharPositionInLine, Stringmsg, RecognitionExceptione) {
thrownewIllegalStateException(String.format("Syntax error on line %d:%d: %s.",
line, charPositionInLine, msg));
}
});
CommonTokenStreamtokens = newCommonTokenStream(lexer);
tokens.fill();
returntokens;
}
Is this a by design or a bug? The Apex language is case-insensitive but that shouldn't affect these string values.
Notes:
Upgrading ANTLR from 4.9.1 to 4.13.2 does not solve it, but it's still good practice
Lexing with CommonTokenStream works correctly for literal non-ASCII Unicode characters like 'François'
The text was updated successfully, but these errors were encountered:
The lexer does not correctly handle input strings containing a Unicode escape sequence like
'Fran\u00E7ois'
, due totoken recognition error
. Wrapping the input stream in aCaseInsensitiveInputStream
makes it work though.Here is a unit test demo:
Is this a by design or a bug? The Apex language is case-insensitive but that shouldn't affect these string values.
Notes:
CommonTokenStream
works correctly for literal non-ASCII Unicode characters like'François'
The text was updated successfully, but these errors were encountered: