-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Case Insensitivity Proof of Concept #1092
Conversation
Fixed exception with invalid escape sequence: antlr#1077 Improved STRING_LITERALS_AND_SETS_CANNOT_BE_EMPTY warning. Now it works with empty sets too []. Added new CHARACTERS_COLLISION_IN_SET warning ([a-f][d-n], [aa-z], 'F'..'A' etc.) Added unit tests for mentioned features.
Sorry, I did not run runtime tests. I'll fix it. |
…set and it's tool code generation error). Fixed runtime tests (removed invalid escaped sequence '\u'.
…start position in full path). Added -O=-inline option for fixing "Method too complex" exception.
I have not idea why "Method L:_serializeATN () is too complex" exception now occured for C# runtime. 😒 |
Probably because the recursive string concatenation ("abc" + "bcd" + … ) is now exceeding the compiler stack limit
|
@ericvergnaud, no, this issue related to lexer. Maybe method has too big size. Local tests are passed (Windows 7). I tried to disable "inline" optimization as described here, but test still failed. |
lexer also has its own serializedATN, and the failing method is TestHugeLexer
|
@ericvergnaud, I understand you, but in this test serializedATN size should not be changed because of caseInsensitive option is not enabled. But I'll look at it in more detail. |
Any change to the meta-language requires deep thought on my part. Not sure when I can devote the time. |
Hi. I'm going to close not because it's not an excellent job but because it's a fairly significant change and I'm nervous about unintended consequences. |
At present time there are quite a lot of case insensitive languages: Pascal, PHP, SQLite, TSQL and other. In the usual case this feature implemented via fragment rules (code generation approach, see here for example) or via overriding LA method in the input stream (runtime approach as mention @jimidle here).
I implemented new option and syntax in ANTLR to support case insensitivity. In my approach it is possible to declare caseInsensitive and caseSensitive modes (it can be used in grammar for parsing both PHP code (caseInsensitive keywords) and JavaScript code (caseSensitive keywords):
Also it is possible to declare caseInsensitive option for entire grammar (combined grammar also supported):
Moreover it is possible to declare caseSensitive and caseInsensitive tokens in the same mode like this:
See unit tests for detail and other cases.
Сase insensitivity implemented via code generation approach, but it can be replaced with runtime approach too if necessary. This feature improve grammar readability and make grammar creation more easy.
If this pull request overall is good, I'll finish remaining issues with separated lexer and parser and other. Otherwise I'll try to separate fixed symbol issues and make another pull request. Suggestions are welcome.