-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase ATN states size limit, simplify ATN serialization #3546
Conversation
Clean up ATN serializer/deserializer code
…er.MAX_VALUE) fix antlr#840, fix antlr#1863, fix antlr#2732, fix antlr#3338
Can you describe what needs to change to handle 32-bit ints using 16-bit unicode? I assume this is easy to fix for non-Java. Just use int not short. What is effect on Java ATN code size? Ah. I see: runtime/Java/src/org/antlr/v4/runtime/atn/ATNDataWriter.java Basically using high bits to encode right? Hmm...again I'm worried about a very rare cases causes a big code change. Not that this approach is "wrong" but always gotta worry about breaking stuff. Also wouldn't this hurt the common case to use upper bit(s)? |
We used a different encoding for 16-bit and 32-bit values: antlr4/runtime/Java/src/org/antlr/v4/runtime/atn/ATNSerializer.java Lines 166 to 201 in db8a483
Yes, two high bits are used for service purposes.
All big Unicode values are encoded using not more than 2 integers in tests as it was before, I've checked: https://github.com/antlr/antlr4/pull/3546/files#diff-45a1efdccdd2d1ab783a98c0d1dc54c09914461d0c4bb0237d9dbe17bd4168dcR39 3 bytes are used for very big values (>= 2^31) and for negative numbers except -1 (negative numbers aren't used for serialization at all):
All values within the range @Test public void testATNDataWriterReaderCompact() {
IntegerList integerList = new IntegerList();
ATNDataWriter writer = new ATNDataWriter(integerList, "Java");
assertEquals(1, writer.write(0));
assertEquals(1, writer.write(-1));
assertEquals(1, writer.write(42));
assertEquals(2, writer.write(1 << 14));
assertEquals(2, writer.write(0xFFFF));
assertEquals(3, writer.write(Integer.MAX_VALUE));
assertEquals(3, writer.write(Integer.MIN_VALUE));
assertEquals(13, integerList.size());
char[] charArray = Utils.toCharArray(integerList);
ATNDataReader reader = new ATNDataReader(charArray);
assertEquals(0, reader.read());
assertEquals(-1, reader.read());
assertEquals(42, reader.read());
assertEquals(1 << 14, reader.read());
assertEquals(0xFFFF, reader.read());
assertEquals(Integer.MAX_VALUE, reader.read());
assertEquals(Integer.MIN_VALUE, reader.read());
}
@Test public void testATNDataWriterReaderRaw() {
IntegerList integerList = new IntegerList();
ATNDataWriter writer = new ATNDataWriter(integerList, "Java");
writer.writeInt32(0);
writer.writeInt32(-1);
writer.writeInt32(42);
writer.writeInt32(1 << 14);
writer.writeInt32(0xFFFF);
writer.writeInt32(Integer.MAX_VALUE);
writer.writeInt32(Integer.MIN_VALUE);
assertEquals(7 * 2, integerList.size());
char[] charArray = Utils.toCharArray(integerList);
ATNDataReader reader = new ATNDataReader(charArray);
assertEquals(0, reader.readInt32());
assertEquals(-1, reader.readInt32());
assertEquals(42, reader.readInt32());
assertEquals(1 << 14, reader.readInt32());
assertEquals(0xFFFF, reader.readInt32());
assertEquals(Integer.MAX_VALUE, reader.readInt32());
assertEquals(Integer.MIN_VALUE, reader.readInt32());
} The ATN serializer format is already not back-compatible with the previous versions (because of UUID removing and version changing). |
Hmm... I think @ericvergnaud had a concern about double encoding; first we encode to get larger integers and then we have to use modified UTF-8. in principle that's okay but it makes me nervous. I think Eric's concern was about all of the decoding on a phone. Also, 14 bits only gives us 16384 max, which I think is hitting the size of the lexer for SQL grammars. I don't remember where we put the numbers I had but isn't it going to hurt grammar size for such large but 16-bit conforming grammars? I see that we allow 32 bit values and sets but I don't see how our current code allows 32-bit single edge transitions in the ATN. do you have any idea if that is the case? |
In my changes, there is no need to encode small numbers at first and big numbers at second, natural order is preserved. It should not break decoding on a phone. Also, it looks like such Unicode encoding was introduced by @bhamiltoncx in fd4246c
It does not hit the size of the lexer for SQL grammars because the max value is
I've added the generated test getAtnStatesSizeMoreThan65535Descriptor that fails on the previous version. Actually, max is 31-bit transitions because of int is signed type. |
thanks for the link back to the empirical tests I did for the SQL grammar.
If this is true, seems like we are doing some major surgery to support a tiny subset of the population. on the other hand, it sounds like there are a number of people that are finding this useful; you passed some links to issues. we could minimize the size of this change by limiting it to only those targets that need to use strings to encode integer lists, right? certainly that his java. can C# do static integer arrays properly? if so, then we don't need this for C# either. we should definitely not do this for any targets such as C++ that can directly encode static arrays. does JavaScript use strings? Dang, yep, looks like JavaScript uses strings as well:
it looks like Java, C#, JS, Dart, Php, Python2/3, Swift. So, everybody except Go and C++? Dang. we should definitely get away from strings for any language that allows it. certainly this should be avoided for Python as it can simply do |
Ok, I've spent a lot of time thinking about this today and went back over the arguments from before. basically I'd like to stick with my original suggestion:
It seems to me that we should simply get an IntegerList out of Sounds like we still need to convert the Java ATNDeserializer to use a "reader", as you have suggested, so that it can read either 16 or 32 depending on what kind of data we have stored. Others seem to use something similar but for example Python directly refers to readInt vs readInt32. We'll need a generic "read" that pulls 16 or 32 bits depending on the value following the version number. |
How to represent
It also requires "bit twiddling" if 32-bit integer mode is activated. It's simple but it exists: public int read() {
return data[p++] | (data[p++] << 16);
} |
Yup, sounds like we need to limit to 0xFFFF - 1 for valid 16 bit values. You're right shifting by 16 is twiddling :) but simple enough I think. |
It's quite a strange solution, especially considering ANTLR is also a binary parser and
I've tried to implement this idea but encountered the following problems:
Ok, let's postpone fixing mentioned bugs because I don't think implementing such encoding is a very good idea. Actually, I think using But feel free to use my commit with the writer/reader and implement your solution if you think it's correct. |
Ok, sounds good. Valid points you raise but we can discuss in future. Thanks for all your efforts. |
) * refactor serialize so we don't need comments * more cleanup during refactor * store language in serializer obj * A lexer rule token type should never be -1 (EOF). 0 is fragment but then must be > 0. * Go uses int not uint16 for ATN now. java/go/python3 pass * remove checks for 0xFFFF in Go. * C++ uint16_t to int for ATN. * add mac php dir; fix type on accept() for generated code to be mixed. * Add test from @KvanTTT. This PR fixes #3555 for non-Java targets. * cleanup and add big lexer from #3546 * increase mvn mem size to 2G * increase mvn mem size to 8G * turn off the big ATN lexer test as we have memory issues during testing. * Fixes #3592 * Revert "C++ uint16_t to int for ATN." This reverts commit 4d2ebbf. # Conflicts: # runtime/Cpp/runtime/src/atn/ATNSerializer.cpp # runtime/Cpp/runtime/src/tree/xpath/XPathLexer.cpp * C++ uint16_t to int32_t for ATN. * rm unnecessary include file, updating project file. get rid of the 0xFFFF does in the C++ deserialization * rm refs to 0xFFFF in swift * javascript tests were running as Node...added to ignore list. * don't distinguish between 16 and 32 bit char sets in serialization; Python2/3 updated to work with this change. * update C++ to deserialize only 32-bit sets * 0xFFFF -> -1 for C++ target. * get other targets to use 32-bit sets in serialization. tests pass locally. * refactor to reduce code size * add comment * oops. comment out call to writeSerializedATNIntegerHistogram(). I wonder if this is why it ran out of memory during testing? * all but Java, Node, PHP, Go work now for the huge lexer file; I have set them to ignore. note that the swift target takes over a minute to lex it. I've turned off Node but it does not seem to terminate but it could terminate eventually. * all but Java, Node, PHP, Go work now for the huge lexer file; I have set them to ignore. note that the swift target takes over a minute to lex it. I've turned off Node but it does not seem to terminate but it could terminate eventually. * Turn off this big lexer because we get memory errors during continuous integration * Intermediate commit where I have shuffled around all of the -1 flipping and bumping by two. work still needs to be done because the token stream rewriter stuff fails. and I assume the other decoding for human readability testing if doesn't work * convert decode to use int[]; remove dead code. don't use serializeAsChar stuff. more tests pass. * more tests passing. simplify. When copying atn, must run ATN through serializer to set some state flags. * 0xFFFD+ are not valid char * clean up. tests passing now * huge clean up. Got Java working with 32-bit ATNs!Still working on cleanup but I want to run the tests * Cleanup the hack I did earlier; everything still seems to work * Use linux DCO not our old contributors certificate of origin * remove bump-by-2 code * clean up per @KvanTTT. Can't test locally on this box. Will see what CI says. * tweak comment * Revert "Use linux DCO not our old contributors certificate of origin" This reverts commit b0f8551. * see if C++ works in CI for huge ATN
* Get rid of reflection in CodeGenerator * Rename TargetType -> Language * Remove TargetType enum, use String instead as it was before Create CodeGenerator only one time during grammar processing, refactor code * Add default branch to appendEscapedCodePoint for unofficial targets (Kotlin) * Remove getVersion() overrides from Targets since they return the same value * Remove getLanguage() overrides from Targets since common implementation returns correct value * [again] don't use "quiet" option for mvn tests...hard to figure out what's wrong when failed. * normalize targets to 80 char strings for ATN serialization, except Java which needs big strings for efficiency. * Update actions.md fixed a small typo * Rename `CodeGenerator.createCodeGenerator` to `CodeGenerator.create` * Replace constants on string literals in `appendEscapedCodePoint` * Restore API of Target getLanguage(): protected -> public as it was before appendUnicodeEscapedCodePoint(int codePoint, StringBuilder sb, boolean escape): protected -> private (it's a new helper method, no need for API now) Added comment for appendUnicodeEscapedCodePoint * Introduce caseInsensitive lexer rule option, fixes #3436 * don't ahead of time compile for DART. See 8ca8804#commitcomment-62642779 * Simplify test rig related to timeouts (#3445) * remove all -q quiet mvn options to see output on CI servers. * run the various unit test classes in parallel rather than each individual test method, all except for Swift at the moment: `-Dparallel=classes -DthreadCount=4` * use bigger machine at circleci * No more test groups like parser1, parser2. * simplify Swift like the other tests * fix whitespace issues * use 4.10 not 4.9.4 * improve releasing antlr doc * Add Support For Swift Package Manager (#3132) * Add Swift Package Manager Support * Swift Package Dynamic * 【fix】【test】Fix run process path Co-authored-by: Terence Parr <[email protected]> * use src 11 for tool, but 8 for plugin/runtime (#3450) * use src 11 for tool, but 8 for plugin/runtime/runtime-tests. * use 11 in CI builds * cpp/cmake: Fix library install directories (#3447) This installs DLLs in bin directory instead of lib. * Python local import fixes (#3232) * Fixed pygrun relative import issue * Added name to contributors.txt Co-authored-by: Terence Parr <[email protected]> * Update javadoc to 8 and 11 (#3454) * no need for plugin in runtime, always gen svg from dot for javadoc, gen 1.8 not 1.7 doc for runtime. Gen 11 for tool. * tweak doc for 1.8 runtime. Test rig should gen 1.8 not 1.7 * [Go] Fix (*BitSet).equals (#3455) * set tool version for testing * oops reversion tool version as it's not sync'd with runtime and not time to release yet. * Remove unused variable from generated code (#3459) * [C++] Fix bugs in UnbufferedCharStream (#3420) * Escape bad words during grammar generation (#3451) * Escape reserved words during grammar generation, fixes #1070 (for -> for_ but RULE_for) Deprecate USE_OF_BAD_WORD * Make name and escapedName consistent across tool and codegen classes Fix other pull request notes * Rename NamedActionChunk to SymbolRefChunk * try out windows runners * rename workflow * Update windows.yml Fix cmd line issue * fix maven issue on windows * use jdk 11 * remove arch arg * display Github status for windows * try testing python3 on windows * try new run for python3 windows * try new run for python3 windows (again) * try new run for python3 windows (again2) * try new run for python3 windows (again3) * try new run for python3 windows (again4) * try new run for python3 windows (again5) * try new run for python3 windows * try new run for python3 windows * try new run for python3 windows * ugh i give up. python won't install on github actions. * Update windows.yml try python 3 * Update windows.yml * Update run-tests-python3.cmd * Update run-tests-python3.cmd * Create run-tests-python2.cmd * Update windows.yml * Update run-tests-python2.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Create run-tests-javascript.cmd * Update run-tests-javascript.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Create run-tests-csharp.cmd * Update windows.yml * fix warnings in C# CI * Update windows.yml * Update windows.yml * Create run-tests-dart.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update run-tests-dart.cmd * Update run-tests-dart.cmd * Update run-tests-dart.cmd * Update run-tests-dart.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Create run-tests-go.cmd * Update windows.yml * Update windows.yml * Update windows.yml * GitHub action php (#3474) * Update windows.yml * Create run-tests-php.cmd * Update run-tests-php.cmd * Update run-tests-php.cmd * Update run-tests-php.cmd * Update run-tests-php.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Update run-tests-php.cmd * Update windows.yml * Cleanup ci (#3476) * Delete .appveyor directory * Delete .travis directory * Improve CI concurrency (#3477) * Update windows.yml * Update windows.yml * Update windows.yml * Optimize toArray replace toArray(new T[size]) with toArray(new T[0]) for better performance https://shipilev.net/blog/2016/arrays-wisdom-ancients/#_conclusion * add contributor * resolve conflicts * fix-maven-concurrency (#3479) * fix-maven-concurrency * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update run-tests-python2.cmd * Update run-tests-python3.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update run-tests-php.cmd * Update windows.yml * Update run-tests-dart.cmd * Update run-tests-csharp.cmd * Update run-tests-go.cmd * Update run-tests-java.cmd * Update run-tests-javascript.cmd * Update run-tests-php.cmd * Update run-tests-python2.cmd * Update run-tests-python3.cmd * increase Windows CI concurrency for all targets except Dart * Preserve line separators for input runtime tests data (#3483) * Preserve line separators for input data in runtime tests, fix test data Refactor and improve performance of BaseRuntimeTest * Add LineSeparator (\n, \r\n) tests * Set up .gitattributes for LineSeparator_LF.txt (eol=lf) and LineSeparator_CRLF.txt (eol=crlf) * Restore `\n` for all input in runtime tests, add extra LexerExec tests (LineSeparatorLf, LineSeparatorCrLf) * Add generated LargeLexer test, remove LargeLexer.txt descriptor * tweak name to be GeneratedLexerDescriptors * [JavaScript] Migrate from jest to jasmine * [C++] Fix Windows min/max macro collision * [C++] Update cmake README.md to C++17 * remove unnecessary comparisons. * Add useful function writeSerializedATNIntegerHistogram for writing out information concerning how many of each integer value appear in a serialized ATN. * fix comment indicating what goes in the serialized ATN. * move writeSerializedATNIntegerHistogram out of runtime. * follow guidelines * Fix .interp file parsing test for the Java runtime. Also includes separating the generation of the .interp file from writing it out so that we can use both independently. * Delete files no longer needed. Should have been part of #3520 * [C++] Optimizations and cleanups and const correctness, oh my * [C++] Optimize LL1Analyzer * [C++] Fix missing virtual destructors * Remove not used PROTECTED, PUBLIC, PRIVATE tokens from ANTLRLexer.g * Remove ANTLR 3 stuff from ANTLR grammars, deprecate ANTLR 3 errors * Remove not used imaginary tokens from ANTLRParser.g * Fix misprints in grammars * ATN serialized data: remove shifting by 2, remove UUID; fix #3515 Regenerate XPathLexer files * Disable native runtime tests (see #3521) * Implement Java-specific ATN data optimization (+-2 shift) * [C++] Remove now unused antlrcpp::Guid * pull new branch diagram from master * use dev not master branch for CI github * update doc from master * add back missing author * [C++] Fix const correctness in ATN and DFA * keep getSerializedATNSegmentLimit at max int * Fixes #3259 make InErrorRecoveryMode public for go * Change code gen template to capitalize InErrorRecoveryMode * [C++] Improve multithreaded performance, fix TSAN error, and fix profiling ATN simulator setup bug * Get rid of unnecessary allocations and calculations in SerializedATN * Get rid of excess char escaping in generated files, decrease size of output files Fix creation of excess fragments for Dart, Cpp, PHP runtimes * Swift: fix binary serialization and use instead of JSON * Fix targetCharValueEscape, make them final and static * [C++] Cleanup ATNDeserializer and remove related deprecated methods from ATNSimulator * Fix for #3557 (getting "go test" to work again). * Convert Python2/3 to use int arrays not strings for ATN encodings (#3561) * Convert Python2/3 to use int arrays not strings for ATN encodings. Also make target indicate int vs string. * rename and reverse ATNSerializedAsInts * add override * remove unneeded method * [C++] Drastically improve multi-threaded performance (#3550) Thanks guys. A major advancement. * [C++] Remove duplicate includes and remove unused includes (#3563) * [C++] Lazily deserialize ATN in generated code (#3562) * [Docs] Update Swift Docs (#3458) * Add Swift Package Manager Support * Swift Package Dynamic * 【fix】【test】Fix run process path * [Docs] [Swift] update link, remove expired descriptions Co-authored-by: Terence Parr <[email protected]> * Ascii only ATN serialization (#3566) * go back to generating pure ascii ATN serializations to avoid issues where target compilers might assume ascii vs utf-8. * forgot I had to change php on previous ATN serialization tweak. * change how we escapeChar() per target. * oops; gotta use escapeChar method * rm unneeded case * add @OverRide * use ints not chars for C# (#3567) * use ints not chars for C# * oops. remove 'quotes' * regen from XPathLexer.g4 * simplify ATN with bypass alts mechanism in Java. * Change string to int[] for serialized ATN for C#; removed unneeded `use System` from XPathLexer.g4; regen that grammar. * [C++] Use camel case name in generated lexers and parsers (#3565) * Change string to int array for serialized ATN for JavaScript (#3568) * perf: Add default implementation for Visit in ParseTreeVisitor. (#3569) * perf: Add default implementation for Visit in ParseTreeVisitor. Reference: https://github.com/antlr/antlr4/blob/ad29539cd2e94b2599e0281515f6cbb420d29f38/runtime/Java/src/org/antlr/v4/runtime/tree/AbstractParseTreeVisitor.java#L18 * doc: add contributor * Don't use utf decoding...these are just ints (#3573) * [Go] Cleanup and fix ATN deserialization verification (#3574) * [C++] Force generated static data type name to titlecase (#3572) * Use int array not string for ATN in Swift (#3575) * [C++] Fix generated Lexer static data constructor (#3576) * Use int array not string for ATN in Dart (#3578) * Fix PHP codegen to support int ATN serialization (#3579) * Update listener documentation to satisfy the discussion about improving exception handling: #3162 * tweak * [C++] Remove unused LexerATNSimulator::match_calls (#3570) * [C++] Remove unused LexerATNSimulator::match_calls * Remove match_calls from other targets * [Java] Preserve serialized ATN version 3 compatibility (#3583) * add jcking to the contributors list * Update releasing-antlr.md * [C++] Avoid using dynamic_cast where possible by using hand rolled RTTI (#3584) * Revert "[Java] Preserve serialized ATN version 3 compatibility (#3583)" This reverts commit 01bc811. * [C++] Add ANTLR4CPP_PUBLIC attributes to various symbols (#3588) * Update editorconfig for c++ (#3586) * Make it easier to contribute: Add c++ configuration for .editorconfig. Using the observed style with 2 indentation spaces. Signed-off-by: Henner Zeller <[email protected]> * Add hzeller to contributors.txt Signed-off-by: Henner Zeller <[email protected]> * Fix code style and typing to support PHP 8 (#3582) * [Go] Port locking algorithm from C++ to Go (#3571) * Use linux DCO not our old contributors certificate of origin * [C++] Fix bugs in SemanticContext (#3595) * [Go] Do not export Array2DHashSet which is an implementation detail (#3597) * Revert "Use linux DCO not our old contributors certificate of origin" This reverts commit b0f8551. * Use signed ints for ATN serialization not uint16, except for java (#3591) * refactor serialize so we don't need comments * more cleanup during refactor * store language in serializer obj * A lexer rule token type should never be -1 (EOF). 0 is fragment but then must be > 0. * Go uses int not uint16 for ATN now. java/go/python3 pass * remove checks for 0xFFFF in Go. * C++ uint16_t to int for ATN. * add mac php dir; fix type on accept() for generated code to be mixed. * Add test from @KvanTTT. This PR fixes #3555 for non-Java targets. * cleanup and add big lexer from #3546 * increase mvn mem size to 2G * increase mvn mem size to 8G * turn off the big ATN lexer test as we have memory issues during testing. * Fixes #3592 * Revert "C++ uint16_t to int for ATN." This reverts commit 4d2ebbf. # Conflicts: # runtime/Cpp/runtime/src/atn/ATNSerializer.cpp # runtime/Cpp/runtime/src/tree/xpath/XPathLexer.cpp * C++ uint16_t to int32_t for ATN. * rm unnecessary include file, updating project file. get rid of the 0xFFFF does in the C++ deserialization * rm refs to 0xFFFF in swift * javascript tests were running as Node...added to ignore list. * don't distinguish between 16 and 32 bit char sets in serialization; Python2/3 updated to work with this change. * update C++ to deserialize only 32-bit sets * 0xFFFF -> -1 for C++ target. * get other targets to use 32-bit sets in serialization. tests pass locally. * refactor to reduce code size * add comment * oops. comment out call to writeSerializedATNIntegerHistogram(). I wonder if this is why it ran out of memory during testing? * all but Java, Node, PHP, Go work now for the huge lexer file; I have set them to ignore. note that the swift target takes over a minute to lex it. I've turned off Node but it does not seem to terminate but it could terminate eventually. * all but Java, Node, PHP, Go work now for the huge lexer file; I have set them to ignore. note that the swift target takes over a minute to lex it. I've turned off Node but it does not seem to terminate but it could terminate eventually. * Turn off this big lexer because we get memory errors during continuous integration * Intermediate commit where I have shuffled around all of the -1 flipping and bumping by two. work still needs to be done because the token stream rewriter stuff fails. and I assume the other decoding for human readability testing if doesn't work * convert decode to use int[]; remove dead code. don't use serializeAsChar stuff. more tests pass. * more tests passing. simplify. When copying atn, must run ATN through serializer to set some state flags. * 0xFFFD+ are not valid char * clean up. tests passing now * huge clean up. Got Java working with 32-bit ATNs!Still working on cleanup but I want to run the tests * Cleanup the hack I did earlier; everything still seems to work * Use linux DCO not our old contributors certificate of origin * remove bump-by-2 code * clean up per @KvanTTT. Can't test locally on this box. Will see what CI says. * tweak comment * Revert "Use linux DCO not our old contributors certificate of origin" This reverts commit b0f8551. * see if C++ works in CI for huge ATN * Use linux DCO not our old contributors certificate of origin (#3598) * Use linux DCO not our old contributors certificate of origin * Revert "Use linux DCO not our old contributors certificate of origin" This reverts commit b0f8551. * use linux DCO * use linux DCO * Use linux DCO not our old contributors certificate of origin * update release documentation Signed-off-by: Terence Parr <[email protected]> * Equivalent of #3537 * clean up setup * clean up doc version * [Swift] improvements to equality functions (#3302) * fix default equality * equality cases * optional unwrapping * [Swift] Use for in loops (#3303) * common for in loops * reversed loop * drop first loop * for in with default BitSet * [Go] Fix symbol collision in generated lexers and parsers (#3603) * [C++] Refactor and optimize SemanticContext (#3594) * [C++] Devirtualize hand rolled RTTI for performance (#3609) * [C++] Add T::is for type hierarchy checks and remove some dynamic_cast (#3612) * [C++] Avoid copying statically generated serialized ATNs (#3613) * [C++] Refactor PredictionContext and yet more performance improvements (#3608) * [C++] Cleanup DFA, DFAState, LexerAction, and yet more performance improvements (#3615) * fix dependabot issues * [Swift] use stdlib (single pass) (#3602) * this was added to the stdlib in Swift 5 * &>> is defined as lhs >> (rhs % lhs.bitwidth) * the stdlib has these * reduce loops * use indices * append(contentsOf:) * Array literal init works for sets too! * inline and remove bit query functions * more optional handling (#3605) * [C++] Minor improvements to PredictionContext (#3616) * use php runtime dev branch to test dev * update doc to be more explicit about the interaction between lexer actions and semantic predicates; Fixes #3611. Fixes #3606. Signed-off-by: Terence Parr <[email protected]> * Refactor js runtime in preparation of future improvements * refactor, 1 file per class, use import, use module semantics, use webpack 5, use eslint * all tests pass * simplifications and alignment with standard js idioms * simplifications and alignment with standard js idioms * support reading legacy ATN * support both module and non-module imports * fix failing tests * fix failing tests * No longer necessary too generate sets or single atom transit that are bigger than 16bits. (#3620) * Updated getting started with Cpp documentation. (#3628) Included specific examples of using ANTLR4_TAG and ANTLR4_ZIP_REPOSITORY in the sample CMakeLists file. * [C++] Free ATNConfig lookup set in readonly ATNConfigSet (#3630) * [C++] Implement configurable PredictionContextMergeCache (#3627) * Allow to choose to switch off building tests in C++ (#3624) The new option to cmake ANTLR_BUILD_CPP_TESTS is default on (so the behavior is as before), but it provides a way to switch off if not needed. The C++ tests pull in an external dependency (googletests), which might conflict if ANTLR is used as a subproject in another cmake project. Signed-off-by: Henner Zeller <[email protected]> * Fix NPE for undefined label, fix #2788 * An interval ought to be a value Interval was a pointer to 2 Ints it ought to be just 2 Ints, which is smaller and more semantically correct, with no need for a cache. However, this technically breaks metadata and AnyObject conformance but people shouldn't be relying on those for an Interval. * [C++] Remove more dynamic_cast usage * [C++] Introduce version macros * add license prefix * Prep 4.10 (#3599) * Tweak doc * Swift was referring to hardcoded version * Start version update script. * add files to update * clean up setup * clean up setup * clean up setup * don't need file * don't need file * Fixes #3600. add instructions and associated code necessary to build the xpath lexers. * clean up version nums * php8 * php8 * php8 * php8 * php8 * php8 * php8 * php8 * tweak doc * ok, i give up. php won't bump up too v8 * tweak doc * version number bumped to 4.10 in runtime. * Change the doc for releasing and update to use latest ST 4.3.2 * fix dart version to 4.10.0 * cmd files Cannot use export bash command. * try fixing php ci again * working on deploy Signed-off-by: Terence Parr <[email protected]> * php8 always install. * set js to 4.10.0 not 4.10 * turn off apt update for php circleci * try w/o cimg/php * try setting branch * ok i give up * tweak * update docs for release. * php8 circleci * use 3.5.3 antlr * use 3.5.3-SNAPSHOT antlr * use full 3.5.3 antlr * [Swift] reduce Optionals in APIs (#3621) * ParserRuleContext.children see comment in removeLastChild * TokenStream.getText * Parser._parseListeners this might require changes to the code templates? * ATN {various} * make computeReachSet return empty, not nil * overrides refine optionality * BufferedTokenStream getHiddenTokensTo{Left, Right} return empty not nil * Update Swift.stg * avoid breakage by adding overload of `getText` in extension * tweak to kick off build Signed-off-by: Terence Parr <[email protected]> * try parallelism: 4 circleci * Revert "[Swift] reduce Optionals in APIs (#3621)" This reverts commit b5ccba0. * tweaks to doc * Improve the deploy script and tweak the released doc. * use 4.10 not Snapshot for scripts Co-authored-by: Ivan Kochurkin <[email protected]> Co-authored-by: Alexandr <[email protected]> Co-authored-by: 100mango <[email protected]> Co-authored-by: Biswapriyo Nath <[email protected]> Co-authored-by: Benjamin Spiegel <[email protected]> Co-authored-by: Justin King <[email protected]> Co-authored-by: Eric Vergnaud <[email protected]> Co-authored-by: Harry Chan <[email protected]> Co-authored-by: Ken Domino <[email protected]> Co-authored-by: chenquan <[email protected]> Co-authored-by: Marcos Passos <[email protected]> Co-authored-by: Henner Zeller <[email protected]> Co-authored-by: Dante Broggi <[email protected]> Co-authored-by: chris-miner <[email protected]>
…591) * refactor serialize so we don't need comments * more cleanup during refactor * store language in serializer obj * A lexer rule token type should never be -1 (EOF). 0 is fragment but then must be > 0. * Go uses int not uint16 for ATN now. java/go/python3 pass * remove checks for 0xFFFF in Go. * C++ uint16_t to int for ATN. * add mac php dir; fix type on accept() for generated code to be mixed. * Add test from @KvanTTT. This PR fixes antlr/antlr4#3555 for non-Java targets. * cleanup and add big lexer from antlr/antlr4#3546 * increase mvn mem size to 2G * increase mvn mem size to 8G * turn off the big ATN lexer test as we have memory issues during testing. * Fixes #3592 * Revert "C++ uint16_t to int for ATN." This reverts commit 4d2ebbf5671a5b373d2ca3b5a05464ccb8b71b52. # Conflicts: # runtime/Cpp/runtime/src/atn/ATNSerializer.cpp # runtime/Cpp/runtime/src/tree/xpath/XPathLexer.cpp * C++ uint16_t to int32_t for ATN. * rm unnecessary include file, updating project file. get rid of the 0xFFFF does in the C++ deserialization * rm refs to 0xFFFF in swift * javascript tests were running as Node...added to ignore list. * don't distinguish between 16 and 32 bit char sets in serialization; Python2/3 updated to work with this change. * update C++ to deserialize only 32-bit sets * 0xFFFF -> -1 for C++ target. * get other targets to use 32-bit sets in serialization. tests pass locally. * refactor to reduce code size * add comment * oops. comment out call to writeSerializedATNIntegerHistogram(). I wonder if this is why it ran out of memory during testing? * all but Java, Node, PHP, Go work now for the huge lexer file; I have set them to ignore. note that the swift target takes over a minute to lex it. I've turned off Node but it does not seem to terminate but it could terminate eventually. * all but Java, Node, PHP, Go work now for the huge lexer file; I have set them to ignore. note that the swift target takes over a minute to lex it. I've turned off Node but it does not seem to terminate but it could terminate eventually. * Turn off this big lexer because we get memory errors during continuous integration * Intermediate commit where I have shuffled around all of the -1 flipping and bumping by two. work still needs to be done because the token stream rewriter stuff fails. and I assume the other decoding for human readability testing if doesn't work * convert decode to use int[]; remove dead code. don't use serializeAsChar stuff. more tests pass. * more tests passing. simplify. When copying atn, must run ATN through serializer to set some state flags. * 0xFFFD+ are not valid char * clean up. tests passing now * huge clean up. Got Java working with 32-bit ATNs!Still working on cleanup but I want to run the tests * Cleanup the hack I did earlier; everything still seems to work * Use linux DCO not our old contributors certificate of origin * remove bump-by-2 code * clean up per @KvanTTT. Can't test locally on this box. Will see what CI says. * tweak comment * Revert "Use linux DCO not our old contributors certificate of origin" This reverts commit b0f8551c9a674a0a1e045b9a710800df28e72c10. * see if C++ works in CI for huge ATN
* Get rid of reflection in CodeGenerator * Rename TargetType -> Language * Remove TargetType enum, use String instead as it was before Create CodeGenerator only one time during grammar processing, refactor code * Add default branch to appendEscapedCodePoint for unofficial targets (Kotlin) * Remove getVersion() overrides from Targets since they return the same value * Remove getLanguage() overrides from Targets since common implementation returns correct value * [again] don't use "quiet" option for mvn tests...hard to figure out what's wrong when failed. * normalize targets to 80 char strings for ATN serialization, except Java which needs big strings for efficiency. * Update actions.md fixed a small typo * Rename `CodeGenerator.createCodeGenerator` to `CodeGenerator.create` * Replace constants on string literals in `appendEscapedCodePoint` * Restore API of Target getLanguage(): protected -> public as it was before appendUnicodeEscapedCodePoint(int codePoint, StringBuilder sb, boolean escape): protected -> private (it's a new helper method, no need for API now) Added comment for appendUnicodeEscapedCodePoint * Introduce caseInsensitive lexer rule option, fixes #3436 * don't ahead of time compile for DART. See antlr/antlr4@8ca8804#commitcomment-62642779 * Simplify test rig related to timeouts (#3445) * remove all -q quiet mvn options to see output on CI servers. * run the various unit test classes in parallel rather than each individual test method, all except for Swift at the moment: `-Dparallel=classes -DthreadCount=4` * use bigger machine at circleci * No more test groups like parser1, parser2. * simplify Swift like the other tests * fix whitespace issues * use 4.10 not 4.9.4 * improve releasing antlr doc * Add Support For Swift Package Manager (#3132) * Add Swift Package Manager Support * Swift Package Dynamic * 【fix】【test】Fix run process path Co-authored-by: Terence Parr <[email protected]> * use src 11 for tool, but 8 for plugin/runtime (#3450) * use src 11 for tool, but 8 for plugin/runtime/runtime-tests. * use 11 in CI builds * cpp/cmake: Fix library install directories (#3447) This installs DLLs in bin directory instead of lib. * Python local import fixes (#3232) * Fixed pygrun relative import issue * Added name to contributors.txt Co-authored-by: Terence Parr <[email protected]> * Update javadoc to 8 and 11 (#3454) * no need for plugin in runtime, always gen svg from dot for javadoc, gen 1.8 not 1.7 doc for runtime. Gen 11 for tool. * tweak doc for 1.8 runtime. Test rig should gen 1.8 not 1.7 * [Go] Fix (*BitSet).equals (#3455) * set tool version for testing * oops reversion tool version as it's not sync'd with runtime and not time to release yet. * Remove unused variable from generated code (#3459) * [C++] Fix bugs in UnbufferedCharStream (#3420) * Escape bad words during grammar generation (#3451) * Escape reserved words during grammar generation, fixes #1070 (for -> for_ but RULE_for) Deprecate USE_OF_BAD_WORD * Make name and escapedName consistent across tool and codegen classes Fix other pull request notes * Rename NamedActionChunk to SymbolRefChunk * try out windows runners * rename workflow * Update windows.yml Fix cmd line issue * fix maven issue on windows * use jdk 11 * remove arch arg * display Github status for windows * try testing python3 on windows * try new run for python3 windows * try new run for python3 windows (again) * try new run for python3 windows (again2) * try new run for python3 windows (again3) * try new run for python3 windows (again4) * try new run for python3 windows (again5) * try new run for python3 windows * try new run for python3 windows * try new run for python3 windows * ugh i give up. python won't install on github actions. * Update windows.yml try python 3 * Update windows.yml * Update run-tests-python3.cmd * Update run-tests-python3.cmd * Create run-tests-python2.cmd * Update windows.yml * Update run-tests-python2.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Create run-tests-javascript.cmd * Update run-tests-javascript.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Create run-tests-csharp.cmd * Update windows.yml * fix warnings in C# CI * Update windows.yml * Update windows.yml * Create run-tests-dart.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update run-tests-dart.cmd * Update run-tests-dart.cmd * Update run-tests-dart.cmd * Update run-tests-dart.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Create run-tests-go.cmd * Update windows.yml * Update windows.yml * Update windows.yml * GitHub action php (#3474) * Update windows.yml * Create run-tests-php.cmd * Update run-tests-php.cmd * Update run-tests-php.cmd * Update run-tests-php.cmd * Update run-tests-php.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Update run-tests-php.cmd * Update windows.yml * Cleanup ci (#3476) * Delete .appveyor directory * Delete .travis directory * Improve CI concurrency (#3477) * Update windows.yml * Update windows.yml * Update windows.yml * Optimize toArray replace toArray(new T[size]) with toArray(new T[0]) for better performance https://shipilev.net/blog/2016/arrays-wisdom-ancients/#_conclusion * add contributor * resolve conflicts * fix-maven-concurrency (#3479) * fix-maven-concurrency * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update run-tests-python2.cmd * Update run-tests-python3.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update run-tests-php.cmd * Update windows.yml * Update run-tests-dart.cmd * Update run-tests-csharp.cmd * Update run-tests-go.cmd * Update run-tests-java.cmd * Update run-tests-javascript.cmd * Update run-tests-php.cmd * Update run-tests-python2.cmd * Update run-tests-python3.cmd * increase Windows CI concurrency for all targets except Dart * Preserve line separators for input runtime tests data (#3483) * Preserve line separators for input data in runtime tests, fix test data Refactor and improve performance of BaseRuntimeTest * Add LineSeparator (\n, \r\n) tests * Set up .gitattributes for LineSeparator_LF.txt (eol=lf) and LineSeparator_CRLF.txt (eol=crlf) * Restore `\n` for all input in runtime tests, add extra LexerExec tests (LineSeparatorLf, LineSeparatorCrLf) * Add generated LargeLexer test, remove LargeLexer.txt descriptor * tweak name to be GeneratedLexerDescriptors * [JavaScript] Migrate from jest to jasmine * [C++] Fix Windows min/max macro collision * [C++] Update cmake README.md to C++17 * remove unnecessary comparisons. * Add useful function writeSerializedATNIntegerHistogram for writing out information concerning how many of each integer value appear in a serialized ATN. * fix comment indicating what goes in the serialized ATN. * move writeSerializedATNIntegerHistogram out of runtime. * follow guidelines * Fix .interp file parsing test for the Java runtime. Also includes separating the generation of the .interp file from writing it out so that we can use both independently. * Delete files no longer needed. Should have been part of antlr/antlr4#3520 * [C++] Optimizations and cleanups and const correctness, oh my * [C++] Optimize LL1Analyzer * [C++] Fix missing virtual destructors * Remove not used PROTECTED, PUBLIC, PRIVATE tokens from ANTLRLexer.g * Remove ANTLR 3 stuff from ANTLR grammars, deprecate ANTLR 3 errors * Remove not used imaginary tokens from ANTLRParser.g * Fix misprints in grammars * ATN serialized data: remove shifting by 2, remove UUID; fix #3515 Regenerate XPathLexer files * Disable native runtime tests (see #3521) * Implement Java-specific ATN data optimization (+-2 shift) * [C++] Remove now unused antlrcpp::Guid * pull new branch diagram from master * use dev not master branch for CI github * update doc from master * add back missing author * [C++] Fix const correctness in ATN and DFA * keep getSerializedATNSegmentLimit at max int * Fixes #3259 make InErrorRecoveryMode public for go * Change code gen template to capitalize InErrorRecoveryMode * [C++] Improve multithreaded performance, fix TSAN error, and fix profiling ATN simulator setup bug * Get rid of unnecessary allocations and calculations in SerializedATN * Get rid of excess char escaping in generated files, decrease size of output files Fix creation of excess fragments for Dart, Cpp, PHP runtimes * Swift: fix binary serialization and use instead of JSON * Fix targetCharValueEscape, make them final and static * [C++] Cleanup ATNDeserializer and remove related deprecated methods from ATNSimulator * Fix for #3557 (getting "go test" to work again). * Convert Python2/3 to use int arrays not strings for ATN encodings (#3561) * Convert Python2/3 to use int arrays not strings for ATN encodings. Also make target indicate int vs string. * rename and reverse ATNSerializedAsInts * add override * remove unneeded method * [C++] Drastically improve multi-threaded performance (#3550) Thanks guys. A major advancement. * [C++] Remove duplicate includes and remove unused includes (#3563) * [C++] Lazily deserialize ATN in generated code (#3562) * [Docs] Update Swift Docs (#3458) * Add Swift Package Manager Support * Swift Package Dynamic * 【fix】【test】Fix run process path * [Docs] [Swift] update link, remove expired descriptions Co-authored-by: Terence Parr <[email protected]> * Ascii only ATN serialization (#3566) * go back to generating pure ascii ATN serializations to avoid issues where target compilers might assume ascii vs utf-8. * forgot I had to change php on previous ATN serialization tweak. * change how we escapeChar() per target. * oops; gotta use escapeChar method * rm unneeded case * add @OverRide * use ints not chars for C# (#3567) * use ints not chars for C# * oops. remove 'quotes' * regen from XPathLexer.g4 * simplify ATN with bypass alts mechanism in Java. * Change string to int[] for serialized ATN for C#; removed unneeded `use System` from XPathLexer.g4; regen that grammar. * [C++] Use camel case name in generated lexers and parsers (#3565) * Change string to int array for serialized ATN for JavaScript (#3568) * perf: Add default implementation for Visit in ParseTreeVisitor. (#3569) * perf: Add default implementation for Visit in ParseTreeVisitor. Reference: https://github.com/antlr/antlr4/blob/ad29539cd2e94b2599e0281515f6cbb420d29f38/runtime/Java/src/org/antlr/v4/runtime/tree/AbstractParseTreeVisitor.java#L18 * doc: add contributor * Don't use utf decoding...these are just ints (#3573) * [Go] Cleanup and fix ATN deserialization verification (#3574) * [C++] Force generated static data type name to titlecase (#3572) * Use int array not string for ATN in Swift (#3575) * [C++] Fix generated Lexer static data constructor (#3576) * Use int array not string for ATN in Dart (#3578) * Fix PHP codegen to support int ATN serialization (#3579) * Update listener documentation to satisfy the discussion about improving exception handling: antlr/antlr4#3162 * tweak * [C++] Remove unused LexerATNSimulator::match_calls (#3570) * [C++] Remove unused LexerATNSimulator::match_calls * Remove match_calls from other targets * [Java] Preserve serialized ATN version 3 compatibility (#3583) * add jcking to the contributors list * Update releasing-antlr.md * [C++] Avoid using dynamic_cast where possible by using hand rolled RTTI (#3584) * Revert "[Java] Preserve serialized ATN version 3 compatibility (#3583)" This reverts commit 01bc811557adad0de63e8db85b78ca8885480378. * [C++] Add ANTLR4CPP_PUBLIC attributes to various symbols (#3588) * Update editorconfig for c++ (#3586) * Make it easier to contribute: Add c++ configuration for .editorconfig. Using the observed style with 2 indentation spaces. Signed-off-by: Henner Zeller <[email protected]> * Add hzeller to contributors.txt Signed-off-by: Henner Zeller <[email protected]> * Fix code style and typing to support PHP 8 (#3582) * [Go] Port locking algorithm from C++ to Go (#3571) * Use linux DCO not our old contributors certificate of origin * [C++] Fix bugs in SemanticContext (#3595) * [Go] Do not export Array2DHashSet which is an implementation detail (#3597) * Revert "Use linux DCO not our old contributors certificate of origin" This reverts commit b0f8551c9a674a0a1e045b9a710800df28e72c10. * Use signed ints for ATN serialization not uint16, except for java (#3591) * refactor serialize so we don't need comments * more cleanup during refactor * store language in serializer obj * A lexer rule token type should never be -1 (EOF). 0 is fragment but then must be > 0. * Go uses int not uint16 for ATN now. java/go/python3 pass * remove checks for 0xFFFF in Go. * C++ uint16_t to int for ATN. * add mac php dir; fix type on accept() for generated code to be mixed. * Add test from @KvanTTT. This PR fixes antlr/antlr4#3555 for non-Java targets. * cleanup and add big lexer from antlr/antlr4#3546 * increase mvn mem size to 2G * increase mvn mem size to 8G * turn off the big ATN lexer test as we have memory issues during testing. * Fixes #3592 * Revert "C++ uint16_t to int for ATN." This reverts commit 4d2ebbf5671a5b373d2ca3b5a05464ccb8b71b52. # Conflicts: # runtime/Cpp/runtime/src/atn/ATNSerializer.cpp # runtime/Cpp/runtime/src/tree/xpath/XPathLexer.cpp * C++ uint16_t to int32_t for ATN. * rm unnecessary include file, updating project file. get rid of the 0xFFFF does in the C++ deserialization * rm refs to 0xFFFF in swift * javascript tests were running as Node...added to ignore list. * don't distinguish between 16 and 32 bit char sets in serialization; Python2/3 updated to work with this change. * update C++ to deserialize only 32-bit sets * 0xFFFF -> -1 for C++ target. * get other targets to use 32-bit sets in serialization. tests pass locally. * refactor to reduce code size * add comment * oops. comment out call to writeSerializedATNIntegerHistogram(). I wonder if this is why it ran out of memory during testing? * all but Java, Node, PHP, Go work now for the huge lexer file; I have set them to ignore. note that the swift target takes over a minute to lex it. I've turned off Node but it does not seem to terminate but it could terminate eventually. * all but Java, Node, PHP, Go work now for the huge lexer file; I have set them to ignore. note that the swift target takes over a minute to lex it. I've turned off Node but it does not seem to terminate but it could terminate eventually. * Turn off this big lexer because we get memory errors during continuous integration * Intermediate commit where I have shuffled around all of the -1 flipping and bumping by two. work still needs to be done because the token stream rewriter stuff fails. and I assume the other decoding for human readability testing if doesn't work * convert decode to use int[]; remove dead code. don't use serializeAsChar stuff. more tests pass. * more tests passing. simplify. When copying atn, must run ATN through serializer to set some state flags. * 0xFFFD+ are not valid char * clean up. tests passing now * huge clean up. Got Java working with 32-bit ATNs!Still working on cleanup but I want to run the tests * Cleanup the hack I did earlier; everything still seems to work * Use linux DCO not our old contributors certificate of origin * remove bump-by-2 code * clean up per @KvanTTT. Can't test locally on this box. Will see what CI says. * tweak comment * Revert "Use linux DCO not our old contributors certificate of origin" This reverts commit b0f8551c9a674a0a1e045b9a710800df28e72c10. * see if C++ works in CI for huge ATN * Use linux DCO not our old contributors certificate of origin (#3598) * Use linux DCO not our old contributors certificate of origin * Revert "Use linux DCO not our old contributors certificate of origin" This reverts commit b0f8551c9a674a0a1e045b9a710800df28e72c10. * use linux DCO * use linux DCO * Use linux DCO not our old contributors certificate of origin * update release documentation Signed-off-by: Terence Parr <[email protected]> * Equivalent of antlr/antlr4#3537 * clean up setup * clean up doc version * [Swift] improvements to equality functions (#3302) * fix default equality * equality cases * optional unwrapping * [Swift] Use for in loops (#3303) * common for in loops * reversed loop * drop first loop * for in with default BitSet * [Go] Fix symbol collision in generated lexers and parsers (#3603) * [C++] Refactor and optimize SemanticContext (#3594) * [C++] Devirtualize hand rolled RTTI for performance (#3609) * [C++] Add T::is for type hierarchy checks and remove some dynamic_cast (#3612) * [C++] Avoid copying statically generated serialized ATNs (#3613) * [C++] Refactor PredictionContext and yet more performance improvements (#3608) * [C++] Cleanup DFA, DFAState, LexerAction, and yet more performance improvements (#3615) * fix dependabot issues * [Swift] use stdlib (single pass) (#3602) * this was added to the stdlib in Swift 5 * &>> is defined as lhs >> (rhs % lhs.bitwidth) * the stdlib has these * reduce loops * use indices * append(contentsOf:) * Array literal init works for sets too! * inline and remove bit query functions * more optional handling (#3605) * [C++] Minor improvements to PredictionContext (#3616) * use php runtime dev branch to test dev * update doc to be more explicit about the interaction between lexer actions and semantic predicates; Fixes #3611. Fixes #3606. Signed-off-by: Terence Parr <[email protected]> * Refactor js runtime in preparation of future improvements * refactor, 1 file per class, use import, use module semantics, use webpack 5, use eslint * all tests pass * simplifications and alignment with standard js idioms * simplifications and alignment with standard js idioms * support reading legacy ATN * support both module and non-module imports * fix failing tests * fix failing tests * No longer necessary too generate sets or single atom transit that are bigger than 16bits. (#3620) * Updated getting started with Cpp documentation. (#3628) Included specific examples of using ANTLR4_TAG and ANTLR4_ZIP_REPOSITORY in the sample CMakeLists file. * [C++] Free ATNConfig lookup set in readonly ATNConfigSet (#3630) * [C++] Implement configurable PredictionContextMergeCache (#3627) * Allow to choose to switch off building tests in C++ (#3624) The new option to cmake ANTLR_BUILD_CPP_TESTS is default on (so the behavior is as before), but it provides a way to switch off if not needed. The C++ tests pull in an external dependency (googletests), which might conflict if ANTLR is used as a subproject in another cmake project. Signed-off-by: Henner Zeller <[email protected]> * Fix NPE for undefined label, fix antlr#2788 * An interval ought to be a value Interval was a pointer to 2 Ints it ought to be just 2 Ints, which is smaller and more semantically correct, with no need for a cache. However, this technically breaks metadata and AnyObject conformance but people shouldn't be relying on those for an Interval. * [C++] Remove more dynamic_cast usage * [C++] Introduce version macros * add license prefix * Prep 4.10 (#3599) * Tweak doc * Swift was referring to hardcoded version * Start version update script. * add files to update * clean up setup * clean up setup * clean up setup * don't need file * don't need file * Fixes #3600. add instructions and associated code necessary to build the xpath lexers. * clean up version nums * php8 * php8 * php8 * php8 * php8 * php8 * php8 * php8 * tweak doc * ok, i give up. php won't bump up too v8 * tweak doc * version number bumped to 4.10 in runtime. * Change the doc for releasing and update to use latest ST 4.3.2 * fix dart version to 4.10.0 * cmd files Cannot use export bash command. * try fixing php ci again * working on deploy Signed-off-by: Terence Parr <[email protected]> * php8 always install. * set js to 4.10.0 not 4.10 * turn off apt update for php circleci * try w/o cimg/php * try setting branch * ok i give up * tweak * update docs for release. * php8 circleci * use 3.5.3 antlr * use 3.5.3-SNAPSHOT antlr * use full 3.5.3 antlr * [Swift] reduce Optionals in APIs (#3621) * ParserRuleContext.children see comment in removeLastChild * TokenStream.getText * Parser._parseListeners this might require changes to the code templates? * ATN {various} * make computeReachSet return empty, not nil * overrides refine optionality * BufferedTokenStream getHiddenTokensTo{Left, Right} return empty not nil * Update Swift.stg * avoid breakage by adding overload of `getText` in extension * tweak to kick off build Signed-off-by: Terence Parr <[email protected]> * try parallelism: 4 circleci * Revert "[Swift] reduce Optionals in APIs (#3621)" This reverts commit b5ccba03c8fa9108975bf13044ce10caed6f579c. * tweaks to doc * Improve the deploy script and tweak the released doc. * use 4.10 not Snapshot for scripts Co-authored-by: Ivan Kochurkin <[email protected]> Co-authored-by: Alexandr <[email protected]> Co-authored-by: 100mango <[email protected]> Co-authored-by: Biswapriyo Nath <[email protected]> Co-authored-by: Benjamin Spiegel <[email protected]> Co-authored-by: Justin King <[email protected]> Co-authored-by: Eric Vergnaud <[email protected]> Co-authored-by: Harry Chan <[email protected]> Co-authored-by: Ken Domino <[email protected]> Co-authored-by: chenquan <[email protected]> Co-authored-by: Marcos Passos <[email protected]> Co-authored-by: Henner Zeller <[email protected]> Co-authored-by: Dante Broggi <[email protected]> Co-authored-by: chris-miner <[email protected]>
I'm returning to the increasing of ATN states size and I've also simplified serialization (related to Unicode encoding).
ATN states size can be > 65535, up to 2^31-1:
Take a look at C# code: it contains small changes because it already has
Read
andWrite
methods (as well as other runtimes except for Java).If Java and C# are ok, I'll complete other runtimes.
Writing, Reading methods have comprehensive tests:
testATNDataWriterReaderCompact
,testATNDataWriterReaderRaw
.