-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Eclipse CDT parser, create SymbolTable #1479
Conversation
Hi @ivangalkin,
I like the last comment of "The art of parsing" discussion:
Unfortunatly, we did not spend effort selecting another parser which solves the current issues. |
Hi @ivangalkin, Thanks for your proposal. Think there are several points we have to check:
Regards, |
... and just another point: the parser must have error recovery: https://github.com/SonarOpenCommunity/sonar-cxx/wiki/Error-Recovery |
thank you for your feedback! Although I see a big advantage of using of Eclipse CDT (or other mature 3rd party C/C++ parser) over the SSLR-based solution, I don't suggest to replace the current parsing completely. SSLR seems to work well for the (advanced) tokenization, which is enough for the highlighting and the simple CxxSquidSensor. However the creation of the consistent symbol table requires much more than this. In fact it requires the complete syntactic and semantic analysis (e.g. type inference, ambiguity/overload resolutions, templates processor and much more). Even GCC/clang face frontend bugs from time to time, so unfortunately I just don't see any chance to solve US #1401 properly without an external parser. So for now, please let us see the scope of Eclipse CDT parser as a tool for symbol table creation only. Regarding Eclipse CDT (or more precise org.eclipse.cdt.core)
I haven't found anything about other language extensions, but I think, that even the combination of GNU C++14 and the robust parser covers the biggest part of the symbol table. I have no information about the performance/memory consumption yet. The usage of the API for the extraction of symbol information is easy (please see my patch) and it works really good. Best regards, |
As far as I understand EPL and LGPL seems to be compatible. As long we are using the EPL code/component only without changing it we can put the result under LGPL again.
See also the advantage. But I don't like to have two parsers in the plug-in. In case we change the parser we should switch completely.
@Bertk and me are using mainly Visual Studio and we both have legacy code with Microsoft extensions: C++/CLI, and Attributed ATL. So question is how to handle this?
Would be another advantage. Current Preprocessor is deprecated.
That's important and would be a no-go if not. @Bertk / @jmecosta any opinion to this point? Regards |
Maybe we could merge the eclipse CDT parser as part of symbol table and try to rewrite the Squid visitors in terms of a new AST? I literally went through all existing visitors and there is no complex algorithms IMHO. So I am not sure if its an argument for the introduction of a new parser or against it. But I am still 100% sure, that building of symbol table on basis of the current AST is a hell of a lot of work.
I believe there should be no big problem. Most probably it works already pretty well. Also extending of keywords, built-ins etc. is comparable easy. I've seen even examples of extending of a syntactic constructs (e.g. gcc's labels extension).
From the source code perspective precompiled header files are nothing special. So if all include paths are specified correctly, such [precompiled] header will be preprocessed and parsed correctly. I don't believe, that eclipse CDT can cache and "inline" the ASTs of [precompiled] headers. |
Replacing the parser will change the behavior of the plugin and I am not sure how fast the CDT will be adapted to a new C++ standard e.g. C++20 e.g. TS modules. I have more confidence with the Clang parser but I do not know CDT. |
my opinion is always to reuse a parser that will give us the most benefit in terms of community support, maintainability, features. Clang would always be the best option for this. And we can even reuse most of the checks clang already provides. This would be inline with C# plugin that uses roslyn, and the java part is simple consuming the metrics. CDT was in used before the SSLR, and there was a decision to deprecated in favour of the SSLR back in the day. Then i think it made sense to do it. Now im not sure, having both SSLR and CDT to maintain i think its not the way to go since it will make maintenance of the plugin a harder. Also the size of the plugin has jumped from |
Hi @Bertk @jmecosta and @guwirth,
What's about the clang's C++ frontend: its ideology is to follow the C++ standard as strict as possible. It it good if you want to check the portability of your code or maybe receive more/better warnings. However clang parser (as far as I know it) is definitely not a superset of all possible dialects. It can be configured to accept the GCC-conform code (I mean GCC's sloppy interpretation of standard w.r.t. templates etc.) , but I am wondering about the support of C++/CLI, Attributed ATL, CUDA etc. Does clang's parser support something like error recovery? The design idea of Eclipse CDT is the opposite one: produce the most generic parser. It parses the code in IDE while coding process and is very robust.
Having a proper AST will bring advantage to the SquidSensor or maybe other visitors. But, to be honest, the most of them are written like apply-regexp-to-the-each-source-line or apply-regexp-to-the-each-comment-line, so even that benefit is arguable.
|
one more question w.r.t. clang
|
C# side just dumps everything processed into the java side using protocol
buffers. So all checks, ast is handled from there and nevers reaches the
java side. I think you can feed all that data to sonar as you do for any
metrics.
That simplifies a lot the plugin since its just wriiting metrics. And all
ast stuff is written natively in c# and roslyn
…On Thu, 24 May 2018, 12:54 ivangalkin, ***@***.***> wrote:
one more question w.r.t. clang
1. Technical implementation: I searched for the way to get AST from
clang to java and haven't find anything satisfying. JNI bridges (if exist)
require the installation of clang on the worker's machine. This is a
disruptive change and could be a requirement, which is hard to implement.
Native front-end port for clang (https://github.com/java-port/clank)
seems to exist, but it isn't official and therefore of unknown quality and
support.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1479 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA_jyIl-g5A7pTqvRtgomBFNxPqFG9Diks5t1oNygaJpZM4UF8Sx>
.
|
@jmecosta extracting checks and highlighting to C++ component means almost a re-implementation of the whole plugin from scratch IMHO. The only benefits you'll have are...
... but it comes on costs of
I am not really sure, clang is worth it. |
@ivangalkin yes, i agree that is a massive work to port all those to clang. just sharing all available options, and the pros and cons of each. |
This would be a clear point preferring Eclipse CDT.
There are still some static code analysis checks but I anyway like to remove it. But the other reason why we need the parser is to create metrics. At the moment there are also some users using XPath on top of the AST.
Think we should stay within one technology. SQ is Java. @ivangalkin is there a possibility to play around wit the CDT like our sslr-cxx-toolkit? |
@guwirth the existing metrics are different;
I haven't found any convenient way to play with the CDT/parser/AST etc yet. The code is open sourced, so it's easy to learn how eclipse C/C++ plugin is implemented. You can see the resulting AST in the Java debugger and finally one can always install EclipseCDT and have a good overview about its functionalities. |
@Bertk that's your favourite feature :-) How should we continue:
|
I do not like using 2 parsers for the AST in a plug-in and losing the sslr-cxx-toolkit is a major drawback. |
Just for the sake of accuracy: CDT adds 10 MB (25 MB vs 15 MB). This might sound gigantic if you measure the diff in percentage (+66%), but for the local [corporate] network the transport of 10 MB takes less than a second. It's not a major argument IMHO w.r.t. the non-Eclipse CDT solution: @Bertk, do you have any idea about the possible one? I repeat myself, but I am certain, that in the current parser is a bad basis for implementation of a symbol table. Maybe in the middle term it even should be replaced completely. E.g. because
So if we claim to implement a correct symbol table, we need to rework / to replace the current parser and the AST IMHO. As alternative we could implement something similar to the highlighting of basic online editors (see e.g. http://ideone.com). The highlighting doesn't use any semantic analysis at all. It uses string matching only, and this minimalistic claim is obvious. I believe such implementation is better than to have an erroneous self-written semantic analysis. And it's better than nothing. |
I've updated the recent version of the patch and it works even better now (macro expansions are excluded, which prevents indexing non-existing symbols). Please give it a try on your SonarQube installations. |
One more update: use the newest |
@guwirth @ivangalkin This PR is not a proper solution for SymbolTable because a second independent AST parser only for SymbolTable is introduced and the existing AST parser is still active. |
|
* Use Eclipse CDT parser (incl. preprocessor) in order to create an AST * Use it to retrieve the invormation about all symbols (declarations and references) of the compilation unit patchset log: * rebase and conflict resolving * use preprocessor (include paths used, defines are not) * filter macro expansions, since symbols are not placed in original source code * update org.eclipse.cdt.core to 6.5.0 (photon), which claims to support c++14 & c++17 features
605272a
to
6a98d3e
Compare
@ALL: This discussion is an older one. Still not sure in which direction we should go. Found article below where they state at the end, that CDT also wanna use Clang. Does someone know the current state? https://www.eclipse.org/community/eclipse_newsletter/2017/april/article3.php |
Hi @guwirth, @Bertk
Hi All!
this patch implements the symbol table for C++ files (US #1401). It uses free and open source Eclipse CDT parser (https://github.com/eclipse/cdt) in order to preprocess the source file and create an AST. I believe, that the code of Eclipse CDT project in general and the design of AST in particular have very high quality. I think it's a good idea to outsource the parser to the 3rd party project, because it's a) a very complex code b) not a principal purpose of sonar-cxx c) subject of constant changes (C++11, C++14, 17 etc)
The parser is used for SymbolTable only and doesn't break any other visitors of sonar-cxx.
Regarding to the code itself:
a) I tested the SymbolTable on the test from Bertk@78643d3#diff-399f59d3cf1cdb0fa3997373554e742f. I've even added templates in order to make the parsing more complex. The testing is tricky, so as a first step I just added a kind of a dump (see
symbols.cc
,symbols_declarations.cc
andsymbols_references.cc
). In my opinion, the completeness is 99%.b) I tested the SymbolTable on the productive and very template-heavy code. It works very satisfying.
the pseudo-code works as following...
Please take a look at the code. There is not much of logic and it isn't complex. I tried to avoid too much SonarQube dependencies in the parser wrapper, so I had to introduce some glue-code w.r.t. to the TextPoint and TextRange.
I'm looking forward for your feedback.
This change is