Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complexity metric / checks must be performed on original source code #1547

Closed
ivangalkin opened this issue Aug 27, 2018 · 7 comments
Closed
Assignees
Milestone

Comments

@ivangalkin
Copy link
Contributor

  • complexity metrics "Cognitive Complexity", "Cyclomatic Complexity" as well as derivate from cyclomatic complexity metrics "Complex Function (NR)", "Complex Functions (%)", "Complex Functions Lines of Code", "Complex Functions Lines of Code" and...
  • complexity checks "FunctionCognitiveComplexityCheck", "FunctionComplexity", "ClassComplexity", "FileComplexity"...

...are calculated/performed on preprocessed code. The preprocessor falsifies the code complexity

  • expansion of macros makes the code more complex
  • evaluation of preprocessor directives like #if #ifdef, #else etc makes the code less complex

Example of how macro expansion affects the check FunctionComplexity: EXPECT_EQ macro adds +3 extra cyclomatic complexity points because it's expanded as switch-case + if.

complexity_vs_macro

Possible solutions:

  • keep information, which AST parts were added while preprocessing (marking of some AST nodes as generated, helps only by macro expansions)
  • keep both original and preprocessor code for each compile unit (create two distinct ASTs)
  • perform some set of visitors before preprocessor and some set after it (there might be more visitors spoiled by preprocessor, e.g. the syntax-highlighting one)

Possible problems:

  • creation of AST for non-preprocessed (original) source code might be tricky: While macro expansion can be mapped to the function-invocation-AST (?), not-evaluation of preprocessor directives can lead to unparseable code (?)
@guwirth
Copy link
Collaborator

guwirth commented Aug 27, 2018

@ivangalkin thx for investigations. Meanwhile I also tend to evaluate only "what you see later on in the source code view" (means without preprocessing the code).

perform some set of visitors before preprocessor and some set after it (there might be more visitors spoiled by preprocessor, e.g. the syntax-highlighting one)

@ALL: Where do you think do we still need preprocessed code?

@jmecosta
Copy link
Member

jmecosta commented Aug 27, 2018 via email

ivangalkin added a commit to ivangalkin/sonar-cxx that referenced this issue Sep 5, 2018
* partial solution for SonarOpenCommunity#1547

MOTIVATION:

* SonarOpenCommunity#1494 made visible the fact, that preprocessor affects the complexity
metric. E.g. if function uses macros, the underlying (hidden) complexity
of theses macros increases the complexity of the function. This is wrong
because of many reasons:

1. Cognitive complexity:
Macros do not increase the cognitive complexity of the source code. To
the reader they are similar to the function invocations.
Example ASSERT_XX/EXPECT_XX macros of GTest.

2. Cyclomatic complexity:
Similar logic might be applied to the cyclomatic complexity: macro expansions
are similar to the function calls (which become inlined). In this case
the cyclomatic complexity of the caller should not be increased.
[Cyclomatic] complexity of the callee must be calculated separately.

3. Preprocessor doesn't distinguish between the macros, which
were declared inside of the analyzed project or were included from the
external header. Obviously the external source code (e.g. again very
complex GTest macros) should not affect the complexity of someone's
custom SonarQube project.

SOLUTION:

In general it's impossible to build a valid C/C++ AST without
performing of preprocessor step. The original source code might
be syntactically incorrect (e.g. if macros are used to "invent" some
domain specific language out of C/C++). All metrics of sonar-cxx
are however AST-based. (This is a very questionably decision
e.g. because #if/#ifdef etc might affect the resulting complexity or [NC]LOC
in dramatic way). So we cannot skip preprocessor step entirely and/or
perform metrics calculation/checks on the non-preprocessed code.

Sonar SSLR has a possibility to mark some tokens as generated.
This property (`Token::isGeneratedCode()`) is for example used
in order to ignore generated while
* [NC]LOC counting (see `AbstractLineLengthCheck<>`)
* CDP analysis (see `CxxCpdVisitor`) or
* Highlighting (see `CxxHighlighterVisitor`)

This patch ensures, that
* all `AstNode`s, which were created while macro expansion are marked as
generated
* all complexity related visitors ignore generated `AstNode`s

RESULT:

PRO:
1. The complexity of "callers" is not affected by macro anymore.
   This fact makes the complexity check look logical again.

CON:
1. the complexity of "callees" is not calculated at all. This
is correct in case of external macros, but wrong w.r.t. to the
project intern macros. Possible solution: preprocessor must try
to extract the function-like macro declaration into some special AST-subtree.
Afterwards complexity metrics/check will be able to calculate/check
macros as separate language construct.

By the way: it must be analyzed if and how macros affect other metrics.
For example LOC etc.

2. This patch might change the complexity related metrics. Depending
on how many macros were used in the analyzed project, the reduction
can be radical.
@ivangalkin
Copy link
Contributor Author

@guwirth @guwirth I came to conclusion, that current visitors (all metric calculators and checks are AST visitors) cannot be applied to the original (non-preprocessed code). Original code might be non-parseable at all (there are far too many tricks one can make with macros and complier directives).

I suggest to mark macro expansion nodes as generated and ignore them by all complexity-related visitors. Please review #1552

@guwirth
Copy link
Collaborator

guwirth commented Sep 5, 2018

@ivangalkin if you test this in your own code base: how big is the difference in 'Measures/Complexity'?

@ivangalkin
Copy link
Contributor Author

@guwirth we have a very GTest intensive code and the complexity decrease is about 30%
Previously each test case with 3 assertions was considered as too complex.

METRIC BEFORE AFTER
Cyclomatic Complexity 154 538 101 610
Cognitive Complexity 106 200 55 515
Complex Functions 1 484 614
Complex Functions (%) 2.4% 1.0%
Complex Functions Lines of Code 71 494 54 767
Complex Functions Lines of Code (%) 23.0% 17.7%

@ivangalkin
Copy link
Contributor Author

Hi All,

update w.r.t. this issue:

MACRO

COMPILER DIRECTIVES

  • Previously I saw an issue in the metric calculation for the code like that:
#ifdef DEBUG
    <150 NCLOCs>
    <+50 complexity points>
#else
    <250 NCLOCs>
    <+150 complexity points>
#endif

Depending the -D, the NCLOC/complexity value was 150/50 or 250/150 respectively. This is quite wrong in my opinion, because it again measures the NCLOC/complexity of the AST rather than the metric of the source-code.

Meanwhile I tend to change my mind:

  1. metric calculation of such branches is impossible in general, since they might contain an arbitrary text. Our analyzers are AST-based. Parsing a valid AST on non-preprocessed code is very tricky (if possible at all).
  2. users are able to pass the -Ds as analysis parameter. So one can create separated projects if metrics for some specific compile variants are required. E.g. one for the debug and one for release code.

What do you think? Can we close the issue?

@guwirth
Copy link
Collaborator

guwirth commented Oct 6, 2018

Close it with #1552

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants