-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] CPD: Add total number of tokens to XML reports #4021
Conversation
Generated by 🚫 Danger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I'm not sure, whether it's a good idea to add this additional method to the existing interface CPDRenderer.... wdyt @oowekyala ?
Documentation-wise we should also update the example on https://pmd.github.io/latest/pmd_userdocs_cpd_report_formats.html#xml
Maybe it's even time to add a formal documentation of the format or even a schema?
Add these additional elements to the XML format shouldn't be a problem. E.g. the maven-pmd-plugin would just ignore this additional elements (I hope...).
Could you also describe the use case a bit? What is this information used for? The number of tokens is an indication of how big the file is. Or do you maybe want to calculate later on that e.g. file x contains 20% duplicated code?
The XML format is now rather flat. Should we maybe group the duplication inside a <file>
element? That of course would render the whole format incompatible, but we could introduce complete new XML renderer with a different format...
pmd-core/src/main/java/net/sourceforge/pmd/cpd/SimpleRenderer.java
Outdated
Show resolved
Hide resolved
pmd-core/src/main/java/net/sourceforge/pmd/cpd/renderer/CPDRenderer.java
Outdated
Show resolved
Hide resolved
pmd-core/src/main/java/net/sourceforge/pmd/cpd/renderer/CPDRenderer.java
Outdated
Show resolved
Hide resolved
pmd-core/src/main/java/net/sourceforge/pmd/cpd/CSVRenderer.java
Outdated
Show resolved
Hide resolved
pmd-core/src/test/java/net/sourceforge/pmd/cpd/CSVRendererTest.java
Outdated
Show resolved
Hide resolved
pmd-core/src/test/java/net/sourceforge/pmd/cpd/XMLRendererTest.java
Outdated
Show resolved
Hide resolved
pmd-core/src/test/java/net/sourceforge/pmd/cpd/CPDCommandLineInterfaceTest.java
Show resolved
Hide resolved
Thank you for the comments! On the use case: you seem to have guessed correctly already :) We're using data from CPD to calculate a percentage of duplicated code in a file/project. Until now, we've done this by calculating the number of lines that contain a duplication and we divide it by the total number of lines. This works reasonably well, but there are obvious examples where the results are unexpected (for example, removing all line breaks from a file means it can only have 0% or 100% duplication). If we can use the number of tokens instead, we'll have a more precise metric for the relative amount of duplication within a file. On the XML format: my thinking was that adding additional elements would preserve backwards compatibility reasonably well (assuming users don't blindly consume all elements). Restructuring the format would require users to change the way the file is parsed. I think that changing the structure does not result in a worthwhile tradeoff in this case, but I'm not against changing it if you disagree. |
Thanks for the contribution @maikelsteneker
This is indeed my main gripe with the code change. I think adding a method parameter is not the right way, if we want to extend the info available to renderers in the future, it would be more flexible to pack all parameters in an object (eg a "CpdReport"). I would rather introduce an interface CpdReportRenderer whose method takes a CpdReport and a Writer, and deprecate the old interface. This way we can add members to CpdReport later without breaking the renderer interface I wouldn't add an AbstractCpdRenderer either. The renderer interfaces should remain simple and an abstract class hints otherwise. We can use an adapter pattern to convert from the old interface to the new one. |
e7c08e6
to
9fc8a56
Compare
@oowekyala Thank you for the feedback! I think you raised some very good points, so I have reworked my solution to resemble your proposal more closely. That is, I have introduced a new interface, the XML renderer is the only one that is changed as a result and the remaining renderers use an adapter to remain functioning. I guess that in the PMD 7.x branch, all renderers should implement just the latest interface, with the older ones being removed. @adangel @oowekyala Could you take another look to see if this new implementation is up to your standards? Thanks in advance for your efforts! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing these changes, just a couple of minor comments
pmd-core/src/main/java/net/sourceforge/pmd/cpd/CPDConfiguration.java
Outdated
Show resolved
Hide resolved
pmd-core/src/main/java/net/sourceforge/pmd/cpd/CPDConfiguration.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Clément Fournier <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes!
From my point of view the only thing we should decide about is: Should we return a List<Matches>
instead of Iterator<Matches>
in the new CPDReport?
My other comments are some minor things that can also be changed later.
pmd-core/src/test/java/net/sourceforge/pmd/cpd/XMLRendererTest.java
Outdated
Show resolved
Hide resolved
pmd-core/src/test/java/net/sourceforge/pmd/cpd/CPDCommandLineInterfaceTest.java
Show resolved
Hide resolved
pmd-core/src/main/java/net/sourceforge/pmd/cpd/XMLRenderer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updated PR.
I'll change the things I've mentioned and merge this afterwards.
Describe the PR
This PR extends the XML output of CPD. This output produces
<duplication>
elements for each duplication that is encountered. In addition to this, it now also produces<file>
elements for each file with the number of tokens in the file added to this format. For example:Other output formats currently don't contain this information, but could easily be extended to add it if needed.
Ready?
./mvnw clean verify
passes (checked automatically by github actions)