-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement new codemod API #213
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #213 +/- ##
==========================================
- Coverage 96.46% 96.43% -0.04%
==========================================
Files 91 95 +4
Lines 4277 4065 -212
==========================================
- Hits 4126 3920 -206
+ Misses 151 145 -6
|
6d5996c
to
712292d
Compare
from codemodder.codemods.utils_mixin import NameResolutionMixin | ||
|
||
|
||
class RefactorNewApi(SimpleCodemod, NameResolutionMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this clearly took some effort so I can feel the reason to commit it, but do you think we will put it to use later on? maybe add a docstring explaining this was used during this transition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a docstring. It is one-time use but it's also a nice demonstration of the capabilities of codemods. I intend to blog about it later. It also genuinely saved me some time with refactoring ~35 of the codemods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor non-blocking questions. I'm super impressed how easy this looks on the reviewer side, awesome job Dan!
|
||
assert output_code == dedent(expected) | ||
|
||
def run_and_assert_filepath( # pylint: disable=too-many-arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could just add this disable to the top of the file instead of for every method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I actually thought about just disabling it globally since it's not my favorite pylint rule. Any thoughts?
@@ -37,4 +36,3 @@ def test_combine(self, tmpdir, func): | |||
) | |||
def test_no_change(self, tmpdir, code): | |||
self.run_and_assert(tmpdir, code, code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing num_changes=0 being passed so just want to make sure that removal of assert len(self.file_context.codemod_changes) == 0
is indeed covered in new tests refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's covered. The case where expected == result
explicitly checks for that condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are good ideas that came with the new API, but I don't think the current implementation reflects them.
tree = source_tree | ||
with file_context.timer.measure("transform"): | ||
for transformer in self.transformers: | ||
tree = transformer.transform(tree, results, file_context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Results will be stale after the first transform. You either generate the results (with a detector) before applying each transformer or ignore the results. I suggest making a pipeline class for each case.
If you'd still want to make a pipeline that reuses the results for each transform, then my suggestion is to create a detector that always returns the same results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fair. That does make sense if the detector belongs to the transformer pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You raise a really good point here, and you're absolutely right about the potential for the results to become stale in general. However, I still think that architecturally the codemod itself is the right place for detector
. This makes easier to reason about codemods that might share the same transformations but use different detectors.
To be clear: the cases I had in mind for multiple transformers fell into one of two categories:
- Cases where the transformers themselves perform detection (which is perfectly valid)
- Cases involving LLM-based transformations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the burden of composing transformations should be with the codemod itself.
My suggestion is to change the Codemod
class to accept a list of transformations, and move this logic into the apply
function of the codemod. This way you could use the detector to guarantee results before each transformation. It also means the Pipeline
classes of transformations wouldn't be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with that suggestion is that there is a phase mismatch between detectors and transformers. Detectors apply to a list of files whereas a transformer applies to a single file. This is necessarily the case due the fact that a tool like Semgrep or Sonar returns results that apply to a collection of files whereas LibCST transformations are applied to a single file at a time. Similarly we would expect LLM-based transformations to apply to one file at a time as well.
from codemodder.codemods.semgrep import SemgrepRuleDetector | ||
|
||
|
||
class SimpleCodemod(LibcstResultTransformer, metaclass=ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You went through the trouble of separating codemods into three components: metadata, transformer, and detector, only to muddle them all together again in this class. It's a bit telling it doesn't use the new BaseCodemod
class at all (and must, instead, rely on an attribute for that).
Not sure why it the codemod class should be concerned with details like detector_pattern
or on_result_found
. Those should be the responsibility of both the detector and transformer, respectively.
Maybe this exists as a convenient way to define everything in the same class, however not only that defeats the purpose of separation to begin with, but it's also not really any different from the old BaseCodemod
class.
I feel like any codemod class should be simple composition of those three components. So it should look like:
SandboxUrlSemgrepDetector = SemgrepDetector(rule = ...)
SandboxUrlTransformer = LibcstResultTransformer(callback = on_result_found)
SandboxUrlCodemod = BaseCodemod(SandBoxUrlSemgrepDetector, SandboxUrlTransformer, metadata = ...)
LimitReadlineCodemod = SemgrepCodemod(LimitReadlineTransformer, metadata = ..., semgrep_rule = ...)
or
class LimitReadline(SemgrepCodemod):
metadata = ...
transformer = LimitReadlineTransformer
semgrep_rule = ...
My suggestion is to first think how you want the codemods to be expressed (e.g. the two examples above) and then work backwards from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this exists as a convenient way to define everything in the same class, however not only that defeats the purpose of separation to begin with, but it's also not really any different from the old BaseCodemod class.
This is exactly the point of this API and I believe that this is an unfair criticism. Making the simple cases simple to express is necessarily in tension with providing the full generality that we want from the BaseCodemod
API. We did in fact work backwards from the API that we wanted, which was already reflected in most of these codemod implementations. However as we were all aware, the original implementation was both not powerful enough and also quite convoluted in terms of the underlying class hierarchy. So adding an API layer to make the simple cases simple seems both worthwhile and like a good design.
I feel like any codemod class should be simple composition of those three components
I am in agreement with you. The BaseCodemod
API makes this explicit, except that the composition is expressed in terms of a class definition instead of a function call.
If you would like to propose an alternative API for the simple codemod cases, I am open to considering it. In truth I'm not very happy with the underlying implementation of SimpleCodemod
since it involves some metaprogramming magic. But I think the API should be judged by its use cases which are clean and simple. I would encourage you to rewrite those examples in terms of your proposed API and see whether that leads to cleaner code and we can evaluate it from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an addendum: I am not trying to hide the fact that this simplified API was explicitly designed to minimize the changes required from the previous simplified API and the new API. I don't see that as an unworthy goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly the point of this API and I believe that this is an unfair criticism. Making the simple cases simple to express is necessarily in tension with providing the full generality that we want from the BaseCodemod API.
If that's the intention, then it's perfectly fine. It does have the benefits of being easy and convenient to throw everything into a class with a few methods/attributes.
Having a clear (and enforced) separation between detection/transformation/metadata does have other benefits like modularity and reusability. You can't really reuse any transformation defined within a SimpleCodemod
.
I am in agreement with you. The BaseCodemod API makes this explicit, except that the composition is expressed in terms of a class definition instead of a function call.
The point is that the composition is muddled with the SimpleCodemod
class. It is a Codemod
and Transformation
at the same time. For a more clear separation, it should "have" a transformation and not be one. If the intent is clear separation, I don't think it should have any logic pertaining the transformation itself.
If you would like to propose an alternative API for the simple codemod cases, I am open to considering it. In truth I'm not very happy with the underlying implementation of SimpleCodemod since it involves some metaprogramming magic. But I think the API should be judged by its use cases which are clean and simple. I would encourage you to rewrite those examples in terms of your proposed API and see whether that leads to cleaner code and we can evaluate it from there.
I'm happy with how it turned out. In fact, you may recall that I've suggested something close to this before. The point of the aforementioned comment is to point that the introduction of the SimpleCodemod
introduces a new way to express codemods and that makes you lose some of the benefits of the new design. Again, if you were already aware of that, then that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't really reuse any transformation defined within a SimpleCodemod.
This is okay with me, but that's good point. I think that cases where the transformation is shared between multiple codemods will need to be refactored.
The point is that the composition is muddled with the SimpleCodemod class. It is a Codemod and Transformation at the same time. For a more clear separation, it should "have" a transformation and not be one. If the intent is clear separation, I don't think it should have any logic pertaining the transformation itself.
I understand the criticism here and I don't disagree but it is also fundamentally at odds with providing a "concise" simplified API which is another explicit design goal here.
The point of the aforementioned comment is to point that the introduction of the SimpleCodemod introduces a new way to express codemods and that makes you lose some of the benefits of the new design.
Again, you are not wrong, but I don't think it's so problematic. SimpleCodemod
is just sugar to make the easy cases easy. It reduces a lot of boilerplate (and refactoring) that would otherwise be required. It just so happens that most of our existing codemods fit into the "simple" category but I expect that there will be many other codemods that will require the lower-level API.
a3b996f
to
ae183c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should fix the pipeline issue before merging this. It will most likely need some structural changes in the code.
However, since no codemod use it for now, I'm willing to accept this. Just make sure to add a comment raising this issue.
ae183c0
to
ebb14fe
Compare
Quality Gate passedThe SonarCloud Quality Gate passed, but some issues were introduced. 5 New issues |
Overview
Implement new codemod API
Description
How to review this PR
base_codemod.py
andlibcst_transformer.py
api.py
file. This contains the simplified API implementation which is intended for most of our current use cases