Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format-preserving AST transformations TODO #344

Closed
16 of 26 tasks
nikic opened this issue Jan 24, 2017 · 21 comments
Closed
16 of 26 tasks

Format-preserving AST transformations TODO #344

nikic opened this issue Jan 24, 2017 · 21 comments

Comments

@nikic
Copy link
Owner

nikic commented Jan 24, 2017

With #41 (comment) the master branch now supports preserving original code formatting when performing AST transformations. At this point formatting-preservation works well in practice, and remaining issues are addressed on an as-needed basis. Please open an issue or comment if you encounter a problem.

  • Support adding and removing elements from list subnodes.
    • Implement Myers differ. (69aec6f)
    • Support insertion. (3101558)
    • Support insertion at start. (dc3ace5)
    • Support insertion into empty list. (aa72c5d)
    • Use multi-line formatting when inserting into multi-line array. (1c7fd31)
    • Support removal. (56bc8eb)
    • Support removal at start. (bd72280)
  • Support toggling of boolean subnodes (like isRef). This can probably be handled with the same mechanism that is used for adding/removing nullable nodes, i.e. specifying which tokens need to be dropped when switching to false and which tokens need to be added (and where) when switching to true.
  • Support changing of integer subnodes.
    • Support changing of modifiers. (7762753)
    • Support changing of use type (maybe?)
  • Finalize support for adding a nullable node. The current implementation has a couple of open todos, and the resulting formatting is not perfect in some cases (namely everything that inserts $key =>).
  • Make sure fixup of weird syntax like "${foo[0]}" works correctly. I didn't test, but suspect that changing the foo here might result in incorrect output.
  • Correctly handle changes involving inline HTML.
  • Allow preservation of formatting for anonymous classes. Because anon classes mix the class declaration and the constructor arguments in a weird way, the offset information for this case is currently ... odd. Maybe a change in the node-structure will help here, maybe an extra hack for this case is needed. (5900d78)
  • Add support for tab indented code. Currently we assume an ideal world where all code is indented with spaces.
  • Handle comment changes.
    • Detect changes to comments. (f6cc85a)
    • Only reprint the comment changes, not everything. (6a2e1ae)
  • Extract token stream related code into a TokenStream class. Currently the pretty printer contains lots of code for searching / extracting from the token stream. (b1cd07a)
    • Write tests for this class independently of pretty printer.
  • Improve performance. There are lots of special cases to handle, but for the most we should be able to take over large chunks of the original code directly. The implementation can probably be a lot faster than it is right now.
  • Decide on final handling of Identifier nodes. Do we want to keep this is an optional parser mode, or make it the only format? I think from an AST consumer perspective it does not make a huge difference, but for people who construct AST nodes having to create Identifiers in many places might be annoying.
    • Resolution: This is no longer an optional parser mode, the parser will now always create Identifiers. Node constructors have been changed to implicitly create Identifier wrappers if plain strings are passed, to keep things more ergonomic.
  • Document things.
  • Add getLexer() method to parser. Otherwise it would be necessary to pass around the lexer with the parser.
@ao2
Copy link

ao2 commented Apr 28, 2017

Hi,

preserving original code formatting when performing AST transformations will be very useful for code refactoring.

However, even without it php-parser has still been useful to me as a validating tool for automatic refactoring done with other mechanisms which do preserve the formatting.

Here is an example using sed and regexes (which I don't trust myself with) for the actual refactoring and php-parser for validation: https://git.ao2.it/experiments/php-drupal-console-code-refactoring.git/tree

Feel free to take the php code as an example for php-parse, it implements a visitor which changes the second argument of a particular function to be null under certain conditions.

@nikic
Copy link
Owner Author

nikic commented Apr 29, 2017

As a small update here: While there has been no movement on the above TODOs, all the necessary parser changes are now in 4.0, without the need to enable special options. As such, the boilerplate code to perform format-preserving transformations is now:

$lexer = new Lexer\Emulative([
    'usedAttributes' => [
        'comments',
        'startLine', 'endLine',
        'startTokenPos', 'endTokenPos',
    ],
]);
$parser = new Parser\Php7($lexer);

$traverser = new NodeTraverser();
$traverser->addVisitor(new NodeVisitor\CloningVisitor());

$printer = new PrettyPrinter\Standard();

$oldStmts = $parser->parse($code);
$oldTokens = $lexer->getTokens();

$newStmts = $traverser->traverse($oldStmts);

// MODIFY $newStmts HERE

$newCode = $printer->printFormatPreserving($newStmts, $oldStmts, $oldTokens);

@gplanchat
Copy link

gplanchat commented May 30, 2017

Hello

A few months ago, I wrote an AST merger: https://github.com/kiboko-labs/akeneo-product-values-management/blob/master/src/Builder/ClassMerger.php

It is very specialized to the needs I had: appending properties and methods to existing classes. Maybe this solution could be some way to answer some of these needs?

@TomasVotruba
Copy link
Contributor

TomasVotruba commented Jul 16, 2017

Thanks for creating this issue. I wonder about another approach to this, let me share:

PrettyPrinter would put out code in non-namespace version. It is not nice, but it works well. What it needs to look nice? A couple of sniffs/fixers, with PSR-2, Symfony style or others.

If you already use coding standard, just use it after PHP-Parser run and code is both refactored and according your style needs. No need to integrate various coding standard rules to parser itself. Parser should parse and focus on printing, if there are maintained tools for that. Maybe there are other features of Parser that would be happy for attention and that cannot be handled by coding standards.

I'm working on this approach in EasyCodingStandard combined with Rector - CLI tool to refactor legacy code to modern and clean code. Rector will do refactoring + coding standard fixes in one command.

What do you think?

@TomasVotruba
Copy link
Contributor

As a small update here: While there has been no movement on the above TODOs, all the necessary parser changes are now in 4.0, without the need to enable special options.

Thanks for the update and sharing specific code. Is there any related PR to check? I'm curious what have you changed.
Do you have any plans on 4.0 release? I'll use it for my stable version of package now to have this feature enabled.

@nikic
Copy link
Owner Author

nikic commented Jul 19, 2017

I don't have concrete plans for a 4.0 release yet. I did not have time to work on this further and would like to at least resolve some of the TODO items (especially the first one is a large limitation).

@TomasVotruba
Copy link
Contributor

I see. I'm adding a method to specific position like this.

@vovafeldman
Copy link

@nikic any ETA on releasing a stable version of format preservation? We really need it :)

@jamesckemp
Copy link

Yes please!

@raducretu
Copy link

+1

@Juddling
Copy link

Juddling commented Nov 26, 2017

Hi all, I'm trying to modify some existing code - just pushing an item onto the end of an array, but the formatting is coming out like so:

--- Expected
+++ Actual
@@ @@
 <?php

 return [
     'one',
     'two',
-    'three',
-    'four'
+    'three', 'four'

I cloned an existing node in the hope that it would copy the starting position.

Here is the relevant code:

    public function addToArray($code)
    {
        $visitor = new class extends NodeVisitorAbstract {
            public function leaveNode(Node $node) {
                if ($node instanceof Array_) {
                    $newItem = clone $node->items[0];
                    $node->items[] = $newItem;
                    return $node;
                }
            }
        };

        return $this->modify($code, $visitor);
    }

    public function modify($code, $visitor)
    {
        $oldStmts = $this->parser->parse($code);
        $oldTokens = $this->lexer->getTokens();
        $newStmts = $this->traverser->traverse($oldStmts);

        $this->traverser->addVisitor($visitor);

        $newStmts = $this->traverser->traverse($newStmts);
        return $this->printer->printFormatPreserving($newStmts, $oldStmts, $oldTokens);
    }

Could someone please advise me if this is a bug or not yet supported?

@nikic
Copy link
Owner Author

nikic commented Nov 26, 2017

@Juddling It's half-supported. The formatting of the array is preserved, but the new element is added on the same line. There is currently no detection that the array is formatted in multi-line style and the new element should be added in multi-line style as well.

@nikic
Copy link
Owner Author

nikic commented Dec 26, 2017

@Juddling I've just landed 1c7fd31, so this case should now work correctly (that is, use multi-line formatting).

@nikic
Copy link
Owner Author

nikic commented Jan 27, 2018

I've released a beta version for PHP-Parser 4.0 (https://github.com/nikic/PHP-Parser/releases/tag/v4.0.0beta1) and plan to create a stable release soonishly. I'd say that at this point, this feature is working pretty well, and it just comes down to fixing formatting issues as they come up. If I'm going to wait until this functionality is "finished", I'm afraid we'll never see a release...

@lisachenko
Copy link
Contributor

@nikic it is definitely good point not to delay release (you will be able to release next major if needed)

Anyway this feature is very important for many developers like me :) Current PHP-Parser implementation is cool, but if some code changes needed then developers should recombine AST with tokens in order to keep formatting like this. It's also impossible to generate custom code with required formatting from AST.

So will it be possible to work on this feature later? But this will require a lot of changes to switch from lexical parsing to raw nodes, because of skipping whitespace nodes :(

PS. Therefore in Java (IDEA engine) AST implementation provides an access to whitespace nodes. But you need to use special methods getNextSibling(), getPreviousSibling(), getChildren(), getParent()to traverse such AST manually and wrap it into meaningful nodes. For example, for PHP it will be like this: search for the namespace node then if you want classes in it you should scan all non-whitespace children nodes.

@rainbow-alex
Copy link
Contributor

What about preserving redundant parentheses? When I parse:

<?php echo (3);

I get:

array(
    0: Stmt_Echo(
        exprs: array(
            0: Scalar_LNumber(
                value: 3
            )
        )
    )
)

Which makes it almost impossible to restore the parentheses without some very complicated walking.

@eeliu
Copy link

eeliu commented Apr 8, 2019

Hi, I may meet some "half-supported". Here is my code

namespace app;
//use PDO;

class Foo
{

}

I want add some footnodes into it

namespace app;
//use PDO;

class Foo
{

}
require_once __CACHE__."header.php"

While, if this file includes "use xxx;", everything is fine, the formatting is the expected.
If not, the file was reformatted.

namespace app;

//use PDO;                               <---------------------here -----------------
class Foo
{
}
require_once __CACHE__."header.php"

my code

......
public function leaveNode(Node $node){
......
         if ($node instanceof Node\Stmt\Namespace_)
        {
            $express  =   new Node\Stmt\Expression(
                new Node\Expr\Include_(
                    new Node\Scalar\String_('xxxx'),
                   Node\Expr\Include_::TYPE_REQUIRE
                )
            );
            $node->stmts[]= $express ;
            return $node;   
      }
......
}
......

$curNode =  $traverser->traverse($origStmts);
$newCode = $printer->printFormatPreserving(
      $curNode,
      $origStmts,
      $lexer->getTokens()
  );

@nikic
Copy link
Owner Author

nikic commented May 12, 2019

@eeliu I can't reproduce this issue. The following test passes for me:

<?php
namespace app;

class
Foo
{}
-----
$stmts[0]->stmts[] = new Stmt\Echo_([new Scalar\String_('Test')]);
-----
<?php
namespace app;

class
Foo
{}
echo 'Test';

Adding a use also didn't make a difference to the behavior.

@eeliu
Copy link

eeliu commented May 13, 2019

@nikic Add some annotations in it.

namespace app;
//use PDO; 

class Foo

become

namespace app;

//use PDO;                               <---------------------here -----------------
class Foo

@kirugan
Copy link

kirugan commented Jul 25, 2019

Unfortunately for me this simple code:

<?php
function test() {
  global $x, $y;
  var_dump($x + $y);
}

after deletion of global stmt become this:

<?php
function test()
{
    var_dump($x + $y);
}

New code doesn't preserve tabulation and open bracket in functions.

@nikic
Copy link
Owner Author

nikic commented May 21, 2023

Closing this tracking issue as done. If there are issues with formatting preservation, they should get reported as separate issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests