Spatch

Contents

Syntactical patch

Syntactical patch

Introduction

The goal of spatch is to allow programmers to express and perform refactorings while using a syntax they already are familiar with, the patch syntax. For instance to remove everywhere the second argument of a function foo one can write this syntactical patch:

 //remove_second_arg_foo.spatch
 foo(X
-    ,Y
     )

and then apply it on a codebase with:

$ spatch -f remove_second_arg_foo.spatch *.php

or:

$ find | grep .php | xargs spatch -f remove_second_arg_foo.spatch

This will work even if the function call is splitted on multiple lines or has extra spaces between the comma and the second expression, because spatch works at the abstract syntax tree level, not at the token or string level like patch or sed.

One could also write it as:

// remove_second_arg_foo_alt.spatch
- foo(X,Y)
+ foo(X)

(although it has some caveats as explained in the section about spaces below)

Finally one can also use the "sed mode" of spatch as in:

$ spatch -e 's/foo(X,Y)/foo(X)/' *.php

See https://github.com/facebook/pfff/blob/master/main_spatch.ml

Motivations

Most programming languages do not have refactoring tools and when they have, like Java with Eclipse, the programmer is often limited to a restricted set of refactorings such as "dropping an argument", "adding an argument", "move a function". Just like for Sgrep, we want to easily express complex code patterns but also source-to-source transformation on those patterns in a flexible way. Spatch is domain specific language to express such refactorings.

Synopsis

The synopsis is:

$ spatch (-f <spatch_file> | -e <s/before/after/>) [options] <files_or_dirs>

By default spatch generates a diff on stdout. Once you are confident that your syntactical patch is correct, you can then use the --apply-patch to actually modify the relevant files.

The further options are:

[--apply-patch] [--pretty-printer] [-lang <lang>]

There is support for a few programming languages. See Matrix to check for your favourite programming language.

Features

Any Expression, any transformation, any context

One can write any PHP expressions inside a syntactical patch and annotate subparts of it with - and + any way you want.

For instance with this spatch:

f(2,
- foo(1)
+ foo(2)
)

we want to replace every calls to foo(1) by foo(2) but only when the call is nested inside a specific kind of calls to f, the ones where the first argument of f is 2.

On this file:

<?php
f(2, foo(2));
f(1, foo(1));
f(2, foo(1));
f(2,
  foo(1));

spatch will generate:

$ ./spatch -f tests/php/spatch/foo.spatch tests/php/spatch/foo.php
--- tests/php/spatch/foo.php 2010-11-04 22:58:16.000000000 -0700
+++ /tmp/trans-31284-13ff71.php      2010-11-04 23:12:35.000000000 -0700
@@ -5,8 +5,8 @@

  f(1, foo(1));

- f(2, foo(1));
+ f(2, foo(2));

  f(2,
-  foo(1));
+  foo(2));

Metavariables

Just like for Sgrep, spatch supports metavariables so you can write syntactical patches like:

// remove_second_arg_foo_alt.spatch
- foo(X,Y)
+ foo(X)

You can use metavariables in place of full PHP expressions.

You can also use metavariables for XHP attribute values as in:

  <ui:section-header
-   border=X
  ></ui:section-header>

See Sgrep#Metavariables for more examples.

Isomorphisms

The principle of spatch is to take a pattern file, the spatch file, and match it over a source file. By using metavariables we get a more flexible pattern that can accomodate more source files. In the same way even if the spatch file contains extra spaces between tokens, or if an expression is split on multiple lines, it will still match source files using a different indentation style because spatch like sgrep works at the AST level.

See Sgrep#Isomorphisms for a few other tricks done by spatch called isomorphisms which allow the pattern to accomodate more source files

Spacing issues

spatch unfortunately sometimes generates diffs that break the indentation of the original code. For instance on this code:

foo(1,
    2);

the application of this spatch file:

- foo(X, Y)
+ bar(X, Y)

will generate this code:

bar(1, 2);

and not:

bar(1,
    2);

as one would expect.

The following spatch file on the opposite will perform the right thing:

- foo
+ bar
  (X, Y)

which may seem surprising because both spatch files look equivalent. To understand the difference, one must understand how internally spatch works, how it handles the minus code, plus code and the metavariables.

Here is what spatch internally does given this spatch file:

- foo(X, Y)
+ bar(X, Y)

it extracts the sgrep "pattern" from the spatch file by just looking at the minus and contextual lines. A contextual line is a line without any sign (in our case there is no such lines). So here the extracted pattern is foo(X, Y)
it annotates the tokens in the pattern with a minus and/or plus sign, to indicate which transformation to perform on the token. Here: [-foo; -(; -X; -,; -Y; -)+"bar(X,Y)"].
it then matches the (annotated) pattern on the code, and transfers the annotation (the - and +), on the tokens in the actual code. So on the foo(1,2) example, the tokens in the PHP code will then be [-foo; -(; -1; -,; -2; -)+"bar(1,2)"].
it pretty prints the tokens and associated spaces/comments in the original file if the token had no annotation. Otherwise, with a - annotation it does not print the token and with a + annotation it prints the string attached to the +. So here most tokens will be removed and the last parenthesis will be replaced by the string "bar(1,2)".

Here is what spatch internally does with the spatch file below, which should explain why this spatch file is more "space friendly":

- foo
+ bar
  (X, Y)

it extracts the sgrep pattern, still 'foo(X, Y)'
it annotates the tokens in the pattern, which this time are [-foo+bar; (; X; ,; Y; )]. As you can see only one token has an annotation.
it matches the code and transfer the annotation. So on the foo(1,2) example, only the foo token will have an annotation.
it pretty prints the tokens and associated spaces/comments in the original file if the token had no annotation, which here is the case for most of the tokens involved, including the token for the comma, which will then have its subsequent newline and tab pretty printed.

So to minimize the number of spacing issues, try to maximize the number of contextual lines in the spatch file, that is lines without any leading -.

Pretty printer

NEW There is a new --pretty-printer option to spatch that will cause spatch to call a pretty printer on the modified code to possibly reindent the code in a nice way (but it currently does not support the whole PHP language).

For instance on this code:

//test.php
function test1() {
   return foo('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
}

and this spatch:

//test.spatch
- foo(X)
+ foo(X, 1, 2, 3, 4)

then spatch --pretty-printer -f test.spatch test.php will generate:

--- test.php  2011-11-08 14:26:23.000000000 -0800
+++ /tmp/trans-8024-37a89b.php        2011-11-08 14:26:36.000000000 -0800
@@ -1,5 +1,11 @@
 <?php

 function test1() {
-  return foo('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
+  return foo(
+    'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa',
+    1,
+    2,
+    3,
+    4
+    );
 }

Speeding up things

spatch is significantly slower than tools like sed because it works on a more complex structure than a stream of characters, the abstract syntax tree. Nevertheless you can combine it with git grep piped to xargs to speedup things:

$ git grep -l foo |xargs spatch -f remove_second_arg_foo.spatch

FAQ

How do I rename a function with a variable number of arguments?

Here is the rename_foo_in_bar.spatch file:

- foo
+ bar
  (...)

ssr in IDEA http://tv.jetbrains.net/videocontent/intellij-idea-static-analysis-custom-rules-with-structural-search-replace
codemod http://github.com/facebook/codemod
gofmt/gofix http://golang.org/cmd/gofmt/#hdr-Examples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly