Skip to content
Bob Nystrom edited this page Jan 15, 2016 · 11 revisions

Why have a formatter?

The formatter has a few goals, in order of descending priority:

  1. Produce consistently formatted code. Consistent style improves readability because you aren't distracted by variance in style between different parts of a program. It makes it easier to contribute to others' code because their style will already be familiar to you.

  2. End debates about style issues in code reviews. This consumes an astonishingly large quantity of very valuable engineering energy. Style debates are time-consuming, upset people, and rarely change anyone's mind. They make code reviews take longer and be more acromonious.

  3. Free users from having to think about and apply formatting. When writing code, you don't have to try to figure out the best way to split a line and then pain-stakingly add in the line breaks. When you do a global refactor that changes the length of some identifier, you don't have to go back and rewrap all of the lines. When you're in the zone, you can just pump out code and let the formatter tidy it up for you as you go.

  4. Produce beautiful, readable output that helps users understand the code. We could solve all of the above goals with a formatter that just removed all whitespace, but that wouldn't be very human-friendly. So, finally, the formatter tries very hard to produce output that is not just consistent but readable to a human. It tries to use indentation and line breaks to highlight the structure and organization of the code.

    In several cases, the formatter has pointed out bugs where the existing indentation was misleading and didn't represent what the code actually did. For example, automated formatted would have helped make Apple's "gotofail" security bug easier to notice:

    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
        goto fail;
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;

    The formatter would change this to:

    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
    goto fail; // <-- now clearly not under the "if".
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;

I don't like the output!

First of all, that's not a question. But, yes, sometimes you may dislike the output of the formatter. This may be a bug or it may be a deliberate stylistic choice of the formatter that you disagree with. The simplest way to find out is to file an issue.

Now that the formatter is fairly mature, it's more likely that the output is deliberate. If your bug gets closed as "as designed", try not to be too sad. Even if the formatter doesn't follow your personal preferences, what it does do is spare you the effort of hand-formatting, and ensure your code is consistently formatted. I hope you'll appreciate the real value in both of those.

How stable is it?

You can rely on the formatter to not break your code or change its semantics. If it does do so, this is a critical bug and we'll fix it quickly.

The formatter is in wide use, so changes that affect the way a significant fraction of code is formatted are very unlikely.

The rules the formatter uses to determine the "best" way to split a line may change over time, mostly in complex cases. We don't promise that code produced by the formatter today will be identical to the same code run through a later version of the formatter. We do hope that you'll like the output of the later version more.

How does it work?

I wrote a long article about how the formatter is implemented here.

Why can't I tell the formatter to ignore a region of code?

Even a really sophisticated formatter can't beat a human in all cases. Our semantic knowledge of the code can let us show more than the formatter can. One escape hatch would be to have a comment telling the formatter "leave this alone".

This might help the fourth goal above, but does so at the expense of the first three. We want code that is consistent and we want you to stop thinking about formatting. If you can decide to turn off the formatter, now you have regions of code that are inconsistent by design.

Further, you're right back into debates about how the code in there should be formatted, with the extra bonus of now debating whether or not that annotation should be used and where. None of this is making your life better.

Yes, maybe you can hand-format some things better than the formatter. (Though, in most cases where users have asked for this, I've seen formatting errors in the examples they provided!) But does doing that really add enough value to make up for re-opening that can of worms?

Why isn't this function or collection indented enough?

In most cases, a function expression or a multiline collection literal's body is indented relative to the containing statement, not relative to the expression nesting where it appears. This style was inherited from Google's JavaScript style guide where it seems to work well.

It is so natural in most code that you probably don't even notice it:

argParser.addAll([
  "--help",
  "--mode",
  "debug"
]);
test("adds two numbers correctly", () {
  expect(1 + 2, equals(3));
});

But the same behavior kicks in even when the body is contained in an expression that has other splits:

configure(
    debugStuff: false,
    optimizeStuff: true,
    removeExtraStuff: false,
    thingsToInclude: [
  "widgets",
  "gadgets",
  "doodads",
  "doohickeys"
]);

Here, the indentation of the list body is less than the thingsToInclude: parameter that it applies to, which breaks the general guideline that indentation should reflect nesting depth.

We could always indent relative to the surrounding expression, but that also leads to bad output in many cases:

group("this description is too long"
    "too fit in one line",
    () {
      // 1000 lines of annoyingly indented test code...
    });

Both indentation styles are useful in different places, so the formatter supports both and tries to choose the best based on the context where the body appears. The rules here are fairly subtle, and take into account method chains, other arguments in the argument list, etc. There doesn't appear to be a simple rule that makes all code look good and the heuristics it uses aren't perfect. It does its best.

Why does the formatter mess up my collection literals?

Large collection literals are often used to define big chunks of structured data, like:

/// Maps ASCII character values to what kind of character they represent.
const characterTypes = const [
  other, other, other, other, other, other, other, other,
  other, white, white, other, other, white,
  other, other, other, other, other, other, other, other,
  other, other, other, other, other, other, other, other,
  other, other, white,
  punct, other, punct, punct, punct, punct, other,
  brace, brace, punct, punct, comma, punct, punct, punct,
  digit, digit, digit, digit, digit,
  digit, digit, digit, digit, digit,
  punct, punct, punct, punct, punct, punct, punct,
  alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha,
  alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha,
  alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha,
  alpha, alpha, brace, punct, brace, punct, alpha, other,
  alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha,
  alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha,
  alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha,
  alpha, alpha, brace, punct, brace, punct
];

The formatter doesn't know those newlines are meaningful, so it wipes it out to:

/// Maps ASCII character values to what kind of character they represent.
const characterTypes = const [
  other,
  other,
  other,

  // lots more ...

  punct,
  brace,
  punct
];

In many cases, ignoring these newlines is a good thing. If you've removed a few items from a list, it's a win for the formatter to repack it into one line if it fits. But here it clearly loses useful information.

Fortunately, in most cases, structured collections like this have comments describing their structure:

const characterTypes = const [
  other, other, other, other, other, other, other, other,
  other, white, white, other, other, white,
  other, other, other, other, other, other, other, other,
  other, other, other, other, other, other, other, other,
  other, other, white,
  punct, other, punct, punct, punct, punct, other, //          !"#$%&´
  brace, brace, punct, punct, comma, punct, punct, punct, //   ()*+,-./
  digit, digit, digit, digit, digit, //                        01234
  digit, digit, digit, digit, digit, //                        56789
  punct, punct, punct, punct, punct, punct, punct, //          :;<=>?@
  alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha, //   ABCDEFGH
  alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha,
  alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha,
  alpha, alpha, brace, punct, brace, punct, alpha, other, //   YZ[\]^_'
  alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha, //   abcdefgh
  alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha,
  alpha, alpha, alpha, alpha, alpha, alpha, alpha, alpha,
  alpha, alpha, brace, punct, brace, punct  //                 yz{|}~
];

In that case, the formatter is smart enough to recognize this and preserve your original newlines. So, if you have a collection that you have carefully split into lines, add at least one line comment somewhere inside it to get it to preserve all of the newlines in it.

Why doesn't the formatter add curlies or otherwise clean up code?

The formatter has a simple, restricted charter: it rewrites only the non-semantic whitespace of your program. It makes absolutely no other changes to your code. This makes it more reliable to run the formatter automatically in things like presubmit scripts where a human does not vet the output.

Making non-whitespace changes like reordering or adding or removing curly braces has a lot of very ugly failure cases.

  • If we add curlies to the body of an if that doesn't fit on one line, do we remove them if it does fit? What if the user prefers using curly braces on all ifs? If we don't remove them, then it means the formatter's behavior isn't reversible. If make the if condition longer, then reformat, it may add curlies. Then you change it back to the original condition but it doesn't remove them, so you aren't back where you started.

  • If we alphabetize your imports, what happens to comments in the middle of them? What if it appears to be a commented out import? Do we sort it?

  • If we split long string literals so that they fit in the line length, do we unsplit adjacent ones that would fit? What kind of string literal do we use when we split or unsplit? How do we handle escaped quotation marks that are affected by that choice? Are all of the things we might do here reversible?

We probably could come up with reasonable behavior for most of these, but the heuristics get increasingly hairy and likely to not work. That's not the kind of program you want modifying all of your code right before you commit it.

Clone this wiki locally