Breaking: Calculate leading/trailing comments in core #7516

kaicataldo · 2016-11-01T22:03:35Z

What is the purpose of this pull request? (put an "X" next to item)

[ ] Documentation update
[ ] Bug fix (template)
[ ] New rule (template)
[ ] Changes an existing rule (template)
[ ] Add autofixing to a rule
[ ] Add a CLI option
[X] Add something to the core
[ ] Other, please explain:

What changes did you make? (Give an overview)
This PR turns off comment attachment in Espree and moves comment getting logic into sourceCode.getComments(). This is a breaking change.

Is there anything you'd like reviewers to focus on?
Would love suggestions for how else we might be able to handle shebang comments.

As discussed on the corresponding issue, we should discuss if we want to continue thinking about comments the same way now that we're not attaching at the parser level. This PR mimics the current attachment strategy in Espree as close as it can, though it's not possible (nor do I think we want it to be) exactly the same as it is in Espree, because there are some unpredictable edge cases and bugs in that.

The version in this PR should essentially work the same for our users and ecosystem (unless they rely on some of the really weird edge cases mentioned before).

eslintbot · 2016-11-01T22:03:37Z

Thanks for the pull request, @kaicataldo! I took a look to make sure it's ready for merging and found some changes are needed:

The commit summary needs to begin with a tag (such as Fix: or Update:). Please check out our guide for how to properly format your commit summary and update it on this pull request.

Can you please update the pull request to address these?

(More information can be found in our pull request guide.)

mention-bot · 2016-11-01T22:03:38Z

@kaicataldo, thanks for your PR! By analyzing the history of the files in this pull request, we identified @btmills, @nzakas and @mysticatea to be potential reviewers.

nzakas

I'm not entirely sure what I should be reviewing here. Can you point out some areas where you'd like some comments? And can you explain what the differences are between what Espree does and what you're doing here?

nzakas · 2016-11-04T18:31:59Z

lib/eslint.js

@@ -908,6 +908,8 @@ module.exports = (function() {
                }
            }

+            ast.hasShebang = !!shebang;


Hmm, this doesn't look like a good idea.

Yeah, this was what I was hoping to get some feedback on. The current behavior is to modify the shebang comment to be a normal JS line comment before parsing and then to remove the parsed comment from the top level comments array as well as from the leadingComments of the first node in the Program body after parsing has completed.

Now that we're calculating this on the fly, I need to figure out how getComments() can know which token represents a shebang comment (if one exists) so that it doesn't include it. The challenge here is that once the shebang comment has been modified to be a normal JS line comment, there isn't reliable way of knowing if there is a shebang comment at the top of the file or not.

Is there any possibility of adding a property to the shebang comment token that we could check? Any other suggestions/ideas would be most welcome!

Hmm, I think I'm missing something. If we're already removing the shebang comment from comments, and getComments() uses the comments array to figure out which comments to return, wouldn't it automatically ignore shebang comments?

Good question - it's because the SourceCode instance is created before we remove the shebang comment from the comments array. So it seems like the fix might actually be in lib/eslint.js

Here's the line where the SourceCode instance is created: https://github.com/eslint/eslint/blob/master/lib/eslint.js#L817

And here's where the shebang comment is removed:
https://github.com/eslint/eslint/blob/master/lib/eslint.js#L903

Still confused. If ast is the same as was passed into SourceCode, shouldn't the change work correctly? Is this just a timing issue?

That's right - sorry I'm not explaining this better. The SourceCode instance above takes the parsed ast and generates its own internal tokenAndCommentStore from it. Since this occurs before the shebang comment is removed, the SourceCode instance's tokenAndCommentStore still contains the shebang comment.

I think we should be able to do the shebang comment removal before creating the SourceCode instance - will have to figure out how to do it with the few forks of logic that happen there.

Had some time to look at this and I think I've found a better solution. One of our rules, lines-around-directive, relies on the shebang comment being in sourceCode's tokensAndCommentsStore, so I need to just fix that rule and we should be good to go!

After a bit of digging I'm realizing that including the shebang in the tokens and comments store might be intentional and actually a good thing. Despite not being attached to any nodes or included in the comments array of the AST, transforming it into a standard JS line comment and including it in the store allows us to use sourceCode.getTokenOrCommentBefore() in rules to write rules around the shebang.

Had some thoughts and will write them in a comment below.

nzakas · 2016-11-04T18:34:22Z

lib/util/source-code.js

@@ -133,6 +143,9 @@ function SourceCode(text, ast) {
    this.getTokenOrCommentBefore = tokensAndCommentsStore.getTokenBefore;
    this.getTokenOrCommentAfter = tokensAndCommentsStore.getTokenAfter;

+    this._getTokens = tokensAndCommentsStore.getTokens;
+    this._getCommentsStore = new WeakMap();


Maybe just _commentStore?

kaicataldo · 2016-11-04T19:45:09Z

Sure, sorry that wasn't clear.

Behavior and differences between `sourceCode.getComments()` and Espree's comment attachment

The new behavior for sourceCode.getComments() is to iterate over the token list and stop when it encounters a non-comment token (starting from the first token and checking before and the last token and checking after). This is different from the current behavior, as Espree collects comments as it parses and then attaches when it finishes a node. This means that comments can be attached across tokens (parentheses, operators) and leads to some unexpected behavior (see examples below).
It doesn't attach nodes that exist outside the range of the node's parent, as it should then be attached to that parent node (I didn't notice any differences between Espree and this PR for this behavior).

Examples:

foo /*comment*/ || /*comment*/ bar
var foo /*comment*/ = /*comment*/ bar;

Espree: In both examples above, the Identifierfoo has 1 trailing comment while bar has 2 leading comments.

sourceCode.getComments(): In both examples above, the Identifierfoo has 1 trailing comment and the Identifierbar has 1 leading comments.

function foo(/*asdf*/) {}
function foo(/*asdf*/bar) {}

Espree: In the first example above, the BlockStatement has a leading comment. In the second, the Identifier bar has a leading comment and the BlockStatement doesn't have any.

sourceCode.getComments(): The first example does not return any comments. I wasn't sure what the desired behavior would be here - should it attach the comment as a trailing comment when a function node's params is empty? In the second, the Identifier bar has a leading comment. The BlockStatement does have leading comments in either case.

Questions/Concerns

How should we handle shebang comments (since they shouldn't be included in the results for getComments()?
How should the comments in the second example (inside the parens of a function declaration without any parameters) be treated? This is a case where the model of leading/trailing comments doesn't make a lot of sense. I also don't think it makes sense for them to be attached to the function body as a leading comment (the current behavior).
How does everyone feel about the slightly changed (and more predictable) behavior described above? So far, it doesn't seem to affect any use cases in the ESLint codebase.

nzakas · 2016-11-09T20:49:08Z

Thanks, that's super helpful. I think the new behavior you've described makes a lot of sense, and have no objections to either dropping the comment without a node or the slightly changed attachment behavior of leading comments.

platinumazure

I might be misunderstanding a few things, but hopefully this review will be of some use.

platinumazure · 2016-11-17T18:19:55Z

tests/lib/util/source-code.js

+            const code = [
+                "//#!/usr/bin/env node",
+                "var a;",
+                "// foo",


Does this comment count as trailing for var a; and leading for var b;?

I don't think this is a problem since people who want to iterate over all comments can just iterate over the comment store without worrying about attachment, but I also want to make sure I understand what's going on here.

Maybe some comments around the asserts would help? (E.g., assertCommentCount(1, 1)(node); // commented shebang, foo)

That's right. This is the current behavior as defined by Espree's comment attachment. I was hoping we could discuss as a team to see what everyone thought about keeping the behavior as close to Espree as possible or if there were ideas for improvements, since this is already a breaking change. If you have any ideas, I'd love to discuss!

I'm okay with the behavior-- I just wanted to confirm my understanding.

I'd love to see some comments in the tests themselves, so it's clear to people unfamiliar with comment attachment which comments are leading and trailing to what nodes. 90% of the time it's clear, but for the other 10%, it'd be nice to see documentation via comments.

platinumazure · 2016-11-17T18:21:40Z

tests/lib/util/source-code.js

+            eslint.on("Identifier", assertCommentCount(0, 0));
+
+            eslint.verify(code, config, "", true);
+        });


Could you please add a test for a multiple-declarator VariableDeclaration, so we understand how the comment attachment is supposed to work there?

Example test case:

// Leading comment for VariableDeclaration? var a, // Trailing comment for VariableDeclarator? And/or leading for the next? b, // Trailing comment for second VariableDeclarator? c; // Trailing comment for VariableDeclaration? // Trailing comment for VariableDeclaration?

platinumazure · 2016-11-17T18:23:24Z

tests/lib/util/source-code.js

+                "switch (foo)",
+                "    //comment",
+                "    /*another comment*/",
+                "}"


Is this example even syntactically valid? I see a closing brace but not an opening brace.

Good catch 👍

kaicataldo · 2016-11-17T20:12:08Z

Working on this has led me to have some questions around how we want to handle shebangs. Essentially, it seems like we actually probably want to keep the current behavior of not including shebangs in the AST's comments array and should not be included as a leading comment when we use sourceCode.getComments(). Please see this comment thread for more context.

The problem as it currently stands is that sourceCode.getComments() doesn't have a way of knowing if a LineComment token was a shebang or not (since it gets transformed into a standard JS LineComment prior to parsing). I think we do want to keep the behavior of transforming the shebang and keeping it in the tokens and comment store, as this allows rules to use sourceCode.getTokenOrCommentBefore() to get the token that represents the shebang.

It seems like we have a few ways forward, and I wanted to see what you all thought:

Right after parsing in lib/eslint.js, add a property (shebang: true?) to the LineComment token that represents the shebang and then filter it when calculating sourceCode.getComments()'s return value.
When sourceCode is instatiated, remove the shebang and store a property/add a helper method in sourceCode that allows access to the shebang token.
Simply remove the shebang token and make rules figure this out for themselves (maybe by checking the first line of the source code). This is my least favorite option, because it feels like something we should be able to provide.

Thoughts? Suggestions? Things I missed?

platinumazure · 2016-11-17T20:28:34Z

I'd vote for option 2:

When sourceCode is instatiated, remove the shebang and store a property/add a helper method in sourceCode that allows access to the shebang token.

This is similar to how we handle the byte-order mark (BOM).

I would be okay with option 1 (add shebang property to the comment token itself).

I would be opposed to option 3 (remove the comment entirely from SourceCode's store).

mikesherov · 2016-11-17T20:29:06Z

This sounds correct to me too:

When sourceCode is instatiated, remove the shebang and store a property/add a helper method in sourceCode that allows access to the shebang token.

kaicataldo · 2016-11-24T02:34:01Z

Updated - thoughts on this approach? I thought about it some more and am actually uncomfortable with removing it from the token list. Treating the shebang like a Line comment works for most cases - the cases that we don't cover are ones where the rule needs to know whether the line comment token it's checking represents a shebang or not.

This current iteration changes the type of the comment token to Shebang. Doing so will continue to allow rules to iterate over tokens (which seems the most likely way that rules will be checking this), as well as giving rules that use sourceCode.getTokenOrCommentBefore() an easy way to differentiate between regular JS Line comments and Shebang comments (token.type === "Shebang").

This could potentially break some rules that assume that the token.type === "Line" check will include shebang comments while iterating over the token list, but this seems like a pretty narrow use case and I think this gives rule writers greater control.

Thanks for all the input! If this doesn't seem like a good idea, I'm happy to continue exploring other options.

eslintbot · 2017-04-04T01:57:33Z

LGTM

kaicataldo · 2017-04-04T01:59:44Z

@not-an-aardvark @btmills Thanks again for the thorough reviews! I have addressed all the comments (either with code changes or comments of my own). Please let me know what you think! Changes were made in the last two commits, so hopefully it's not too hard to re-review.

btmills

@kaicataldo thanks for adding tests for those edge cases. I'm totally on board with #8408 and think that's the right direction to go. There's no perfect way to classify all comments as leading or trailing some particular node, but it looks like this does as good a job as we can hope for. LGTM

kaicataldo · 2017-04-05T18:45:08Z

Also, rebased and ran eslint-canary against this branch - not seeing any new unexpected errors! 🎉

not-an-aardvark

LGTM with a slight nitpick. Thanks!

not-an-aardvark · 2017-04-04T01:44:40Z

tests/lib/rules/capitalized-comments.js

+        // Ignores shebangs
+        "#!/usr/bin/env node",
+        { code: "#!/usr/bin/env node", options: ["always"] },
+        { code: "#!/usr/bin/env node", options: ["never"] },


Nitpick: All of the comments in these tests start with /, so the rule wouldn't report an error for them anyway. If the rule is refactored in the future, it might be useful to have tests for:

{ code: "#!foo", options: ["always"] } { code: "#!Foo", options: ["never"] }

Good call - done!

not-an-aardvark · 2017-04-06T01:37:32Z

Is the CLA bot down? It's still waiting for the status to be reported (and it hasn't left a comment)

vitorbal · 2017-04-06T09:40:05Z

@not-an-aardvark that happened to me a couple of days ago. I had to force push to trigger the bot again.

eslintbot · 2017-04-07T02:14:10Z

LGTM

JamesHenry · 2017-04-07T08:10:19Z

Yayyyyy! What an epic PR 😄

Thanks so much for your work on this @kaicataldo! And to all the reviewers for their invaluable help.

mikesherov · 2017-04-07T10:07:28Z

Congrats!

kaicataldo force-pushed the getcomments branch from 5779c43 to 3d684b4 Compare November 1, 2016 22:14

kaicataldo mentioned this pull request Nov 1, 2016

Calculate leading/trailing comments #6724

Closed

kaicataldo force-pushed the getcomments branch 3 times, most recently from 7459bec to 1c5ffa3 Compare November 1, 2016 22:55

nzakas reviewed Nov 4, 2016

View reviewed changes

kaicataldo removed the do not merge This pull request should not be merged yet label Nov 6, 2016

kaicataldo changed the title ~~WIP - Breaking: Calculate leading/trailing comments in sourceCode.getComments()~~ Breaking: Calculate leading/trailing comments in sourceCode.getComments() Nov 8, 2016

kaicataldo changed the title ~~Breaking: Calculate leading/trailing comments in sourceCode.getComments()~~ Calculate leading/trailing comments in sourceCode.getComments() Nov 8, 2016

kaicataldo changed the title ~~Calculate leading/trailing comments in sourceCode.getComments()~~ Breaking: Calculate leading/trailing comments in sourceCode.getComments() Nov 8, 2016

platinumazure suggested changes Nov 17, 2016

View reviewed changes

kaicataldo force-pushed the getcomments branch 5 times, most recently from 7a64fdb to 970a009 Compare November 24, 2016 02:18

kaicataldo force-pushed the getcomments branch from 970a009 to be5df78 Compare November 24, 2016 04:07

btmills approved these changes Apr 5, 2017

View reviewed changes

kaicataldo force-pushed the getcomments branch 2 times, most recently from 6274d06 to 9cd4e91 Compare April 5, 2017 18:30

not-an-aardvark approved these changes Apr 5, 2017

View reviewed changes

not-an-aardvark approved these changes Apr 6, 2017

View reviewed changes

kaicataldo added 5 commits April 6, 2017 21:30

Breaking: Calculate leading/trailing comments in core

d916cca

Fix capitalized-comments

452244c

Fix trailing comment behavior for empty nodes

5d1dc02

Refactor lines-around-comment.js

2b5e134

Add regression tests for shebangs in capitalized-comments

58dd348

kaicataldo force-pushed the getcomments branch from f4aa99f to 58dd348 Compare April 7, 2017 01:30

kaicataldo added 2 commits April 6, 2017 21:53

Fix up comments in lib/util/source-code.js

1f00ff4

Extract shebang matcher regex into ast-utils.js

c413af1

ilyavolodin approved these changes Apr 7, 2017

View reviewed changes

ilyavolodin merged commit 867dd2e into master Apr 7, 2017

kaicataldo deleted the getcomments branch April 7, 2017 02:37

dounan mentioned this pull request May 17, 2017

react/display-name gives false positives with component subclasses + argument spreading + es7 class properties jsx-eslint/eslint-plugin-react#1200

Closed

JamesHenry mentioned this pull request Aug 7, 2017

The AST does not contain a comments array at the root eslint/typescript-eslint-parser#346

Closed

jimjenkins5 mentioned this pull request Jan 16, 2018

Upgrade rules to support eslint v4 silvermine/eslint-plugin-silvermine#20

Closed

eslint-deprecated bot locked and limited conversation to collaborators Feb 6, 2018

eslint-deprecated bot added the archived due to age This issue has been archived; please open a new issue for any further discussion label Feb 6, 2018

Breaking: Calculate leading/trailing comments in core #7516

Breaking: Calculate leading/trailing comments in core #7516

Conversation

kaicataldo commented Nov 1, 2016 • edited Loading

eslintbot commented Nov 1, 2016

mention-bot commented Nov 1, 2016

nzakas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaicataldo Nov 4, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaicataldo Nov 14, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaicataldo commented Nov 4, 2016 • edited Loading

Behavior and differences between sourceCode.getComments() and Espree's comment attachment

Examples:

Questions/Concerns

nzakas commented Nov 9, 2016

platinumazure left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaicataldo commented Nov 17, 2016 • edited Loading

platinumazure commented Nov 17, 2016 • edited Loading

mikesherov commented Nov 17, 2016

kaicataldo commented Nov 24, 2016 • edited Loading

eslintbot commented Apr 4, 2017

kaicataldo commented Apr 4, 2017

btmills left a comment

Choose a reason for hiding this comment

kaicataldo commented Apr 5, 2017 • edited Loading

not-an-aardvark left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

not-an-aardvark commented Apr 6, 2017

vitorbal commented Apr 6, 2017

eslintbot commented Apr 7, 2017

JamesHenry commented Apr 7, 2017

mikesherov commented Apr 7, 2017

kaicataldo commented Nov 1, 2016 •

edited

Loading

kaicataldo Nov 4, 2016 •

edited

Loading

kaicataldo Nov 14, 2016 •

edited

Loading

kaicataldo commented Nov 4, 2016 •

edited

Loading

Behavior and differences between `sourceCode.getComments()` and Espree's comment attachment

kaicataldo commented Nov 17, 2016 •

edited

Loading

platinumazure commented Nov 17, 2016 •

edited

Loading

kaicataldo commented Nov 24, 2016 •

edited

Loading

kaicataldo commented Apr 5, 2017 •

edited

Loading