Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking: Calculate leading/trailing comments in core #7516

Merged
merged 7 commits into from
Apr 7, 2017

Conversation

kaicataldo
Copy link
Member

@kaicataldo kaicataldo commented Nov 1, 2016

Fixes #6724

What is the purpose of this pull request? (put an "X" next to item)

[ ] Documentation update
[ ] Bug fix (template)
[ ] New rule (template)
[ ] Changes an existing rule (template)
[ ] Add autofixing to a rule
[ ] Add a CLI option
[X] Add something to the core
[ ] Other, please explain:

What changes did you make? (Give an overview)
This PR turns off comment attachment in Espree and moves comment getting logic into sourceCode.getComments(). This is a breaking change.

Is there anything you'd like reviewers to focus on?
Would love suggestions for how else we might be able to handle shebang comments.

As discussed on the corresponding issue, we should discuss if we want to continue thinking about comments the same way now that we're not attaching at the parser level. This PR mimics the current attachment strategy in Espree as close as it can, though it's not possible (nor do I think we want it to be) exactly the same as it is in Espree, because there are some unpredictable edge cases and bugs in that.

The version in this PR should essentially work the same for our users and ecosystem (unless they rely on some of the really weird edge cases mentioned before).

@eslintbot
Copy link

Thanks for the pull request, @kaicataldo! I took a look to make sure it's ready for merging and found some changes are needed:

  • The commit summary needs to begin with a tag (such as Fix: or Update:). Please check out our guide for how to properly format your commit summary and update it on this pull request.

Can you please update the pull request to address these?

(More information can be found in our pull request guide.)

@mention-bot
Copy link

@kaicataldo, thanks for your PR! By analyzing the history of the files in this pull request, we identified @btmills, @nzakas and @mysticatea to be potential reviewers.

@kaicataldo kaicataldo added enhancement This change enhances an existing feature of ESLint core Relates to ESLint's core APIs and features breaking This change is backwards-incompatible accepted There is consensus among the team that this change meets the criteria for inclusion do not merge This pull request should not be merged yet labels Nov 1, 2016
@kaicataldo kaicataldo force-pushed the getcomments branch 3 times, most recently from 7459bec to 1c5ffa3 Compare November 1, 2016 22:55
Copy link
Member

@nzakas nzakas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure what I should be reviewing here. Can you point out some areas where you'd like some comments? And can you explain what the differences are between what Espree does and what you're doing here?

lib/eslint.js Outdated
@@ -908,6 +908,8 @@ module.exports = (function() {
}
}

ast.hasShebang = !!shebang;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this doesn't look like a good idea.

Copy link
Member Author

@kaicataldo kaicataldo Nov 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was what I was hoping to get some feedback on. The current behavior is to modify the shebang comment to be a normal JS line comment before parsing and then to remove the parsed comment from the top level comments array as well as from the leadingComments of the first node in the Program body after parsing has completed.

Now that we're calculating this on the fly, I need to figure out how getComments() can know which token represents a shebang comment (if one exists) so that it doesn't include it. The challenge here is that once the shebang comment has been modified to be a normal JS line comment, there isn't reliable way of knowing if there is a shebang comment at the top of the file or not.

Is there any possibility of adding a property to the shebang comment token that we could check? Any other suggestions/ideas would be most welcome!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think I'm missing something. If we're already removing the shebang comment from comments, and getComments() uses the comments array to figure out which comments to return, wouldn't it automatically ignore shebang comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - it's because the SourceCode instance is created before we remove the shebang comment from the comments array. So it seems like the fix might actually be in lib/eslint.js

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the line where the SourceCode instance is created: https://github.com/eslint/eslint/blob/master/lib/eslint.js#L817

And here's where the shebang comment is removed:
https://github.com/eslint/eslint/blob/master/lib/eslint.js#L903

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still confused. If ast is the same as was passed into SourceCode, shouldn't the change work correctly? Is this just a timing issue?

Copy link
Member Author

@kaicataldo kaicataldo Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right - sorry I'm not explaining this better. The SourceCode instance above takes the parsed ast and generates its own internal tokenAndCommentStore from it. Since this occurs before the shebang comment is removed, the SourceCode instance's tokenAndCommentStore still contains the shebang comment.

I think we should be able to do the shebang comment removal before creating the SourceCode instance - will have to figure out how to do it with the few forks of logic that happen there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some time to look at this and I think I've found a better solution. One of our rules, lines-around-directive, relies on the shebang comment being in sourceCode's tokensAndCommentsStore, so I need to just fix that rule and we should be good to go!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a bit of digging I'm realizing that including the shebang in the tokens and comments store might be intentional and actually a good thing. Despite not being attached to any nodes or included in the comments array of the AST, transforming it into a standard JS line comment and including it in the store allows us to use sourceCode.getTokenOrCommentBefore() in rules to write rules around the shebang.

Had some thoughts and will write them in a comment below.

@@ -133,6 +143,9 @@ function SourceCode(text, ast) {
this.getTokenOrCommentBefore = tokensAndCommentsStore.getTokenBefore;
this.getTokenOrCommentAfter = tokensAndCommentsStore.getTokenAfter;

this._getTokens = tokensAndCommentsStore.getTokens;
this._getCommentsStore = new WeakMap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just _commentStore?

@kaicataldo
Copy link
Member Author

kaicataldo commented Nov 4, 2016

Sure, sorry that wasn't clear.

Behavior and differences between sourceCode.getComments() and Espree's comment attachment

  • The new behavior for sourceCode.getComments() is to iterate over the token list and stop when it encounters a non-comment token (starting from the first token and checking before and the last token and checking after). This is different from the current behavior, as Espree collects comments as it parses and then attaches when it finishes a node. This means that comments can be attached across tokens (parentheses, operators) and leads to some unexpected behavior (see examples below).
  • It doesn't attach nodes that exist outside the range of the node's parent, as it should then be attached to that parent node (I didn't notice any differences between Espree and this PR for this behavior).

Examples:

foo /*comment*/ || /*comment*/ bar
var foo /*comment*/ = /*comment*/ bar;

Espree: In both examples above, the Identifierfoo has 1 trailing comment while bar has 2 leading comments.

sourceCode.getComments(): In both examples above, the Identifierfoo has 1 trailing comment and the Identifierbar has 1 leading comments.

function foo(/*asdf*/) {}
function foo(/*asdf*/bar) {}

Espree: In the first example above, the BlockStatement has a leading comment. In the second, the Identifier bar has a leading comment and the BlockStatement doesn't have any.

sourceCode.getComments(): The first example does not return any comments. I wasn't sure what the desired behavior would be here - should it attach the comment as a trailing comment when a function node's params is empty? In the second, the Identifier bar has a leading comment. The BlockStatement does have leading comments in either case.

Questions/Concerns

  • How should we handle shebang comments (since they shouldn't be included in the results for getComments()?
  • How should the comments in the second example (inside the parens of a function declaration without any parameters) be treated? This is a case where the model of leading/trailing comments doesn't make a lot of sense. I also don't think it makes sense for them to be attached to the function body as a leading comment (the current behavior).
  • How does everyone feel about the slightly changed (and more predictable) behavior described above? So far, it doesn't seem to affect any use cases in the ESLint codebase.

@kaicataldo kaicataldo removed the do not merge This pull request should not be merged yet label Nov 6, 2016
@kaicataldo kaicataldo changed the title WIP - Breaking: Calculate leading/trailing comments in sourceCode.getComments() Breaking: Calculate leading/trailing comments in sourceCode.getComments() Nov 8, 2016
@kaicataldo kaicataldo changed the title Breaking: Calculate leading/trailing comments in sourceCode.getComments() Calculate leading/trailing comments in sourceCode.getComments() Nov 8, 2016
@kaicataldo kaicataldo changed the title Calculate leading/trailing comments in sourceCode.getComments() Breaking: Calculate leading/trailing comments in sourceCode.getComments() Nov 8, 2016
@nzakas
Copy link
Member

nzakas commented Nov 9, 2016

Thanks, that's super helpful. I think the new behavior you've described makes a lot of sense, and have no objections to either dropping the comment without a node or the slightly changed attachment behavior of leading comments.

Copy link
Member

@platinumazure platinumazure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misunderstanding a few things, but hopefully this review will be of some use.

const code = [
"//#!/usr/bin/env node",
"var a;",
"// foo",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this comment count as trailing for var a; and leading for var b;?

I don't think this is a problem since people who want to iterate over all comments can just iterate over the comment store without worrying about attachment, but I also want to make sure I understand what's going on here.

Maybe some comments around the asserts would help? (E.g., assertCommentCount(1, 1)(node); // commented shebang, foo)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. This is the current behavior as defined by Espree's comment attachment. I was hoping we could discuss as a team to see what everyone thought about keeping the behavior as close to Espree as possible or if there were ideas for improvements, since this is already a breaking change. If you have any ideas, I'd love to discuss!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with the behavior-- I just wanted to confirm my understanding.

I'd love to see some comments in the tests themselves, so it's clear to people unfamiliar with comment attachment which comments are leading and trailing to what nodes. 90% of the time it's clear, but for the other 10%, it'd be nice to see documentation via comments.

eslint.on("Identifier", assertCommentCount(0, 0));

eslint.verify(code, config, "", true);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a test for a multiple-declarator VariableDeclaration, so we understand how the comment attachment is supposed to work there?

Example test case:

// Leading comment for VariableDeclaration?
var a,  // Trailing comment for VariableDeclarator? And/or leading for the next?
    b,  // Trailing comment for second VariableDeclarator?
    c;  // Trailing comment for VariableDeclaration?
// Trailing comment for VariableDeclaration?

"switch (foo)",
" //comment",
" /*another comment*/",
"}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this example even syntactically valid? I see a closing brace but not an opening brace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch 👍

@kaicataldo
Copy link
Member Author

kaicataldo commented Nov 17, 2016

Working on this has led me to have some questions around how we want to handle shebangs. Essentially, it seems like we actually probably want to keep the current behavior of not including shebangs in the AST's comments array and should not be included as a leading comment when we use sourceCode.getComments(). Please see this comment thread for more context.

The problem as it currently stands is that sourceCode.getComments() doesn't have a way of knowing if a LineComment token was a shebang or not (since it gets transformed into a standard JS LineComment prior to parsing). I think we do want to keep the behavior of transforming the shebang and keeping it in the tokens and comment store, as this allows rules to use sourceCode.getTokenOrCommentBefore() to get the token that represents the shebang.

It seems like we have a few ways forward, and I wanted to see what you all thought:

  • Right after parsing in lib/eslint.js, add a property (shebang: true?) to the LineComment token that represents the shebang and then filter it when calculating sourceCode.getComments()'s return value.
  • When sourceCode is instatiated, remove the shebang and store a property/add a helper method in sourceCode that allows access to the shebang token.
  • Simply remove the shebang token and make rules figure this out for themselves (maybe by checking the first line of the source code). This is my least favorite option, because it feels like something we should be able to provide.

Thoughts? Suggestions? Things I missed?

@platinumazure
Copy link
Member

platinumazure commented Nov 17, 2016

I'd vote for option 2:

  • When sourceCode is instatiated, remove the shebang and store a property/add a helper method in sourceCode that allows access to the shebang token.

This is similar to how we handle the byte-order mark (BOM).

I would be okay with option 1 (add shebang property to the comment token itself).

I would be opposed to option 3 (remove the comment entirely from SourceCode's store).

@mikesherov
Copy link
Contributor

This sounds correct to me too:

When sourceCode is instatiated, remove the shebang and store a property/add a helper method in sourceCode that allows access to the shebang token.

@kaicataldo kaicataldo force-pushed the getcomments branch 5 times, most recently from 7a64fdb to 970a009 Compare November 24, 2016 02:18
@kaicataldo
Copy link
Member Author

kaicataldo commented Nov 24, 2016

Updated - thoughts on this approach? I thought about it some more and am actually uncomfortable with removing it from the token list. Treating the shebang like a Line comment works for most cases - the cases that we don't cover are ones where the rule needs to know whether the line comment token it's checking represents a shebang or not.

This current iteration changes the type of the comment token to Shebang. Doing so will continue to allow rules to iterate over tokens (which seems the most likely way that rules will be checking this), as well as giving rules that use sourceCode.getTokenOrCommentBefore() an easy way to differentiate between regular JS Line comments and Shebang comments (token.type === "Shebang").

This could potentially break some rules that assume that the token.type === "Line" check will include shebang comments while iterating over the token list, but this seems like a pretty narrow use case and I think this gives rule writers greater control.

Thanks for all the input! If this doesn't seem like a good idea, I'm happy to continue exploring other options.

@eslintbot
Copy link

LGTM

@kaicataldo
Copy link
Member Author

@not-an-aardvark @btmills Thanks again for the thorough reviews! I have addressed all the comments (either with code changes or comments of my own). Please let me know what you think! Changes were made in the last two commits, so hopefully it's not too hard to re-review.

Copy link
Member

@btmills btmills left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaicataldo thanks for adding tests for those edge cases. I'm totally on board with #8408 and think that's the right direction to go. There's no perfect way to classify all comments as leading or trailing some particular node, but it looks like this does as good a job as we can hope for. LGTM :shipit:

@kaicataldo kaicataldo force-pushed the getcomments branch 2 times, most recently from 6274d06 to 9cd4e91 Compare April 5, 2017 18:30
@kaicataldo
Copy link
Member Author

kaicataldo commented Apr 5, 2017

Also, rebased and ran eslint-canary against this branch - not seeing any new unexpected errors! 🎉

Copy link
Member

@not-an-aardvark not-an-aardvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a slight nitpick. Thanks!

// Ignores shebangs
"#!/usr/bin/env node",
{ code: "#!/usr/bin/env node", options: ["always"] },
{ code: "#!/usr/bin/env node", options: ["never"] },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: All of the comments in these tests start with /, so the rule wouldn't report an error for them anyway. If the rule is refactored in the future, it might be useful to have tests for:

{ code: "#!foo", options: ["always"] }
{ code: "#!Foo", options: ["never"] }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - done!

@not-an-aardvark
Copy link
Member

Is the CLA bot down? It's still waiting for the status to be reported (and it hasn't left a comment)

@vitorbal
Copy link
Member

vitorbal commented Apr 6, 2017

@not-an-aardvark that happened to me a couple of days ago. I had to force push to trigger the bot again.

@eslintbot
Copy link

LGTM

@ilyavolodin ilyavolodin merged commit 867dd2e into master Apr 7, 2017
@kaicataldo kaicataldo deleted the getcomments branch April 7, 2017 02:37
@JamesHenry
Copy link
Member

Yayyyyy! What an epic PR 😄

Thanks so much for your work on this @kaicataldo! And to all the reviewers for their invaluable help.

@mikesherov
Copy link
Contributor

Congrats!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion archived due to age This issue has been archived; please open a new issue for any further discussion breaking This change is backwards-incompatible core Relates to ESLint's core APIs and features enhancement This change enhances an existing feature of ESLint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calculate leading/trailing comments