-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breaking: Calculate leading/trailing comments in core #7516
Conversation
Thanks for the pull request, @kaicataldo! I took a look to make sure it's ready for merging and found some changes are needed:
Can you please update the pull request to address these? (More information can be found in our pull request guide.) |
@kaicataldo, thanks for your PR! By analyzing the history of the files in this pull request, we identified @btmills, @nzakas and @mysticatea to be potential reviewers. |
5779c43
to
3d684b4
Compare
7459bec
to
1c5ffa3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure what I should be reviewing here. Can you point out some areas where you'd like some comments? And can you explain what the differences are between what Espree does and what you're doing here?
lib/eslint.js
Outdated
@@ -908,6 +908,8 @@ module.exports = (function() { | |||
} | |||
} | |||
|
|||
ast.hasShebang = !!shebang; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this doesn't look like a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was what I was hoping to get some feedback on. The current behavior is to modify the shebang comment to be a normal JS line comment before parsing and then to remove the parsed comment from the top level comments
array as well as from the leadingComments
of the first node in the Program body after parsing has completed.
Now that we're calculating this on the fly, I need to figure out how getComments()
can know which token represents a shebang comment (if one exists) so that it doesn't include it. The challenge here is that once the shebang comment has been modified to be a normal JS line comment, there isn't reliable way of knowing if there is a shebang comment at the top of the file or not.
Is there any possibility of adding a property to the shebang comment token that we could check? Any other suggestions/ideas would be most welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think I'm missing something. If we're already removing the shebang comment from comments
, and getComments()
uses the comments
array to figure out which comments to return, wouldn't it automatically ignore shebang comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question - it's because the SourceCode
instance is created before we remove the shebang comment from the comments
array. So it seems like the fix might actually be in lib/eslint.js
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the line where the SourceCode
instance is created: https://github.com/eslint/eslint/blob/master/lib/eslint.js#L817
And here's where the shebang comment is removed:
https://github.com/eslint/eslint/blob/master/lib/eslint.js#L903
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still confused. If ast
is the same as was passed into SourceCode
, shouldn't the change work correctly? Is this just a timing issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right - sorry I'm not explaining this better. The SourceCode
instance above takes the parsed ast
and generates its own internal tokenAndCommentStore
from it. Since this occurs before the shebang comment is removed, the SourceCode
instance's tokenAndCommentStore
still contains the shebang comment.
I think we should be able to do the shebang comment removal before creating the SourceCode
instance - will have to figure out how to do it with the few forks of logic that happen there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some time to look at this and I think I've found a better solution. One of our rules, lines-around-directive
, relies on the shebang comment being in sourceCode
's tokensAndCommentsStore
, so I need to just fix that rule and we should be good to go!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a bit of digging I'm realizing that including the shebang in the tokens and comments store might be intentional and actually a good thing. Despite not being attached to any nodes or included in the comments
array of the AST, transforming it into a standard JS line comment and including it in the store allows us to use sourceCode.getTokenOrCommentBefore()
in rules to write rules around the shebang.
Had some thoughts and will write them in a comment below.
lib/util/source-code.js
Outdated
@@ -133,6 +143,9 @@ function SourceCode(text, ast) { | |||
this.getTokenOrCommentBefore = tokensAndCommentsStore.getTokenBefore; | |||
this.getTokenOrCommentAfter = tokensAndCommentsStore.getTokenAfter; | |||
|
|||
this._getTokens = tokensAndCommentsStore.getTokens; | |||
this._getCommentsStore = new WeakMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just _commentStore
?
Sure, sorry that wasn't clear. Behavior and differences between
|
Thanks, that's super helpful. I think the new behavior you've described makes a lot of sense, and have no objections to either dropping the comment without a node or the slightly changed attachment behavior of leading comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be misunderstanding a few things, but hopefully this review will be of some use.
tests/lib/util/source-code.js
Outdated
const code = [ | ||
"//#!/usr/bin/env node", | ||
"var a;", | ||
"// foo", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this comment count as trailing for var a;
and leading for var b;
?
I don't think this is a problem since people who want to iterate over all comments can just iterate over the comment store without worrying about attachment, but I also want to make sure I understand what's going on here.
Maybe some comments around the asserts would help? (E.g., assertCommentCount(1, 1)(node); // commented shebang, foo
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. This is the current behavior as defined by Espree's comment attachment. I was hoping we could discuss as a team to see what everyone thought about keeping the behavior as close to Espree as possible or if there were ideas for improvements, since this is already a breaking change. If you have any ideas, I'd love to discuss!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with the behavior-- I just wanted to confirm my understanding.
I'd love to see some comments in the tests themselves, so it's clear to people unfamiliar with comment attachment which comments are leading and trailing to what nodes. 90% of the time it's clear, but for the other 10%, it'd be nice to see documentation via comments.
eslint.on("Identifier", assertCommentCount(0, 0)); | ||
|
||
eslint.verify(code, config, "", true); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a test for a multiple-declarator VariableDeclaration, so we understand how the comment attachment is supposed to work there?
Example test case:
// Leading comment for VariableDeclaration?
var a, // Trailing comment for VariableDeclarator? And/or leading for the next?
b, // Trailing comment for second VariableDeclarator?
c; // Trailing comment for VariableDeclaration?
// Trailing comment for VariableDeclaration?
tests/lib/util/source-code.js
Outdated
"switch (foo)", | ||
" //comment", | ||
" /*another comment*/", | ||
"}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this example even syntactically valid? I see a closing brace but not an opening brace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch 👍
Working on this has led me to have some questions around how we want to handle shebangs. Essentially, it seems like we actually probably want to keep the current behavior of not including shebangs in the AST's The problem as it currently stands is that It seems like we have a few ways forward, and I wanted to see what you all thought:
Thoughts? Suggestions? Things I missed? |
I'd vote for option 2:
This is similar to how we handle the byte-order mark (BOM). I would be okay with option 1 (add I would be opposed to option 3 (remove the comment entirely from SourceCode's store). |
This sounds correct to me too:
|
7a64fdb
to
970a009
Compare
Updated - thoughts on this approach? I thought about it some more and am actually uncomfortable with removing it from the token list. Treating the shebang like a This current iteration changes the This could potentially break some rules that assume that the Thanks for all the input! If this doesn't seem like a good idea, I'm happy to continue exploring other options. |
970a009
to
be5df78
Compare
LGTM |
@not-an-aardvark @btmills Thanks again for the thorough reviews! I have addressed all the comments (either with code changes or comments of my own). Please let me know what you think! Changes were made in the last two commits, so hopefully it's not too hard to re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaicataldo thanks for adding tests for those edge cases. I'm totally on board with #8408 and think that's the right direction to go. There's no perfect way to classify all comments as leading or trailing some particular node, but it looks like this does as good a job as we can hope for. LGTM
6274d06
to
9cd4e91
Compare
Also, rebased and ran eslint-canary against this branch - not seeing any new unexpected errors! 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a slight nitpick. Thanks!
// Ignores shebangs | ||
"#!/usr/bin/env node", | ||
{ code: "#!/usr/bin/env node", options: ["always"] }, | ||
{ code: "#!/usr/bin/env node", options: ["never"] }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: All of the comments in these tests start with /
, so the rule wouldn't report an error for them anyway. If the rule is refactored in the future, it might be useful to have tests for:
{ code: "#!foo", options: ["always"] }
{ code: "#!Foo", options: ["never"] }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call - done!
Is the CLA bot down? It's still waiting for the status to be reported (and it hasn't left a comment) |
@not-an-aardvark that happened to me a couple of days ago. I had to force push to trigger the bot again. |
LGTM |
Yayyyyy! What an epic PR 😄 Thanks so much for your work on this @kaicataldo! And to all the reviewers for their invaluable help. |
Congrats! |
Fixes #6724
What is the purpose of this pull request? (put an "X" next to item)
[ ] Documentation update
[ ] Bug fix (template)
[ ] New rule (template)
[ ] Changes an existing rule (template)
[ ] Add autofixing to a rule
[ ] Add a CLI option
[X] Add something to the core
[ ] Other, please explain:
What changes did you make? (Give an overview)
This PR turns off comment attachment in Espree and moves comment getting logic into
sourceCode.getComments()
. This is a breaking change.Is there anything you'd like reviewers to focus on?
Would love suggestions for how else we might be able to handle shebang comments.
As discussed on the corresponding issue, we should discuss if we want to continue thinking about comments the same way now that we're not attaching at the parser level. This PR mimics the current attachment strategy in Espree as close as it can, though it's not possible (nor do I think we want it to be) exactly the same as it is in Espree, because there are some unpredictable edge cases and bugs in that.
The version in this PR should essentially work the same for our users and ecosystem (unless they rely on some of the really weird edge cases mentioned before).