Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report line numbers in custom lints rules #791

Closed
pixelzoom opened this issue Sep 19, 2019 · 19 comments
Closed

report line numbers in custom lints rules #791

pixelzoom opened this issue Sep 19, 2019 · 19 comments

Comments

@pixelzoom
Copy link
Contributor

pixelzoom commented Sep 19, 2019

@zepumph said in #732 (comment):

I don't like that this is copying a bad precedent from bad-text.js in which we don't report the actual line number of the failure. We should figure out how to do that in our lint rules and do it.

First step is probably to make a list of which lint rules have this problem.

@jessegreenberg
Copy link
Contributor

@jessegreenberg is going to take a look at this as part of #732.

@jessegreenberg
Copy link
Contributor

jessegreenberg commented Sep 20, 2019

This is an issue for rules that search for issues through the whole text the context SourceCode. Looking through rules, it looks like getBadTextTester is the only rule implementation that does this.

One way to fix this is to check for bad text on all Tokens of the SourceCode. This would include everything but be really inefficient. Another option would be to check for bad text on all Esprima Nodes. A little more efficient but we need to make a list of all Nodes (listed here) that could have values with text and check those. There is a lot of them. And a list may not include everything.

Ill go with Tokens and see how slow it is.

@zepumph
Copy link
Member

zepumph commented Sep 20, 2019

If performance is a concern, we could do the same checking as we do now to find a problem, and then dive in only in bad files to determine the right LOC.

@jessegreenberg
Copy link
Contributor

jessegreenberg commented Sep 20, 2019

I ran grunt lint-everything --disable-eslint-cache before and after ff56fb0 and timed it.

On my machine the test took 1 minute 11 seconds with token method and 1 minute with the source code search method. 10 seconds is not a problem for me, though others may disagree. It probably won't be noticeable when running grunt lint for a single runnable.

I also noticed that after ff56fb0 it is catching a few more usages of Math.round, I am not sure why.

jessegreenberg added a commit that referenced this issue Sep 20, 2019
samreid added a commit to phetsims/wave-interference that referenced this issue Sep 23, 2019
@samreid
Copy link
Member

samreid commented Sep 23, 2019

I addressed the issues with Math.round in the preceding commit.

@zepumph
Copy link
Member

zepumph commented Oct 11, 2019

@jessegreenberg I think it is important for this issue to also delete all the places on the top of the file where /* eslint-disable bad-sim-text */ because those should be moved to the direct lines that they effect now. I see 12 usages.

@zepumph
Copy link
Member

zepumph commented Oct 11, 2019

I think that after ff56fb0, can only support word characters. I'm not totally sure, but adding '_.extend' to bad-sim-text only yielded 6 errors. Likely because that is a collection of tokens instead of just encapsulated in a single token?

I came to this issue while working on phetsims/phet-core#71

@zepumph
Copy link
Member

zepumph commented Oct 11, 2019

I'm working on a potential solution. will comment shortly

@zepumph
Copy link
Member

zepumph commented Oct 11, 2019

Over in 1fcccd2, I created a new lint rule, which is largely experimental, called no-extend. In that it uses a similar algorithm as other bad-text rules, but takes an array of code "tokens". This is really helpful for a lot of our lint rules that are not just words. I think that it may be nicest if getBadTextTester can take an array of three potential types of args: strings, objects (to support regex, see #737), and string[], where the string array is a list of tokens from the node.

@jessegreenberg I really appreciate the work you put into this type to convert it to using tokens. It was really helpful as a jumping off point for no-extend. If you would like I can take this issue over as I continue to work on #737 and phetsims/phet-core#71. It may be helpful to have one developer bring all this together. What do you think?

@zepumph
Copy link
Member

zepumph commented Oct 25, 2019

In the above commits I added a bit more support for bad text that spreads across multiple tokens. I did this by formalizing a schema that takes a string as well as a tokenized list of code parts that can be used while iterating through all the code tokens in a file. I also improved performance by testing to see if the source-code text included the string version of the bad-text before diving into iterating through tokens. This exposed 3 lint errors that crept in in the last few weeks. I added workarounds to them all, but think that phetsims/number-line-integers#37 will likely need to be fixed instead of ignored. @jessegreenberg would you please review my commits, especially for documentation and general understanding. I want it to both be powerful as well as easy to understand, and I don't yet know where the code fits on that scale just yet.

My next steps for lint rules will be

@zepumph
Copy link
Member

zepumph commented Oct 31, 2019

@jessegreenberg
Copy link
Contributor

I also improved performance by testing to see if the source-code text included the string version of the bad-text before diving into iterating through tokens.

OK cool, very good!

I don't think I understand how the updated getBadTextTester works yet. One thing I noticed is that codeTokensArray is not an array of esprima Tokens, is that correct? It is an array of sub strings? How does combinedIndex work, what is the relation of the Token index to the index of the substring we are searching for?

I think that the direction listed #791 (comment) and in #808 is correct.

@jessegreenberg
Copy link
Contributor

It doesn't look like this is working quite yet. In phetsims/energy-skate-park#140 (comment) a few cases of () => { return ... were found, but the lint rule is reporting the first line of code rather than the line of the error.

@zepumph
Copy link
Member

zepumph commented Nov 2, 2019

Yes that rule is currently not supported with line numbers. I think we will need to create a custom lint rule just for it in order to get it right. The reason is that codeTokens just look at the list of esprima node values, and so there is no different between:

() => {
   return . . .
}

and () => { return . . . }

I added a "workaround" just for that rule, in which is will still error out globally, with a bad line number. That felt better than nothing.

@zepumph zepumph assigned jessegreenberg and unassigned zepumph Nov 2, 2019
@samreid
Copy link
Member

samreid commented Nov 4, 2019

For the workaround cases, would it be better to just text search the file line by line (ignoring the parsed esprima node structure)?

@jessegreenberg
Copy link
Contributor

I tried this in testBadText:

        // If marked as global, then report the global failure based on the source code indexOf check
        if ( forbiddenText.global ) {

          for ( let i = 0; i < codeLines.length; i++ ) {
            const columnIndex = codeLines[ i ].indexOf( forbiddenText.id );
            if ( columnIndex >= 0 ) {

              // lines are 1 based, codeLines array is 0 based
              const badLine = i + 1;

              // esprima Token loc object, see https://esprima.readthedocs.io/en/latest/lexical-analysis.html
              const loc = {
                start: { line: badLine, column: columnIndex },
                end: { line: badLine, column: columnIndex + forbiddenText.id.length }
              };

              context.report( {
                node: node,
                loc: loc,
                message: 'File contains bad text: \'' + forbiddenText.id + '\''
              } );

              break;
            }
          }
        }

Where codeLines is sourceCode.lines.

@zepumph do you think that this could be used for all bad text and replace the method of looking through tokens?

@zepumph
Copy link
Member

zepumph commented Nov 4, 2019

That looks promising. I'll try to take a look this Thursday at it.

@zepumph
Copy link
Member

zepumph commented Nov 7, 2019

@jessegreenberg your code snippet in #791 (comment) was very helpful, and it simplified a bunch of code. Here is now the basic alrgorithm:

  • In most cases supply a string and it will search line by line for the presence of that string.
  • If you supply code tokens, then it will only look for that item in code, and not line by line. I thought about removing this feature, but there are a fair number of Math.random() calls in documentation, and felt like for the most part, there is a class of bad-text which really only applies to code, so I think it is best to keep the codeToken logic. Let me know if you disagree.
  • Before any work is done, it checks on the presence of the id within the whole file, and if it doesn't find it, then it doesn't go any further (performance optimization).

Because of your code snippet, much of the api around ForbiddenTextObject could be simplified, and if this was the final bad-text issue, I may have just tried to get rid of it altogether, but I wanted to keep this flexibility for future work, especially in supporting regex over in #737.

I still have one lint rule commented out until phetsims/energy-skate-park#140 is fixed. Please review. Anything else here?

@jessegreenberg
Copy link
Contributor

but there are a fair number of Math.random() calls in documentation, and felt like for the most part, there is a class of bad-text which really only applies to code,

I hadn't thought of that and I agree, good idea, thanks.

This looks nice and it is highlighting the location of all bad text in my editor and in the console. I can quickly address phetsims/energy-skate-park#140 now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants