Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lint rule: Unicode characters in JS should be escaped with ASCII #732

Open
samreid opened this issue Jan 4, 2019 · 9 comments
Open

Lint rule: Unicode characters in JS should be escaped with ASCII #732

samreid opened this issue Jan 4, 2019 · 9 comments
Assignees

Comments

@samreid
Copy link
Member

samreid commented Jan 4, 2019

From phetsims/wave-interference#285 (comment). Sims should use ASCII. When unicode is needed, it should be escaped.

// Copyright 2018, University of Colorado Boulder

/**
 * Lint detector for invalid text based on regex search.  Checks the entire file and does not correctly report line number.
 *
 * @author Sam Reid (PhET Interactive Simulations)
 */
module.exports = function( context ) {
  'use strict';

  var badRegexes = [
    /[^\u0000-\u007F]+/
  ];

  return {
    Program: function( node ) {
      var sourceCode = context.getSourceCode();
      var text = sourceCode.text;
      badRegexes.forEach( function( badRegex ) {
        if ( badRegex.test( text ) ) {
          context.report( {
            node: node,
            message: 'File contains bad regex: \'' + badRegex + '\''
          } );
        }
      } );
    }
  };
};

module.exports.schema = [
  // JSON Schema for rule options goes here
];
@samreid
Copy link
Member Author

samreid commented Sep 10, 2019

Here's a stack overflow page that may help:
https://stackoverflow.com/questions/150033/regular-expression-to-match-non-ascii-characters

@ariel-phet can this be completed by another developer?

@ariel-phet
Copy link
Contributor

Going to look for a volunteer/voluntold at dev meeting

@jonathanolson
Copy link
Contributor

Is this something developers feel strongly about? Our minifier just unescapes those and puts them in raw unicode in our built files, but if people prefer the limit than I wouldn't object to it.

@jbphet
Copy link
Contributor

jbphet commented Sep 12, 2019

Keep in mind that there are probably all sorts of unescaped unicode characters in the string translation files (in the babel repo), so these should be excluded from any sort of check like this. We should decide if the English strings need to adhere to this rule and, if so, we may need to check them a bit differently since they are in a JSON file.

@samreid
Copy link
Member Author

samreid commented Sep 12, 2019

I believe our lint tools are set up around JS at the moment.

@zepumph
Copy link
Member

zepumph commented Sep 12, 2019

Keep in mind that there are probably all sorts of unescaped unicode characters in the string translation files (in the babel repo), so these should be excluded from any sort of check like this. We should decide if the English strings need to adhere to this rule and, if so, we may need to check them a bit differently since they are in a JSON file.

Presumably this would only need to apply to js code, I'm not sure JSON is linted.

It seems like this was decided in phetsims/wave-interference#285 (comment). I am totally fine with this restriction, and it makes sense, but @pixelzoom do you mind explaining your reasoning for that post.

Regardless of the need, I have the following thoughts on the rule itself:

  • I don't like that this is copying a bad precedent from bad-text.js in which we don't report the actual line number of the failure. We should figure out how to do that in our lint rules and do it. (@pixelzoom edit: see report line numbers in custom lints rules #791)
  • When reporting, printing the following regex which is a range of unicode chars is largely useless to me. Could we print just one char, or the regex and the errored string? /[^\u0000-\u007F]+/

@pixelzoom
Copy link
Contributor

@zepumph asked:

It seems like this was decided in phetsims/wave-interference#285 (comment). I am totally fine with this restriction, and it makes sense, but @pixelzoom do you mind explaining your reasoning for that post.

See phetsims/wave-interference#285 (comment) in that same issue.

@Denz1994
Copy link
Contributor

Discussed at dev meeting: 09/19/19

Devs don't mind a new rule to handle this. Let’s make the lint rule, but if it takes too much effort we can bail. @jessegreenberg would take care of making this rule.

@zepumph
Copy link
Member

zepumph commented Sep 19, 2019

@jessegreenberg please note that a new lint rule is not needed here. getBadTextTester supports regexp, see the commented out usage in bad-sim-text.js. This was added in #728, and will have a usage after I sort out #737.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants