Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize favors shortness over readability #208

Open
fregante opened this issue May 4, 2020 · 3 comments
Open

optimize favors shortness over readability #208

fregante opened this issue May 4, 2020 · 3 comments

Comments

@fregante
Copy link

fregante commented May 4, 2020

I use regexp-tree via eslint-plugin-unicorn/better-regex and found that some optimizations lead to less-readable output.

Similarly to #135, I'd argue that readability and shortness might not always overlap. I found this specific example, outputted by regexp-tree

/^\/\/\*\??/

Can you guess what the expected match looks like without spending a couple seconds on the zigzag sequence?

This was written as:

/^[/][/][*][?]?/

Which IMHO is slightly more scannable/readable.

My guess is that this is caused by charClassToSingleChar, which makes sense for [\d] -> \d but worsens what I just showed.

@DmitrySoshnikov
Copy link
Owner

DmitrySoshnikov commented May 9, 2020

@fregante, thanks for the report. Yes, I see how optimizer may lead to less readable outputs, however, the resulting regexp object in this case is still would be more optimized from the regexp engine perspective (handling simple sequence should be faster in general, than handling a sequence of character classes).

To support your use case specifically, optimizer has both, whitelist and blacklist. You can blacklist the charClassToSingleChar in case.

@fregante
Copy link
Author

fregante commented May 9, 2020

Ideally though at linting time I get both [\d] -> \d and \/ -> [/] (I’d like to see this specifically) because they optimize for readability.

Then at build/minification time you could apply any changes that conflict with readability but improve speed and size.

The first would be called by eslint and the second one would be called by Webpack, terser etc.

Since optimize is already customizable this would probably mean:

  • splitting charClassToSingleChar into 2 rules to allow \/ -> [/]
  • introducing config groups a-la “eslint:recommended”, for example whitelist: ['optimize:minimize'] and whitelist: ['optimize:readability']

@fregante
Copy link
Author

fregante commented Jun 12, 2020

Another everyday example:

/http:\/\/[^/]+\/pull\/commits/gi

👆 Readable

/ht{2}p:\/{2}[^/]+\/pul{2}\/com{2}its/gi

👆 Nonsense, longer

Strangely this only happens if I include that [^/]+\/. If it's not there, it's not "optimized"


Tested on RunKit

var regexpTree = require("regexp-tree")
regexpTree.optimize(/http:\/\/[^/]+\/pull\/commits/gi).toRegExp()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants