You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Assuming no side effects (e.g. capturing groups), duplicate alternatives can be removed without affecting the behavior of the pattern (negatively).
More specifically, given an alternation, let index(x) -> int be a function that returns the index of a given alternative of the alternation in its list of alternatives. If there exist two alternatives a and b such that index(a) < index(b), then b can be removed without effecting the pattern.
This optimization might even prevent very simple cases of exponential backtracking. E.g. ( "a" | "a" )+ "b".
Describe alternatives you've considered
A more complete solution would be to implement a more complete DFA-based solution to determine whether a given alternative is a subset of the union of all previous alternatives in the alternation. However, this is significantly more computationally expensive and does not support irregular features such as assertions.
Additional context
The regexp/no-dupe-disjunctions rule is an implementation of this optimization. This rule uses the DFA approach along if a simpler regex-comparison approach to determine duplicate alternatives.
The text was updated successfully, but these errors were encountered:
We need to make a list of all these optimizations.
Even if they don't get into the compiler immediately, they can be supported in the IDE as inspections.
Is your feature request related to a problem? Please describe.
Duplicate alternatives can happen when combining different variables, or simply by accident.
Example: A subset of C keyword with
while
appearing twice:Describe the solution you'd like
Assuming no side effects (e.g. capturing groups), duplicate alternatives can be removed without affecting the behavior of the pattern (negatively).
More specifically, given an alternation, let
index(x) -> int
be a function that returns the index of a given alternative of the alternation in its list of alternatives. If there exist two alternatives a and b such that index(a) < index(b), then b can be removed without effecting the pattern.This optimization might even prevent very simple cases of exponential backtracking. E.g.
( "a" | "a" )+ "b"
.Describe alternatives you've considered
A more complete solution would be to implement a more complete DFA-based solution to determine whether a given alternative is a subset of the union of all previous alternatives in the alternation. However, this is significantly more computationally expensive and does not support irregular features such as assertions.
Additional context
The
regexp/no-dupe-disjunctions
rule is an implementation of this optimization. This rule uses the DFA approach along if a simpler regex-comparison approach to determine duplicate alternatives.The text was updated successfully, but these errors were encountered: