-
Notifications
You must be signed in to change notification settings - Fork 60
/
TODO.txt
81 lines (71 loc) · 2.42 KB
/
TODO.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
* To compile r{i,j} we need a sequence that does not match epsilon
(or a constructor around an expression telling that this expression
does not match epsilon)
* A subexpression repeated by an asterisk ( '*' ) or an interval
expression shall not match a null expression unless this is the only
match for the repetition or it is necessary to satisfy the exact or
minimum number of occurrences for the interval expression.
* There might be a typo in deriv_1/delta_1: should we generate 'TMatch
mark' or 'TMatch mark'? (neither is correct!)
POSIX:
"(a?)*" "b" ""
"(a?)*" "ab" "a"
"((a)|(b))*" "ab" -> "b" none "b"
Str
"(a?)*" "b" no submatch
"(a?)*" "ab" "a"
"((a)|(b))*" "ab" -> "b" "a" "b"
Javascript
"(a?)*" "b" no submatch
"(a?)*" "ab" "a"
"((a)|(b))*" "ab" -> "b" none "b"
PCRE
"(a?)*" "b" ""
"(a?)*" "ab" ""
"(a?)*?" "b" ""
"(a?)*?" "ab" "a"
"((a)|(b))*" "ab" -> "b" "a" "b"
Emacs
"(a?)*" "b" ""
"(a?)*" "ab" ""
"(a?)*?" "b" ""
"(a?)*?" "ab" "a"
"((a)|(b))*" "ab" -> "b" "a" "b"
r{0,0} = eps
r{i+1,j+1} = r,r{i,j}
r{0,j+1} = r,r{0,j} | eps PCRE/Emacs
r{0,j+1} = (r-eps},r{0,j} | eps JavaScript
* Rewrite sequences of sequences when possible...
High priority
=============
* Improve the Perl regular expressions parser
* Character classes (in the three regular expression parsers)
* Reduce memory usage
- More compact representation of character sequences
- Special notation for "anything but this set of characters"
(more generally, optimize the compilation of regular expressions)
* Simple optimisations
- alt containing alt
- epsilon elimination
- Seq (Seq (x,y), z) => Seq (x, Seq (y, z)) under some circumstances
(x or y has a fixed length)
...
* Test suite
Medium priority
===============
* Implement back-references
* Implement look-ahead and look-behind assertions
Low priority
============
* Optimize the main loop for processor that are not register starved
* Rewrite the main loops in C
(but keep the option to compile a pure OCaml version)
* Limit the size of the cached DFAs by removing states that have not
been used recently
* Documentation
Other ideas
===========
* It would be great to have a more generic interface (parameterized
over some abstract tokens).
* Compile checked printers parameterized over match groups (DRY for
literal subexpressions)