-
Notifications
You must be signed in to change notification settings - Fork 7
/
Background.txt
93 lines (84 loc) · 4.57 KB
/
Background.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
From [email protected] Sat Jul 28 16:32:30 2018
X-Envelope-From: [email protected]
Return-Path: <[email protected]>
Received: from minnie.tuhs.org (minnie.tuhs.org [45.79.103.53])
by freefriends.org (8.14.9/8.14.9) with ESMTP id w6SMWSOV015446;
Sat, 28 Jul 2018 16:32:28 -0600
Received: by minnie.tuhs.org (Postfix, from userid 112)
id ADC0EA1B34; Sun, 29 Jul 2018 08:32:26 +1000 (AEST)
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on minnie.tuhs.org
X-Spam-Level:
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED,
SPF_HELO_PASS autolearn=unavailable autolearn_force=no version=3.4.1
Received: from minnie.tuhs.org (localhost [127.0.0.1])
by minnie.tuhs.org (Postfix) with ESMTP id 530659ED45;
Sun, 29 Jul 2018 08:32:05 +1000 (AEST)
X-Original-To: [email protected]
Delivered-To: [email protected]
Received: by minnie.tuhs.org (Postfix, from userid 112)
id 395969ED57; Sun, 29 Jul 2018 08:31:41 +1000 (AEST)
Received: from mail.cs.Dartmouth.EDU (mail.cs.dartmouth.edu [129.170.212.100])
by minnie.tuhs.org (Postfix) with ESMTPS id D91C39ED45
for <[email protected]>; Sun, 29 Jul 2018 08:31:40 +1000 (AEST)
Received: from tahoe.cs.Dartmouth.EDU (tahoe.cs.dartmouth.edu [129.170.212.20])
by mail.cs.Dartmouth.EDU (8.15.2/8.15.2) with ESMTPS id w6SMVdah018443
(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO)
for <[email protected]>; Sat, 28 Jul 2018 18:31:39 -0400
Received: from tahoe.cs.Dartmouth.EDU (localhost.localdomain [127.0.0.1])
by tahoe.cs.Dartmouth.EDU (8.15.2/8.14.3) with ESMTP id w6SMVdld097679
for <[email protected]>; Sat, 28 Jul 2018 18:31:39 -0400
Received: (from doug@localhost)
by tahoe.cs.Dartmouth.EDU (8.15.2/8.15.2/Submit) id w6SMVcWT097678
for [email protected]; Sat, 28 Jul 2018 18:31:38 -0400
From: Doug McIlroy <[email protected]>
Message-Id: <[email protected]>
Date: Sat, 28 Jul 2018 18:31:38 -0400
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Subject: Re: [TUHS] Doug McIlroy's C++ regular expression library (mostly)
revived
X-BeenThere: [email protected]
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: The Unix Heritage Society mailing list <tuhs.minnie.tuhs.org>
List-Unsubscribe: <https://minnie.tuhs.org/cgi-bin/mailman/options/tuhs>,
<mailto:[email protected]?subject=unsubscribe>
List-Archive: <http://minnie.tuhs.org/pipermail/tuhs/>
List-Post: <mailto:[email protected]>
List-Help: <mailto:[email protected]?subject=help>
List-Subscribe: <https://minnie.tuhs.org/cgi-bin/mailman/listinfo/tuhs>,
<mailto:[email protected]?subject=subscribe>
Errors-To: [email protected]
Sender: "TUHS" <[email protected]>
Why would anyone be interested in an old regex package that never was
a part of any Unix distro?
The driving force was Posix, whose regex spec was quite inscrutable. Could
there be a reference implementation? It was easy to fool every
implementation I could get my hands on, including Gnu's over-the-top
9000-line implementation.
But as I got into it, I got fascinated by regexes per se. In making a
recognizer, there's a tradeoff between contruction time and execution
time. Linear execution can be achieved, but at a potentially exponential
cost in construction time (and space). Backreferencing takes the regex
languages out of the class of regular languages.
Recalling that regular languages are closed under intersection and
negation, I wondered about how to implement new regex operators, &
and -. I came up with a scheme for this optional non-Posix feature that
involved layering continuation-passing over more traditional methods. And
while I was at it, I broke out smaller sublanguages for special treatment
(as does Gnu), all the way down to Knuth-Morris-Pratt for expressions
in which the only operation is catenation.
And finally, having followed the development of C++ from its infancy,
I wanted to try out its new template facility, so there's a bit of
that in the package, too. Arnold has discovered that not only has C++
evolved, but also that without the discipline of -Wall to force clean
code, I was rather cavalier about casting, both explicitly and implicitly.
The only real customer the code ever had was the AST project, which
translated it to C. After the C++ had sat idle for a half-dozen years, I
thought to revive it in Linux, but found it riddled with incompatibilities
with that new environment and gave up. Arnold deserves a citation for
bravery in pushing that through 15 years further on.
Doug