-
Notifications
You must be signed in to change notification settings - Fork 7
/
chezweb.w
2186 lines (1910 loc) · 87.7 KB
/
chezweb.w
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% \input graphicx
\input supp-pdf
\def\epsfbox#1{\hbox{\convertMPtoPDF{#1}{1}{1}}}
\def\ChezWEB{Chez\.{WEB}}
\def\CWEB{\.{CWEB}}
\def\WEB{\.{WEB}}
\def\title{ChezWEB (Version 2.0)}
\def\topofcontents{\null\vfill
\centerline{\titlefont ChezWEB: Hygienic Literate Programming}
\vskip 15pt
\centerline{(Version 2.0)}
\vfill}
\def\botofcontents{\vfill
\noindent
Copyright $\copyright$ 2012 Aaron W. Hsu \.{[email protected]}
\smallskip\noindent
Permission to use, copy, modify, and distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
\smallskip\noindent
THE SOFTWARE IS PROVIDED ``AS IS'' AND THE AUTHOR DISCLAIMS ALL
WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE
AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL
DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THIS SOFTWARE.
}
@* Introduction. This document describes the implementation of the
\ChezWEB\ system of documentation and programming. It is modeled
closely after the \WEB\
%\footnote{Knuth, ``WEB.''}
and \CWEB\
%\footnote{Author, ``\CWEB\.''}
systems. It allows a Chez Scheme programmer to write programs of
higher quality and better documentation. It produces print-quality
output for documents, and delivers programming convenience for doing
annotations and programming illustration. It also implements a new and
novel concept in Literate Programming: Hygienic LP. Hygienic literate
programming enables the programmer to explicitly control the
visibility of variables inside and outside of a chunk. This provides
the user with a cleaner and more reliable \WEB\ than would otherwise
be had. Notably, it means that you never have to worry about variables
accidently capturing or overwriting ones that you have used
internally. It also encourages a cleaner approach to code reuse, and
discourages hacks to get around the disadvantages of traditional
literate programming.
@^traditional literate programming@>
@ Readers of this program are expected to be familiar with the
\ChezWEB\ guide, as this program does not discuss the language at an
user level in any detail. Instead, we concern ourselves here chiefly
with the implementation of the \WEB\ system itself.
@ Some time ago, the now Professor Emeritus of the Art of Computer
Programming Donald E. Knuth began writing programs. Sometime after
that, he began to construct programs in a new manner. This manner, he
documented and labeled ``Literate Programming.'' In Professor Knuth's
vision, a program is not constructed to be read by the machine, but
rather, to be read as a pleasant book is read, to be read by the
human. In this way, one constructs and builds the pieces of a program
together, as you might build up the necessary elements of math,
surrounding them with exposition, and ordering them in the manner that
best reveals the program's working and meaning to the reader.
This somewhat radical approach to programming leads to a drastically
different perspective on how to write programs. Indeed, I feel that
writing my programs in a literate style has greatly improved my
ability to maintain and improve these same programs, and moreover, to
understand these programs as I am writing them. I enjoy writing and
seeing the results of my writing, both in a printed or screen-readable
form, as well as in a machine executable form.
While I profess no particular skill in either writing or programming,
I do profess to enjoy both. This dual enjoyment is a
necessary condition for good programs, and is especially important in
literate programming, because it exposes your thoughts in two ways.
This enforced discipline can be embarrassing at times, but inevitably
leads to a better programmer.
\ChezWEB\ is my attempt at bringing the \WEB\ system of documentation
to the Schemer's world, to improve its usability and the reliability
of code that is written in a literate style. It is far from perfect,
but I hope that those who use it find it both appealing and efficient
for delivering programs of higher quality, that others can read and
understand more easily, and that can stand the rigors of many eyes and
fingers working over the document.
@ The current version of \ChezWEB\ is 2.0.
@p
(define (display-chezweb-version tangle/weave)
(printf "This is ~a, ChezWEB Version 2.0.~n" tangle/weave))
@* 2 The ChezWEB System. We divide the \ChezWEB\ system into two primary
parts: the runtime elements, which are in charge of handling hygienic
guarantees, and the main program logic, which contains all the code for
dealing directly with webs. Both of these modules are described in this
document. Neither the runtime nor the web handling code is particularly
useful to the end user, so we encapsulate the runtime into a library,
and we provide access to the weave and tangle functionality of the web
handling code through two programs {\tt chezweave.ss} and {\tt
cheztangle.ss}. Thus, we have the following diagram, which illustrates
the relationship and dependencies of the various files that will be
produced by tangling this web.
@.chezweave@>
@.cheztangle@>
%$$\includegraphics[width=4in]{chezweb-5.eps}$$
$$\epsfbox{chezweb.5}$$
The runtime system is actually used by the tangle program when tangling
programs, as {\tt cheztangle} will embed the runtime into the tangled
code. This means that code produced by \ChezWEB\ is self-contained, and
does not require any additional libraries. We generate the runtime code
and library as separate entities specifically because programmers may
want to use them directly in their programs, outside of \ChezWEB{}.
We generate a single {\tt chezweb.ss} file for both the weaving and the
tangling in order to share common code between the two. We could have
generated a third {\tt common} file, but this actually makes things more
complicated, and it is easier to just use the same code base for the
tangle and weave programs. This makes the {\tt cheztangle.ss} and {\tt
chezweave.ss} programs quite small in themselves, with the bulk of their
logic and code inside of the {\tt chezweb.ss} file.
Naturally, if you are working with \ChezWEB{}, you should not be
developing on the tangled code, but should be working directly from the
web file.
@* Control Codes Cheat Sheet. This section describes in brief the
function and syntax of every control code. It does not go into detail,
but it is meant to be used as a general reference point for developers who
are familiar with the overall system, and want an at-a-glance picture of
the \ChezWEB\ language.
@^Cheat Sheet@>
%$$\includegraphics[width=6in]{chezweb-3.eps}$$
$$\epsfbox{chezweb.3}$$
{\parindent = 0.5in
\item{\.{@@\ }} Start a new normal section.
\item{\.{@@*}} New starred section, listed in table of contents.
\item{\.{@@p}} Begin top-level program code.
\item{\.{@@(}} Output code to a separate file.
\item{\.{@@<}} Define a named section code chunk.
\item{\.{@@>}} Delimit/end index codes or a section reference inside of a
code body.
\item{\.{@@>=}} Delimit the name of a section you are defining.
\item{\.{@@c}} Before a named section, list the captures and exports of
that section.
\item{\.{@@q}} A comment, only shows in source.
\item{\.{@@i}} Include the contents of another file.
\item{\.{@@\^}} Index an entry in roman type.
\item{\.{@@:}} Index an entry using $\.{\\9}$, default is |@@:blah}{code@@>|
which typesets |blah| as |code| in the index.
\item{\.{@@.}} Index an entry in $\.{typewriter}$ type.
\par}
@* The ChezWEB Runtime. Normal \CWEB\ programs do not have any
runtime, and they operate completely at the equivalent of a macro
@^runtime@>%
expansion phase before the C preprocessor runs. This is also how
systems like {\tt noweb} and others work. All of these systems lack
the hygiene properties that we want to preserve in a Scheme program,
especially as they relate to anything that might resemble macros.
In order to preserve hygiene in our system, we rely on the Scheme
macro system to do the hard lifting for us. This means that we have to
leave some code around that the macro system can use to do the work we
want it to. This is the \ChezWEB\ runtime. In point of fact, the
runtime will not remain at the actual runtime of the code, but exists
during the macro expansion phase of program evaluation.
The runtime itself for tangling programs is a macro that allows one to
arbitrarily reorder chunks of code in a hygienic manner. The chunking
macro itself is designed to support two important properties of a
given chunk. These correspond to the normal hygienic conditions of
hygiene proper and referential transparency. These properties may be
stated casually as follows:
\medskip{\narrower\noindent
{\bf Hygiene.}@^hygiene@>
Any definition introduced in the body of the chunk that is not
explicitly exported by the export and captures clauses is visible only
within the scope of the chunk, and is not visible to any surrounding
context that references the chunk.
\smallskip\noindent{\bf Referential Transparency.}
@^referential transparency@>%
Any free reference that appears in the body of the chunk will refer to
the nearest lexical binding of the tangled output unless they are
explicitly enumerated in the captures clause, in which case, they will
refer to the nearest binding in the context surrounding the chunk
whenever the chunk is referenced.\par}\medskip
\noindent A subtlety occurs in the actual use of the referential
transparency property. Because the user does not have direct control
over the location of the chunk definitions when tangling a \WEB\ file,
this means that there is a more restricted notion of what the scope is
for a chunk than would be normally if the tangling runtime were used
as a program time entity. On the other hand, this restriction only
applies to the specific way in which \ChezWEB\ combines/tangles a \WEB\
file, and it does not apply to how an user may use the chunking
mechanism. Users may, in fact, place chunk definitions arbitrarily in
their own code, if they would like to do so. In this case, the
referential transparency property must still hold in its full
generality.
In the case when we are only dealing with chunks defined through the
\WEB\ mechanism and tangled explicitly, the result is that all chunks
will be defined at the top level of the file after the definition of
the runtime, but before the inclusion of any other top level
elements. This means that in practice, the lexical scope of any free
variable in a chunk is the top-level of the file in which the chunk
appears. So, top-level definitions will match for any free reference
in the chunk body, but these are really the only references that are
likely to be resolvable unless one explicitly captures them through
means of the capture facility.
@^top-level@>
@^free variable@>
@^free reference@>
@ The macro itself takes the following syntax:
\medskip\verbatim
(@@< (name capture ...) body+ ...)
(@@< (name capture ...) => (export ...) body+ ...)
!endverbatim \medskip
\noindent The first instance is the value form of a chunk. It binds
|name| to an identifier syntax that will, when referenced, expand into
a form that evaluates |body+ ...| in its own scope, where
|capture ...| are bound to the values visible in the surrounding
context (rather than lexically scoped), whose return value is the
value of the last expression appearing in |body+ ...|.
The second form is the definition form of a chunk. In this form, it
works the same as above, except that a reference to the chunk may only
appear in definition contexts; it returns no value, but instead binds
in the surrounding context of the reference those identifiers
enumerated by |export ...| to the values to which they were bound in
the chunk body.
@(runtime.ss@>=
(module (@@< =>)
(import-only (chezscheme))
(define-syntax @@< @.@@<@>
(syntax-rules (=>)
[(_ (name c ...) => (e ...) b1 b2 ...)
(for-all identifier? #'(name c ... e ...))
(module-form name (c ...) (e ...) b1 b2 ...)]
[(_ (name c ...) b1 b2 ...)
(value-form name (c ...) b1 b2 ...)]))
@ Let's consider the value form first, since it is slightly easier. In
this case, we want to define the macro |name| to be an identifier
macro that will expand into the following form:
\medskip\verbatim
(let ()
(alias ic oc) ...
body+ ...)
!endverbatim \medskip
\noindent Notice the use of |ic ...| and |oc ...|. These are the
inner/outer bindings that correspond exactly to one another except
that they capture different lexical bindings. That is, we create the
|oc| bindings by rewrapping the |ic| bindings with the wraps (marks
and substitutions) of the location where the |name| is referenced. We
use |alias| to link the two identifiers to the same underlying
location.
@(runtime.ss@>=
(define-syntax (build-value-form x) @.build-value-form@>
(syntax-case x ()
[(_ id (ic ...) body ...)
(with-syntax ([(oc ...) (datum->syntax #'id (syntax->datum #'(ic ...)))])
#'(let () (alias ic oc) ... body ...))]))
@ The |build-value-form| syntax
is used as a part of the |value-form| macro, which is what @.value-form@>
does the initial definition of the macro for |name|. The |name| macro
is just an identifier syntax that has clauses for the single
identifier use and the macro call, but nothing for the |set!| clause,
since that doesn't make sense. Because we don't care about this case,
we can avoid the use of |make-variable-transformer| and instead use a
@.make-variable-transformer@>%
regular |syntax-case| form.
There is an interesting problem that arises if we try to just expand
the body directly. Because we are using |syntax-case| to do the
matching, the body that is expanded as a part of the first
level (|value-form|) of expansion, will lead to a possible ellipses
problem. Take the following body as an example:
\medskip\verbatim
(define-syntax a
(syntax-rules ()
[(_ e ...) (list 'e ...)]))
(a a b c)
!endverbatim \medskip
\noindent This seems like it should be fine, but consider what happens
if we do the following:
\medskip\verbatim
(@@< (|List of a, b, and c|)
(define-syntax a
(syntax-rules ()
[(_ e ...) (list 'e ...)]))
(a a b c))
!endverbatim \medskip
\noindent We might end up in some trouble. When |value-form| runs on it,
we will get something like this:
\medskip\verbatim
(define-syntax (|List of a, b, and c| x)
(syntax-case x ()
[id (identifier? #'id)
#'(build-value-form id ()
((define-syntax a
(syntax-rules ()
[(_ e ...) (list 'e ...)]))
(a a b c)))]))
!endverbatim \medskip
\noindent Obviously, the above syntax doesn't work, because there is
no pattern variable |e| in the pattern clause. This means that we will
get an error about an extra ellipses. What we need to do, when we run
|value-form|, is to make sure that the expanded code escapes the
ellipses, so we would expand the two body forms |(define...)| and
|(a a b c)| with ellipses around them instead.
@(runtime.ss@>=
(define-syntax value-form @.value-form@>
(syntax-rules ()
[(_ name (c ...) body ...)
(define-syntax (name x)
(syntax-case x ()
[id (identifier? #'id)
#'(build-value-form id (c ...) ((... ...) body) ...)]
[(id . rest)
#'((build-value-form id (c ...) ((... ...) body) ...)
. rest)]))]))
@ When we work with the definition form, we want to use a similar
aliasing technique as above. However, in this case, we need to link
both exports and captures. Furthermore, we need to expand into a
|module| form instead of using a |let| form as we do above.
\medskip\verbatim
(module (oe ...)
(alias ic oc) ...
(module (ie ...) body+ ...)
(alias oe ie) ...)
!endverbatim \medskip
\noindent In this case, as in the value form, the |ic ...| and
|ie ...| bindings are, respectively, the captures and exports of the
lexical (inner) scope, while the |oc ...| and |oe ...| are the same for
the surrounding context (outer).
@(runtime.ss@>=
(define-syntax (build-module-form x) @.build-module-form@>
(syntax-case x ()
[(_ id (ic ...) (ie ...) body ...)
(with-syntax ([(oc ...) (datum->syntax #'id (syntax->datum #'(ic ...)))]
[(oe ...) (datum->syntax #'id (syntax->datum #'(ie ...)))])
#'(module (oe ...)
(alias ic oc) ...
(module (ie ...) body ...)
(alias oe ie) ...))]))
@ And just as we did for the |value-form| macro,
we implement the |module-form| macro in the same way,
taking care to escape the body elements.
Unlike the value form of our call, though, we never expect to have the
|name| identifier syntax referenced at the call position of a form, as
in |(name x y z)| because that is not a valid definition context.
Thus, we only need to define the first form where it appears as a lone
identifier reference in a definition context.
@(runtime.ss@>=
(define-syntax module-form @.module-form@>
(syntax-rules ()
[(_ name (c ...) (e ...) body ...)
(define-syntax (name x)
(syntax-case x ()
[id (identifier? #'id)
#'(build-module-form id (c ...) (e ...)
((... ...) body) ...)]))]))
@ And that concludes the definition of the runtime. We do want to mark
the indirect exports for the |@@<| macro.
@(runtime.ss@>=
(indirect-export @@< @.indirect-export@>
module-form value-form build-module-form build-value-form)
)
@* 2 The Runtime Library. For users who wish to use this runtime in
their own code, we provide a simple library for them to load the
runtime code themselves. This enables them to use the macro as
their own abstraction and have the chunk like reordering without
actually requiring them to write their entire program in \ChezWEB{}.
@^runtime library@>
@(runtime.sls@>=
#!chezscheme
(library (arcfide chezweb runtime)
(export @@< =>)
(import (chezscheme))
(include "runtime.ss"))
@* Tokenizing WEB files.
\ChezWEB\ programs written in the \WEB\ syntax are all treated as
a single stream of tokens. This token stream can be obtained by using
the |chezweb-tokenize| procedure, whose signature is as follows.
$$\.{chezweb-tokenize} : \\{port}\to\\{token-list}$$
\noindent Each token is either a string or a symbol representing one of
the \ChezWEB\ control codes. Control codes in the text can be
identified by reading at most three characters.
Each control code begins with an ampersand;
most are only two characters, though the |@@>=| form has more.
This makes it fairly straightforward to build a
tokenizer directly. We do this without much abstraction below.
We have the following parameters in our main loop:
\medskip{\parindent = 0.75in
\item{|tokens|} The reversed list of accumulated tokens so far
\item{|cur|} A character list buffer for accumulated string tokens
\item{|ports|} The set of file ports to read from, in order\par}
\medskip
\noindent I use the |ports| list because of the |@@i| control code
discussed further down.
@p
(define (chezweb-tokenize port) @.chezweb-tokenize@>
(let loop ([tokens '()] [cur '()] [ports (list port)])
(if (null? ports)
@<Finish tokenizing and return token list@>
(let ([c (read-char (car ports))])
(cond
[(eof-object? c) @<Finish tokenizing port and |loop|@>]
[(char=? #\@@ c) @<Parse possible control code and |loop|@>]
[else (loop tokens (cons c cur) ports)])))))
@ When we run out of ports to process, we want to handle any left over
elements in the |cur| list and return the reversed list of tokens.
@c (tokens cur)
@<Finish tokenizing and return token list@>=
(reverse
(if (null? cur)
tokens
(cons (list->string (reverse cur)) tokens)))
@ When we run out of characters at one port, we have to consider this as
a token boundary, so that we don't collapse the code encapsulated inside
of one included file with the rest. That is, you should not be able to
do something like this:
\medskip\verbatim
@@i blah.w
kdkdkdksljklsdl
!endverbatim \medskip
\noindent You want the string starting with the line after the include
to be the start of a distinct token element,
and not be merged with any previous tokens.
Thus, the above should be an error because it begins a string of
text or code without first delimiting it with a control code of some
sort.
@c (loop tokens cur ports)
@<Finish tokenizing port and |loop|@>=
(if (null? cur)
(loop tokens cur (cdr ports))
(loop (cons (list->string (reverse cur)) tokens)
'()
(cdr ports)))
@ Most of the control codes can be determined by reading ahead only
one more character, but dealing with |@@>=| requires two character
lookaheads. Additionally, there is an escape control code
(|@@@@|) that lets us escape out the ampersand if we really want to
put a literal two characters into our text rather than to use a
control code. If we do find a token, we want that token to be encoded
as the appropriate symbol. We will first add any buffer left in |cur|
to our tokens list, and then add our symbolic token to the list of
tokens as well. The list of tokens is accumulated in reverse order.
@c (c cur tokens loop ports)
@<Parse possible control code and |loop|@>=
(let ([nc (read-char (car ports))])
(case nc
[(#\@@) (loop tokens (cons c cur) ports)]
[(#\q) (get-line (car ports))
(loop tokens (cons #\newline cur) ports)]
[(#\space #\< #\p #\* #\e #\r #\( #\^ #\. #\: #\c) @q )
@<Add buffer and control code to |tokens| and |loop|@>]
[(#\>) @<Parse possible |@@>=| delimiter and |loop|@>]
[(#\i) @<Include new file in |ports| and |loop|@>]
[else
(if (eof-object? nc)
(loop tokens cur ports)
(loop tokens (cons* nc c cur) ports))]))
@ For the control codes that don't require any additional parsing, we
can simply add the |cur| buffer if it is non-empty and then add the
control code to the list of tokens.
@c (c nc cur tokens loop ports)
@<Add buffer and control code to |tokens| and |loop|@>=
(let ([token (string->symbol (string c nc))])
(if (null? cur)
(loop (cons token tokens) '() ports)
(loop (cons* token (list->string (reverse cur)) tokens)
'() ports)))
@ When we encounter an include directive, we want to splice in the
content from the file as its own set of sections. We do this by adding
the file's port at the head of |ports|. However, to maintain the
boundaries of sections, we also need to clear out things like the |cur|
buffer when we do this. We expect that the line of the include code
should contain a Scheme string that contains the path to the file. We
use this Scheme string notation instead of just a raw string because we
want to have some way of specifying strange files that may have
whitespace and other nonesense at the beginning and end of them, and so
forth.
@c (loop ports cur tokens)
@<Include new file in |ports| and |loop|@>=
(let ([fname (with-input-from-string (get-line (car ports)) read)])
(unless (string? fname)
(error #f "expected string file name" fname))
(loop (if (pair? cur)
(cons (list->string (reverse cur)) tokens)
tokens)
'()
(cons fname ports)))
@ When we encounter the sequence |@@>| in our system, we may have a
closing delimiter, but we won't know that until we read ahead a bit
more. When we do have a closing delimiter, we will ignore all of the
characters after that on the line. In essence, this is like having an
implicit |@@q| code sitting around. We do this in order to
provide a clean slate to the user when writing files, so that
extraneous whitespace is not inserted into a file if the programmer
does not intend it. Extraneous whitespace at the beginning of a file
can cause problems with things like scripts if the user is using the
|@@(| control code to generate the script file.
@^scripts@>
If we do not find the correct character for closing, then we
will treat it like a normal |@@>| code, which is a code which
does not strip the rest of the line's contents.
@c (cur loop tokens c nc ports)
@<Parse possible |@@>=| delimiter and |loop|@>=
(define (extend tok ncur)
(if (null? cur)
(loop (cons tok tokens) ncur ports)
(loop (cons* tok (list->string (reverse cur)) tokens)
ncur ports)))
(let ([nnc (read-char (car ports))])
(if (char=? #\= nnc)
(begin (get-line (car ports)) (extend '@@>= '()))
(extend '@@> (list nnc))))
@* Processing code bodies. When we are dealing with a token list,
the code bodies that may have chunk references in them will be
broken up into the code string elements and the delimiters surrounding
a code body. We want to make it easy to get a code body and treat
it like a single string of text. Tangling and weaving require
different textual representations of a chunk reference, but the
overall logic for handling slurping, as I call it, is the same
for both tangling and weaving. As such, we'll document the basic
logic here, and you can read about the special representations
in the appropriate section below.
$$\.{slurp-code} : \\{tokens}\times\\{encode}\times\\{clean}
\to\\{tokens-rest}\times\\{code-body-string}$$
\noindent The |slurp-code| procedure takes four arguments:
\medskip{\parindent=1.0in
\item{|tokens|} A list whose head should be the start of a code body;
\item{|encode|} A procedure that, when given a string representing a
chunk name, will encode that string into another string, suitable for
use as part of the code body of either tangled or woven code;
\item{|clean|} A cleaning procedure that will sanitize a
string for output;
\item{|render|} Finally, a procedure that will be called on the final
result to do any post-processing, such as stripping of trailing
or leading whitespace.\par}\medskip
\noindent
The use of a cleaner allows the slurper to be used for both tangling
and weaving. Specifically, if we tangle code, we don't want to
prepare it for \TeX{}ing. On the other hand, if we are weaving the
code, we need to remember to do things to it to make it nicer for the
\TeX\ environments.
The |slurp-code| procedure will return two values, one being the
pointer to the rest of the tokens after the body has been processed,
and the other, the code body itself as a single string.
@p
(define (slurp-code tokens encode clean render) @.slurp-code@>
(let loop ([tokens tokens] [res '()])
(cond
[(null? tokens) @<Return rest of tokens and slurped body@>]
[(string? (car tokens))
(loop (cdr tokens)
(cons (clean (car tokens)) res))]
[(eq? '@@< (car tokens))
@<Verify chunk reference syntax@>
(loop (cdddr tokens)
(cons (encode (cadr tokens)) res))]
[else @<Return rest of tokens and slurped body@>])))
@ On completion of our slurping, we want to verify that we got
something useful in the code to start with before we do the final
rendering and composition of the string elements.
@c (tokens res render)
@<Return rest of tokens and slurped body@>=
(let ([res (apply string-append (reverse res))])
(when (zero? (string-length res))
(error #f
"expected code body"
(list-head tokens (min (length tokens) 3))))
(when (for-all char-whitespace? (string->list res))
(error #f "empty chunk body" res))
(values tokens (render res)))
@ The syntax for a chunk reference is a chunk name string surrounded
by an opening |@@<| and a closing |@@>|.
\medskip\verbatim
@@<chunk name@@>
!endverbatim \medskip
\noindent The name of the chunk is the contents between the
delimiters with the leading and trailing whitespace removed. @^whitespace@>
We can verify this with a simple set of tests.
This isn't a fully thorough test but it should do the job.
@c (tokens)
@<Verify chunk reference syntax@>=
(unless (<= 3 (length tokens))
(error #f "unexpected end of token stream" tokens))
(unless (string? (cadr tokens))
(error #f "expected chunk name" (list-head tokens 2)))
(unless (eq? '@@> (caddr tokens))
(error #f "expected chunk closer" (list-head tokens 3)))
@* Tangling a WEB. Tangling is the process of taking a \WEB\ and
converting it to a Scheme file. In the current implementation,
the tangled code should run self contained on its own,
without the need of any other files.
%$$\includegraphics[width=3in]{chezweb-4.eps}$$
$$\epsfbox{chezweb.4}$$
\noindent Once we have this list of tokens, we can in turn
write a simple program to tangle the output. Tangling actually
consists of several steps.
\medskip{\parindent = 2em
\item{1.}
Accumulate named chunks;
\item{2.}
Gather file code and |@@p| code for output;
\item{3.}
Output each named file and the default file,
making sure to prepend the runtime, followed by the chunk definitions
to the default file.
\par}\medskip
\noindent For example, if we have not used the |@@(| control code,
which allows us to send data to extra files, then we will send
all of our data to the default file.
For the default file, the user can put references to named chunks
inside of the top level code. To accomadate this, we need to embed
the runtime code at the top of the default file, followed by all of
the named code definitions. We can follow both of these by the main
top level definitions of the program. Note that we do not allow
named chunk/section references inside of sections output to other
files; that is, we do not allow you to put a reference to a named
section inside of a section that was started with the |@@(| control
code.
The first step is to actually grab our runtime, which we will do when
we compile the program, or, if we are using petite, we can hope that
the user has set up their CHEZWEBHOME environment variable correctly so
that we can find it.
@p
(meta define (runtime-path)
(define (path d) (format "~a~aruntime.ss" d (directory-separator)))
(let loop ([dlst (source-directories)])
(cond
[(null? dlst) (error #f "runtime.ss not found")]
[(file-exists? (path (car dlst))) (path (car dlst))]
[else (loop (cdr dlst))])))
(define-syntax (get-code x)
(call-with-input-file (runtime-path) get-string-all))
(define runtime-code (get-code))
@ We can now define a program for tangling.
We want a program that takes a single file,
and generates the tangled output.
\medskip\verbatim
cheztangle <web_file>
!endverbatim \medskip
\noindent We will use an R6RS style program for this, assuming that
all of our important library functions will be installed in {\tt
chezweb.ss}. This creates a bit of a problem if the user is using
petite for everything, and thus, does not compile the chezweb.ss file
into the system. To get around this, we allow the user to provide a
{\tt CHEZWEBHOME} environment variable that will be used to find the
chezweb.ss file in case it cannot be located.
@(cheztangle.ss@>=
#! /usr/bin/env scheme-script
(import (chezscheme))
(define-syntax (include-chezweb x)
(define chezweb-home (getenv "CHEZWEBHOME"))
(source-directories
(cons (or chezweb-home (path-parent (car (command-line))))
(source-directories)))
(syntax-case x ()
[(k) #`(#,(datum->syntax #'k 'include) "chezweb.ss")]))
(module (tangle-file display-chezweb-version) (include-chezweb))
(display-chezweb-version "CHEZTANGLE")
(unless (= 1 (length (command-line-arguments)))
(printf "Usage: cheztangle <web_file>\n")
(exit 1))
(unless (file-exists? (car (command-line-arguments)))
(printf "Specified file '~a' does not exist.\n"
(car (command-line-arguments)))
(exit 1))
(tangle-file (car (command-line-arguments)))
(exit 0)
@ We already have a tokenizer, but in order to get the |tangle-file|
program, we need a way to extract out the appropriate code parts. We
care about two types of code: top-level and named chunks. Top level
chunks are any chunks delineated by |@@(| or |@@p|, and named chunks
are those which start with |@@<|. We store the named chunks and
top-level chunks into two tables. These are tables that map either
chunk names or file names to the chunk contents, which are
strings. Additionally, named chunks have captures and export
information that must be preserved, so we also have a table for that.
The captures table is keyed on the same values as a named chunk table,
and indeed, there should be a one-to-one mapping from named chunk keys
to capture keys, but the value of a captures table is a pair of
captures and exports lists, where the exports list may be false for
value chunks.
$$\vbox{
\offinterlineskip
\halign{
\strut #\hfill & #\hfill & #\hfill \cr
{\bf Table} & {\bf Key Type} & {\bf Value Type} \cr
\noalign{\hrule}
Top-level & Filename or |*default*| & Code String \cr
Named Chunk & Chunk Name Symbol & Code String \cr
Captures & Chunk Name Symbol &
Pair of captures and exports lists \cr
}
}$$
\noindent We use hashtables for each table, but these hashtables are
only meant for internal use, and should never see the light of the
outside userspace. The only other gotcha to remember is that the
tokens list will contain a string as the first element only if there
is something in the limbo area of the file.
If there is nothing in limbo, there will be a token first.
We want to loop around assuming that we receive a
token before any string input, and we don't care about limbo when we
tangle a file, so when we seed the loop, we will take care to remove
the initial limbo string if there is any.
@c (tokens)
@<Construct chunk tables |named|, |top-level|, and |captures|@>=
(let ([named (make-eq-hashtable)]
[top-level (make-hashtable equal-hash equal?)]
[captures (make-eq-hashtable)])
(let loop ([tokens (if (string? (car tokens)) (cdr tokens) tokens)]
[current-captures '()]
[current-exports #f])
(if (null? tokens)
(values top-level named captures)
@<Dispatch on control code and |loop|@>)))
@ On each step of the loop, we will expect to have a single control
code at the head of the |tokens| list. Each time we iterate through
the loop, we should remove all of the strings and other elements
related to that control code, so that our next iteration will again
encounter a control code at the head of the list. We do not need to
check for index control codes here or otherwise because we assume
that we have already run |cleanse-tokens-for-tangle|
@.cleans-tokens-for-tangle@> that removes all the unimportant
tokens that we don't want from the tokens list. However, we do
have to be aware of empty unnamed section bodies.
@c (loop tokens top-level current-captures
current-exports named captures)
@<Dispatch on control code and |loop|@>=
(case (car tokens)
[(@@*) (loop (cddr tokens) '() #f)]
[(|@@ |)
(if (string? (cadr tokens))
(loop (cddr tokens) '() #f)
(loop (cdr tokens) '() #f))]
[(@@p) @<Extend default top-level and |loop|@>]
[(@@<) @<Extend named chunk and |loop|@>]
[(|@@(|) @<Extend file top-level and |loop|@>]
[(@@c) @<Update the current captures and |loop|@>]
[else (error #f "Unexpected token" (car tokens) (cadr tokens))])
@ Extending the default top level is the easiest. We just append the
string that we find to the |*default*| key in the |top-level| table.
@c (loop tokens top-level)
@<Extend default top-level and |loop|@>=
(define-values (ntkns body)
(slurp-code (cdr tokens)
tangle-encode
(lambda (x) x)
strip-whitespace))
(hashtable-update! top-level '*default*
(lambda (cur) (format "~a~a~n" cur body))
"")
(loop ntkns '() #f)
@ I'd like to take a moment here to discuss what |tangle-encode| is. Our
|slurp-code| procedure, which is defined elsewhere, takes an encoder,
@.slurp-code@>
which will expect it to receive a string for encoding a chunk reference
name. The job of the encoder is to make sure that the string that it
returns is something that belongs as valid code. For weaving, this is a
form of \TeX{}ififcation, while with tangling, it's turning that name
into a proper Scheme identifier.
We can assume that the name will be stripped of extraneous
whitespace. The encoder will be a pretty straightforward use of
|format|.
@p
(define (tangle-encode x) (format "~s" (string->symbol x))) @.tangle-encode@>
@ Handling file name top-level updates works much like a named chunk,
except that we do not have to deal with the issues of capture
variables, which we will discuss shortly. We must verify that we have
a valid syntax in the stream and then we can add the name in. We
should remember to strip off the leading and trailing whitespace from
the name in question. @^whitespace@>
@c (loop tokens top-level)
@<Extend file top-level and |loop|@>=
@<Verify and extract delimited chunk@>
(let ([name (strip-whitespace name)])
(hashtable-update! top-level name
(lambda (cur) (format "~a~a~n" cur body))
""))
(loop tknsrest '() #f)
@ Named chunk updates are complicated by the need to track
captures. In the \WEB\ syntax, if you have a capture that you want to
associate with a given named chunk, you list the |@@c| form right
before you define your chunk. When we parse this, we save the captures
as soon as we encounter them so that they can be used in the next
chunk. We reset the captures if we do not find a named chunk as our
next section.
$$\.{parse-captures-line} : \\{captures-string}
\to\\{captures}\times\\{maybe-exports}$$
\noindent The format of a captures form looks something like this:
\medskip\verbatim
@@c (c ...) [=> (e ...)]
!endverbatim \medskip
\noindent In the above, the exports are optional, and the captures
could be empty. This will come in to us as a string, so we will
need a way to convert it into a data representation that we can use.
In the following function, we will get two values back that
are the captures and exports, if no exports were provided to us,
then the second value will be false.
@p
(define (parse-captures-line str) @.parse-captures-line@>
(with-input-from-string str
(lambda ()
(let* ([captures (read)] [arrow (read)] [exports (read)])
(unless (and (list? captures) (for-all symbol? captures))
(error #f
"Expected list of identifiers for captures" captures))
(unless (and (eof-object? arrow) (eof-object? exports))
(unless (eq? '=> arrow)
(error #f "Expected =>" arrow))
(unless (and (list? exports) (for-all symbol? exports))
(error #f
"Expected list of identifiers for exports" exports)))
(values captures (and (not (eof-object? exports)) exports))))))
@ With the above function, we can now trivially handle the captures
updating in our loop.
@c (loop tokens)
@<Update the current captures and |loop|@>=
(unless (string? (cadr tokens))
(error #f "Expected captures line" (cadr tokens)))
(let-values ([(captures exports) (parse-captures-line (cadr tokens))])
(loop (cddr tokens) captures exports))
@ When it comes to actually extending a named chunk, we will either
have nothing in the captures and exports forms, or we will have two
lists in |current-captures| and |current-exports| of symbols that
represent the identifiers that we want to capture and export,
respectively. We need to update two hashtables, one that maps the
actual names of the chunks to their contents, and the other that
tracks the captures and exports for each named chunk. Why do both? If
someone uses the same chunk name to define two chunks, then those
chunks are linked together. Likewise, we do not want to force the user
to put all of the captures for a chunk into the first instance that
the chunk name was used as a definition. Rather, we should allow the
programmer to extend the captures and exports in the same way that the
programmer can extend the chunks. So, for example:
\medskip\verbatim
@@c (a b) => (x y z)
@@<blah@@>=
(define-values (x y z) (list a b 'c))
@@c (t) => (u v)
@@<blah@@>=
(define-values (u v) (list t t))
!endverbatim \medskip
\noindent In the above code example, we want the end result to have a
captures list of |a b t| and the exports list to be |x y z u v|.
@c (loop tokens named current-captures current-exports captures)
@<Extend named chunk and |loop|@>=
@<Verify and extract delimited chunk@>
(let ([name (string->symbol (strip-whitespace name))])
(hashtable-update! named name
(lambda (cur) (format "~a~a~n" cur body))
"")
(hashtable-update! captures name
(lambda (cur) @<Extend captures and exports@>)
#f))
(loop tknsrest '() #f)
@ We have to be careful about how we deal with the exports list.
Suppose that the user first defines a captures line without the
exports, and then later extends a chunk with a captures line that has
an export in it. The first chunk will have been written assuming that
it will return a value, and the second will have been written assuming
that it will not. This causes a conflict, and we should not allow
this sort of thing to happen. In the above, we partially deal with
this by assuming that if the chunk has not been extended it is fine to
extend it; this is equivalent to passing the nil object as our default
in the call to |hashtable-update!|. On the other hand, we have to make
sure that we give the right error if we do encounter a false value if
we don't expect one. That is, if we receive a pair in |cur| whose
|cdr| field is false, this means that the chunk was previously defined
and that this definition had no exports in it. We should then error
out if we have been given anything other than a false exports.
@c (current-exports current-captures cur name)
@<Extend captures and exports@>=
(define (union s1 s2)
(fold-left (lambda (s e) (if (memq e s) s (cons e s))) s1 s2))
(when (and cur (not (cdr cur)) current-exports)
(error #f