-
Notifications
You must be signed in to change notification settings - Fork 47
/
Copy pathwc.lit
297 lines (235 loc) · 8.78 KB
/
wc.lit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
@code_type c .c
@comment_type /* %s */
@compiler lit -t wc.lit && gcc wc.c && rm a.out
@title Word Count Program
@s An example of Literate programming
This example is a close adaptation of a tutorial example for Knuth and Levy's `CWEB` programming system,
translated from `CWEB` to the `Literate` programming system.
That example in turn is based on a program by Klaus Guntermann and Joachim Schrod
[*TUGboat* **7** (1986), 134–137] for a version of the "word count" program from `UNIX`.
This example demonstrates literate programming in C, although the
`Literate` programming system can be used in conjunction with any programming language.
The level of detail in this document is intentionally high,
for didactic purposes; many of the things spelled out here don’t need to be explained in other programs.
The purpose of `wc` is to count lines, words, and/or characters in a list of files.
The number of lines in a file is the number of newline characters it contains.
The number of characters is the file length in bytes. A "word" is a maximal
sequence of consecutive characters other than newline, space, or tab, containing at
least one visible ASCII code. (We assume that the standard ASCII code is in use.)
This version of `wc` has a nonstandard "silent" option (`-s`), which suppresses printing
except for the grand totals over all files.
@s
Most `Literate` programs share a common structure. It’s probably a good idea to state
the overall structure explicitly at the outset, even though the various parts could all be
introduced in a piecemeal fashion.
Here, then, is an overview of the file `wc.c` that is defined by this `Literate` program `wc.lit`:
--- wc.c
@{Header files to include}
@{Preprocessor definitions}
@{Global variables}
@{Functions}
@{The main program}
---
@s
We must include the standard I/O definitions, since we want to send formatted output to *stdout* and *stderr*.
--- Header files to include
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
---
@s
The `status` variable will tell the operating system if the run was successful or not,
and `prog_name` is used in case there’s an error message to be printed.
--- Preprocessor definitions
#define OK 1 /* status code for successful run */
#define usage_error 1 /* status code for improper syntax */
#define cannot_open_file 2 /* status code for file access error */
---
--- Global variables
int status = OK; /* exit status of command, initially OK */
char *prog_name; /* who we are */
---
@s
Now we come to the general layout of the `main()` function.
--- The main program
int main(int argc, char **argv)
{
@{Variables local to main}
prog_name = argv[0];
@{Set up option selection}
@{Process all the files}
@{Print the grand totals if there were multiple files}
return status;
}
---
@s
If the first argument begins with a '`-`', the user is choosing the desired counts and
specifying the order in which they should be displayed. Each selection is given by the
initial character (lines, words, or characters). For example, '`-cl`' would cause just
the number of characters and the number of lines to be printed, in that order. The
default, if no special argument is given, is '`-lwc`'.
We do not process this string now; we simply remember where it is. It will be used
to control the formatting at output time.
If the '`-`' is immediately followed by '`s`', only summary totals are printed.
--- Variables local to main
int file_count; /* how many files there are */
char *which; /* which counts to print */
int silent = 0; /* nonzero if the silent option was selected */
---
@s
--- Set up option selection
which = "lwc"; /* if no option is given, print all three values */
if (argc >1 && *argv[1] == '-') {
argv[1]++;
if (*argv [1] == 's') silent = 1, argv [1]++;
if (*argv [1]) which = argv [1];
argc--;
argv++;
}
file_count = argc - 1;
---
@s
Now we scan the remaining arguments and try to open a file, if possible. The file
is processed and its statistics are given. We use a `do ... while` loop because
we should read from the standard input if no file name is given.
--- Process all the files
argc--;
do {
@{If a file is given, try to open *(++argv ); continue if unsuccessful}
@{Initialize pointers and counters}
@{Scan file}
@{Write statistics for file}
@{Close file}
@{Update grand totals}
} while (--argc > 0);
---
@s
Here’s the code to open the file. A special trick allows us to handle input from *stdin*
when no name is given. Recall that the file descriptor to *stdin* is `0`; that’s what we
use as the default initial value.
--- Variables local to main +=
int fd = 0;
---
@s
--- Preprocessor definitions +=
#define READ_ONLY 0
---
--- If a file is given, try to open *(++argv ); continue if unsuccessful
if (file_count > 0 && (fd = open(*(++argv), READ_ONLY)) < 0) {
fprintf(stderr, "%s: cannot open file %s\n", prog_name, *argv);
status |= 2;
file_count--;
continue;
}
---
@s
--- Close file
close(fd);
---
@s
We will do some homemade buffering in order to speed things up: Characters will be
read into the buffer array before we process them. To do this we set up appropriate
pointers and counters.
--- Variables local to main +=
char buffer[BUFSIZ]; /* we read the input into this array */
register char *ptr; /* the first unprocessed character in buffer */
register char *buf_end; /* the first unused position in buffer */
register int c; /* current character or number of characters just read */
int in_word; /* are we within a word? */
long word_count, line_count, char_count;
/* number of words, lines, and characters found in the file so far */
---
@s
--- Initialize pointers and counters
ptr = buf_end = buffer;
line_count = word_count = char_count = 0;
in_word = 0;
---
@s
The grand totals must be initialized to zero at the beginning of the program.
If we made these variables local to main, we would have to do this initialization explicitly;
however, C’s globals are automatically zeroed. (Or rather, "statically zeroed.") (Get it?)
--- Global variables +=
long tot_word_count, tot_line_count, tot_char_count;
/* total number of words, lines and chars */
---
@s
The present section, which does the counting that is `wc`'s *raison d'être*, was actually
one of the simplest to write. We look at each character and change state if it begins or ends a word.
--- Scan file
while (1) {
@{Fill buffer if it is empty; break at end of file}
c = *ptr++;
if (c > ' ' && c < 177) { /* visible ASCII codes */
if (!in_word) {
word_count++;
in_word = 1;
}
continue;
}
if (c == '\n') line_count++;
else if (c != ' ' && c != '\t') continue;
in_word = 0; /* c is newline, space, or tab */
}
---
@s
Buffered I/O allows us to count the number of characters almost for free.
--- Fill buffer if it is empty; break at end of file
if (ptr >= buf_end) {
ptr = buffer;
c = read(fd, ptr, BUFSIZ);
if (c <= 0) break;
char_count += c;
buf_end = buffer + c;
}
---
@s
It’s convenient to output the statistics by defining a new function `wc_print()`;
then the same function can be used for the totals. Additionally we must decide here
if we know the name of the file we have processed or if it was just *stdin*.
--- Write statistics for file
if (!silent) {
wc_print(which, char_count, word_count, line_count);
if (file_count) printf(" %s\n", *argv); /* not stdin */
else printf("\n"); /* stdin */
}
---
@s
--- Update grand totals
tot_line_count += line_count;
tot_word_count += word_count;
tot_char_count += char_count;
---
@s
We might as well improve a bit on `UNIX`’s `wc` by displaying the number of files too.
--- Print the grand totals if there were multiple files
if (file_count > 1 || silent) {
wc_print(which, tot_char_count, tot_word_count, tot_line_count);
if (!file_count) printf("\n");
else printf(" total in %d file%s\n", file_count, file_count > 1 ? "s" : "");
}
---
@s
Here now is the function that prints the values according to the specified options.
The calling routine is supposed to supply a newline. If an invalid option character
is found we inform the user about proper usage of the command. Counts are printed
in 8-digit fields so that they will line up in columns.
--- Functions
void wc_print(char *which, long char_count, long word_count, long line_count)
{
while (*which)
switch (*which++) {
case 'l': printf("%8ld", line_count);
break;
case 'w': printf("%8ld", word_count);
break;
case 'c': printf("%8ld", char_count);
break;
default:
if ((status & 1) == 0) {
fprintf(stderr, "\nUsage: %s [-lwc] [filename ...]\n", prog_name);
status |= 1;
}
}
}
---