-
Notifications
You must be signed in to change notification settings - Fork 2
/
wdcnt.html
83 lines (80 loc) · 2.26 KB
/
wdcnt.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD><TITLE>wdcnt</TITLE>
</HEAD>
<BODY>
<H1><A NAME="label:0">wdcnt -- word counter for English/Japanese text file.</A></H1>
<H2><A NAME="label:1">SYNOPSIS</A></H2>
<P>
<KBD>wdcnt [-p|-z] [-e] <VAR>files</VAR> ...</KBD>
</P>
<P>
<KBD>wdcnt [-p|-z] [-e] < <VAR>file</VAR></KBD>
</P>
<P>
<KBD>wdcnt -v</KBD>
</P>
<H2><A NAME="label:2">DESCRIPTION</A></H2>
<P>
<VERB>wdcnt</VERB> counts reports English or Japanese words in files or
standard input. <VERB>wdcnt</VERB> ignores punctuation, digits, quote signs
or HTML tags. The output is sorted in the order of the occurrence
frequency and can be plotted directly by <VERB>gnuplot(1)</VERB> as follows.
</P>
<BLOCKQUOTE><PRE>
gnuplot> set log xy
gnuplot> plot "< wdcnt file"
</PRE></BLOCKQUOTE>
<H2><A NAME="label:3">OPTIONS</A></H2>
<DL>
<DT><A NAME="label:8">-p</A>
<DD>
<P>
Reports probability instead of number of occurrences.
Each frequency is normalized by 1.0.
</P>
</DD>
<DT><A NAME="label:9">-z</A>
<DD>
<P>
Reports relative frequency instead of number of occurrences.
1.0 for the most occurring word.
</P>
</DD>
<DT><A NAME="label:10">-e</A>
<DD>
<P>
Does not use KAKASI. This option is NOT useful to Japanese documents.
</P>
</DD>
<DT><A NAME="label:11">-v, -h</A>
<DD>
<P>
Prints usage and version then exit.
</P>
</DD>
</DL>
<H2><A NAME="label:4">HISTORY</A></H2>
<P>
For English document, a traditional one-liner is known:
</P>
<BLOCKQUOTE><PRE>
% tr -s '\040' '\012' files ... | sort -n | uniq -c | sort -n -r
</PRE></BLOCKQUOTE>
<H2><A NAME="label:5">SEE ALSO</A></H2>
<P>
<VERB>Ruby/KAKASI</VERB> <A HREF="http://www.ruby-lang.org/en/raa.html#Ruby%2FKAKASI"><URL:http://www.ruby-lang.org/en/raa.html#Ruby%2FKAKASI></A>,
<VERB>ruby(1)</VERB> <A HREF="http://www.ruby-lang.org/"><URL:http://www.ruby-lang.org/></A>,
<VERB>kakasi(1)</VERB> <A HREF="http://kakasi.namazu.org/"><URL:http://kakasi.namazu.org/></A>,
<VERB>gnuplot(1)</VERB>, <VERB>tr(1)</VERB>, <VERB>sort(1)</VERB>, <VERB>uniq(1)</VERB>
</P>
<H2><A NAME="label:6">BUGS</A></H2>
<P>
Word separation is not accurate.
</P>
<H2><A NAME="label:7">AUTHOR</A></H2>
<P>
Gotoken <A HREF="mailto:[email protected]"><URL:mailto:[email protected]></A>
</P>
</BODY>
</HTML>