Skip to content

Commit

Permalink
updated man page (autogenerated from README, should probably be part …
Browse files Browse the repository at this point in the history
…of Makefile)

Signed-off-by: Tim Bray <[email protected]>
  • Loading branch information
timbray committed Apr 11, 2024
1 parent 43bfe8f commit ba69bce
Showing 1 changed file with 36 additions and 18 deletions.
54 changes: 36 additions & 18 deletions doc/tf.1
Original file line number Diff line number Diff line change
@@ -1,6 +1,29 @@
.TH topfew
.PP
A program that finds and prints out the top few records in which a certain field or combination of fields occurs most frequently.
.SH Examples
.PP
To find the IP address that most commonly hits your web site, given an Apache logfile named \fB\fCaccess_log\fR\&.
.PP
\fB\fCtf \-\-fields 1 access_log\fR
.PP
The same effect could be achieved with
.PP
\fB\fCawk '{print $1}' access_log | sort | uniq \-c | sort \-rn | head\fR
.PP
But \fBtf\fP is usually much faster.
.PP
Do the same, but exclude high\-traffic bots (omitting the filename).
.PP
\fB\fCtf \-\-fields 1 \-\-vgrep googlebot \-\-vgrep bingbot\fR
.PP
Most popular IP addresses from May 2020.
.PP
\fB\fCtf \-\-fields 1 \-grep '\\[../May/2020'\fR
.PP
Most popular hour/minute of the day for retrievals.
.PP
\fB\fCtf \-\-fields 4 \-\-sed "\\\\[" "" \-\-sed '^[^:]*:' '' \-\-sed ':..$' ''\fR
.SH Usage
.PP
.RS
Expand Down Expand Up @@ -69,29 +92,24 @@ The default is the result of the Go \fB\fCruntime.NumCPU()\fR calls and often pr
\fB\fC\-h\fR, \fB\fC\-help\fR, \fB\fC\-\-help\fR
.PP
Describes the function and options of \fBtf\fP\&.
.SH Examples
.PP
To find the IP address that most commonly hits your web site, given an Apache logfile named \fB\fCaccess_log\fR\&.
.PP
\fB\fCtf \-\-fields 1 access_log\fR
.SH Performance issues
.PP
The same effect could be achieved with
Since the effect of topfew can be exactly duplicated with a combination of \fB\fCawk\fR, \fB\fCgrep\fR, \fB\fCsed\fR and \fB\fCsort\fR, you wouldn’t be using it if you didn’t care about performance.
Topfew is quite highly tuned and pushes your computer’s I/O subsystem and Go runtime hard.
Therefore, the observed effects of combinations of options can vary dramatically from system to system.
.PP
\fB\fCawk '{print $1}' access_log | sort | uniq \-c | sort \-rn | head\fR
For example, if I want to list the top records containing the string \fB\fCexample\fR from a file named \fB\fCbig\-file\fR I could do either of the following:
.PP
But \fBtf\fP is usually much faster.
.PP
Do the same, but exclude high\-traffic bots (omitting the filename).
.PP
\fB\fCtf \-fields 1 \-vgrep googlebot \-vgrep bingbot\fR
.PP
Most popular IP addresses from May 2020.
.PP
\fB\fCtf \-fields 1 \-grep '\\[../May/2020'\fR
.RS
.nf
tf \-g example big\-file
grep example big\-file | tf
.fi
.RE
.PP
Most popular hour/minute of the day for retrievals.
When I benchmark topfew on a modern Apple\-Silicon Mac and an elderly spinning\-rust Linux VPS, I observe that the first option is faster on Mac, the second on Linux.
.PP
\fB\fCtf \-fields 4 \-sed "\\\\[" "" \-sed '^[^:]*:' '' \-sed ':..$' ''\fR
Only one performance issue is uncomplicated: Topfew will \fBalways\fP run faster on a named file than a standard\-input stream.
.SH Credits
.PP
Tim Bray created version 0.1 of Topfew, and the path toward 1.0 was based chiefly on ideas stolen from Dirkjan Ochtman and contributed by Simon Fell.

0 comments on commit ba69bce

Please sign in to comment.