From ba69bce7baa8bcd2b03a1ae2afc8c0076d381e74 Mon Sep 17 00:00:00 2001 From: Tim Bray Date: Thu, 11 Apr 2024 14:02:17 -0700 Subject: [PATCH] updated man page (autogenerated from README, should probably be part of Makefile) Signed-off-by: Tim Bray --- doc/tf.1 | 54 ++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 36 insertions(+), 18 deletions(-) diff --git a/doc/tf.1 b/doc/tf.1 index 1dfb917..6ff1b1d 100644 --- a/doc/tf.1 +++ b/doc/tf.1 @@ -1,6 +1,29 @@ .TH topfew .PP A program that finds and prints out the top few records in which a certain field or combination of fields occurs most frequently. +.SH Examples +.PP +To find the IP address that most commonly hits your web site, given an Apache logfile named \fB\fCaccess_log\fR\&. +.PP +\fB\fCtf \-\-fields 1 access_log\fR +.PP +The same effect could be achieved with +.PP +\fB\fCawk '{print $1}' access_log | sort | uniq \-c | sort \-rn | head\fR +.PP +But \fBtf\fP is usually much faster. +.PP +Do the same, but exclude high\-traffic bots (omitting the filename). +.PP +\fB\fCtf \-\-fields 1 \-\-vgrep googlebot \-\-vgrep bingbot\fR +.PP +Most popular IP addresses from May 2020. +.PP +\fB\fCtf \-\-fields 1 \-grep '\\[../May/2020'\fR +.PP +Most popular hour/minute of the day for retrievals. +.PP +\fB\fCtf \-\-fields 4 \-\-sed "\\\\[" "" \-\-sed '^[^:]*:' '' \-\-sed ':..$' ''\fR .SH Usage .PP .RS @@ -69,29 +92,24 @@ The default is the result of the Go \fB\fCruntime.NumCPU()\fR calls and often pr \fB\fC\-h\fR, \fB\fC\-help\fR, \fB\fC\-\-help\fR .PP Describes the function and options of \fBtf\fP\&. -.SH Examples -.PP -To find the IP address that most commonly hits your web site, given an Apache logfile named \fB\fCaccess_log\fR\&. -.PP -\fB\fCtf \-\-fields 1 access_log\fR +.SH Performance issues .PP -The same effect could be achieved with +Since the effect of topfew can be exactly duplicated with a combination of \fB\fCawk\fR, \fB\fCgrep\fR, \fB\fCsed\fR and \fB\fCsort\fR, you wouldn’t be using it if you didn’t care about performance. +Topfew is quite highly tuned and pushes your computer’s I/O subsystem and Go runtime hard. +Therefore, the observed effects of combinations of options can vary dramatically from system to system. .PP -\fB\fCawk '{print $1}' access_log | sort | uniq \-c | sort \-rn | head\fR +For example, if I want to list the top records containing the string \fB\fCexample\fR from a file named \fB\fCbig\-file\fR I could do either of the following: .PP -But \fBtf\fP is usually much faster. -.PP -Do the same, but exclude high\-traffic bots (omitting the filename). -.PP -\fB\fCtf \-fields 1 \-vgrep googlebot \-vgrep bingbot\fR -.PP -Most popular IP addresses from May 2020. -.PP -\fB\fCtf \-fields 1 \-grep '\\[../May/2020'\fR +.RS +.nf +tf \-g example big\-file +grep example big\-file | tf +.fi +.RE .PP -Most popular hour/minute of the day for retrievals. +When I benchmark topfew on a modern Apple\-Silicon Mac and an elderly spinning\-rust Linux VPS, I observe that the first option is faster on Mac, the second on Linux. .PP -\fB\fCtf \-fields 4 \-sed "\\\\[" "" \-sed '^[^:]*:' '' \-sed ':..$' ''\fR +Only one performance issue is uncomplicated: Topfew will \fBalways\fP run faster on a named file than a standard\-input stream. .SH Credits .PP Tim Bray created version 0.1 of Topfew, and the path toward 1.0 was based chiefly on ideas stolen from Dirkjan Ochtman and contributed by Simon Fell.