From 236539aa690bc562f319187f6d432f60847f5973 Mon Sep 17 00:00:00 2001 From: Roland Walker Date: Tue, 26 Dec 2023 08:35:43 -0500 Subject: [PATCH] copyedit README up to --chomp section --- README.md | 181 +++++++++++++++++++++++++++--------------------------- 1 file changed, 91 insertions(+), 90 deletions(-) diff --git a/README.md b/README.md index 7e887e3..659e5bd 100644 --- a/README.md +++ b/README.md @@ -28,7 +28,7 @@ $ cat /var/log/secure | teip -c 1-15 -- date -f- +%s $ cat file | teip -g HELLO -- sed 's/WORLD/EARTH/' ``` -* Make characters upper case on the 2nd field of a CSV (RFC4180) +* Make characters upper case in the 2nd field of a CSV (RFC4180) ```bash $ cat file.csv | teip --csv -f 2 -- tr a-z A-Z @@ -47,6 +47,7 @@ $ cat access.log | teip -e 'grep -n -C 3 hello' -- sed 's/./@/g' ``` ## Performance enhancement + `teip` allows a command to focus on its own task. Here is a comparison of the processing time to replace approx 761,000 IP addresses with dummy ones in a 100 MiB text file. @@ -55,22 +56,21 @@ Here is a comparison of the processing time to replace approx 761,000 IP address benchmark bar chart

-See detail on wiki > Benchmark. +See detail at wiki > Benchmark. ## Features * Taping: Help the command "do one thing well" - - Bypassing a partial range of standard input to any command whatever you want - - The targeted command just handles bypassed parts of the standard input - - Flexible methods for selecting a range (Select like AWK, `cut` or `grep`) + - Passing a partial range of the standard input to any command — whatever you want + - The targeted command just actions the passed parts of the standard input + - Flexible methods for selecting a range (Select like `awk`, `cut` or `grep`) -* High performer - - The targeted command's standard input/output are intercepted by multiple `teip`'s threads asynchronously. - - If general UNIX commands on your environment can process a few hundred MB files in a few seconds, then `teip` can do the same or better performance. +* High performance + - The targeted command's standard input/output are written to and read from by multiple `teip` threads asynchronously. + - If general UNIX commands in your environment can process a few-hundred MB file in a few seconds, then `teip` can do the same or better performance. ## Installation - ### macOS (x86_64, ARM64) / Linux (x86_64) Install [Homebrew](https://brew.sh/), and @@ -121,14 +121,13 @@ Files whose filenames end with `sha256` have hash values listed. - ### Windows (x86_64) Download installer from [here](https://github.com/greymd/teip/releases/download/v2.3.0/teip_installer-2.3.0-x86_64-pc-windows-msvc.exe). -See [Wiki > Use on Windows](https://github.com/greymd/teip/wiki/Use-on-Windows) in detail. +See [Wiki > Use on Windows](https://github.com/greymd/teip/wiki/Use-on-Windows) for detail. ### Other architectures @@ -136,7 +135,7 @@ See [Wiki > Use on Windows](https://github.com/greymd/teip/wiki/Use-on-Windows) Check the [latest release page](https://github.com/greymd/teip/releases/tag/v2.3.0) for executables for the platform you are using. -If not, please build it from source. +If not present, please build teip from source. ### Build from source @@ -147,7 +146,7 @@ cargo install teip ``` To enable Oniguruma regular expression (`-G` option), build with `--features oniguruma` option. -Please make sure `libclang` shared library is on your environment in advance. +Please make sure the `libclang` shared library is available in your environment. ```bash ### Ubuntu @@ -178,30 +177,30 @@ USAGE: teip -e [-svz] [--] [...] OPTIONS: - -g Bypassing lines that match the regular expression - -o -g bypasses only matched parts + -g Act on lines that match the regular expression . + -o -g acts on only matched ranges. -G -g interprets Oniguruma regular expressions. - -c Bypassing these characters - -l Bypassing these lines - -f Bypassing these white-space separated fields - -d Use for field delimiter of -f - -D Use regular expression for field delimiter of -f + -c Act on these characters. + -l Act on these lines. + -f Act on these white-space separated fields. + -d Use for the field delimiter of -f. + -D Use regular expression for the field delimiter of -f --csv -f interprets as field number of a CSV according to - RFC 4180, instead of white-space separated fields - -e Execute on another process that will receive identical - standard input as the teip, and numbers given by the result - are used as line numbers for bypassing + RFC 4180, instead of whitespace separated fields. + -e Execute in another process that will receive identical + standard input as the main teip command, emitting numbers to be + used as line numbers for actioning. FLAGS: - -h, --help Prints help information - -V, --version Prints version information - -s Execute new command for each bypassed chunk - --chomp Command spawned by -s receives standard input without trailing - newlines - -I Replace the with bypassed chunk in the - then -s is forcefully enabled. - -v Invert the range of bypassing - -z Line delimiter is NUL instead of a newline + -h, --help Prints help information. + -V, --version Prints version information. + -s Execute a new command for each actioned chunk. + --chomp The command spawned by -s receives the standard input without + trailing newlines. + -I Replace the with the actioned chunk in , + implying -s. + -v Invert the range of actioning. + -z Line delimiter is NUL instead of a newline. ALIASES: -g @@ -214,20 +213,20 @@ ALIASES: ## Getting Started -Try this at first. +Try this at first: ```bash $ echo "100 200 300 400" | teip -f 3 ``` -The result is almost the same as the input but "300" is highlighted and surrounded by `[...]`. -Because `-f 3` specifies the 3rd field of space-separated input. +The result is almost the same as the input, but "300" is highlighted and surrounded by `[...]`, +because `-f 3` specifies the 3rd field of space-separated input. ```bash 100 200 [300] 400 ``` -Understand that the area enclosed in `[...]` is a **hole** on the masking tape. +Understand that the area enclosed in `[...]` is a **hole** in the masking tape. @@ -238,44 +237,45 @@ $ echo "100 200 300 400" | teip -f 3 sed 's/./@/g' ``` The result is as below. -Highlight and `[...]` is gone then. +The highlight and `[...]` will not be present when a command is added. ``` 100 200 @@@ 400 ``` -As you can see, the `sed` only processed the input in the "hole" and ignores masked parts. -Technically, `teip` passes only highlighted part to the `sed` and replaces it with the result of the `sed`. +As you can see, the `sed` command only acted on the input defined by the "hole" and ignored the masked +parts. Technically, `teip` passes only the highlighted part to the `sed` process, and replaces the +highlighted part with the result of the `sed` command. -Off-course, any command whatever you like can be specified. +Of course, any command you like can be specified. It is called the **targeted command** in this article. -Let's try the `cut` as the targeted command to extract the first character only. +Let's try `cut` as the targeted command, to extract the first character only. ```bash $ echo "100 200 300 400" | teip -f 3 cut -c 1 teip: Invalid arguments. ``` -Oops? Why is it failed? +Oops! Why did this fail? -This is because the `cut` uses the `-c` option. -The option of the same name is also provided by `teip`, which is confusing. +This is because the `cut`command uses the `-c` option. +An option of the same name is also provided by `teip`, which is confusing. -When entering a targeted command with `teip`, it is better to enter it after `--`. -Then, `teip` interprets the arguments after `--` as the targeted command and its argument. +When specifying a targeted command to `teip`, it is better to give it after `--`. +Then, `teip` interprets any arguments after `--` as the targeted command and its arguments. ```bash $ echo "100 200 300 400" | teip -f 3 -- cut -c 1 100 200 3 400 ``` -Great, the first character `3` is extracted from `300`! +Great — the first character `3` is extracted from `300`! -Although `--` is not always necessary, it is always better to be used. -So, `--` is used in all the examples from here. +Although `--` is not always necessary, it is always better to use it. +So, `--` is used in all the examples from here on. -Now let's double this number with the `awk`. +Now let's double these number with `awk`. The command looks like the following (Note that the variable to be doubled is not `$3`). ```bash @@ -283,20 +283,21 @@ $ echo "100 200 300 400" | teip -f 3 -- awk '{print $1*2}' 100 200 600 400 ``` -OK, the result went from 300 to 600. +OK, the selection in the "hole" went from 300 to 600. -Now, let's change `-f 3` to `-f 3,4` and run it. +Now, let's change `-f 3` to `-f 3,4` and run teip. ```bash $ echo "100 200 300 400" | teip -f 3,4 -- awk '{print $1*2}' 100 200 600 800 ``` -The numbers in the 3rd and 4th were doubled! +The numbers in the 3rd and 4th fields were doubled! -As some of you may have noticed, the argument of `-f` is compatible with the __LIST__ of `cut`. +As you may have noticed, the argument to `-f` is compatible with the __LIST__ of `cut`. +You can refer to `cut --help` to see how it works. -Let's see how it works with `cut --help`. +Examples: ```bash $ echo "100 200 300 400" | teip -f -3 -- sed 's/./@/g' @@ -311,8 +312,8 @@ $ echo "100 200 300 400" | teip -f 1- -- sed 's/./@/g' ## Select range by character -The `-c` option allows you to specify a range by character-base. -The below example is specifing 1st, 3rd, 5th, 7th characters and apply the `sed` command to them. +The `-c` option allows you to specify a range by character. +The below example is specifying the 1st, 3rd, 5th, 7th characters and applying the `sed` command to them. ```bash $ echo ABCDEFG | teip -c 1,3,5,7 @@ -322,13 +323,13 @@ $ echo ABCDEFG | teip -c 1,3,5,7 -- sed 's/./@/' @B@D@F@ ``` -As same as `-f`, `-c`'s argument is compatible with `cut`'s __LIST__. +Like `-f`, the argument to `-c` is compatible with `cut`'s __LIST__. -## Processing delimited text like CSV, TSV +## Processing delimited text like CSV and TSV The `-f` option recognizes delimited fields [like `awk`](https://www.gnu.org/software/gawk/manual/html_node/Regexp-Field-Splitting.html) by default. -The continuous white spaces (all forms of whitespace categorized by [Unicode](https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt)) is interpreted as a single delimiter. +Any continuous whitespace (all forms of whitespace categorized by [Unicode](https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt)) is interpreted as a single delimiter. ```bash $ printf "A       B \t\t\t\   C \t D" | teip -f 3 -- sed s/./@@@@/ @@ -337,16 +338,16 @@ A       B @@@@   C D This behavior might be inconvenient for the processing of CSV and TSV. -However, the `-d` option in conjunction with the `-f` can be used to specify a delimiter. -Now you can process the CSV file like this. +However, the `-d` option in conjunction with `-f` can be used to specify a delimiter. +You can process a simple CSV file like this: ```bash $ echo "100,200,300,400" | teip -f 3 -d , -- sed 's/./@/g' 100,200,@@@,400 ``` -In order to process TSV, the TAB character need to be typed. -If you are using Bash, type `$'\t'` which is one of [ANSI-C Quoting](https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html). +In order to process TSV, the TAB character must be given at the command line. +If you are using Bash, type `$'\t'` which is in the form of [ANSI-C Quoting](https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html). ```bash $ printf "100\t200\t300\t400\n" | teip -f 3 -d $'\t' -- sed 's/./@/g' @@ -354,7 +355,7 @@ $ printf "100\t200\t300\t400\n" | teip -f 3 -d $'\t' -- sed 's/./@/g' ``` `teip` also provides `-D` option to specify an extended regular expression as the delimiter. -This is useful when you want to ignore consecutive delimiters, or when there are multiple types of delimiters. +This is useful when you want to ignore consecutive delimiters, or when there are multiple types of delimiter. ```bash $ echo 'A,,,,,B,,,,C' | teip -f 2 -D ',+' @@ -366,14 +367,14 @@ $ echo "1970-01-02 03:04:05" | teip -f 2-5 -D '[-: ]' 1970-[01]-[02] [03]:[04]:05 ``` -The regular expression of TAB character (`\t`) can also be specified with the `-D` option. +The TAB character regular expression (`\t`) can also be specified with the `-D` option. ``` $ printf "100\t200\t300\t400\n" | teip -f 3 -D '\t' -- sed 's/./@/g' 100 200 @@@ 400 ``` -Regarding available notations of the regular expression, refer to [regular expression of Rust](https://docs.rs/regex/1.3.7/regex/). +For the available regular expression notations, refer to [regular expression of Rust](https://docs.rs/regex/1.3.7/regex/). ## Complex CSV processing @@ -389,7 +390,7 @@ Yui Nagomi,"Nagomi Street 456, Nagomitei, Oishina town",26930-0312 With `--csv`, teip will parse the input as a CSV file according to [RFC4180](https://www.rfc-editor.org/rfc/rfc4180). Thus, you can use `-f` to specify column numbers for CSV files with complex structures. -For example, the CSV just mentioned above will have a "hole" as shown below. +For example, the CSV above will have a "hole" as shown below. ``` $ cat tests/sample.csv | teip --csv -f2 @@ -412,14 +413,14 @@ Yui Nagomi,"@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@",26930-0312 "Conectol Motimotit Hooklala Glycogen Comex II a.k.a ""Kome kome""","@@@@@@@@@@@",513123 ``` -Note for `--csv` option: +Notes for the `--csv` option: -* Double quotation `"` surrounding fields are also included in the holes. -* Escaped double quotes `""` are treated as is; two double quotes `""` are given as input to the targeted command. +* Double quotes `"` surrounding fields are also included in the holes. +* Escaped double quotes `""` are treated as-is; two double quotes `""` are given as input to the targeted command. * Fields containing newlines will have multiple holes, separated by newlines, instead of a single hole. - * However, if the `-s` or `-z` option is used, it is treated as a single hole, including line breaks. + * However, if the `-s` or `-z` option is used, such a field is treated as a single hole, and line breaks are included. -## Matching with Regular Expression +## Matching with Regular Expressions You can also use `-g` to select a specific line matching a regular expression as the hole location. @@ -431,7 +432,7 @@ ABC1 ``` By default, the entire line containing the pattern is the range of holes. -With the -o option, the range of holes will be only at matched range. +With the -o option, the range of the holes will ony cover the matched range. ```bash $ echo -e "ABC1\nEFG2\nHIJ3" | teip -og '[GJ]\d' @@ -440,9 +441,9 @@ EF[G2] HI[J3] ``` -Note that `-og` is one of the useful idiom and frequently used in this manual. +Note that `-og` is one of the most useful idioms and is frequently used in this manual. -Here is an example of using `\d` which matches numbers. +Here is an example using `\d`, which matches numbers. ```bash $ echo ABC100EFG200 | teip -og '\d+' @@ -452,17 +453,17 @@ $ echo ABC100EFG200 | teip -og '\d+' -- sed 's/.*/@@@/g' ABC@@@EFG@@@ ``` -This feature is quite versatile and can be useful for handling the file that has no fixed form like logs, markdown, etc. +This feature is quite versatile and can be useful for handling files that have no fixed form such as logs, markdown, etc. ## What commands are appropriate? -`teip` bypasses the string in the hole line by line so that each hole is one line of input. +`teip` passes the strings from the hole line-by-line, so that each hole is one line of input. Therefore, a targeted command must follow the below rule. * **A targeted command must print a single line of result for each line of input.** -In the simplest example, the `cat` command always succeeds. -Because the `cat` prints the same number of lines against the input. +In the simplest example, the `cat` command always succeeds, +because the `cat` prints the same number of lines as it is given in input. ```bash $ echo ABCDEF | teip -og . -- cat @@ -485,7 +486,7 @@ $ echo $? 1 ``` -`teip` could not get the result corresponding to the hole of D, E, and F. +`teip` did not receive results corresponding to the holes of D, E, and F. That is why the above example fails. If an inconsistency occurs, `teip` will exit with the error message. @@ -497,15 +498,15 @@ To learn more about `teip`'s behavior, see [Wiki > Chunking](https://github.com/ ### Solid mode (`-s`) -If you want to use a command that does not satisfy the condition, **"A targeted command must print a single line of result for each line of input"**, enable "Solid mode" which is available with the `-s` option. +If you want to use a command that does not satisfy the condition, **"A targeted command must print a single line of result for each line of input"**, enable "Solid mode" with the `-s` option. -Solid mode spawns the targeted command for each hole and executes it each time. +Solid mode spawns the targeted command multiple times: once for each hole in the input. ```bash $ echo ABCDEF | teip -s -og . -- grep '[ABC]' ``` -In the above example, understand the following commands are executed in `teip`'s internal procedure. +In the above example, understand that the following commands are executed by `teip` internally: ```bash $ echo A | grep '[ABC]' # => A @@ -517,7 +518,7 @@ $ echo F | grep '[ABC]' # => Empty ``` The empty result is replaced with an empty string. -Therefore, D, E, and F are replaced with empty as expected. +Therefore, D, E, and F are replaced with the empty string. ```bash $ echo ABCDEF | teip -s -og . -- grep '[ABC]' @@ -527,18 +528,18 @@ $ echo $? 0 ``` -However, this option is not suitable for processing large files because of its high processing overhead, which can significantly degrade performance. +However, this option is not suitable for processing large files because of its high overhead, which can significantly degrade performance. #### Solid mode with placeholder (`-I `) -If you want to use the contents of the hole as an argument of the targeted command, use the `-I` option. +If you want to use the contents of the hole as an argument to the targeted command, use the `-I` option. ```bash $ echo AAA BBB CCC | teip -f 2 -I @ -- echo '[@]' AAA [BBB] CCC ``` -`` can be any strings and multiple characters are allowed. +`` can be any string, and multiple characters are allowed. ```bash $ seq 5 | teip -f 1 -I NUMBER -- awk 'BEGIN{print NUMBER * 3}' @@ -554,7 +555,7 @@ Therefore, it is not suitable for processing huge files. In addition, the targeted command does not get any input from stdin. The targeted command is expected to work without stdin. -#### Solid mode with `--chomp` +#### Solid mode with `--chomp` If `-s` option does not work as expected, `--chomp` may be helpful.