UNIX Shell Command Line Programming

Introduction to Command Line Programming.

Nautilus Pompilius

(Image: Nautilus Pompilius. Wikimedia Commons, CC)

UNIX/Linux/MacOS or any other Unix-like Operating systems, include a Command Line Interface via a Terminal emulator program.

The user can interact with the Operating System via a Shell or Command Line Interpreter or also known as Shell, that interprets a sequence of lines of text entered by a user.

UNIX Shells

There is a wide family of available of UNIX shells. We mention the most current used ones that come by default in Linux and MacOS systems:

Bourne Again Shell: /bin/bash (Linux systems default)
Z shell: /bin/zsh (MacOS systems default)

The Z Shell has backward compatibility with the Bash Shell have similar functionality. To find out which Shell your Terminal is running on, please type:

echo $SHELL (ENTER)

Or to find if bash is in your predefined PATHs, you can type which bash to see if you have available from your environment variables.

You could use the command find to locate a program:

find / -name bash -print, which translates to find from where, what filename and then print.

One of the advantages of working with these Shells is that, they inherit the some editing commands from the GNU Emacs Editor. This allows us to move our cursor to move and "edit" the command line text.

For completeness we summarize these commands, because we will be using them extensively:

Command	Action
Positioning
`Ctrl+f`	Moves the cursor one character forward
`ESC`, `f`	Moves the cursor one word Forward
`Ctrl+b`	Moves the cursor one character backward
`ESC`, `b`	Moves the cursor one word Backward
`Ctrl+a`	Moves the cursor to the beginning of line
`Ctrl+e`	Moves the cursor to the end of line
`Ctrl+p`	Moves the cursor to the previous line in commands history
`Ctrl+n`	Moves the cursor to the next line in commands history
Memory buffer
`Ctrl+k`	Sends contents of right region after cursor to memory (a.k.a. Kill. Memory keeps only last contents if overwritten)
`Ctrl+y`	Flushes the contents in memory into cursor position (a.k.a. Yank)

Having these commands in mind, will ease our job of command line editing when needed.

Also, take advantage of the word completion capability offered by the Shell, for example in the case of long file/directory names.

Pipes and redirecting output

Command	Action
`cat file \| less`	Redirects standard output as standard input to next command less
`ls -al \| tee out.txt`	Sends output to screen and at the same time writes output to out.txt
`echo 'Hello Word!' > hello.txt`	Redirects standard output and writes it to a file hello.txt
`echo 'Hello back! >> hello.txt`	Appends the phrase to the end of the file hello.txt

Text processing in the Shell

The Shell includes at least the following tools for text processing:

cut: A command-line utility that allows you to cut parts of lines from specified files or piped data and print the result to standard output.
grep. A command-line utility for searching plain data texts for regular expressions
sed. It is a stream editor command-line tool, that parses and transforms text.
awk. Is a programming language for text processing and data extraction.
perl. Is a general-purpose UNIX scripting language for making reporting easier.

In this tutorial we will focus on: cut, grep, sed and awk only. Perl is out of the scope of this workshop.

Downloading a common text file.

For this purpose, we will download a free text file from the Gutenberg Project: The Raven, a Poem by Edgar Allan Poe. (Please download this short text file from Github)

Cut - A command-line utility for extracting sections of a line.

Reading the Man Pages: : man cut

Syntax:
cut OPTION... [FILE]...

Options description:
-f (--fields=LIST) - Select specifying a field, set of fields, or range of fields (Separated by "TAB").
-b (--bytes=LIST) - Select by specifying a byte, a set of bytes, or a range of bytes.  
-c (--characters=LIST) - Select by specifying a character, set of characters, or range of characters.

You can use one, and only one of the options listed above.

Other options are:
-d (--delimiter) - Specify a delimiter that will be used instead of the default “TAB” delimiter.   
--complement - Display complement the selection. 
-s (--only-delimited) - By default cut prints the lines that contain no delimiter character. 
--output-delimiter - The default of cut is to use the input delimiter as the output delimiter. 

The cut command can accept zero or more input FILE names. When FILE is -, cut reads the standard input.

Examples
`echo "Lorem ipsum dolor sit amet consectetur" \| cut -d ' ' -f 1,3`
`echo "Lorem ipsum dolor sit amet consectetur" \| cut -d ' ' -f 3-9`
`echo "Lorem ipsum dolor sit amet consectetur" \| cut -c 3-9`
`echo "Lorem ipsum dolor sit amet consectetur" \| cut -d ' ' -f -3`
`echo "Lorem, ipsum, dolor, sit, amet, consectetur" \| cut -d ',' -f 3-`
(Execute next 3 lines)
`echo "Lorem:ipsum:dolor:sit:amet:consectetur" > lorem.txt`
`echo "urna:consequat:felis:vehicula:class:ultricies:mollis:dictumst" >> lorem.txt`
`cut -d ':' -f 3-5 lorem.txt`

Grep - print lines that match patterns

The grep filter searches file contents for a particular pattern of characters, and displays all lines that contain that pattern. The pattern that is searched is referred to as the regular expression. (grep = "global search for regular expression and print out")

We can check the Man Pages: man grep

 Syntax: 
 `grep` [options] pattern [files]
 
 Options Description: 
 -c : This prints only a count of the lines that match a pattern. 
 -h : Display the matched lines, but do not display the filenames. 
 -i : Ignores, case for matching. 
 -l : Displays list of a filenames only. 
 -n : Display the matched lines and their line numbers. 
 -v : This prints out all the lines that do not matches the pattern. 
 -e exp : Specifies expression with this option. Can use multiple times. 
 -f file : Takes patterns from file, one per line. 
 -E : Treats pattern as an extended regular expression (ERE) 
 -w : Match whole word. 
 -o : Print only the matched parts of a matching line, 
      with each such part on a separate output line.

 -A n : Prints searched line and nlines after the result. 
 -B n : Prints searched line and n line before the result. 
 -C n : Prints searched line and n lines after before the result.

Learning common regex metacharacters

Square brackets can be used to define a list or range of characters to be found. So:

[ABC] matches A or B or C.
[A-Z] matches any upper case letter.
[A-Za-z] matches any upper or lower case letter.
[A-Za-z0-9] matches any upper or lower case letter or any digit.

Then there are:

. matches any character.
\d matches any single digit.
\w matches any part of word character (equivalent to [A-Za-z0-9]).
\s matches any space, tab, or newline.
\ used to escape the following character when that character is a special character. So, for example, a regular expression that found .com would be \.com because . is a special character that matches any character.
^ is an “anchor” which asserts the position at the start of the line. So what you put after the caret will only match if they are the first characters of a line. The caret is also known as a circumflex.
$ is an “anchor” which asserts the position at the end of the line. So what you put before it will only match if they are the last characters of a line.
\b asserts that the pattern must match at a word boundary. Putting this either side of a word stops the regular expression matching longer variants of words. So:

the regular expression mark will match not only mark but also find marking, market, unremarkable, and so on.
the regular expression \bword will match word, wordless, and wordlessly.
the regular expression comb\b will match comb and honeycomb but not combine.
the regular expression \brespect\b will match respect but not respectable or disrespectful.

Other useful special characters are:

* matches the preceding element zero or more times. For example, ab*c matches ac, abc, abbbc, etc.
+ matches the preceding element one or more times. For example, ab+c matches abc, abbbc but not ac.
? matches when the preceding character appears zero or one time.
{VALUE} matches the preceding character the number of times defined by VALUE; ranges, say, 1-6, can be specified with the syntax {VALUE,VALUE}, e.g. \d{1,9} will match any number between one and nine digits in length.
| means or.
/i renders an expression case-insensitive (equivalent to [A-Za-z]).

Examples	Description
`grep -i Raven TheRaven.txt`	Print lines having the string `Raven` ignoring case
`grep -ivc Raven TheRaven.txt`	Find the lines not containing the string `Raven` ignoring case and count lines, words and characters
`grep -n '^The' TheRaven.txt`	Print line number and lines beginning with `The`
`grep 'en!$' TheRaven.txt`	Print lines ending with `en!`
`grep -in '\bsmil' TheRaven.txt`	Print the line number and line starting with word `smil`
`grep -c '\bthe\b' TheRaven.txt`	Count the number of lines that include the word `the`
`grep -c 'ing\b' TheRaven.txt`	Count the number of lines that include a word ending with `ing`
`grep -E -w -i 'Raven	night' TheRaven.txt`

You can try this regular expressions online tool or this other

Sed (stream editor).

The sed command in UNIX stands for stream editor and it can perform many functions on file like searching, find and replace, insertion or deletion.

We can check the Man Pages: man sed

Syntax:
sed OPTIONS... [SCRIPT] [INPUTFILE...] 

-n, --quiet, --silent. Suppress automatic printing of pattern space
-e script, --expression=script. Add the script to the commands to be executed
-f script-file, --file=script-file. Add the contents of script-file to the commands to be executed
--follow-symlinks. Follow symlinks when processing in place
-i[SUFFIX], --in-place[=SUFFIX]. Edit files in place (makes backup if extension supplied)
-l N, --line-length=N. Specify the desired line-wrap length for the `l' command
--posix. Disable all GNU extensions.
-r, --regexp-extended. Use extended regular expressions in the script.
-s, --separate. Consider files as separate rather than as a single continuous long stream.
-u, --unbuffered. Load minimal amounts of data from the input files and flush the output buffers more often

If no -e, --expression, -f, or --file option is given, then the first non-option argument is taken as the sed script to interpret. All remaining arguments are names of input files; if no input files are specified, then the standard input is read.

Examples	Description
`sed -n 's/raven/omen/p' TheRaven.txt`	Substitute the first occurrence of word raven with omen.
	`-n`suppresses printing lines that do not match
`sed -n 's/floor/ground/gp' TheRaven.txt`	Do a global substitute of word floor with ground
`sed -n 's/[Ff]loor/ground/gp' TheRaven.txt`	Do a global substitute of word Floor or floor with ground
`sed -n 's/floor/ground/gip' TheRaven.txt`	Do a global case insensitive substitute of word floor with ground
`sed -n 's/floor/ground/gip;s/raven/omen/gip' TheRaven.txt`	Do a double global case insensitive substitutions (using `;`)
`sed -n '5,9p' TheRaven.txt`	Print lines 5 thru 9.
`sed -n -e '5,10p' -e '19,24p' TheRaven.txt`	Prints lines 5-9 and 19-24
	`-e`allows adding multiple selections
`sed -n '1~3p' TheRaven.txt`	Starting from line 1, print every third line
`sed -n '/^And /p' TheRaven.txt`	Prints all lines that begin with the word _And _
`sed 's/^$.$,$.$$/\2,\1 /g' poets.txt`	Will invert word order on a last name, first name file
`gsed '/Di/a --> Inserted!' poets.txt`	Will append a line with text `--> Inserted!` if the line contains the expression Di
`sed 's/.*/--> Inserted &/' poets.txt`	Will insert `--> Inserted!` before matched text
`sed 'G' poets.txt`	Will insert a blank line after each line of text
`sed '3d' poets.txt`	Will delete the 3rd. line
`sed '4,5d' poets.txt`	Will delete a range of lines
`sed -i'.bak' '/^.Di.$/d' poets.txt`	Will delete all lines with expression Di and create a backup of original file
`sed -i'.bak' '/^.Di.$/d' poets.txt > new_poets.txt`	Similar as above, but the modified file is new_poets.txt

Try this sed (stream editor) online tool.

Note on MacOS sed: Since /bin/bash in MacOS is /bin/zsh, the MacOS sed is not 100% the same as Linux sed. Use Brew to install gnu-sed, then use gsed. Brew.sh is a GNU software package installation system for MacOS and Linux.

AWK (pattern scanning and processing language)

AWK is a full scripting language, as well as a complete text manipulation toolkit for the command line. The awk command was named using the initials of the three people who wrote the original version in 1977: Alfred Aho, Peter Weinberger, and Brian Kernighan.

Check the Man Pages: man awk.

What is possible to do with AWK.

AWK Operations:
(a) Scans a file line by line
(b) Splits each input line into fields
(c) Compares input line/fields to pattern
(d) Performs action(s) on matched lines
Useful For:
(a) Transform data files
(b) Produce formatted reports
Programming Constructs:
(a) Format output lines
(b) Arithmetic and string operations
(c) Conditionals and loops

Syntax:
awk options 'selection _criteria {action }' input-file > output-file

Options:  
-f program-file : Reads the AWK program source from the file 
                  program-file, instead of from the 
                  first command line argument.
-F fs : Use fs for the input field separator

Examples	Description
`who \| awk '{print $3, $4, $5}'`	Prints fields 3, 4 and 5 of 'who' command output
	Field `$0` represents whole line, and `$NF`is the number of fields and it represents the last one
`date \| awk '{print $2,$3,$NF}'`	Extracts the day, month and year of `date` command
`date \| awk 'OFS="/" {print$2,$3,$6}'`	Will insert an output field separator in `date` command output
`awk 'BEGIN {print "The Dickinson Family of Poets"} {print $0}' poets.txt`	Adds a text at the beginning
`date \| awk 'BEGIN {print "Today is:"} {print $2,$3, $NF}'`	Prints Today is ... date
`awk -F, '{print $1,$2}' poets.txt`	Prints the two fields of the file, with field separator `,`
`awk 'BEGIN { print sqrt(625)}'`	AWK can compute mathematical expressions
`awk '/^And/ {print $0}' TheRaven.txt`	Print all lines starting with the word `And`
`awk '/eyes/{print}' TheRaven.txt`	Print all lines containing the word `eyes`
`awk 'NR==5, NR==11 {print NR ".- ",$0}' TheRaven.txt`	Prints from line 5 thru 11, with line number followed with ".- "
`awk 'BEGIN { for(i=1;i<=6;i++) print "square of", i, "is",i*i; }'`	Prints the squares of 1 thru 6