-
Notifications
You must be signed in to change notification settings - Fork 8
UNIX Shell Command Line Programming
(Image: Nautilus Pompilius. Wikimedia Commons, CC)
UNIX/Linux/MacOS or any other Unix-like Operating systems, include a Command Line Interface via a Terminal emulator program.
The user can interact with the Operating System via a Shell or Command Line Interpreter or also known as Shell, that interprets a sequence of lines of text entered by a user.
There is a wide family of available of UNIX shells. We mention the most current used ones that come by default in Linux and MacOS systems:
-
Bourne Again Shell:
/bin/bash
(Linux systems default) -
Z shell:
/bin/zsh
(MacOS systems default)
The Z Shell has backward compatibility with the Bash Shell have similar functionality. To find out which Shell your Terminal is running on, please type:
echo $SHELL
(ENTER)
Or to find if bash
is in your predefined PATHs, you can type which bash
to see if you have available from your environment variables.
You could use the command find
to locate a program:
find / -name bash -print
, which translates to find from where, what filename and then print.
One of the advantages of working with these Shells is that, they inherit the some editing commands from the GNU Emacs Editor. This allows us to move our cursor to move and "edit" the command line text.
For completeness we summarize these commands, because we will be using them extensively:
Command | Action |
---|---|
Positioning | |
Ctrl+f |
Moves the cursor one character forward |
ESC , f
|
Moves the cursor one word Forward |
Ctrl+b |
Moves the cursor one character backward |
ESC , b
|
Moves the cursor one word Backward |
Ctrl+a |
Moves the cursor to the beginning of line |
Ctrl+e |
Moves the cursor to the end of line |
Ctrl+p |
Moves the cursor to the previous line in commands history |
Ctrl+n |
Moves the cursor to the next line in commands history |
Memory buffer | |
Ctrl+k |
Sends contents of right region after cursor to memory (a.k.a. Kill. Memory keeps only last contents if overwritten) |
Ctrl+y |
Flushes the contents in memory into cursor position (a.k.a. Yank) |
Having these commands in mind, will ease our job of command line editing when needed.
Also, take advantage of the word completion capability offered by the Shell, for example in the case of long file/directory names.
Command | Action |
---|---|
cat file | less |
Redirects standard output as standard input to next command less |
ls -al | tee out.txt |
Sends output to screen and at the same time writes output to out.txt |
echo 'Hello Word!' > hello.txt |
Redirects standard output and writes it to a file hello.txt |
echo 'Hello back! >> hello.txt |
Appends the phrase to the end of the file hello.txt |
The Shell includes at least the following tools for text processing:
-
cut
: A command-line utility that allows you to cut parts of lines from specified files or piped data and print the result to standard output. -
grep
. A command-line utility for searching plain data texts for regular expressions -
sed
. It is a stream editor command-line tool, that parses and transforms text. -
awk
. Is a programming language for text processing and data extraction. -
perl
. Is a general-purpose UNIX scripting language for making reporting easier.
In this tutorial we will focus on: cut
, grep
, sed
and awk
only. Perl is out of the scope of this workshop.
For this purpose, we will download a free text file from the Gutenberg Project: The Raven, a Poem by Edgar Allan Poe. (Please download this short text file from Github)
Reading the Man Pages: : man cut
Syntax:
cut OPTION... [FILE]...
Options description:
-f (--fields=LIST) - Select specifying a field, set of fields, or range of fields (Separated by "TAB").
-b (--bytes=LIST) - Select by specifying a byte, a set of bytes, or a range of bytes.
-c (--characters=LIST) - Select by specifying a character, set of characters, or range of characters.
You can use one, and only one of the options listed above.
Other options are:
-d (--delimiter) - Specify a delimiter that will be used instead of the default “TAB” delimiter.
--complement - Display complement the selection.
-s (--only-delimited) - By default cut prints the lines that contain no delimiter character.
--output-delimiter - The default of cut is to use the input delimiter as the output delimiter.
The cut command can accept zero or more input FILE names. When FILE is -, cut reads the standard input.
Examples |
---|
echo "Lorem ipsum dolor sit amet consectetur" | cut -d ' ' -f 1,3 |
echo "Lorem ipsum dolor sit amet consectetur" | cut -d ' ' -f 3-9 |
echo "Lorem ipsum dolor sit amet consectetur" | cut -c 3-9 |
echo "Lorem ipsum dolor sit amet consectetur" | cut -d ' ' -f -3 |
echo "Lorem, ipsum, dolor, sit, amet, consectetur" | cut -d ',' -f 3- |
(Execute next 3 lines) |
echo "Lorem:ipsum:dolor:sit:amet:consectetur" > lorem.txt |
echo "urna:consequat:felis:vehicula:class:ultricies:mollis:dictumst" >> lorem.txt |
cut -d ':' -f 3-5 lorem.txt |
The grep
filter searches file contents for a particular pattern of characters, and displays all lines that contain that pattern. The pattern that is searched is referred to as the regular expression. (grep
= "global search for regular expression and print out")
We can check the Man Pages: man grep
Syntax:
`grep` [options] pattern [files]
Options Description:
-c : This prints only a count of the lines that match a pattern.
-h : Display the matched lines, but do not display the filenames.
-i : Ignores, case for matching.
-l : Displays list of a filenames only.
-n : Display the matched lines and their line numbers.
-v : This prints out all the lines that do not matches the pattern.
-e exp : Specifies expression with this option. Can use multiple times.
-f file : Takes patterns from file, one per line.
-E : Treats pattern as an extended regular expression (ERE)
-w : Match whole word.
-o : Print only the matched parts of a matching line,
with each such part on a separate output line.
-A n : Prints searched line and nlines after the result.
-B n : Prints searched line and n line before the result.
-C n : Prints searched line and n lines after before the result.
Square brackets can be used to define a list or range of characters to be found. So:
[ABC]
matches A or B or C.
[A-Z]
matches any upper case letter.
[A-Za-z]
matches any upper or lower case letter.
[A-Za-z0-9]
matches any upper or lower case letter or any digit.
Then there are:
.
matches any character.
\d
matches any single digit.
\w
matches any part of word character (equivalent to [A-Za-z0-9]
).
\s
matches any space, tab, or newline.
\
used to escape the following character when that character is a special character. So, for example, a regular expression that found .com
would be \.com
because .
is a special character that matches any character.
^
is an “anchor” which asserts the position at the start of the line. So what you put after the caret will only match if they are the first characters of a line. The caret is also known as a circumflex.
$
is an “anchor” which asserts the position at the end of the line. So what you put before it will only match if they are the last characters of a line.
\b
asserts that the pattern must match at a word boundary. Putting this either side of a word stops the regular expression matching longer variants of words. So:
- the regular expression
mark
will match not only mark but also find marking, market, unremarkable, and so on.
- the regular expression
\bword
will match word, wordless, and wordlessly. - the regular expression
comb\b
will match comb and honeycomb but not combine. - the regular expression
\brespect\b
will match respect but not respectable or disrespectful.
Other useful special characters are:
*
matches the preceding element zero or more times. For example, ab*c
matches ac, abc, abbbc, etc.
+
matches the preceding element one or more times. For example, ab+c
matches abc, abbbc but not ac.
?
matches when the preceding character appears zero or one time.
{VALUE}
matches the preceding character the number of times defined by VALUE; ranges, say, 1-6, can be specified with the syntax {VALUE,VALUE}
, e.g. \d{1,9}
will match any number between one and nine digits in length.
|
means or.
/i
renders an expression case-insensitive (equivalent to [A-Za-z]
).
Examples | Description |
---|---|
grep -i Raven TheRaven.txt |
Print lines having the string Raven ignoring case |
grep -ivc Raven TheRaven.txt |
Find the lines not containing the string Raven ignoring case and count lines, words and characters |
grep -n '^The' TheRaven.txt |
Print line number and lines beginning with The
|
grep 'en!$' TheRaven.txt |
Print lines ending with en!
|
grep -in '\bsmil' TheRaven.txt |
Print the line number and line starting with word smil
|
grep -c '\bthe\b' TheRaven.txt |
Count the number of lines that include the word the
|
grep -c 'ing\b' TheRaven.txt |
Count the number of lines that include a word ending with ing
|
`grep -E -w -i 'Raven | night' TheRaven.txt` |
You can try this regular expressions online tool or this other
The sed
command in UNIX stands for stream editor and it can perform many functions on file like searching, find and replace, insertion or deletion.
We can check the Man Pages: man sed
Syntax:
sed OPTIONS... [SCRIPT] [INPUTFILE...]
-n, --quiet, --silent. Suppress automatic printing of pattern space
-e script, --expression=script. Add the script to the commands to be executed
-f script-file, --file=script-file. Add the contents of script-file to the commands to be executed
--follow-symlinks. Follow symlinks when processing in place
-i[SUFFIX], --in-place[=SUFFIX]. Edit files in place (makes backup if extension supplied)
-l N, --line-length=N. Specify the desired line-wrap length for the `l' command
--posix. Disable all GNU extensions.
-r, --regexp-extended. Use extended regular expressions in the script.
-s, --separate. Consider files as separate rather than as a single continuous long stream.
-u, --unbuffered. Load minimal amounts of data from the input files and flush the output buffers more often
If no -e
, --expression
, -f
, or --file
option is given, then the first non-option argument is taken as the sed script to interpret. All remaining arguments are names of input files; if no input files are specified, then the standard input is read.
Examples | Description |
---|---|
sed -n 's/raven/omen/p' TheRaven.txt |
Substitute the first occurrence of word raven with omen. |
-n suppresses printing lines that do not match |
|
sed -n 's/floor/ground/gp' TheRaven.txt |
Do a global substitute of word floor with ground |
sed -n 's/[Ff]loor/ground/gp' TheRaven.txt |
Do a global substitute of word Floor or floor with ground |
sed -n 's/floor/ground/gip' TheRaven.txt |
Do a global case insensitive substitute of word floor with ground |
sed -n 's/floor/ground/gip;s/raven/omen/gip' TheRaven.txt |
Do a double global case insensitive substitutions (using ; ) |
sed -n '5,9p' TheRaven.txt |
Print lines 5 thru 9. |
sed -n -e '5,10p' -e '19,24p' TheRaven.txt |
Prints lines 5-9 and 19-24 |
-e allows adding multiple selections |
|
sed -n '1~3p' TheRaven.txt |
Starting from line 1, print every third line |
sed -n '/^And /p' TheRaven.txt |
Prints all lines that begin with the word _And _ |
sed 's/^\(.*\),\(.*\)$/\2,\1 /g' poets.txt |
Will invert word order on a last name, first name file |
gsed '/Di/a --> Inserted!' poets.txt |
Will append a line with text --> Inserted! if the line contains the expression Di
|
sed 's/.*/--> Inserted &/' poets.txt |
Will insert --> Inserted! before matched text |
sed 'G' poets.txt |
Will insert a blank line after each line of text |
sed '3d' poets.txt |
Will delete the 3rd. line |
sed '4,5d' poets.txt |
Will delete a range of lines |
sed -i'.bak' '/^.*Di.*$/d' poets.txt |
Will delete all lines with expression Di and create a backup of original file |
sed -i'.bak' '/^.*Di.*$/d' poets.txt > new_poets.txt |
Similar as above, but the modified file is new_poets.txt |
Try this sed (stream editor) online tool.
Note on MacOS sed: Since /bin/bash
in MacOS is /bin/zsh
, the MacOS sed
is not 100% the same as Linux sed
. Use Brew to install gnu-sed, then use gsed
. Brew.sh is a GNU software package installation system for MacOS and Linux.
AWK is a full scripting language, as well as a complete text manipulation toolkit for the command line. The awk command was named using the initials of the three people who wrote the original version in 1977: Alfred Aho, Peter Weinberger, and Brian Kernighan.
What is possible to do with AWK.
-
AWK Operations:
(a) Scans a file line by line
(b) Splits each input line into fields
(c) Compares input line/fields to pattern
(d) Performs action(s) on matched lines -
Useful For:
(a) Transform data files
(b) Produce formatted reports -
Programming Constructs:
(a) Format output lines
(b) Arithmetic and string operations
(c) Conditionals and loops
Syntax:
awk options 'selection _criteria {action }' input-file > output-file
Options:
-f program-file : Reads the AWK program source from the file
program-file, instead of from the
first command line argument.
-F fs : Use fs for the input field separator
Examples | Description |
---|---|
who | awk '{print $3, $4, $5}' |
Prints fields 3, 4 and 5 of 'who' command output |
Field $0 represents whole line, and $NF is the number of fields and it represents the last one |
|
date | awk '{print $2,$3,$NF}' |
Extracts the day, month and year of date command |
date | awk 'OFS="/" {print$2,$3,$6}' |
Will insert an output field separator in date command output |
awk 'BEGIN {print "The Dickinson Family of Poets"} {print $0}' poets.txt |
Adds a text at the beginning |
date | awk 'BEGIN {print "Today is:"} {print $2,$3, $NF}' |
Prints Today is ... date |
awk -F, '{print $1,$2}' poets.txt |
Prints the two fields of the file, with field separator ,
|
awk 'BEGIN { print sqrt(625)}' |
AWK can compute mathematical expressions |
awk '/^And/ {print $0}' TheRaven.txt |
Print all lines starting with the word And
|
awk '/eyes/{print}' TheRaven.txt |
Print all lines containing the word eyes
|
awk 'NR==5, NR==11 {print NR ".- ",$0}' TheRaven.txt |
Prints from line 5 thru 11, with line number followed with ".- " |
awk 'BEGIN { for(i=1;i<=6;i++) print "square of", i, "is",i*i; }' |
Prints the squares of 1 thru 6 |
Try this awk online interpreter.
man cut
man grep
- 16 grep Command Examples to Help You in Real-World. Abhishek Nair. GeekFlare.com.
- How to Use the grep Command on Linux. Dave McKay. HowToGeek.com.
man sed
- How to use the sed command on Linux. Dave McKay. HowToGeek.com.
man awk
- How to use the awk command on Linux. Dave McKay. HowToGeek.com.
- Linux Man Pages
Created: 04/13/2022 (C. Lizárraga); Last update: 04/17/2022 (C. Lizárraga)
University of Arizona, D7 Data Science Institute, 2022.
- Introduction to the Command Line Interface Shell
- Unix Shell - Command Line Programming
- Introduction to Github Wikis
- Introduction to Github
- Github Wikis and Github Pages
- Introduction to Docker
- Introduction to Python for Data Science - RezBaz AZ 2022.
- Jupyter Notebooks
- Pandas for Data Analysis
- Exploratory Data Analysis with Python
- Low-code Data Exploration Tools
- Outlier Analysis and Anomalies Detection.
- Data Visualization with Python
- Introduction to Time Series Analysis
- Low-code Time Series Analysis
- Time Series Forecasting
- Overview of Machine Learning Algorithms
- Overview of Deep Learning Algorithms
- Introduction to Machine Learning with Scikit-Learn
Carlos Lizárraga, Data Lab, Data Science Institute, University of Arizona.