Bash scripting language

Tutorial for “Operating systems” subject

Bash (Bourne Again SHell) is a command interpreter (shell) of project GNU, compatible to older Bourne shell (sh), adding useful functions from Korn shell (ksh) and C shell (csh). Shell is a primary interface for user interaction with a computer operating system. It's essential purpose is an execution of commands and external programs. It also provides set of commands for effective work with the system. It is possible to use Bash:

interactively (it realizes command line interface), or
as a script interpreter.

In the interactive mode, bash reads user commands from the standard input (prompt) instead of file, otherwise its behavior is identical in both modes.

Script in the bash language is a text file containing bash commands, separated by new line or semicolon. It is advised to write header on the very first line of the script, marking the file as a script and telling the language to interpret the script. On UNIX systems this header has a form

#!/bin/bash

Characters #! on the very first line (there should not be even empty line before) indicates the script. They are followed by the full path to the interpreter (/bin/bash in our case on Linux). You should take in account that the path to interpreter may vary on different systems (e.g. /usr/local/bin/bash on FreeBSD) and so the script portability is affected. This header is used when the script is executed by typing its name to command line (script has to have executable permission in this case). When the script is executed as an argument of bash interpreter, e.g.

/bin/bash ./my_script.sh

the header is not necessary. Anyway, it is appropriate to specify a header always for later simple identification of the script language.

Basic function of the shell is external program execution and its syntax is relative or absolute path to the executable file (binary or script). Commands are separated by a new line or a semicolon. The # character is used for comments - anything on line after this character is ignored.

To exploit further functions of bash, variables will be needed.

Variables

There are no variable data types in bash, all variables are handled as strings. Variable is identified by its name, consisting of lower and upper-case letters of english alphabet, underscore character and numbers, while the name can not start with number. Variable names are case-sensitive, so “promenna” and “PROMENNA” are two different variable names.

Variable value assignment is realized by = operator:

A=24

This command assigns string “24” to variable A. When assigning value to a variable, it is important to remember there should be no spaces before and behind the = operator, otherwise the command would be wrong-interpreted:

A =24	# attempt to execute command "A" with parameter "=24"
A= 24	# attempt to execute command "24", with the variable "A" pre-set to empty string in the scope of this command
A = 24	# attempt to execute command "A" with two parameters "=" and "24"

When the variable value is needed (to be passed as a parameter, assigned to other variable, or to be used as a part of other string), the form $ + <variable name> is used, e.g.

echo $A	# print value of variable "A" to standard output

Alternatively, form with curly braces may be used:

echo ${A}

It may be necessary in some cases, e.g. when a letter should be printed right behind the variable value and this letter would be interpreted as a part or variable name:

echo x$Ax	# print character "x" + variable "Ax" value
echo x${A}x	# print character "x" + variable "A" value + character "x"

Arrays

Besides ordinary string variables, bash allows to store data in arrays. It is possible to create an empty array either by explicit declaration

declare -a ARRAY_VAR

or directly by assigning array of elements to the variable. Array elements are enclosed in parentheses:

EMPTY_ARRAY=()	  # create an empty array in variable EMPTY_ARRAY
COLORS=(red green blue)   # create an array of three elements "red", "green" and "blue" in variable "COLORS"

To access an array element the following expression is used:

${COLORS[0]}   # this returns first array element value

where the number in the brackets is the integral element index, starting from 0 for the first element. If the index and brackets were omitted, the first element value would be returned (so \$COLORS returns same value as \${COLORS[0]}). Following expressions return array length (number of elements), or all elements together:

${#COLORS[*]}	# return COLORS array element number
${COLORS[*]}	# return all elements as one string, where element values are separated by a space (or by first character of the $IFS variable) ${COLORS[@]}	# return all elements as separate strings (quotes have to be used around this expression to be interpreted correctly)

It is possible to add elements to an existing array using operator +=:

COLORS+=(black white)

export

The scope of variable visibility is bounded by an instance of interpreter executing current script. It is possible to propagate variable values to child processes (e.g. when other script is executed from the current) using a command

export VARIABLE

All executed child processes will then see this variable. In case of ordinary applications, these exported variables are accessible as “environment variables”.

Quotes

Syntax of the bash language does not require to enclose strings in quotes, nevertheless it is always recommended. Quotes are advantageous in cases, when strings contain spaces, or characters with specific interpretation. Let's see an example:

A="text with spaces"
program $A
program "$A"

If the quotes would be omitted in the assignment on the first line, the command execution would probably result in error due to unknown command “with” - there would be no way for interpreter to recognize words “with” and “spaces” as a part of the previous string and it would be handled as the command following a variable assignment. The second line is interpreted as

program text with spaces

which means, that the specified program will be executed with three arguments “text”, “with” and spaces, which is not our intention, apparently. Quotes on the third line keep the string together regardless its content and it is passed as a single parameter of the executed program as intended.

When all elements of an array are to be passed as command parameters (every element as a single parameter), it is necessary to enclose the expression ${ARRAY[@]} in quotes:

program "${ARRAY[@]}"

otherwise spaces in strings will cause splitting of strings to several parameters.

Besides the quotes, it is possible to enclose strings in 'apostrophes'. The basic difference is in interpretation of the string content. In the quotes, all variables (specified after $ character) are replaced by the variable values. A string in apostrophes will always remain as it is written in the code.

echo "variable content: $A"   prints string containing value of variable "A"
echo 'variable name: $A'   prints the string "variable name: $A"

It may be advantageous to combine quotes and apostrophes when a composition of string containing quotes or apostrophes is needed. This is often the case of composing commands for executed tools.

Recommendation:

Get used to always enclose variables in quotes. You can avoid many situations of script misbehave and unexpected errors appearing only in case of specific input.

Besides the quotes and apostrophes, bash syntax also uses inverted apostrophes, which are not used for string definition, but the string inside is interpreted as a command which is executed. The standard output of the command is then returned in place of inverted apostrophes expression.

DIRECTORY=`pwd`	# execute command "pwd" (print work directory) and store output (path to current working directory) in variable DIRECTORY

The same functionality may be achieved by an alternative syntax:

DIRECTORY=$(pwd)

When several lines of output are to be stored in an array (every line as a single array element), the following command may be used:

FILES=( `ls` )	# execute command "ls" and store every output line as a single element of an array FILES (note that in case of files containing spaces in the list output this command will not work as intended).

Script argument handling

It is always required to parametrize script behavior using arguments provided on script execution. Sometimes an input data are passed as script parameters. Bash allows to read argument values using specific variables

$1	# value of 1st parameter
$2      # value of 2nd parameter
$3      # value of 3rd parameter
etc...

In the variable \$0 is stored command of current script execution (path to the script, without parameters). Variables \$* and \$@ allows working with all parameters together, either as a single string or as a list of strings. Difference between \$* and \$@ is analogous to work with array elements. Variable \$# contains number of all parameters.

Variables \$1 to \$9 may be used to obtain first nine parameters. To access following parameters, expression with curly braces (\${11}, \${12}, etc…) have to be used. It is often necessary to have a script handling variable number of parameters, whose order may also vary (e.g. script working with several files, whose paths are provided as parameters). In this case it is advantageous to use a command shift, removing 1st argument and shifting all the following arguments by one place forward (variable \$1 will contain value of second parameter). An auxiliary number of arguments may be processed by repetitive call of the shift command. The command shift also affects values of variables $# and \$*, \$@ respectively.

Bash internal variables

Bash provides several internal variables, allowing us to obtain useful information, or modify interpreter behavior.

One of them, besides the variables containing script arguments, is the \$? variable, containing exit value of the last executed command. In the case of executing a pipe of several parallel processes, the \$? variable contains exit value of the last command in the pipe and exit values of the remaining processes may be obtained from the variable \$PIPESTATUS, containing all exit values stored in an array.

Next useful variable is \$PWD, containing path to the current working directory.

Very useful variable may be the IFS (internal field separator), affecting splitting and concatenating of string in various operations. The variable IFS contains list of characters treated as separators. When a command is executed, which parses input string to an array or a list of arguments, the splitting occurs at the place of these separators. When more strings are concatenated into one, the single strings are separated by the first character of IFS (e.g. in expression “\$*” or “\${ARRAY[*]}” ). Default value of the IFS variable contains three characters: space, tab and new-line character. When an output of a “ls” command should be stored in an array, it is reasonable to set the IFS to new-line character only, which prevents braking of file names containing spaces:

OLD_IFS=$IFS	# store old value of IFS
IFS=$'\n'	# this tores new-line character to IFS
FILES=( `ls` )	# read all file names in the current directory and store them in the array "FILES"
IFS=$OLD_IFS	# it is reasonable to re-set the value of IFS to original value, so the behavior of following commands will not be affected

Globs (wildcards)

When working with files, particularly in the interactive mode, we will appreciate so called globs, forming patterns able to create list of files with similar names or paths. These patterns may contain following characters:

* (asterisk) - represents an arbitrary string
? (question mark) - represents an arbitrary character
[...] (set of characters in brackets) - represents one character of the set

When a string containing these characters is parsed by a bash interpreter, all file names matching the specified pattern are found, and list of found file names is inserted in the place of the original pattern. This substitution is applied only if the * character is NOT enclosed in a quotes - otherwise it will be treated as an ordinary character. It is obvious that glob using inside scripts may not be reasonable, considering a misbehavior in a situation when script processes files containing spaces in names.

Input and output

Every executed process, no exception to scripts, has a standard input, output and an error output, used for data interaction with other processes or an user. A simplest way of printing a string to the standard output offers command echo. More often we use output of other programs executed from our script. If the output of these executed programs is not redirected explicitly, it is automatically redirected to the output of the currently executed script. Printing to the error output may be achieved by simple redirecting standard output of echo (or any other) command:

echo "Error message" >&2

One line from the standard input may be read by command read. A syntax of this command is

read VARIABLE

This example reads one line (terminated by new-line character) and stores it in the variable named VARIABLE. When there are no data in the input buffer, the read command blocks and waits until new-line character arrives. In a case of error, or end of a file redirected to the standard input, the command returns non-zero exit code. When multi-line input is to be processed, repetitive call of the command is needed.

Conditions and loops

It is possible to terminate script execution unconditionally at any place by call of exit command. An optional argument of the command is a exit code returned to parent process (default exit code is 0).

Conditions

Conditions in bash are realized by command if with syntax

if <command>
then
  <command1>
  <command2>
  ...
else
  <command3>
  ...
fi

The if keyword is followed by a command to be executed. Depending on an exit code of the executed command the code after the then keyword (or after the else, if present) is interpreted. Exit code 0 is interpreted as a true value, all other exit codes as false. To evaluate conditions containing expressions with operators and variable values a command test may be used. It provides string operations (comparison, empty string test), integer algebraic relations and file/path operators (file existence, access rights, etc.). Command

test "$A" = "abcd"

returns exit code 0, when the value of variable “A” is the string “abcd”. More convenient and efficient syntax of test command call uses brackets:

if [ "$A" = "abcd" ]; then
  # condition is satisfied
fi

The command test expects all operators and operands specified as separated arguments. It is necessary to consistently separate operators and operands by spaces. If the spaces are omitted, e.g.

if [ "$A"="abcd" ]; then ... ; fi

command test obtains single parameter “…=abcd” and returns 0 regardless value of variable A, because equality relation is not recognized in parameter and whole parameter is handled as general string. All variables should be also rigorously quoted, otherwise empty string in the variable may cause error:

if [ $A = "abcd" ]; then ... ; fi

This command will work as expected, until value of A is empty. When A contains empty string, the test command gets only two arguments “=” and “abcd” and returns error, because equality operator expects left and right argument. Useful operators of the test command are:

-n STRING ... string is non-empty
-z STRING ... string is empty
STRING1 = STRING2 ... two strings are equal
STRING1 != STRING2 ... two strings are non-equal
INTEGER1 -eq INTEGER2 ... two numbers are equal
INTEGER1 -ne INTEGER2 ... two numbers are not equal
INTEGER1 -gt INTEGER2 ... INTEGER1 is greater than INTEGER2
INTEGER1 -ge INTEGER2 ... INTEGER1 is greater or equal to INTEGER2
INTEGER1 -lt INTEGER2 ... INTEGER1 is lesser than INTEGER2
INTEGER1 -le INTEGER2 ... INTEGER1 is lesser or equal to INTEGER2
-e FILE ... file named FILE exists
-f FILE ... ordinary file FILE exists
-d FILE ... directory named FILE exists
-r FILE ... file FILE exists and read permission is granted
-w FILE ... file FILE exists and write permission is granted
-L FILE ... file FILE is symbolic link

The test command also allows conjunction and disjunction of expressions using operators -a (AND) and -o (OR) and closing parts of expression in parentheses.

When comparison of variable with several values is needed, the case command may be used. Its syntax is

case $VARIABLE in
value1)
  <command>
  <command>
  ...
 ;;
value2)
  <command>
 ;;
value3|value4)
  <command>
 ;;
*)
  <command>
 ;;
esac

Value of the specified variable is compared to particular values. In the case of match, a corresponding block of code is executed until ending double semicolon characters. Pipe character may be used to specify several values for single block. When no value is matching, *) block is executed, if present.

List of commands

Bash provides operators && a ||, allowing chaining of commands into lists with conditional execution. Interpretation of the list

<command1> && <command2>

executes command1 and if the zero exit code is returned, the command2 is executed. When operator || is used instead, the second command in the list is executed only if the first command returns non-zero exit code. These operators allows to write conditions very effectively in cases when only one command should (or should not) be executed regarding the result of the command. It is also possible to chain and combine these operators arbitrarily.

Loops

There are three types of loops used in bash: for, while and until.

The for loop allows to iterate through list of values. Its syntax is

for VARIABLE in list of values
do
  <command>
  ...
done

The for command assigns individual elements of the list (array or IFS-separated string) into variable VARIABLE one by one and for every assignment the list of commands inside the block do…done is executed.

The while and until loops has an identical syntax and the difference between them lies only in a way the ending condition is evaluated:

while <command> # or until <command>
do
  <command>
  ...
done

In the beginning of the loop the command following the while or until keyword is executed. The while loop is executed while the command returns 0, whereas the until is executed while the exit code is non-zero. It is possible to combine while and until loops with the test commands in the same way as in the conditions.

Inside the do…done loop inner block it is possible to use commands break and continue affecting loop execution. The break command interrupts execution of the loop and continues by execution of commands after the loop. The continue command skips the rest of the loop inner block and continues by execution of the condition command at the beginning of the loop.

Arithmetic operations

Variables in bash contain strings only. When an integral value is stored in a string, written as a sequence of decimal numbers, it is possible to use several commands to perform basic algebraic operations with these numbers. The expr command interprets its parameters as numbers, arguments and parentheses, forming an expression to be evaluated. The result of the evaluation is printed to standard output. Most useful operators are addition (+), subtraction (-), multiplication (*), integral division (/) and integral division remainder (%). It also provides some relational operators (but we may achieve that using test command) and some string operators (we will be able to do it easier way in bash).

When specifying expression as a parameters of the expr command, it is necessary to separate all operators and operands by spaces (they need to be passed as individual parameters), otherwise the expression will be wrong interpreted. It is not very convenient nor clear to call the expr command directly, so the shortened syntax $¹⁾ may be used in bash. This form provides better expression parsing, so the missing spaces between operators and operands are no longer a problem. Additionally, it automatically substitutes variable values for its names. The incrementation of integer may be written as

A=$(( A+1 ))

Other possibility of arithmetic evaluations provides bash internal command let. It supports also logic and bit operators and increment/decrement operators. Incrementation of bash variable may be written very efficiently using let

let A++

String operations

There are some basic string operations provided by bash for variable stored strings. A length of a string in the variable STRING may be obtained by expression

${#STRING}

A sub-string of a string, defined by start index (0 for first character) and length, may be extracted by expression

${STRING:<index>:<length>}

It is also possible to remove string beginning or end, if it matches specified string or pattern:

echo ${STRING#begin} # cut string "begin" from the beginning of string in $STRING
                        # (if $STRING starts with this string) and returns a remainder
echo ${STRING%end} # cut string "end" from the end of string in $STRING
                        # (if $STRING ends with this string) and returns a reminder

These operators are much more powerful in combination with character *, allowing composition of patterns to be matched. It is possible to affect an greed of the * character by doubling character # to ## or % to %% respectively. When the operator is doubled, * tries to match maximal-length string while matching the pattern.

A="abcdefghabcdefgh"
echo ${A#*cde}     # prints "fghabcdefgh"
echo ${A##*cde}     # prints "fgh"
echo ${A%cde*}     # prints "abcdefghab"
echo ${A%%cde*}     # prints "ab"

It is possible to replace a substring in a string by writing

${STRING/pattern/replacement}   # prints string in variable $STRING, while substring "pattern" will be replaced by a string "replacement"

This expression is able to replacement the first occurrence of pattern only. A * character may be used in a pattern, matching any string. External tools may be used to perform more string operations. A cut tool extracts characters, words, or arbitrary-delimited strings from an input line by specifying an index of a record on the line.

Character-per-character replacement in a text is possible using a tr tool. The tr command gets two character sets as parameters, specified as strings. It processes the text from the standard input and when character from the first set is found, it is replaced by a corresponding character from the second set.

A wc command may be used to calculate letters, words or lines of input text.

Grouping commands

Group of commands may be enclosed in curly braces, which allows e.g. bulk input/output redirection for the whole group of commands:

{
  <command>
  <command>
  ...
} > file

Besides the curly braces parentheses may be used with the difference that command inside parentheses are executed in a separated instance of bash interpreter.

Functions

A function may be defined by the following two ways:

function function_name
{
  <command>
  <command>
  ...
}

or

function_name()
{
  <command>
  ...
}

There is no number of parameters defined for a function. The parameter values may be accessed using variables \$1, \$2, etc. in the function body, similarly to accessing script parameters. A meaning of \$#, \$*, \$@ and the shift command is altered analogically. It also means, it is not possible to access script parameters inside the function body. A function exit code matches an exit code of the last command executed in the function. It is possible to interrupt the function using return command, whose optional parameter is returned exit code.

Function call is simply done by writing its name, followed by arguments in a same way external script or program would be called.

Variables inside functions have a global scope. When a variable should have a local scope inside the function only, so the equally named variables outside the function are not affected, it is possible to declare a variable as a local one:

function fcn
{
 local A
 A=1
}

A=2
fcn
echo $A	  # prints "2"

Script argument options parsing

Call syntax of most of unix tools complies POSIX standard recommendations. When one-character options are processed, a getopts command may be helpful. Its syntax is

getopts options variable

The first parameter (options) is a string containing list of letters of supported options. If an option has an argument, the colon character is following the letter. Colon at the beginning of options string enables quiet mode, in which parsing errors are not reported when invalid option is supplied. The second parameter is a variable name, where recognized option letter is stored. Every call of the getopts command processes one option, hence it is necessary to call it in a loop. Command exit code is zero as long as the options in form -<letter> are parsed from script parameters. Usage of the command is clear from the following example:

while getopts ":abchn:s:" opt; do
  case $opt in
  a|b|c) echo "Option -${opt} activated"
    ;;
  n) echo "Option -n value is ${OPTARG}"
    ;;
  s) echo "Option -s value is ${OPTARG}"
    ;;
  h) echo "Usage:"
    echo "`basename $0` [-a] [-b] [-c] [-h] [-n <argument>] [-s <argument>]"
    ;;
  ?) echo "Invalid option -${OPTARG} supplied"
    ;;
  esac
done
shift $(($OPTIND - 1))
if [ $# -gt 0 ]; then
  echo "Non-option parameters: $*"
fi

This script recognizes argument-free options -a, -b, -c, and -h, options with arguments -n and -s and arbitrary number of other parameters (not starting with '-' character (in the presented script options have to precede other parameters). There is no option order specified. The getopts command sets variables \$OPTARG (argument value of the current option) and \$OPTIND (index of the currently processed script parameter).

There is also a tool called getopt available, able to process long-format options (e.g. –help).

More useful commands and tools

eval

An eval command interprets a string in argument as a command to be executed. It is useful in cases when more complicated command for other tool is composed, e.g. according to supplied script parameters.

find

A find command searches a directory tree. Found files are either printed to standard output, or a specified command may be executed for every file. There are wide possibilities of filtering the results according to various criteria (e.g. file name or type). By using the find command a recursive implementation of directory browsing is avoided.

dirname

A dirname command prints path to the parent directory for a supplied file or directory. When the input path is relative, the printed output is also a relative path.

basename

A basename command removes path to the parent directory from the supplied file or directory path, leaving only name of the file or directory.

readlink

A readlink command prints a path to the file a symbolic link is pointing to. Useful parameter -f allows to determine absolute path to the target file and may be also used to determine an absolute path of ordinary files.

stat

A stat prints available information about a file: size, modification time, permissions and other I-node stored information. It is possible to arbitrary format an output of the command by its parameters.

diff

A diff compares two supplied files and prints differences to the output. An output format may be specified by command parameters.

seq

A seq generates sequence of numbers in specified range with a specified step.

head and tail

head and tail commands prints first or last n lines of the supplied file, or the text on the standard input, where n is number supplied as a -n option argument.

sort

A sort sorts lines of text (from the file or standard input) alphabetically. Sorting criteria may be changed by specifying various command parameters.

uniq

A uniq command copies input lines to output, discarding consecutive identical lines. A -c option allows counting of duplicate line occurrences.

grep

A grep filters lines of input text (from file or standard input) according to its content. A required parameter of the command is a pattern to be matched. Lines containing a substring matching the pattern are printed to the output, remaining lines are discarded. A regular expression may be used as a pattern.

awk

An awk is a tool processing an input text line by line, while a script in a proprietary language is applied to every line. This script consists of lines in form

<pattern> { <action> }

The pattern is a regular expression to be matched, or a condition to be evaluated for every input line. When a pattern is matching, the action is applied to the line. Actions are composed of awk language commands. Input lines are automatically divided by the specified separator into fields, which simplifies text processing. The awk supports variables, arrays, arithmetic operations and program constructs similar to C language (if, while, for, etc.). Additionally it provides useful text processing functions or math functions.

sed

A sed is a text processing tool applying proprietary language script to an input text. It is very effective tool with wide range of applications. Useful is its ability to use regular expressions for text matching or replacement.

References and links to other sources

¹⁾

expression

Table of Contents