Search
Tutorial for “Operating systems” subject
Bash (Bourne Again SHell) is a command interpreter (shell) of project GNU, compatible to older Bourne shell (sh), adding useful functions from Korn shell (ksh) and C shell (csh). Shell is a primary interface for user interaction with a computer operating system. It's essential purpose is an execution of commands and external programs. It also provides set of commands for effective work with the system. It is possible to use Bash:
In the interactive mode, bash reads user commands from the standard input (prompt) instead of file, otherwise its behavior is identical in both modes.
Script in the bash language is a text file containing bash commands, separated by new line or semicolon. It is advised to write header on the very first line of the script, marking the file as a script and telling the language to interpret the script. On UNIX systems this header has a form
#!/bin/bash
Characters #! on the very first line (there should not be even empty line before) indicates the script. They are followed by the full path to the interpreter (/bin/bash in our case on Linux). You should take in account that the path to interpreter may vary on different systems (e.g. /usr/local/bin/bash on FreeBSD) and so the script portability is affected. This header is used when the script is executed by typing its name to command line (script has to have executable permission in this case). When the script is executed as an argument of bash interpreter, e.g.
/bin/bash ./my_script.sh
Basic function of the shell is external program execution and its syntax is relative or absolute path to the executable file (binary or script). Commands are separated by a new line or a semicolon. The # character is used for comments - anything on line after this character is ignored.
To exploit further functions of bash, variables will be needed.
There are no variable data types in bash, all variables are handled as strings. Variable is identified by its name, consisting of lower and upper-case letters of english alphabet, underscore character and numbers, while the name can not start with number. Variable names are case-sensitive, so “promenna” and “PROMENNA” are two different variable names.
Variable value assignment is realized by = operator:
=
A=24
A =24 # attempt to execute command "A" with parameter "=24" A= 24 # attempt to execute command "24", with the variable "A" pre-set to empty string in the scope of this command A = 24 # attempt to execute command "A" with two parameters "=" and "24"
echo $A # print value of variable "A" to standard output
echo ${A}
echo x$Ax # print character "x" + variable "Ax" value echo x${A}x # print character "x" + variable "A" value + character "x"
Besides ordinary string variables, bash allows to store data in arrays. It is possible to create an empty array either by explicit declaration
declare -a ARRAY_VAR
EMPTY_ARRAY=() # create an empty array in variable EMPTY_ARRAY COLORS=(red green blue) # create an array of three elements "red", "green" and "blue" in variable "COLORS"
${COLORS[0]} # this returns first array element value
${#COLORS[*]} # return COLORS array element number ${COLORS[*]} # return all elements as one string, where element values are separated by a space (or by first character of the $IFS variable) ${COLORS[@]} # return all elements as separate strings (quotes have to be used around this expression to be interpreted correctly)
COLORS+=(black white)
The scope of variable visibility is bounded by an instance of interpreter executing current script. It is possible to propagate variable values to child processes (e.g. when other script is executed from the current) using a command
export VARIABLE
Syntax of the bash language does not require to enclose strings in quotes, nevertheless it is always recommended. Quotes are advantageous in cases, when strings contain spaces, or characters with specific interpretation. Let's see an example:
A="text with spaces" program $A program "$A"
program text with spaces
When all elements of an array are to be passed as command parameters (every element as a single parameter), it is necessary to enclose the expression ${ARRAY[@]} in quotes:
program "${ARRAY[@]}"
Besides the quotes, it is possible to enclose strings in 'apostrophes'. The basic difference is in interpretation of the string content. In the quotes, all variables (specified after $ character) are replaced by the variable values. A string in apostrophes will always remain as it is written in the code.
echo "variable content: $A" prints string containing value of variable "A" echo 'variable name: $A' prints the string "variable name: $A"
Get used to always enclose variables in quotes. You can avoid many situations of script misbehave and unexpected errors appearing only in case of specific input.
Besides the quotes and apostrophes, bash syntax also uses inverted apostrophes, which are not used for string definition, but the string inside is interpreted as a command which is executed. The standard output of the command is then returned in place of inverted apostrophes expression.
DIRECTORY=`pwd` # execute command "pwd" (print work directory) and store output (path to current working directory) in variable DIRECTORY
DIRECTORY=$(pwd)
FILES=( `ls` ) # execute command "ls" and store every output line as a single element of an array FILES (note that in case of files containing spaces in the list output this command will not work as intended).
It is always required to parametrize script behavior using arguments provided on script execution. Sometimes an input data are passed as script parameters. Bash allows to read argument values using specific variables
$1 # value of 1st parameter $2 # value of 2nd parameter $3 # value of 3rd parameter etc...
Variables \$1 to \$9 may be used to obtain first nine parameters. To access following parameters, expression with curly braces (\${11}, \${12}, etc…) have to be used. It is often necessary to have a script handling variable number of parameters, whose order may also vary (e.g. script working with several files, whose paths are provided as parameters). In this case it is advantageous to use a command shift, removing 1st argument and shifting all the following arguments by one place forward (variable \$1 will contain value of second parameter). An auxiliary number of arguments may be processed by repetitive call of the shift command. The command shift also affects values of variables $# and \$*, \$@ respectively.
Bash provides several internal variables, allowing us to obtain useful information, or modify interpreter behavior.
One of them, besides the variables containing script arguments, is the \$? variable, containing exit value of the last executed command. In the case of executing a pipe of several parallel processes, the \$? variable contains exit value of the last command in the pipe and exit values of the remaining processes may be obtained from the variable \$PIPESTATUS, containing all exit values stored in an array.
Next useful variable is \$PWD, containing path to the current working directory.
Very useful variable may be the IFS (internal field separator), affecting splitting and concatenating of string in various operations. The variable IFS contains list of characters treated as separators. When a command is executed, which parses input string to an array or a list of arguments, the splitting occurs at the place of these separators. When more strings are concatenated into one, the single strings are separated by the first character of IFS (e.g. in expression “\$*” or “\${ARRAY[*]}” ). Default value of the IFS variable contains three characters: space, tab and new-line character. When an output of a “ls” command should be stored in an array, it is reasonable to set the IFS to new-line character only, which prevents braking of file names containing spaces:
OLD_IFS=$IFS # store old value of IFS IFS=$'\n' # this tores new-line character to IFS FILES=( `ls` ) # read all file names in the current directory and store them in the array "FILES" IFS=$OLD_IFS # it is reasonable to re-set the value of IFS to original value, so the behavior of following commands will not be affected
When working with files, particularly in the interactive mode, we will appreciate so called globs, forming patterns able to create list of files with similar names or paths. These patterns may contain following characters:
* (asterisk) - represents an arbitrary string ? (question mark) - represents an arbitrary character [...] (set of characters in brackets) - represents one character of the set
Every executed process, no exception to scripts, has a standard input, output and an error output, used for data interaction with other processes or an user. A simplest way of printing a string to the standard output offers command echo. More often we use output of other programs executed from our script. If the output of these executed programs is not redirected explicitly, it is automatically redirected to the output of the currently executed script. Printing to the error output may be achieved by simple redirecting standard output of echo (or any other) command:
echo "Error message" >&2
read VARIABLE
It is possible to terminate script execution unconditionally at any place by call of exit command. An optional argument of the command is a exit code returned to parent process (default exit code is 0).
Conditions in bash are realized by command if with syntax
if <command> then <command1> <command2> ... else <command3> ... fi
The if keyword is followed by a command to be executed. Depending on an exit code of the executed command the code after the then keyword (or after the else, if present) is interpreted. Exit code 0 is interpreted as a true value, all other exit codes as false. To evaluate conditions containing expressions with operators and variable values a command test may be used. It provides string operations (comparison, empty string test), integer algebraic relations and file/path operators (file existence, access rights, etc.). Command
test "$A" = "abcd"
if [ "$A" = "abcd" ]; then # condition is satisfied fi
if [ "$A"="abcd" ]; then ... ; fi
if [ $A = "abcd" ]; then ... ; fi
-n STRING ... string is non-empty -z STRING ... string is empty STRING1 = STRING2 ... two strings are equal STRING1 != STRING2 ... two strings are non-equal INTEGER1 -eq INTEGER2 ... two numbers are equal INTEGER1 -ne INTEGER2 ... two numbers are not equal INTEGER1 -gt INTEGER2 ... INTEGER1 is greater than INTEGER2 INTEGER1 -ge INTEGER2 ... INTEGER1 is greater or equal to INTEGER2 INTEGER1 -lt INTEGER2 ... INTEGER1 is lesser than INTEGER2 INTEGER1 -le INTEGER2 ... INTEGER1 is lesser or equal to INTEGER2 -e FILE ... file named FILE exists -f FILE ... ordinary file FILE exists -d FILE ... directory named FILE exists -r FILE ... file FILE exists and read permission is granted -w FILE ... file FILE exists and write permission is granted -L FILE ... file FILE is symbolic link
When comparison of variable with several values is needed, the case command may be used. Its syntax is
case $VARIABLE in value1) <command> <command> ... ;; value2) <command> ;; value3|value4) <command> ;; *) <command> ;; esac
Bash provides operators && a ||, allowing chaining of commands into lists with conditional execution. Interpretation of the list
<command1> && <command2>
executes command1 and if the zero exit code is returned, the command2 is executed. When operator || is used instead, the second command in the list is executed only if the first command returns non-zero exit code. These operators allows to write conditions very effectively in cases when only one command should (or should not) be executed regarding the result of the command. It is also possible to chain and combine these operators arbitrarily.
There are three types of loops used in bash: for, while and until.
The for loop allows to iterate through list of values. Its syntax is
for VARIABLE in list of values do <command> ... done
The while and until loops has an identical syntax and the difference between them lies only in a way the ending condition is evaluated:
while <command> # or until <command> do <command> ... done
Inside the do…done loop inner block it is possible to use commands break and continue affecting loop execution. The break command interrupts execution of the loop and continues by execution of commands after the loop. The continue command skips the rest of the loop inner block and continues by execution of the condition command at the beginning of the loop.
Variables in bash contain strings only. When an integral value is stored in a string, written as a sequence of decimal numbers, it is possible to use several commands to perform basic algebraic operations with these numbers. The expr command interprets its parameters as numbers, arguments and parentheses, forming an expression to be evaluated. The result of the evaluation is printed to standard output. Most useful operators are addition (+), subtraction (-), multiplication (*), integral division (/) and integral division remainder (%). It also provides some relational operators (but we may achieve that using test command) and some string operators (we will be able to do it easier way in bash).
When specifying expression as a parameters of the expr command, it is necessary to separate all operators and operands by spaces (they need to be passed as individual parameters), otherwise the expression will be wrong interpreted. It is not very convenient nor clear to call the expr command directly, so the shortened syntax $1) may be used in bash. This form provides better expression parsing, so the missing spaces between operators and operands are no longer a problem. Additionally, it automatically substitutes variable values for its names. The incrementation of integer may be written as
A=$(( A+1 ))
let A++
There are some basic string operations provided by bash for variable stored strings. A length of a string in the variable STRING may be obtained by expression
${#STRING}
${STRING:<index>:<length>}
echo ${STRING#begin} # cut string "begin" from the beginning of string in $STRING # (if $STRING starts with this string) and returns a remainder echo ${STRING%end} # cut string "end" from the end of string in $STRING # (if $STRING ends with this string) and returns a reminder
%%
A="abcdefghabcdefgh" echo ${A#*cde} # prints "fghabcdefgh" echo ${A##*cde} # prints "fgh" echo ${A%cde*} # prints "abcdefghab" echo ${A%%cde*} # prints "ab"
It is possible to replace a substring in a string by writing
${STRING/pattern/replacement} # prints string in variable $STRING, while substring "pattern" will be replaced by a string "replacement"
Character-per-character replacement in a text is possible using a tr tool. The tr command gets two character sets as parameters, specified as strings. It processes the text from the standard input and when character from the first set is found, it is replaced by a corresponding character from the second set.
A wc command may be used to calculate letters, words or lines of input text.
Group of commands may be enclosed in curly braces, which allows e.g. bulk input/output redirection for the whole group of commands:
{ <command> <command> ... } > file
Besides the curly braces parentheses may be used with the difference that command inside parentheses are executed in a separated instance of bash interpreter.
A function may be defined by the following two ways:
function function_name { <command> <command> ... }
function_name() { <command> ... }
Function call is simply done by writing its name, followed by arguments in a same way external script or program would be called.
Variables inside functions have a global scope. When a variable should have a local scope inside the function only, so the equally named variables outside the function are not affected, it is possible to declare a variable as a local one:
function fcn { local A A=1 } A=2 fcn echo $A # prints "2"
Call syntax of most of unix tools complies POSIX standard recommendations. When one-character options are processed, a getopts command may be helpful. Its syntax is
getopts options variable
while getopts ":abchn:s:" opt; do case $opt in a|b|c) echo "Option -${opt} activated" ;; n) echo "Option -n value is ${OPTARG}" ;; s) echo "Option -s value is ${OPTARG}" ;; h) echo "Usage:" echo "`basename $0` [-a] [-b] [-c] [-h] [-n <argument>] [-s <argument>]" ;; ?) echo "Invalid option -${OPTARG} supplied" ;; esac done shift $(($OPTIND - 1)) if [ $# -gt 0 ]; then echo "Non-option parameters: $*" fi
There is also a tool called getopt available, able to process long-format options (e.g. –help).
An eval command interprets a string in argument as a command to be executed. It is useful in cases when more complicated command for other tool is composed, e.g. according to supplied script parameters.
A find command searches a directory tree. Found files are either printed to standard output, or a specified command may be executed for every file. There are wide possibilities of filtering the results according to various criteria (e.g. file name or type). By using the find command a recursive implementation of directory browsing is avoided.
A dirname command prints path to the parent directory for a supplied file or directory. When the input path is relative, the printed output is also a relative path.
A basename command removes path to the parent directory from the supplied file or directory path, leaving only name of the file or directory.
A readlink command prints a path to the file a symbolic link is pointing to. Useful parameter -f allows to determine absolute path to the target file and may be also used to determine an absolute path of ordinary files.
A stat prints available information about a file: size, modification time, permissions and other I-node stored information. It is possible to arbitrary format an output of the command by its parameters.
A diff compares two supplied files and prints differences to the output. An output format may be specified by command parameters.
A seq generates sequence of numbers in specified range with a specified step.
head and tail commands prints first or last n lines of the supplied file, or the text on the standard input, where n is number supplied as a -n option argument.
A sort sorts lines of text (from the file or standard input) alphabetically. Sorting criteria may be changed by specifying various command parameters.
A uniq command copies input lines to output, discarding consecutive identical lines. A -c option allows counting of duplicate line occurrences.
A grep filters lines of input text (from file or standard input) according to its content. A required parameter of the command is a pattern to be matched. Lines containing a substring matching the pattern are printed to the output, remaining lines are discarded. A regular expression may be used as a pattern.
An awk is a tool processing an input text line by line, while a script in a proprietary language is applied to every line. This script consists of lines in form
<pattern> { <action> }
A sed is a text processing tool applying proprietary language script to an input text. It is very effective tool with wide range of applications. Useful is its ability to use regular expressions for text matching or replacement.