Manipulating Strings
Bash supports a surprising number of string manipulation
operations. Unfortunately, these tools lack
a unified focus. Some are a subset of parameter substitution, and
others fall under the functionality of the UNIX expr command. This results in
inconsistent command syntax and overlap of functionality,
not to mention confusion. String Length - ${#string}
- expr length $string
- expr "$string" : '.*'
stringZ=abcABC123ABCabc
echo ${#stringZ} # 15
echo `expr length $stringZ` # 15
echo `expr "$stringZ" : '.*'` # 15 |
Example 9-10. Inserting a blank line between paragraphs in a text file #!/bin/bash
# paragraph-space.sh
# Inserts a blank line between paragraphs of a single-spaced text file.
# Usage: $0 <FILENAME
MINLEN=45 # May need to change this value.
# Assume lines shorter than $MINLEN characters
#+ terminate a paragraph.
while read line # For as many lines as the input file has...
do
echo "$line" # Output the line itself.
len=${#line}
if [ "$len" -lt "$MINLEN" ]
then echo # Add a blank line after short line.
fi
done
exit 0 |
Length of Matching Substring at Beginning of String - expr match "$string" '$substring'
$substring is a regular expression. - expr "$string" : '$substring'
$substring is a regular
expression.
stringZ=abcABC123ABCabc
# |------|
echo `expr match "$stringZ" 'abc[A-Z]*.2'` # 8
echo `expr "$stringZ" : 'abc[A-Z]*.2'` # 8 |
Index - expr index $string $substring
Numerical position in $string of first character in
$substring that matches. stringZ=abcABC123ABCabc
echo `expr index "$stringZ" C12` # 6
# C position.
echo `expr index "$stringZ" 1c` # 3
# 'c' (in #3 position) matches before '1'. |
This is the near equivalent of
strchr() in C.
Substring Extraction - ${string:position}
Extracts substring from $string at
$position. If the $string parameter is
"*"
or "@", then this extracts the
positional parameters,
starting at $position. - ${string:position:length}
Extracts $length characters
of substring from $string at
$position. stringZ=abcABC123ABCabc
# 0123456789.....
# 0-based indexing.
echo ${stringZ:0} # abcABC123ABCabc
echo ${stringZ:1} # bcABC123ABCabc
echo ${stringZ:7} # 23ABCabc
echo ${stringZ:7:3} # 23A
# Three characters of substring.
# Is it possible to index from the right end of the string?
echo ${stringZ:-4} # abcABC123ABCabc
# Defaults to full string, as in ${parameter:-default}.
# However . . .
echo ${stringZ:(-4)} # Cabc
echo ${stringZ: -4} # Cabc
# Now, it works.
# Parentheses or added space "escape" the position parameter.
# Thank you, Dan Jacobson, for pointing this out. |
If the $string parameter is
"*" or
"@", then this extracts a maximum
of $length positional parameters, starting
at $position. echo ${*:2} # Echoes second and following positional parameters.
echo ${@:2} # Same as above.
echo ${*:2:3} # Echoes three positional parameters, starting at second. |
- expr substr $string $position $length
Extracts $length characters
from $string starting at
$position. stringZ=abcABC123ABCabc
# 123456789......
# 1-based indexing.
echo `expr substr $stringZ 1 2` # ab
echo `expr substr $stringZ 4 3` # ABC |
- expr match "$string" '\($substring\)'
Extracts $substring
at beginning of $string,
where $substring is a regular expression. - expr "$string" : '\($substring\)'
Extracts $substring
at beginning of $string,
where $substring is a regular
expression. stringZ=abcABC123ABCabc
# =======
echo `expr match "$stringZ" '\(.[b-c]*[A-Z]..[0-9]\)'` # abcABC1
echo `expr "$stringZ" : '\(.[b-c]*[A-Z]..[0-9]\)'` # abcABC1
echo `expr "$stringZ" : '\(.......\)'` # abcABC1
# All of the above forms give an identical result. |
- expr match "$string" '.*\($substring\)'
Extracts $substring
at end of
$string, where
$substring is a regular
expression. - expr "$string" : '.*\($substring\)'
Extracts $substring
at end of $string,
where $substring is a regular
expression. stringZ=abcABC123ABCabc
# ======
echo `expr match "$stringZ" '.*\([A-C][A-C][A-C][a-c]*\)'` # ABCabc
echo `expr "$stringZ" : '.*\(......\)'` # ABCabc |
Substring Removal - ${string#substring}
Strips shortest match of
$substring from
front of
$string. - ${string##substring}
Strips longest match of
$substring from
front of
$string.
stringZ=abcABC123ABCabc
# |----|
# |----------|
echo ${stringZ#a*C} # 123ABCabc
# Strip out shortest match between 'a' and 'C'.
echo ${stringZ##a*C} # abc
# Strip out longest match between 'a' and 'C'. |
- ${string%substring}
Strips shortest match of
$substring from
back of
$string. - ${string%%substring}
Strips longest match of
$substring from
back of
$string.
stringZ=abcABC123ABCabc
# ||
# |------------|
echo ${stringZ%b*c} # abcABC123ABCa
# Strip out shortest match between 'b' and 'c', from back of $stringZ.
echo ${stringZ%%b*c} # a
# Strip out longest match between 'b' and 'c', from back of $stringZ. |
Example 9-11. Converting graphic file formats, with filename change #!/bin/bash
# cvt.sh:
# Converts all the MacPaint image files in a directory to "pbm" format.
# Uses the "macptopbm" binary from the "netpbm" package,
#+ which is maintained by Brian Henderson (bryanh@giraffe-data.com).
# Netpbm is a standard part of most Linux distros.
OPERATION=macptopbm
SUFFIX=pbm # New filename suffix.
if [ -n "$1" ]
then
directory=$1 # If directory name given as a script argument...
else
directory=$PWD # Otherwise use current working directory.
fi
# Assumes all files in the target directory are MacPaint image files,
#+ with a ".mac" filename suffix.
for file in $directory/* # Filename globbing.
do
filename=${file%.*c} # Strip ".mac" suffix off filename
#+ ('.*c' matches everything
#+ between '.' and 'c', inclusive).
$OPERATION $file > "$filename.$SUFFIX"
# Redirect conversion to new filename.
rm -f $file # Delete original files after converting.
echo "$filename.$SUFFIX" # Log what is happening to stdout.
done
exit 0
# Exercise:
# --------
# As it stands, this script converts *all* the files in the current
#+ working directory.
# Modify it to work *only* on files with a ".mac" suffix. |
A simple emulation of getopt
using substring extraction constructs. Example 9-12. Emulating getopt #!/bin/bash
# getopt-simple.sh
# Author: Chris Morgan
# Used in the ABS Guide with permission.
getopt_simple()
{
echo "getopt_simple()"
echo "Parameters are '$*'"
until [ -z "$1" ]
do
echo "Processing parameter of: '$1'"
if [ ${1:0:1} = '/' ]
then
tmp=${1:1} # Strip off leading '/' . . .
parameter=${tmp%%=*} # Extract name.
value=${tmp##*=} # Extract value.
echo "Parameter: '$parameter', value: '$value'"
eval $parameter=$value
fi
shift
done
}
# Pass all options to getopt_simple().
getopt_simple $*
echo "test is '$test'"
echo "test2 is '$test2'"
exit 0
---
sh getopt_example.sh /test=value1 /test2=value2
Parameters are '/test=value1 /test2=value2'
Processing parameter of: '/test=value1'
Parameter: 'test', value: 'value1'
Processing parameter of: '/test2=value2'
Parameter: 'test2', value: 'value2'
test is 'value1'
test2 is 'value2' |
Substring Replacement - ${string/substring/replacement}
Replace first match of
$substring with
$replacement. - ${string//substring/replacement}
Replace all matches of
$substring with
$replacement.
stringZ=abcABC123ABCabc
echo ${stringZ/abc/xyz} # xyzABC123ABCabc
# Replaces first match of 'abc' with 'xyz'.
echo ${stringZ//abc/xyz} # xyzABC123ABCxyz
# Replaces all matches of 'abc' with # 'xyz'. |
- ${string/#substring/replacement}
If $substring matches
front end of
$string, substitute
$replacement for
$substring. - ${string/%substring/replacement}
If $substring matches
back end of
$string, substitute
$replacement for
$substring.
stringZ=abcABC123ABCabc
echo ${stringZ/#abc/XYZ} # XYZABC123ABCabc
# Replaces front-end match of 'abc' with 'XYZ'.
echo ${stringZ/%abc/XYZ} # abcABC123ABCXYZ
# Replaces back-end match of 'abc' with 'XYZ'. |
9.2.1. Manipulating strings using awkA Bash script may invoke the string manipulation facilities of
awk as an alternative to using its
built-in operations. Example 9-13. Alternate ways of extracting substrings #!/bin/bash
# substring-extraction.sh
String=23skidoo1
# 012345678 Bash
# 123456789 awk
# Note different string indexing system:
# Bash numbers first character of string as '0'.
# Awk numbers first character of string as '1'.
echo ${String:2:4} # position 3 (0-1-2), 4 characters long
# skid
# The awk equivalent of ${string:pos:length} is substr(string,pos,length).
echo | awk '
{ print substr("'"${String}"'",3,4) # skid
}
'
# Piping an empty "echo" to awk gives it dummy input,
#+ and thus makes it unnecessary to supply a filename.
exit 0 |
Parameter Substitution
Manipulating and/or expanding variables - ${parameter}
Same as $parameter, i.e.,
value of the variable
parameter.
In certain contexts, only the less ambiguous
${parameter} form
works. May be used for concatenating variables with strings. your_id=${USER}-on-${HOSTNAME}
echo "$your_id"
#
echo "Old \$PATH = $PATH"
PATH=${PATH}:/opt/bin #Add /opt/bin to $PATH for duration of script.
echo "New \$PATH = $PATH" |
- ${parameter-default}, ${parameter:-default}
If parameter not set, use default. echo ${username-`whoami`}
# Echoes the result of `whoami`, if variable $username is still unset. |
| ${parameter-default}
and ${parameter:-default}
are almost equivalent. The extra : makes
a difference only when parameter
has been declared, but is null. |
#!/bin/bash
# param-sub.sh
# Whether a variable has been declared
#+ affects triggering of the default option
#+ even if the variable is null.
username0=
echo "username0 has been declared, but is set to null."
echo "username0 = ${username0-`whoami`}"
# Will not echo.
echo
echo username1 has not been declared.
echo "username1 = ${username1-`whoami`}"
# Will echo.
username2=
echo "username2 has been declared, but is set to null."
echo "username2 = ${username2:-`whoami`}"
# ^
# Will echo because of :- rather than just - in condition test.
# Compare to first instance, above.
#
# Once again:
variable=
# variable has been declared, but is set to null.
echo "${variable-0}" # (no output)
echo "${variable:-1}" # 1
# ^
unset variable
echo "${variable-2}" # 2
echo "${variable:-3}" # 3
exit 0 |
The default parameter construct
finds use in providing "missing" command-line
arguments in scripts. DEFAULT_FILENAME=generic.data
filename=${1:-$DEFAULT_FILENAME}
# If not otherwise specified, the following command block operates
#+ on the file "generic.data".
#
# Commands follow. |
See also Example 3-4, Example 28-2, and Example A-6. Compare this method with using an and
list to supply a default command-line
argument. - ${parameter=default}, ${parameter:=default}
If parameter not set, set it to default. Both forms nearly equivalent. The :
makes a difference only when $parameter
has been declared and is null,
as above.
echo ${username=`whoami`}
# Variable "username" is now set to `whoami`. |
- ${parameter+alt_value}, ${parameter:+alt_value}
If parameter set, use
alt_value, else use null
string. Both forms nearly equivalent. The :
makes a difference only when parameter
has been declared and is null, see below. echo "###### \${parameter+alt_value} ########"
echo
a=${param1+xyz}
echo "a = $a" # a =
param2=
a=${param2+xyz}
echo "a = $a" # a = xyz
param3=123
a=${param3+xyz}
echo "a = $a" # a = xyz
echo
echo "###### \${parameter:+alt_value} ########"
echo
a=${param4:+xyz}
echo "a = $a" # a =
param5=
a=${param5:+xyz}
echo "a = $a" # a =
# Different result from a=${param5+xyz}
param6=123
a=${param6+xyz}
echo "a = $a" # a = xyz |
- ${parameter?err_msg}, ${parameter:?err_msg}
If parameter set, use it, else print err_msg. Both forms nearly equivalent. The :
makes a difference only when parameter
has been declared and is null, as above.
Example 9-14. Using parameter substitution and error messages #!/bin/bash
# Check some of the system's environmental variables.
# This is good preventative maintenance.
# If, for example, $USER, the name of the person at the console, is not set,
#+ the machine will not recognize you.
: ${HOSTNAME?} ${USER?} ${HOME?} ${MAIL?}
echo
echo "Name of the machine is $HOSTNAME."
echo "You are $USER."
echo "Your home directory is $HOME."
echo "Your mail INBOX is located in $MAIL."
echo
echo "If you are reading this message,"
echo "critical environmental variables have been set."
echo
echo
# ------------------------------------------------------
# The ${variablename?} construction can also check
#+ for variables set within the script.
ThisVariable=Value-of-ThisVariable
# Note, by the way, that string variables may be set
#+ to characters disallowed in their names.
: ${ThisVariable?}
echo "Value of ThisVariable is $ThisVariable".
echo
echo
: ${ZZXy23AB?"ZZXy23AB has not been set."}
# If ZZXy23AB has not been set,
#+ then the script terminates with an error message.
# You can specify the error message.
# : ${variablename?"ERROR MESSAGE"}
# Same result with: dummy_variable=${ZZXy23AB?}
# dummy_variable=${ZZXy23AB?"ZXy23AB has not been set."}
#
# echo ${ZZXy23AB?} >/dev/null
# Compare these methods of checking whether a variable has been set
#+ with "set -u" . . .
echo "You will not see this message, because script already terminated."
HERE=0
exit $HERE # Will NOT exit here.
# In fact, this script will return an exit status (echo $?) of 1. |
Example 9-15. Parameter substitution and "usage" messages #!/bin/bash
# usage-message.sh
: ${1?"Usage: $0 ARGUMENT"}
# Script exits here if command-line parameter absent,
#+ with following error message.
# usage-message.sh: 1: Usage: usage-message.sh ARGUMENT
echo "These two lines echo only if command-line parameter given."
echo "command line parameter = \"$1\""
exit 0 # Will exit here only if command-line parameter present.
# Check the exit status, both with and without command-line parameter.
# If command-line parameter present, then "$?" is 0.
# If not, then "$?" is 1. |
Variable length / Substring removal - ${#var}
String length (number
of characters in $var). For
an array,
${#array} is the length of the
first element in the array. | Exceptions:
${#*} and
${#@} give the number
of positional parameters.
For an array, ${#array[*]} and
${#array[@]} give the number
of elements in the array.
|
Example 9-16. Length of a variable #!/bin/bash
# length.sh
E_NO_ARGS=65
if [ $# -eq 0 ] # Must have command-line args to demo script.
then
echo "Please invoke this script with one or more command-line arguments."
exit $E_NO_ARGS
fi
var01=abcdEFGH28ij
echo "var01 = ${var01}"
echo "Length of var01 = ${#var01}"
# Now, let's try embedding a space.
var02="abcd EFGH28ij"
echo "var02 = ${var02}"
echo "Length of var02 = ${#var02}"
echo "Number of command-line arguments passed to script = ${#@}"
echo "Number of command-line arguments passed to script = ${#*}"
exit 0 |
- ${var#Pattern}, ${var##Pattern}
Remove from $var
the shortest/longest part of $Pattern
that matches the front end
of $var.
A usage illustration from Example A-7:
# Function from "days-between.sh" example.
# Strips leading zero(s) from argument passed.
strip_leading_zero () # Strip possible leading zero(s)
{ #+ from argument passed.
return=${1#0} # The "1" refers to "$1" -- passed arg.
} # The "0" is what to remove from "$1" -- strips zeros. |
Manfred Schwarb's more elaborate variation of the above:
strip_leading_zero2 () # Strip possible leading zero(s), since otherwise
{ # Bash will interpret such numbers as octal values.
shopt -s extglob # Turn on extended globbing.
local val=${1##+(0)} # Use local variable, longest matching series of 0's.
shopt -u extglob # Turn off extended globbing.
_strip_leading_zero2=${val:-0}
# If input was 0, return 0 instead of "".
} |
Another usage illustration:
echo `basename $PWD` # Basename of current working directory.
echo "${PWD##*/}" # Basename of current working directory.
echo
echo `basename $0` # Name of script.
echo $0 # Name of script.
echo "${0##*/}" # Name of script.
echo
filename=test.data
echo "${filename##*.}" # data
# Extension of filename. |
- ${var%Pattern}, ${var%%Pattern}
Remove from $var
the shortest/longest part of $Pattern
that matches the back end
of $var.
Version 2 of Bash added
additional options. Example 9-17. Pattern matching in parameter substitution #!/bin/bash
# patt-matching.sh
# Pattern matching using the # ## % %% parameter substitution operators.
var1=abcd12345abc6789
pattern1=a*c # * (wild card) matches everything between a - c.
echo
echo "var1 = $var1" # abcd12345abc6789
echo "var1 = ${var1}" # abcd12345abc6789
# (alternate form)
echo "Number of characters in ${var1} = ${#var1}"
echo
echo "pattern1 = $pattern1" # a*c (everything between 'a' and 'c')
echo "--------------"
echo '${var1#$pattern1} =' "${var1#$pattern1}" # d12345abc6789
# Shortest possible match, strips out first 3 characters abcd12345abc6789
# ^^^^^ |-|
echo '${var1##$pattern1} =' "${var1##$pattern1}" # 6789
# Longest possible match, strips out first 12 characters abcd12345abc6789
# ^^^^^ |----------|
echo; echo; echo
pattern2=b*9 # everything between 'b' and '9'
echo "var1 = $var1" # Still abcd12345abc6789
echo
echo "pattern2 = $pattern2"
echo "--------------"
echo '${var1%pattern2} =' "${var1%$pattern2}" # abcd12345a
# Shortest possible match, strips out last 6 characters abcd12345abc6789
# ^^^^ |----|
echo '${var1%%pattern2} =' "${var1%%$pattern2}" # a
# Longest possible match, strips out last 12 characters abcd12345abc6789
# ^^^^ |-------------|
# Remember, # and ## work from the left end (beginning) of string,
# % and %% work from the right end.
echo
exit 0 |
Example 9-18. Renaming file extensions: #!/bin/bash
# rfe.sh: Renaming file extensions.
#
# rfe old_extension new_extension
#
# Example:
# To rename all *.gif files in working directory to *.jpg,
# rfe gif jpg
E_BADARGS=65
case $# in
0|1) # The vertical bar means "or" in this context.
echo "Usage: `basename $0` old_file_suffix new_file_suffix"
exit $E_BADARGS # If 0 or 1 arg, then bail out.
;;
esac
for filename in *.$1
# Traverse list of files ending with 1st argument.
do
mv $filename ${filename%$1}$2
# Strip off part of filename matching 1st argument,
#+ then append 2nd argument.
done
exit 0 |
Variable expansion / Substring
replacement These constructs have been adopted from
ksh. - ${var:pos}
Variable var expanded,
starting from offset pos.
- ${var:pos:len}
Expansion to a max of len
characters of variable var, from offset
pos. See Example A-14
for an example of the creative use of this operator.
- ${var/Pattern/Replacement}
First match of Pattern,
within var replaced with
Replacement. If Replacement is
omitted, then the first match of
Pattern is replaced by
nothing, that is, deleted. - ${var//Pattern/Replacement}
As above, if Replacement
is omitted, then all occurrences of
Pattern are replaced by
nothing, that is, deleted. Example 9-19. Using pattern matching to parse arbitrary strings #!/bin/bash
var1=abcd-1234-defg
echo "var1 = $var1"
t=${var1#*-*}
echo "var1 (with everything, up to and including first - stripped out) = $t"
# t=${var1#*-} works just the same,
#+ since # matches the shortest string,
#+ and * matches everything preceding, including an empty string.
# (Thanks, Stephane Chazelas, for pointing this out.)
t=${var1##*-*}
echo "If var1 contains a \"-\", returns empty string... var1 = $t"
t=${var1%*-*}
echo "var1 (with everything from the last - on stripped out) = $t"
echo
# -------------------------------------------
path_name=/home/bozo/ideas/thoughts.for.today
# -------------------------------------------
echo "path_name = $path_name"
t=${path_name##/*/}
echo "path_name, stripped of prefixes = $t"
# Same effect as t=`basename $path_name` in this particular case.
# t=${path_name%/}; t=${t##*/} is a more general solution,
#+ but still fails sometimes.
# If $path_name ends with a newline, then `basename $path_name` will not work,
#+ but the above expression will.
# (Thanks, S.C.)
t=${path_name%/*.*}
# Same effect as t=`dirname $path_name`
echo "path_name, stripped of suffixes = $t"
# These will fail in some cases, such as "../", "/foo////", # "foo/", "/".
# Removing suffixes, especially when the basename has no suffix,
#+ but the dirname does, also complicates matters.
# (Thanks, S.C.)
echo
t=${path_name:11}
echo "$path_name, with first 11 chars stripped off = $t"
t=${path_name:11:5}
echo "$path_name, with first 11 chars stripped off, length 5 = $t"
echo
t=${path_name/bozo/clown}
echo "$path_name with \"bozo\" replaced by \"clown\" = $t"
t=${path_name/today/}
echo "$path_name with \"today\" deleted = $t"
t=${path_name//o/O}
echo "$path_name with all o's capitalized = $t"
t=${path_name//o/}
echo "$path_name with all o's deleted = $t"
exit 0 |
- ${var/#Pattern/Replacement}
If prefix of
var matches
Pattern, then substitute
Replacement for
Pattern. - ${var/%Pattern/Replacement}
If suffix of
var matches
Pattern, then substitute
Replacement for
Pattern. Example 9-20. Matching patterns at prefix or suffix of string #!/bin/bash
# var-match.sh:
# Demo of pattern replacement at prefix / suffix of string.
v0=abc1234zip1234abc # Original variable.
echo "v0 = $v0" # abc1234zip1234abc
echo
# Match at prefix (beginning) of string.
v1=${v0/#abc/ABCDEF} # abc1234zip1234abc
# |-|
echo "v1 = $v1" # ABCDEF1234zip1234abc
# |----|
# Match at suffix (end) of string.
v2=${v0/%abc/ABCDEF} # abc1234zip123abc
# |-|
echo "v2 = $v2" # abc1234zip1234ABCDEF
# |----|
echo
# ----------------------------------------------------
# Must match at beginning / end of string,
#+ otherwise no replacement results.
# ----------------------------------------------------
v3=${v0/#123/000} # Matches, but not at beginning.
echo "v3 = $v3" # abc1234zip1234abc
# NO REPLACEMENT.
v4=${v0/%123/000} # Matches, but not at end.
echo "v4 = $v4" # abc1234zip1234abc
# NO REPLACEMENT.
exit 0 |
Typing variables: declare or
typesetThe declare or typeset
builtins (they are exact
synonyms) permit restricting the properties of variables. This
is a very weak form of the typing available in certain
programming languages. The declare
command is specific to version 2 or later of Bash. The
typeset command also works in ksh
scripts. declare/typeset options - -r readonly
(declare -r var1 works the same as
readonly var1) This is the rough equivalent of the C
const type qualifier. An
attempt to change the value of a readonly variable fails with an
error message. - -i integer
declare -i number
# The script will treat subsequent occurrences of "number" as an integer.
number=3
echo "Number = $number" # Number = 3
number=three
echo "Number = $number" # Number = 0
# Tries to evaluate the string "three" as an integer. |
Certain arithmetic operations are permitted
for declared integer variables without the need
for expr or let. n=6/3
echo "n = $n" # n = 6/3
declare -i n
n=6/3
echo "n = $n" # n = 2 |
- -a array
The variable indices will be treated as
an array. - -f functions
A declare -f line with no
arguments in a script causes a listing of all the
functions previously defined in that script. A declare -f function_name
in a script lists just the function named. - -x export
This declares a variable as available for exporting outside the
environment of the script itself. - -x var=$value
The declare command permits
assigning a value to a variable in the same statement
as setting its properties.
Example 9-21. Using declare to type variables #!/bin/bash
func1 ()
{
echo This is a function.
}
declare -f # Lists the function above.
echo
declare -i var1 # var1 is an integer.
var1=2367
echo "var1 declared as $var1"
var1=var1+1 # Integer declaration eliminates the need for 'let'.
echo "var1 incremented by 1 is $var1."
# Attempt to change variable declared as integer.
echo "Attempting to change var1 to floating point value, 2367.1."
var1=2367.1 # Results in error message, with no change to variable.
echo "var1 is still $var1"
echo
declare -r var2=13.36 # 'declare' permits setting a variable property
#+ and simultaneously assigning it a value.
echo "var2 declared as $var2" # Attempt to change readonly variable.
var2=13.37 # Generates error message, and exit from script.
echo "var2 is still $var2" # This line will not execute.
exit 0 # Script will not exit here. |
Indirect References to Variables
Assume that the value of a variable is the name of a second
variable. Is it somehow possible to retrieve the value
of this second variable from the first one? For example,
if a=letter_of_alphabet
and letter_of_alphabet=z,
can a reference to a return
z? This can indeed be done, and
it is called an indirect reference. It
uses the unusual eval var1=\$$var2
notation. Example 9-22. Indirect References #!/bin/bash
# ind-ref.sh: Indirect variable referencing.
# Accessing the contents of the contents of a variable.
a=letter_of_alphabet # Variable "a" holds the name of another variable.
letter_of_alphabet=z
echo
# Direct reference.
echo "a = $a" # a = letter_of_alphabet
# Indirect reference.
eval a=\$$a
echo "Now a = $a" # Now a = z
echo
# Now, let's try changing the second-order reference.
t=table_cell_3
table_cell_3=24
echo "\"table_cell_3\" = $table_cell_3" # "table_cell_3" = 24
echo -n "dereferenced \"t\" = "; eval echo \$$t # dereferenced "t" = 24
# In this simple case, the following also works (why?).
# eval t=\$$t; echo "\"t\" = $t"
echo
t=table_cell_3
NEW_VAL=387
table_cell_3=$NEW_VAL
echo "Changing value of \"table_cell_3\" to $NEW_VAL."
echo "\"table_cell_3\" now $table_cell_3"
echo -n "dereferenced \"t\" now "; eval echo \$$t
# "eval" takes the two arguments "echo" and "\$$t" (set equal to $table_cell_3)
echo
# (Thanks, Stephane Chazelas, for clearing up the above behavior.)
# Another method is the ${!t} notation, discussed in "Bash, version 2" section.
# See also ex78.sh.
exit 0 |
Of what practical use is indirect referencing of variables? It
gives Bash a little of the functionality of
pointers in C,
for instance, in table lookup.
And, it also has some other very interesting applications. . . . Nils Radtke shows how to build "dynamic"
variable names and evaluate their contents. This can be useful
when sourcing configuration files.
#!/bin/bash
# ---------------------------------------------
# This could be "sourced" from a separate file.
isdnMyProviderRemoteNet=172.16.0.100
isdnYourProviderRemoteNet=10.0.0.10
isdnOnlineService="MyProvider"
# ---------------------------------------------
remoteNet=$(eval "echo \$$(echo isdn${isdnOnlineService}RemoteNet)")
remoteNet=$(eval "echo \$$(echo isdnMyProviderRemoteNet)")
remoteNet=$(eval "echo \$isdnMyProviderRemoteNet")
remoteNet=$(eval "echo $isdnMyProviderRemoteNet")
echo "$remoteNet" # 172.16.0.100
# ================================================================
# And, it gets even better.
# Consider the following snippet given a variable named getSparc,
#+ but no such variable getIa64:
chkMirrorArchs () {
arch="$1";
if [ "$(eval "echo \${$(echo get$(echo -ne $arch |
sed 's/^\(.\).*/\1/g' | tr 'a-z' 'A-Z'; echo $arch |
sed 's/^.\(.*\)/\1/g')):-false}")" = true ]
then
return 0;
else
return 1;
fi;
}
getSparc="true"
unset getIa64
chkMirrorArchs sparc
echo $? # 0
# True
chkMirrorArchs Ia64
echo $? # 1
# False
# Notes:
# -----
# Even the to-be-substituted variable name part is built explicitly.
# The parameters to the chkMirrorArchs calls are all lower case.
# The variable name is composed of two parts: "get" and "Sparc" . . . |
Example 9-23. Passing an indirect reference to awk #!/bin/bash
# Another version of the "column totaler" script
#+ that adds up a specified column (of numbers) in the target file.
# This one uses indirect references.
ARGS=2
E_WRONGARGS=65
if [ $# -ne "$ARGS" ] # Check for proper no. of command line args.
then
echo "Usage: `basename $0` filename column-number"
exit $E_WRONGARGS
fi
filename=$1
column_number=$2
#===== Same as original script, up to this point =====#
# A multi-line awk script is invoked by awk ' ..... '
# Begin awk script.
# ------------------------------------------------
awk "
{ total += \$${column_number} # indirect reference
}
END {
print total
}
" "$filename"
# ------------------------------------------------
# End awk script.
# Indirect variable reference avoids the hassles
#+ of referencing a shell variable within the embedded awk script.
# Thanks, Stephane Chazelas.
exit 0 |
$RANDOM: generate random integer$RANDOM is an internal Bash function (not a constant) that
returns a pseudorandom
integer in the range 0 - 32767. It should
not be used to generate an encryption
key. Example 9-24. Generating random numbers #!/bin/bash
# $RANDOM returns a different random integer at each invocation.
# Nominal range: 0 - 32767 (signed 16-bit integer).
MAXCOUNT=10
count=1
echo
echo "$MAXCOUNT random numbers:"
echo "-----------------"
while [ "$count" -le $MAXCOUNT ] # Generate 10 ($MAXCOUNT) random integers.
do
number=$RANDOM
echo $number
let "count += 1" # Increment count.
done
echo "-----------------"
# If you need a random int within a certain range, use the 'modulo' operator.
# This returns the remainder of a division operation.
RANGE=500
echo
number=$RANDOM
let "number %= $RANGE"
# ^^
echo "Random number less than $RANGE --- $number"
echo
# If you need a random integer greater than a lower bound,
#+ then set up a test to discard all numbers below that.
FLOOR=200
number=0 #initialize
while [ "$number" -le $FLOOR ]
do
number=$RANDOM
done
echo "Random number greater than $FLOOR --- $number"
echo
# Combine above two techniques to retrieve random number between two limits.
number=0 #initialize
while [ "$number" -le $FLOOR ]
do
number=$RANDOM
let "number %= $RANGE" # Scales $number down within $RANGE.
done
echo "Random number between $FLOOR and $RANGE --- $number"
echo
# Generate binary choice, that is, "true" or "false" value.
BINARY=2
T=1
number=$RANDOM
let "number %= $BINARY"
# Note that let "number >>= 14" gives a better random distribution
#+ (right shifts out everything except last binary digit).
if [ "$number" -eq $T ]
then
echo "TRUE"
else
echo "FALSE"
fi
echo
# Generate a toss of the dice.
SPOTS=6 # Modulo 6 gives range 0 - 5.
# Incrementing by 1 gives desired range of 1 - 6.
# Thanks, Paulo Marcel Coelho Aragao, for the simplification.
die1=0
die2=0
# Would it be better to just set SPOTS=7 and not add 1? Why or why not?
# Tosses each die separately, and so gives correct odds.
let "die1 = $RANDOM % $SPOTS +1" # Roll first one.
let "die2 = $RANDOM % $SPOTS +1" # Roll second one.
# Which arithmetic operation, above, has greater precedence --
#+ modulo (%) or addition (+)?
let "throw = $die1 + $die2"
echo "Throw of the dice = $throw"
echo
exit 0 |
Example 9-25. Picking a random card from a deck #!/bin/bash
# pick-card.sh
# This is an example of choosing random elements of an array.
# Pick a card, any card.
Suites="Clubs
Diamonds
Hearts
Spades"
Denominations="2
3
4
5
6
7
8
9
10
Jack
Queen
King
Ace"
# Note variables spread over multiple lines.
suite=($Suites) # Read into array variable.
denomination=($Denominations)
num_suites=${#suite[*]} # Count how many elements.
num_denominations=${#denomination[*]}
echo -n "${denomination[$((RANDOM%num_denominations))]} of "
echo ${suite[$((RANDOM%num_suites))]}
# $bozo sh pick-cards.sh
# Jack of Clubs
# Thank you, "jipe," for pointing out this use of $RANDOM.
exit 0 |
Jipe points out a set of techniques for
generating random numbers within a range.
# Generate random number between 6 and 30.
rnumber=$((RANDOM%25+6))
# Generate random number in the same 6 - 30 range,
#+ but the number must be evenly divisible by 3.
rnumber=$(((RANDOM%30/3+1)*3))
# Note that this will not work all the time.
# It fails if $RANDOM returns 0.
# Exercise: Try to figure out the pattern here. |
Bill Gradwohl came up with an improved
formula that works for positive numbers.
rnumber=$(((RANDOM%(max-min+divisibleBy))/divisibleBy*divisibleBy+min)) |
Here Bill presents a versatile function that returns
a random number between two specified values. Example 9-26. Random between values #!/bin/bash
# random-between.sh
# Random number between two specified values.
# Script by Bill Gradwohl, with minor modifications by the document author.
# Used with permission.
randomBetween() {
# Generates a positive or negative random number
#+ between $min and $max
#+ and divisible by $divisibleBy.
# Gives a "reasonably random" distribution of return values.
#
# Bill Gradwohl - Oct 1, 2003
syntax() {
# Function embedded within function.
echo
echo "Syntax: randomBetween [min] [max] [multiple]"
echo
echo "Expects up to 3 passed parameters, but all are completely optional."
echo "min is the minimum value"
echo "max is the maximum value"
echo "multiple specifies that the answer must be a multiple of this value."
echo " i.e. answer must be evenly divisible by this number."
echo
echo "If any value is missing, defaults area supplied as: 0 32767 1"
echo "Successful completion returns 0, unsuccessful completion returns"
echo "function syntax and 1."
echo "The answer is returned in the global variable randomBetweenAnswer"
echo "Negative values for any passed parameter are handled correctly."
}
local min=${1:-0}
local max=${2:-32767}
local divisibleBy=${3:-1}
# Default values assigned, in case parameters not passed to function.
local x
local spread
# Let's make sure the divisibleBy value is positive.
[ ${divisibleBy} -lt 0 ] && divisibleBy=$((0-divisibleBy))
# Sanity check.
if [ $# -gt 3 -o ${divisibleBy} -eq 0 -o ${min} -eq ${max} ]; then
syntax
return 1
fi
# See if the min and max are reversed.
if [ ${min} -gt ${max} ]; then
# Swap them.
x=${min}
min=${max}
max=${x}
fi
# If min is itself not evenly divisible by $divisibleBy,
#+ then fix the min to be within range.
if [ $((min/divisibleBy*divisibleBy)) -ne ${min} ]; then
if [ ${min} -lt 0 ]; then
min=$((min/divisibleBy*divisibleBy))
else
min=$((((min/divisibleBy)+1)*divisibleBy))
fi
fi
# If max is itself not evenly divisible by $divisibleBy,
#+ then fix the max to be within range.
if [ $((max/divisibleBy*divisibleBy)) -ne ${max} ]; then
if [ ${max} -lt 0 ]; then
max=$((((max/divisibleBy)-1)*divisibleBy))
else
max=$((max/divisibleBy*divisibleBy))
fi
fi
# ---------------------------------------------------------------------
# Now, to do the real work.
# Note that to get a proper distribution for the end points,
#+ the range of random values has to be allowed to go between
#+ 0 and abs(max-min)+divisibleBy, not just abs(max-min)+1.
# The slight increase will produce the proper distribution for the
#+ end points.
# Changing the formula to use abs(max-min)+1 will still produce
#+ correct answers, but the randomness of those answers is faulty in
#+ that the number of times the end points ($min and $max) are returned
#+ is considerably lower than when the correct formula is used.
# ---------------------------------------------------------------------
spread=$((max-min))
[ ${spread} -lt 0 ] && spread=$((0-spread))
let spread+=divisibleBy
randomBetweenAnswer=$(((RANDOM%spread)/divisibleBy*divisibleBy+min))
return 0
# However, Paulo Marcel Coelho Aragao points out that
#+ when $max and $min are not divisible by $divisibleBy,
#+ the formula fails.
#
# He suggests instead the following formula:
# rnumber = $(((RANDOM%(max-min+1)+min)/divisibleBy*divisibleBy))
}
# Let's test the function.
min=-14
max=20
divisibleBy=3
# Generate an array of expected answers and check to make sure we get
#+ at least one of each answer if we loop long enough.
declare -a answer
minimum=${min}
maximum=${max}
if [ $((minimum/divisibleBy*divisibleBy)) -ne ${minimum} ]; then
if [ ${minimum} -lt 0 ]; then
minimum=$((minimum/divisibleBy*divisibleBy))
else
minimum=$((((minimum/divisibleBy)+1)*divisibleBy))
fi
fi
# If max is itself not evenly divisible by $divisibleBy,
#+ then fix the max to be within range.
if [ $((maximum/divisibleBy*divisibleBy)) -ne ${maximum} ]; then
if [ ${maximum} -lt 0 ]; then
maximum=$((((maximum/divisibleBy)-1)*divisibleBy))
else
maximum=$((maximum/divisibleBy*divisibleBy))
fi
fi
# We need to generate only positive array subscripts,
#+ so we need a displacement that that will guarantee
#+ positive results.
displacement=$((0-minimum))
for ((i=${minimum}; i<=${maximum}; i+=divisibleBy)); do
answer[i+displacement]=0
done
# Now loop a large number of times to see what we get.
loopIt=1000 # The script author suggests 100000,
#+ but that takes a good long while.
for ((i=0; i<${loopIt}; ++i)); do
# Note that we are specifying min and max in reversed order here to
#+ make the function correct for this case.
randomBetween ${max} ${min} ${divisibleBy}
# Report an error if an answer is unexpected.
[ ${randomBetweenAnswer} -lt ${min} -o ${randomBetweenAnswer} -gt ${max} ] && echo MIN or MAX error - ${randomBetweenAnswer}!
[ $((randomBetweenAnswer%${divisibleBy})) -ne 0 ] && echo DIVISIBLE BY error - ${randomBetweenAnswer}!
# Store the answer away statistically.
answer[randomBetweenAnswer+displacement]=$((answer[randomBetweenAnswer+displacement]+1))
done
# Let's check the results
for ((i=${minimum}; i<=${maximum}; i+=divisibleBy)); do
[ ${answer[i+displacement]} -eq 0 ] && echo "We never got an answer of $i." || echo "${i} occurred ${answer[i+displacement]} times."
done
exit 0 |
Just how random is $RANDOM? The best
way to test this is to write a script that tracks
the distribution of "random" numbers
generated by $RANDOM. Let's roll a
$RANDOM die a few times . . . Example 9-27. Rolling a single die with RANDOM #!/bin/bash
# How random is RANDOM?
RANDOM=$$ # Reseed the random number generator using script process ID.
PIPS=6 # A die has 6 pips.
MAXTHROWS=600 # Increase this if you have nothing better to do with your time.
throw=0 # Throw count.
ones=0 # Must initialize counts to zero,
twos=0 #+ since an uninitialized variable is null, not zero.
threes=0
fours=0
fives=0
sixes=0
print_result ()
{
echo
echo "ones = $ones"
echo "twos = $twos"
echo "threes = $threes"
echo "fours = $fours"
echo "fives = $fives"
echo "sixes = $sixes"
echo
}
update_count()
{
case "$1" in
0) let "ones += 1";; # Since die has no "zero", this corresponds to 1.
1) let "twos += 1";; # And this to 2, etc.
2) let "threes += 1";;
3) let "fours += 1";;
4) let "fives += 1";;
5) let "sixes += 1";;
esac
}
echo
while [ "$throw" -lt "$MAXTHROWS" ]
do
let "die1 = RANDOM % $PIPS"
update_count $die1
let "throw += 1"
done
print_result
exit 0
# The scores should distribute fairly evenly, assuming RANDOM is fairly random.
# With $MAXTHROWS at 600, all should cluster around 100, plus-or-minus 20 or so.
#
# Keep in mind that RANDOM is a pseudorandom generator,
#+ and not a spectacularly good one at that.
# Randomness is a deep and complex subject.
# Sufficiently long "random" sequences may exhibit
#+ chaotic and other "non-random" behavior.
# Exercise (easy):
# ---------------
# Rewrite this script to flip a coin 1000 times.
# Choices are "HEADS" and "TAILS". |
As we have seen in the last example, it is best to
"reseed" the RANDOM
generator each time it is invoked. Using the same seed
for RANDOM repeats the same series
of numbers.
(This mirrors the behavior of the
random() function in
C.) Example 9-28. Reseeding RANDOM #!/bin/bash
# seeding-random.sh: Seeding the RANDOM variable.
MAXCOUNT=25 # How many numbers to generate.
random_numbers ()
{
count=0
while [ "$count" -lt "$MAXCOUNT" ]
do
number=$RANDOM
echo -n "$number "
let "count += 1"
done
}
echo; echo
RANDOM=1 # Setting RANDOM seeds the random number generator.
random_numbers
echo; echo
RANDOM=1 # Same seed for RANDOM...
random_numbers # ...reproduces the exact same number series.
#
# When is it useful to duplicate a "random" number series?
echo; echo
RANDOM=2 # Trying again, but with a different seed...
random_numbers # gives a different number series.
echo; echo
# RANDOM=$$ seeds RANDOM from process id of script.
# It is also possible to seed RANDOM from 'time' or 'date' commands.
# Getting fancy...
SEED=$(head -1 /dev/urandom | od -N 1 | awk '{ print $2 }')
# Pseudo-random output fetched
#+ from /dev/urandom (system pseudo-random device-file),
#+ then converted to line of printable (octal) numbers by "od",
#+ finally "awk" retrieves just one number for SEED.
RANDOM=$SEED
random_numbers
echo; echo
exit 0 |
| The /dev/urandom device-file provides
a method of generating much more "random"
pseudorandom numbers than the $RANDOM
variable. dd if=/dev/urandom of=targetfile
bs=1 count=XX creates a file of well-scattered
pseudorandom numbers. However, assigning these numbers
to a variable in a script requires a workaround, such as
filtering through od (as in
above example and Example 12-13), or using dd (see Example 12-54),
or even piping to md5sum
(see Example 33-14).
There are also other ways to generate pseudorandom
numbers in a script. Awk provides a
convenient means of doing this. Example 9-29. Pseudorandom numbers, using awk #!/bin/bash
# random2.sh: Returns a pseudorandom number in the range 0 - 1.
# Uses the awk rand() function.
AWKSCRIPT=' { srand(); print rand() } '
# Command(s) / parameters passed to awk
# Note that srand() reseeds awk's random number generator.
echo -n "Random number between 0 and 1 = "
echo | awk "$AWKSCRIPT"
# What happens if you leave out the 'echo'?
exit 0
# Exercises:
# ---------
# 1) Using a loop construct, print out 10 different random numbers.
# (Hint: you must reseed the "srand()" function with a different seed
#+ in each pass through the loop. What happens if you fail to do this?)
# 2) Using an integer multiplier as a scaling factor, generate random numbers
#+ in the range between 10 and 100.
# 3) Same as exercise #2, above, but generate random integers this time. |
The date command also lends
itself to generating pseudorandom
integer sequences. |
LoopsA loop is a block of code that
iterates (repeats) a list of commands
as long as the loop control condition
is true. for loops - for arg in [list]
This is the basic looping construct. It differs significantly
from its C counterpart. for arg in [list] do command(s)... done | During each pass through the loop,
arg takes on the
value of each successive variable in the
list. |
for arg in "$var1" "$var2" "$var3" ... "$varN"
# In pass 1 of the loop, arg = $var1
# In pass 2 of the loop, arg = $var2
# In pass 3 of the loop, arg = $var3
# ...
# In pass N of the loop, arg = $varN
# Arguments in [list] quoted to prevent possible word splitting. |
The argument list may contain wild cards. If do is on same line as
for, there needs to be a semicolon
after list. for arg in [list] ; do
Example 10-1. Simple for loops #!/bin/bash
# Listing the planets.
for planet in Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune Pluto
do
echo $planet # Each planet on a separate line.
done
echo
for planet in "Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune Pluto"
# All planets on same line.
# Entire 'list' enclosed in quotes creates a single variable.
do
echo $planet
done
exit 0 |
| Each [list] element
may contain multiple parameters. This is useful when
processing parameters in groups. In such cases,
use the set command
(see Example 11-15) to force parsing of each
[list] element and assignment of
each component to the positional parameters. |
Example 10-2. for loop with two parameters in each
[list] element #!/bin/bash
# Planets revisited.
# Associate the name of each planet with its distance from the sun.
for planet in "Mercury 36" "Venus 67" "Earth 93" "Mars 142" "Jupiter 483"
do
set -- $planet # Parses variable "planet" and sets positional parameters.
# the "--" prevents nasty surprises if $planet is null or begins with a dash.
# May need to save original positional parameters, since they get overwritten.
# One way of doing this is to use an array,
# original_params=("$@")
echo "$1 $2,000,000 miles from the sun"
#-------two tabs---concatenate zeroes onto parameter $2
done
# (Thanks, S.C., for additional clarification.)
exit 0 |
A variable may supply the [list] in a
for loop. Example 10-3. Fileinfo: operating on a file list
contained in a variable #!/bin/bash
# fileinfo.sh
FILES="/usr/sbin/accept
/usr/sbin/pwck
/usr/sbin/chroot
/usr/bin/fakefile
/sbin/badblocks
/sbin/ypbind" # List of files you are curious about.
# Threw in a dummy file, /usr/bin/fakefile.
echo
for file in $FILES
do
if [ ! -e "$file" ] # Check if file exists.
then
echo "$file does not exist."; echo
continue # On to next.
fi
ls -l $file | awk '{ print $9 " file size: " $5 }' # Print 2 fields.
whatis `basename $file` # File info.
# Note that the whatis database needs to have been set up for this to work.
# To do this, as root run /usr/bin/makewhatis.
echo
done
exit 0 |
The [list] in a
for loop may contain filename globbing, that is,
wildcards used in filename expansion. Example 10-4. Operating on files with a for loop #!/bin/bash
# list-glob.sh: Generating [list] in a for-loop using "globbing".
echo
for file in *
do
ls -l "$file" # Lists all files in $PWD (current directory).
# Recall that the wild card character "*" matches every filename,
# however, in "globbing", it doesn't match dot-files.
# If the pattern matches no file, it is expanded to itself.
# To prevent this, set the nullglob option
# (shopt -s nullglob).
# Thanks, S.C.
done
echo; echo
for file in [jx]*
do
rm -f $file # Removes only files beginning with "j" or "x" in $PWD.
echo "Removed file \"$file\"".
done
echo
exit 0 |
Omitting the in [list] part of a
for loop causes the loop to operate
on $@ -- the list of arguments given
on the command line to the script. A particularly clever
illustration of this is Example A-16. Example 10-5. Missing in [list] in a
for loop #!/bin/bash
# Invoke this script both with and without arguments,
#+ and see what happens.
for a
do
echo -n "$a "
done
# The 'in list' missing, therefore the loop operates on '$@'
#+ (command-line argument list, including whitespace).
echo
exit 0 |
It is possible to use command substitution
to generate the [list] in a
for loop. See also Example 12-48,
Example 10-10 and Example 12-42. Example 10-6. Generating the [list] in a for
loop with command substitution #!/bin/bash
# for-loopcmd.sh: for-loop with [list]
#+ generated by command substitution.
NUMBERS="9 7 3 8 37.53"
for number in `echo $NUMBERS` # for number in 9 7 3 8 37.53
do
echo -n "$number "
done
echo
exit 0 |
This is a somewhat more complex example of using command
substitution to create the [list]. Example 10-7. A grep replacement
for binary files #!/bin/bash
# bin-grep.sh: Locates matching strings in a binary file.
# A "grep" replacement for binary files.
# Similar effect to "grep -a"
E_BADARGS=65
E_NOFILE=66
if [ $# -ne 2 ]
then
echo "Usage: `basename $0` search_string filename"
exit $E_BADARGS
fi
if [ ! -f "$2" ]
then
echo "File \"$2\" does not exist."
exit $E_NOFILE
fi
IFS="\n" # Per suggestion of Paulo Marcel Coelho Aragao.
for word in $( strings "$2" | grep "$1" )
# The "strings" command lists strings in binary files.
# Output then piped to "grep", which tests for desired string.
do
echo $word
done
# As S.C. points out, lines 23 - 29 could be replaced with the simpler
# strings "$2" | grep "$1" | tr -s "$IFS" '[\n*]'
# Try something like "./bin-grep.sh mem /bin/ls" to exercise this script.
exit 0 |
More of the same. Example 10-8. Listing all users on the system #!/bin/bash
# userlist.sh
PASSWORD_FILE=/etc/passwd
n=1 # User number
for name in $(awk 'BEGIN{FS=":"}{print $1}' < "$PASSWORD_FILE" )
# Field separator = : ^^^^^^
# Print first field ^^^^^^^^
# Get input from password file ^^^^^^^^^^^^^^^^^
do
echo "USER #$n = $name"
let "n += 1"
done
# USER #1 = root
# USER #2 = bin
# USER #3 = daemon
# ...
# USER #30 = bozo
exit 0
# Exercise:
# --------
# How is it that an ordinary user (or a script run by same)
#+ can read /etc/passwd?
# Isn't this a security hole? Why or why not? |
A final example of the [list] resulting from command
substitution. Example 10-9. Checking all the binaries in a directory for
authorship #!/bin/bash
# findstring.sh:
# Find a particular string in binaries in a specified directory.
directory=/usr/bin/
fstring="Free Software Foundation" # See which files come from the FSF.
for file in $( find $directory -type f -name '*' | sort )
do
strings -f $file | grep "$fstring" | sed -e "s%$directory%%"
# In the "sed" expression,
#+ it is necessary to substitute for the normal "/" delimiter
#+ because "/" happens to be one of the characters filtered out.
# Failure to do so gives an error message (try it).
done
exit 0
# Exercise (easy):
# ---------------
# Convert this script to taking command-line parameters
#+ for $directory and $fstring. |
The output of a for loop may be piped to
a command or commands. Example 10-10. Listing the symbolic links in a directory #!/bin/bash
# symlinks.sh: Lists symbolic links in a directory.
directory=${1-`pwd`}
# Defaults to current working directory,
#+ if not otherwise specified.
# Equivalent to code block below.
# ----------------------------------------------------------
# ARGS=1 # Expect one command-line argument.
#
# if [ $# -ne "$ARGS" ] # If not 1 arg...
# then
# directory=`pwd` # current working directory
# else
# directory=$1
# fi
# ----------------------------------------------------------
echo "symbolic links in directory \"$directory\""
for file in "$( find $directory -type l )" # -type l = symbolic links
do
echo "$file"
done | sort # Otherwise file list is unsorted.
# Strictly speaking, a loop isn't really necessary here,
#+ since the output of the "find" command is expanded into a single word.
# However, it's easy to understand and illustrative this way.
# As Dominik 'Aeneas' Schnitzer points out,
#+ failing to quote $( find $directory -type l )
#+ will choke on filenames with embedded whitespace.
# Even this will only pick up the first field of each argument.
exit 0
# Jean Helou proposes the following alternative:
echo "symbolic links in directory \"$directory\""
# Backup of the current IFS. One can never be too cautious.
OLDIFS=$IFS
IFS=:
for file in $(find $directory -type l -printf "%p$IFS")
do # ^^^^^^^^^^^^^^^^
echo "$file"
done|sort |
The stdout of a loop may be redirected to a file, as this slight
modification to the previous example shows. Example 10-11. Symbolic links in a directory, saved to a file #!/bin/bash
# symlinks.sh: Lists symbolic links in a directory.
OUTFILE=symlinks.list # save file
directory=${1-`pwd`}
# Defaults to current working directory,
#+ if not otherwise specified.
echo "symbolic links in directory \"$directory\"" > "$OUTFILE"
echo "---------------------------" >> "$OUTFILE"
for file in "$( find $directory -type l )" # -type l = symbolic links
do
echo "$file"
done | sort >> "$OUTFILE" # stdout of loop
# ^^^^^^^^^^^^^ redirected to save file.
exit 0 |
There is an alternative syntax to a for
loop that will look very familiar to C
programmers. This requires double parentheses. Example 10-12. A C-like for loop #!/bin/bash
# Two ways to count up to 10.
echo
# Standard syntax.
for a in 1 2 3 4 5 6 7 8 9 10
do
echo -n "$a "
done
echo; echo
# +==========================================+
# Now, let's do the same, using C-like syntax.
LIMIT=10
for ((a=1; a <= LIMIT ; a++)) # Double parentheses, and "LIMIT" with no "$".
do
echo -n "$a "
done # A construct borrowed from 'ksh93'.
echo; echo
# +=========================================================================+
# Let's use the C "comma operator" to increment two variables simultaneously.
for ((a=1, b=1; a <= LIMIT ; a++, b++)) # The comma chains together operations.
do
echo -n "$a-$b "
done
echo; echo
exit 0 |
See also Example 26-15, Example 26-16, and Example A-6. --- Now, a for loop used in a
"real-life" context. Example 10-13. Using efax in batch mode #!/bin/bash
# Faxing (must have 'fax' installed).
EXPECTED_ARGS=2
E_BADARGS=65
if [ $# -ne $EXPECTED_ARGS ]
# Check for proper no. of command line args.
then
echo "Usage: `basename $0` phone# text-file"
exit $E_BADARGS
fi
if [ ! -f "$2" ]
then
echo "File $2 is not a text file"
exit $E_BADARGS
fi
fax make $2 # Create fax formatted files from text files.
for file in $(ls $2.0*) # Concatenate the converted files.
# Uses wild card in variable list.
do
fil="$fil $file"
done
efax -d /dev/ttyS3 -o1 -t "T$1" $fil # Do the work.
# As S.C. points out, the for-loop can be eliminated with
# efax -d /dev/ttyS3 -o1 -t "T$1" $2.0*
# but it's not quite as instructive [grin].
exit 0 |
- while
This construct tests for a condition at the top of a
loop, and keeps looping as long as that condition
is true (returns a 0 exit status). In contrast
to a for loop, a
while loop finds use in situations
where the number of loop repetitions is not known
beforehand. while [condition] do command... done As is the case with for loops,
placing the do on the same line as
the condition test requires a semicolon. while [condition] ; do Note that certain specialized while
loops, as, for example, a getopts construct, deviate
somewhat from the standard template given here. Example 10-14. Simple while loop #!/bin/bash
var0=0
LIMIT=10
while [ "$var0" -lt "$LIMIT" ]
do
echo -n "$var0 " # -n suppresses newline.
# ^ Space, to separate printed out numbers.
var0=`expr $var0 + 1` # var0=$(($var0+1)) also works.
# var0=$((var0 + 1)) also works.
# let "var0 += 1" also works.
done # Various other methods also work.
echo
exit 0 |
Example 10-15. Another while loop #!/bin/bash
echo
# Equivalent to:
while [ "$var1" != "end" ] # while test "$var1" != "end"
do
echo "Input variable #1 (end to exit) "
read var1 # Not 'read $var1' (why?).
echo "variable #1 = $var1" # Need quotes because of "#" . . .
# If input is 'end', echoes it here.
# Does not test for termination condition until top of loop.
echo
done
exit 0 |
A while loop may have multiple
conditions. Only the final condition determines when the loop
terminates. This necessitates a slightly different loop syntax,
however. Example 10-16. while loop with multiple conditions #!/bin/bash
var1=unset
previous=$var1
while echo "previous-variable = $previous"
echo
previous=$var1
[ "$var1" != end ] # Keeps track of what $var1 was previously.
# Four conditions on "while", but only last one controls loop.
# The *last* exit status is the one that counts.
do
echo "Input variable #1 (end to exit) "
read var1
echo "variable #1 = $var1"
done
# Try to figure out how this all works.
# It's a wee bit tricky.
exit 0 |
As with a for loop, a
while loop may employ C-like syntax
by using the double parentheses construct (see also Example 9-30). Example 10-17. C-like syntax in a while loop #!/bin/bash
# wh-loopc.sh: Count to 10 in a "while" loop.
LIMIT=10
a=1
while [ "$a" -le $LIMIT ]
do
echo -n "$a "
let "a+=1"
done # No surprises, so far.
echo; echo
# +=================================================================+
# Now, repeat with C-like syntax.
((a = 1)) # a=1
# Double parentheses permit space when setting a variable, as in C.
while (( a <= LIMIT )) # Double parentheses, and no "$" preceding variables.
do
echo -n "$a "
((a += 1)) # let "a+=1"
# Yes, indeed.
# Double parentheses permit incrementing a variable with C-like syntax.
done
echo
# Now, C programmers can feel right at home in Bash.
exit 0 |
- until
This construct tests for a condition at the top of a loop, and keeps
looping as long as that condition is false (opposite of
while loop). until [condition-is-true] do command... done Note that an until loop tests for the
terminating condition at the top of the loop, differing from a
similar construct in some programming languages. As is the case with for loops,
placing the do on the same line as
the condition test requires a semicolon. until [condition-is-true] ; do Example 10-18. until loop #!/bin/bash
END_CONDITION=end
until [ "$var1" = "$END_CONDITION" ]
# Tests condition here, at top of loop.
do
echo "Input variable #1 "
echo "($END_CONDITION to exit)"
read var1
echo "variable #1 = $var1"
echo
done
exit 0 |
Text Processing CommandsCommands affecting text and
text files - sort
File sorter, often used as a filter in a pipe. This
command sorts a text stream or file forwards or backwards,
or according to various keys or character positions. Using
the -m option, it merges presorted input
files. The info page lists its many
capabilities and options. See Example 10-9,
Example 10-10, and Example A-8. - tsort
Topological sort, reading in pairs of
whitespace-separated strings and sorting according to
input patterns. - uniq
This filter removes duplicate lines from a sorted
file. It is often seen in a pipe coupled with
sort.
cat list-1 list-2 list-3 | sort | uniq > final.list
# Concatenates the list files,
# sorts them,
# removes duplicate lines,
# and finally writes the result to an output file. |
The useful -c option prefixes each line of
the input file with its number of occurrences. bash$ cat testfile
This line occurs only once.
This line occurs twice.
This line occurs twice.
This line occurs three times.
This line occurs three times.
This line occurs three times.
bash$ uniq -c testfile
1 This line occurs only once.
2 This line occurs twice.
3 This line occurs three times.
bash$ sort testfile | uniq -c | sort -nr
3 This line occurs three times.
2 This line occurs twice.
1 This line occurs only once.
|
The sort INPUTFILE | uniq -c | sort -nr
command string produces a frequency
of occurrence listing on the
INPUTFILE file (the
-nr options to sort
cause a reverse numerical sort). This template finds
use in analysis of log files and dictionary lists, and
wherever the lexical structure of a document needs to
be examined. Example 12-11. Word Frequency Analysis #!/bin/bash
# wf.sh: Crude word frequency analysis on a text file.
# This is a more efficient version of the "wf2.sh" script.
# Check for input file on command line.
ARGS=1
E_BADARGS=65
E_NOFILE=66
if [ $# -ne "$ARGS" ] # Correct number of arguments passed to script?
then
echo "Usage: `basename $0` filename"
exit $E_BADARGS
fi
if [ ! -f "$1" ] # Check if file exists.
then
echo "File \"$1\" does not exist."
exit $E_NOFILE
fi
########################################################
# main ()
sed -e 's/\.//g' -e 's/\,//g' -e 's/ /\
/g' "$1" | tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr
# =========================
# Frequency of occurrence
# Filter out periods and commas, and
#+ change space between words to linefeed,
#+ then shift characters to lowercase, and
#+ finally prefix occurrence count and sort numerically.
# Arun Giridhar suggests modifying the above to:
# . . . | sort | uniq -c | sort +1 [-f] | sort +0 -nr
# This adds a secondary sort key, so instances of
#+ equal occurrence are sorted alphabetically.
# As he explains it:
# "This is effectively a radix sort, first on the
#+ least significant column
#+ (word or string, optionally case-insensitive)
#+ and last on the most significant column (frequency)."
########################################################
exit 0
# Exercises:
# ---------
# 1) Add 'sed' commands to filter out other punctuation,
#+ such as semicolons.
# 2) Modify the script to also filter out multiple spaces and
# other whitespace. |
bash$ cat testfile
This line occurs only once.
This line occurs twice.
This line occurs twice.
This line occurs three times.
This line occurs three times.
This line occurs three times.
bash$ ./wf.sh testfile
6 this
6 occurs
6 line
3 times
3 three
2 twice
1 only
1 once
|
- expand, unexpand
The expand filter converts tabs to
spaces. It is often used in a pipe. The unexpand filter
converts spaces to tabs. This reverses the effect of
expand. - cut
A tool for extracting fields from files. It is similar to the
print $N command set in awk, but more limited. It may be
simpler to use cut in a script than
awk. Particularly important are the
-d (delimiter) and -f
(field specifier) options. Using cut to obtain a listing of the
mounted filesystems:
cat /etc/mtab | cut -d ' ' -f1,2 |
Using cut to list the OS and kernel version:
uname -a | cut -d" " -f1,3,11,12 |
Using cut to extract message headers from
an e-mail folder:
bash$ grep '^Subject:' read-messages | cut -c10-80
Re: Linux suitable for mission-critical apps?
MAKE MILLIONS WORKING AT HOME!!!
Spam complaint
Re: Spam complaint |
Using cut to parse a file:
# List all the users in /etc/passwd.
FILENAME=/etc/passwd
for user in $(cut -d: -f1 $FILENAME)
do
echo $user
done
# Thanks, Oleg Philon for suggesting this. |
cut -d ' ' -f2,3 filename is equivalent to
awk -F'[ ]' '{ print $2, $3 }' filename See also Example 12-42. - paste
Tool for merging together different files into a single,
multi-column file. In combination with
cut, useful for creating system log
files.
- join
Consider this a special-purpose cousin of
paste. This powerful utility allows
merging two files in a meaningful fashion, which essentially
creates a simple version of a relational database. The join command operates on
exactly two files, but pastes together only those lines
with a common tagged field (usually a numerical label),
and writes the result to stdout.
The files to be joined should be sorted according to the
tagged field for the matchups to work properly. File: 1.data
100 Shoes
200 Laces
300 Socks |
File: 2.data
100 $40.00
200 $1.00
300 $2.00 |
bash$ join 1.data 2.data
File: 1.data 2.data
100 Shoes $40.00
200 Laces $1.00
300 Socks $2.00
|
| The tagged field appears only once in the
output. |
- head
lists the beginning of a file to
stdout (the default is
10 lines, but this can be changed). It
has a number of interesting options.
Example 12-12. Which files are scripts? #!/bin/bash
# script-detector.sh: Detects scripts within a directory.
TESTCHARS=2 # Test first 2 characters.
SHABANG='#!' # Scripts begin with a "sha-bang."
for file in * # Traverse all the files in current directory.
do
if [[ `head -c$TESTCHARS "$file"` = "$SHABANG" ]]
# head -c2 #!
# The '-c' option to "head" outputs a specified
#+ number of characters, rather than lines (the default).
then
echo "File \"$file\" is a script."
else
echo "File \"$file\" is *not* a script."
fi
done
exit 0
# Exercises:
# ---------
# 1) Modify this script to take as an optional argument
#+ the directory to scan for scripts
#+ (rather than just the current working directory).
#
# 2) As it stands, this script gives "false positives" for
#+ Perl, awk, and other scripting language scripts.
# Correct this. |
Example 12-13. Generating 10-digit random numbers #!/bin/bash
# rnd.sh: Outputs a 10-digit random number
# Script by Stephane Chazelas.
head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p'
# =================================================================== #
# Analysis
# --------
# head:
# -c4 option takes first 4 bytes.
# od:
# -N4 option limits output to 4 bytes.
# -tu4 option selects unsigned decimal format for output.
# sed:
# -n option, in combination with "p" flag to the "s" command,
# outputs only matched lines.
# The author of this script explains the action of 'sed', as follows.
# head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p'
# ----------------------------------> |
# Assume output up to "sed" --------> |
# is 0000000 1198195154\n
# sed begins reading characters: 0000000 1198195154\n.
# Here it finds a newline character,
#+ so it is ready to process the first line (0000000 1198195154).
# It looks at its <range><action>s. The first and only one is
# range action
# 1 s/.* //p
# The line number is in the range, so it executes the action:
#+ tries to substitute the longest string ending with a space in the line
# ("0000000 ") with nothing (//), and if it succeeds, prints the result
# ("p" is a flag to the "s" command here, this is different from the "p" command).
# sed is now ready to continue reading its input. (Note that before
#+ continuing, if -n option had not been passed, sed would have printed
#+ the line once again).
# Now, sed reads the remainder of the characters, and finds the end of the file.
# It is now ready to process its 2nd line (which is also numbered '$' as
# it's the last one).
# It sees it is not matched by any <range>, so its job is done.
# In few word this sed commmand means:
# "On the first line only, remove any character up to the right-most space,
#+ then print it."
# A better way to do this would have been:
# sed -e 's/.* //;q'
# Here, two <range><action>s (could have been written
# sed -e 's/.* //' -e q):
# range action
# nothing (matches line) s/.* //
# nothing (matches line) q (quit)
# Here, sed only reads its first line of input.
# It performs both actions, and prints the line (substituted) before quitting
#+ (because of the "q" action) since the "-n" option is not passed.
# =================================================================== #
# An even simpler altenative to the above one-line script would be:
# head -c4 /dev/urandom| od -An -tu4
exit 0 |
See also Example 12-35.- tail
lists the end of a file to stdout
(the default is 10 lines). Commonly used
to keep track of changes to a system logfile, using the
-f option, which outputs lines appended
to the file. Example 12-14. Using tail to monitor the system log #!/bin/bash
filename=sys.log
cat /dev/null > $filename; echo "Creating / cleaning out file."
# Creates file if it does not already exist,
#+ and truncates it to zero length if it does.
# : > filename and > filename also work.
tail /var/log/messages > $filename
# /var/log/messages must have world read permission for this to work.
echo "$filename contains tail end of system log."
exit 0 |
| To list a specific line of a text file,
pipe the output of
head to tail -1.
For example head -8 database.txt | tail
-1 lists the 8th line of the file
database.txt. To set a variable to a given block of a text file:
var=$(head -$m $filename | tail -$n)
# filename = name of file
# m = from beginning of file, number of lines to end of block
# n = number of lines to set variable to (trim from end of block) |
|
See also Example 12-5, Example 12-35 and
Example 29-6. - grep
A multi-purpose file search tool that uses
Regular Expressions.
It was originally a command/filter in the
venerable ed line editor:
g/re/p -- global -
regular expression - print. grep pattern [file...] Search the target file(s) for
occurrences of pattern, where
pattern may be literal text
or a Regular Expression. bash$ grep '[rst]ystem.$' osinfo.txt
The GPL governs the distribution of the Linux operating system.
|
If no target file(s) specified, grep
works as a filter on stdout, as in
a pipe. bash$ ps ax | grep clock
765 tty1 S 0:00 xclock
901 pts/1 S 0:00 grep clock
|
The -i option causes a case-insensitive
search. The -w option matches only whole
words. The -l option lists only the files in which
matches were found, but not the matching lines. The -r (recursive) option searches files in
the current working directory and all subdirectories below
it. The -n option lists the matching lines,
together with line numbers. bash$ grep -n Linux osinfo.txt
2:This is a file containing information about Linux.
6:The GPL governs the distribution of the Linux operating system.
|
The -v (or --invert-match)
option filters out matches.
grep pattern1 *.txt | grep -v pattern2
# Matches all lines in "*.txt" files containing "pattern1",
# but ***not*** "pattern2". |
The -c (--count)
option gives a numerical count of matches, rather than
actually listing the matches.
grep -c txt *.sgml # (number of occurrences of "txt" in "*.sgml" files)
# grep -cz .
# ^ dot
# means count (-c) zero-separated (-z) items matching "."
# that is, non-empty ones (containing at least 1 character).
#
printf 'a b\nc d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -cz . # 4
printf 'a b\nc d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -cz '$' # 5
printf 'a b\nc d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -cz '^' # 5
#
printf 'a b\nc d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -c '$' # 9
# By default, newline chars (\n) separate items to match.
# Note that the -z option is GNU "grep" specific.
# Thanks, S.C. |
When invoked with more than one target file given,
grep specifies which file contains
matches. bash$ grep Linux osinfo.txt misc.txt
osinfo.txt:This is a file containing information about Linux.
osinfo.txt:The GPL governs the distribution of the Linux operating system.
misc.txt:The Linux operating system is steadily gaining in popularity.
|
| To force grep to show the filename
when searching only one target file, simply give
/dev/null as the second file. bash$ grep Linux osinfo.txt /dev/null
osinfo.txt:This is a file containing information about Linux.
osinfo.txt:The GPL governs the distribution of the Linux operating system.
|
|
If there is a successful match, grep
returns an exit status
of 0, which makes it useful in a condition test in a
script, especially in combination with the -q
option to suppress output.
SUCCESS=0 # if grep lookup succeeds
word=Linux
filename=data.file
grep -q "$word" "$filename" # The "-q" option causes nothing to echo to stdout.
if [ $? -eq $SUCCESS ]
# if grep -q "$word" "$filename" can replace lines 5 - 7.
then
echo "$word found in $filename"
else
echo "$word not found in $filename"
fi |
Example 29-6 demonstrates how to use
grep to search for a word pattern in
a system logfile. Example 12-15. Emulating "grep" in a script #!/bin/bash
# grp.sh: Very crude reimplementation of 'grep'.
E_BADARGS=65
if [ -z "$1" ] # Check for argument to script.
then
echo "Usage: `basename $0` pattern"
exit $E_BADARGS
fi
echo
for file in * # Traverse all files in $PWD.
do
output=$(sed -n /"$1"/p $file) # Command substitution.
if [ ! -z "$output" ] # What happens if "$output" is not quoted?
then
echo -n "$file: "
echo $output
fi # sed -ne "/$1/s|^|${file}: |p" is equivalent to above.
echo
done
echo
exit 0
# Exercises:
# ---------
# 1) Add newlines to output, if more than one match in any given file.
# 2) Add features. |
How can grep search for two (or
more) separate patterns? What if you want
grep to display all lines in a file
or files that contain both "pattern1"
and "pattern2"? One method is to pipe the result of grep
pattern1 to grep pattern2. For example, given the following file: # Filename: tstfile
This is a sample file.
This is an ordinary text file.
This file does not contain any unusual text.
This file is not unusual.
Here is some text. |
Now, let's search this file for lines containing
both "file" and
"test" . . . bash$ grep file tstfile
# Filename: tstfile
This is a sample file.
This is an ordinary text file.
This file does not contain any unusual text.
This file is not unusual.
bash$ grep file tstfile | grep text
This is an ordinary text file.
This file does not contain any unusual text. |
-- egrep
- extended grep - is the same
as grep -E. This uses a somewhat
different, extended set of Regular
Expressions, which can make the search a bit more
flexible. fgrep - fast grep
- is the same as grep -F. It does
a literal string search (no Regular Expressions), which
usually speeds things up a bit. | On some Linux distros, egrep and
fgrep are symbolic links to, or aliases for
grep, but invoked with the
-E and -F options,
respectively. |
Example 12-16. Looking up definitions in Webster's 1913 Dictionary #!/bin/bash
# dict-lookup.sh
# This script looks up definitions in the 1913 Webster's Dictionary.
# This Public Domain dictionary is available for download
#+ from various sites, including
#+ Project Gutenberg (http://www.gutenberg.org/etext/247).
#
# Convert it from DOS to UNIX format (only LF at end of line)
#+ before using it with this script.
# Store the file in plain, uncompressed ASCII.
# Set DEFAULT_DICTFILE variable below to path/filename.
E_BADARGS=65
MAXCONTEXTLINES=50 # Maximum number of lines to show.
DEFAULT_DICTFILE="/usr/share/dict/webster1913-dict.txt"
# Default dictionary file pathname.
# Change this as necessary.
# Note:
# ----
# This particular edition of the 1913 Webster's
#+ begins each entry with an uppercase letter
#+ (lowercase for the remaining characters).
# Only the *very first line* of an entry begins this way,
#+ and that's why the search algorithm below works.
if [[ -z $(echo "$1" | sed -n '/^[A-Z]/p') ]]
# Must at least specify word to look up, and
#+ it must start with an uppercase letter.
then
echo "Usage: `basename $0` Word-to-define [dictionary-file]"
echo
echo "Note: Word to look up must start with capital letter,"
echo "with the rest of the word in lowercase."
echo "--------------------------------------------"
echo "Examples: Abandon, Dictionary, Marking, etc."
exit $E_BADARGS
fi
if [ -z "$2" ] # May specify different dictionary
#+ as an argument to this script.
then
dictfile=$DEFAULT_DICTFILE
else
dictfile="$2"
fi
# ---------------------------------------------------------
Definition=$(fgrep -A $MAXCONTEXTLINES "$1 \\" "$dictfile")
# Definitions in form "Word \..."
#
# And, yes, "fgrep" is fast enough
#+ to search even a very large text file.
# Now, snip out just the definition block.
echo "$Definition" |
sed -n '1,/^[A-Z]/p' |
# Print from first line of output
#+ to the first line of the next entry.
sed '$d' | sed '$d'
# Delete last two lines of output
#+ (blank line and first line of next entry).
# ---------------------------------------------------------
exit 0
# Exercises:
# ---------
# 1) Modify the script to accept any type of alphabetic input
# + (uppercase, lowercase, mixed case), and convert it
# + to an acceptable format for processing.
#
# 2) Convert the script to a GUI application,
# + using something like "gdialog" . . .
# The script will then no longer take its argument(s)
# + from the command line.
#
# 3) Modify the script to parse one of the other available
# + Public Domain Dictionaries, such as the U.S. Census Bureau Gazetteer. |
agrep (approximate
grep) extends the capabilities of
grep to approximate matching. The search
string may differ by a specified number of characters
from the resulting matches. This utility is not part of
the core Linux distribution. | To search compressed files, use
zgrep, zegrep, or
zfgrep. These also work on non-compressed
files, though slower than plain grep,
egrep, fgrep.
They are handy for searching through a mixed set of files,
some compressed, some not. To search bzipped
files, use bzgrep. |
- look
The command look works like
grep, but does a lookup on
a "dictionary", a sorted word list.
By default, look searches for a match
in /usr/dict/words, but a different
dictionary file may be specified. Example 12-17. Checking words in a list for validity #!/bin/bash
# lookup: Does a dictionary lookup on each word in a data file.
file=words.data # Data file from which to read words to test.
echo
while [ "$word" != end ] # Last word in data file.
do
read word # From data file, because of redirection at end of loop.
look $word > /dev/null # Don't want to display lines in dictionary file.
lookup=$? # Exit status of 'look' command.
if [ "$lookup" -eq 0 ]
then
echo "\"$word\" is valid."
else
echo "\"$word\" is invalid."
fi
done <"$file" # Redirects stdin to $file, so "reads" come from there.
echo
exit 0
# ----------------------------------------------------------------
# Code below line will not execute because of "exit" command above.
# Stephane Chazelas proposes the following, more concise alternative:
while read word && [[ $word != end ]]
do if look "$word" > /dev/null
then echo "\"$word\" is valid."
else echo "\"$word\" is invalid."
fi
done <"$file"
exit 0 |
- sed, awk
Scripting languages especially suited for parsing text
files and command output. May be embedded singly or in
combination in pipes and shell scripts. - sed
Non-interactive "stream editor", permits using
many ex commands in batch mode. It
finds many uses in shell scripts. - awk
Programmable file extractor and formatter, good for
manipulating and/or extracting fields (columns) in
structured text files. Its syntax is similar to C. - wc
wc gives a "word count" on a file or I/O stream:
bash $ wc /usr/share/doc/sed-4.1.2/README
13 70 447 README
[13 lines 70 words 447 characters] |
wc -w gives only the word count. wc -l gives only the line count. wc -c gives only the byte count. wc -m gives only the character count. wc -L gives only the length of the longest line. Using wc to count how many
.txt files are in current working directory:
$ ls *.txt | wc -l
# Will work as long as none of the "*.txt" files have a linefeed in their name.
# Alternative ways of doing this are:
# find . -maxdepth 1 -name \*.txt -print0 | grep -cz .
# (shopt -s nullglob; set -- *.txt; echo $#)
# Thanks, S.C. |
Using wc to total up the size of all the
files whose names begin with letters in the range d - h
bash$ wc [d-h]* | grep total | awk '{print $3}'
71832
|
Using wc to count the instances of the
word "Linux" in the main source file for
this book.
bash$ grep Linux abs-book.sgml | wc -l
50
|
See also Example 12-35 and Example 16-8. Certain commands include some of the
functionality of wc as options.
... | grep foo | wc -l
# This frequently used construct can be more concisely rendered.
... | grep -c foo
# Just use the "-c" (or "--count") option of grep.
# Thanks, S.C. |
- tr
character translation filter. | Must use quoting and/or
brackets, as appropriate. Quotes prevent the
shell from reinterpreting the special characters in
tr command sequences. Brackets should be
quoted to prevent expansion by the shell. |
Either tr "A-Z" "*" <filename
or tr A-Z \* <filename changes
all the uppercase letters in filename
to asterisks (writes to stdout).
On some systems this may not work, but tr A-Z
'[**]' will. The -d option deletes a range of
characters.
echo "abcdef" # abcdef
echo "abcdef" | tr -d b-d # aef
tr -d 0-9 <filename
# Deletes all digits from the file "filename". |
The --squeeze-repeats (or
-s) option deletes all but the
first instance of a string of consecutive characters.
This option is useful for removing excess whitespace.
bash$ echo "XXXXX" | tr --squeeze-repeats 'X'
X |
The -c "complement"
option inverts the character set to
match. With this option, tr acts only
upon those characters not matching
the specified set. bash$ echo "acfdeb123" | tr -c b-d +
+c+d+b++++ |
Note that tr recognizes POSIX character classes.
bash$ echo "abcd2ef1" | tr '[:alpha:]' -
----2--1
|
Example 12-18. toupper: Transforms a file to all uppercase. #!/bin/bash
# Changes a file to all uppercase.
E_BADARGS=65
if [ -z "$1" ] # Standard check for command line arg.
then
echo "Usage: `basename $0` filename"
exit $E_BADARGS
fi
tr a-z A-Z <"$1"
# Same effect as above, but using POSIX character set notation:
# tr '[:lower:]' '[:upper:]' <"$1"
# Thanks, S.C.
exit 0
# Exercise:
# Rewrite this script to give the option of changing a file
#+ to *either* upper or lowercase. |
Example 12-19. lowercase: Changes all filenames in working directory to lowercase. #!/bin/bash
#
# Changes every filename in working directory to all lowercase.
#
# Inspired by a script of John Dubois,
#+ which was translated into Bash by Chet Ramey,
#+ and considerably simplified by the author of the ABS Guide.
for filename in * # Traverse all files in directory.
do
fname=`basename $filename`
n=`echo $fname | tr A-Z a-z` # Change name to lowercase.
if [ "$fname" != "$n" ] # Rename only files not already lowercase.
then
mv $fname $n
fi
done
exit $?
# Code below this line will not execute because of "exit".
#--------------------------------------------------------#
# To run it, delete script above line.
# The above script will not work on filenames containing blanks or newlines.
# Stephane Chazelas therefore suggests the following alternative:
for filename in * # Not necessary to use basename,
# since "*" won't return any file containing "/".
do n=`echo "$filename/" | tr '[:upper:]' '[:lower:]'`
# POSIX char set notation.
# Slash added so that trailing newlines are not
# removed by command substitution.
# Variable substitution:
n=${n%/} # Removes trailing slash, added above, from filename.
[[ $filename == $n ]] || mv "$filename" "$n"
# Checks if filename already lowercase.
done
exit $? |
Example 12-20. Du: DOS to UNIX text file conversion. #!/bin/bash
# Du.sh: DOS to UNIX text file converter.
E_WRONGARGS=65
if [ -z "$1" ]
then
echo "Usage: `basename $0` filename-to-convert"
exit $E_WRONGARGS
fi
NEWFILENAME=$1.unx
CR='\015' # Carriage return.
# 015 is octal ASCII code for CR.
# Lines in a DOS text file end in CR-LF.
# Lines in a UNIX text file end in LF only.
tr -d $CR < $1 > $NEWFILENAME
# Delete CR's and write to new file.
echo "Original DOS text file is \"$1\"."
echo "Converted UNIX text file is \"$NEWFILENAME\"."
exit 0
# Exercise:
# --------
# Change the above script to convert from UNIX to DOS. |
Example 12-21. rot13: rot13, ultra-weak encryption. #!/bin/bash
# rot13.sh: Classic rot13 algorithm,
# encryption that might fool a 3-year old.
# Usage: ./rot13.sh filename
# or ./rot13.sh <filename
# or ./rot13.sh and supply keyboard input (stdin)
cat "$@" | tr 'a-zA-Z' 'n-za-mN-ZA-M' # "a" goes to "n", "b" to "o", etc.
# The 'cat "$@"' construction
#+ permits getting input either from stdin or from files.
exit 0 |
Example 12-22. Generating "Crypto-Quote" Puzzles #!/bin/bash
# crypto-quote.sh: Encrypt quotes
# Will encrypt famous quotes in a simple monoalphabetic substitution.
# The result is similar to the "Crypto Quote" puzzles
#+ seen in the Op Ed pages of the Sunday paper.
key=ETAOINSHRDLUBCFGJMQPVWZYXK
# The "key" is nothing more than a scrambled alphabet.
# Changing the "key" changes the encryption.
# The 'cat "$@"' construction gets input either from stdin or from files.
# If using stdin, terminate input with a Control-D.
# Otherwise, specify filename as command-line parameter.
cat "$@" | tr "a-z" "A-Z" | tr "A-Z" "$key"
# | to uppercase | encrypt
# Will work on lowercase, uppercase, or mixed-case quotes.
# Passes non-alphabetic characters through unchanged.
# Try this script with something like:
# "Nothing so needs reforming as other people's habits."
# --Mark Twain
#
# Output is:
# "CFPHRCS QF CIIOQ MINFMBRCS EQ FPHIM GIFGUI'Q HETRPQ."
# --BEML PZERC
# To reverse the encryption:
# cat "$@" | tr "$key" "A-Z"
# This simple-minded cipher can be broken by an average 12-year old
#+ using only pencil and paper.
exit 0
# Exercise:
# --------
# Modify the script so that it will either encrypt or decrypt,
#+ depending on command-line argument(s). |
- fold
A filter that wraps lines of input to a specified width.
This is especially useful with the -s
option, which breaks lines at word spaces (see Example 12-23 and Example A-1). - fmt
Simple-minded file formatter, used as a filter in a
pipe to "wrap" long lines of text
output. Example 12-23. Formatted file listing. #!/bin/bash
WIDTH=40 # 40 columns wide.
b=`ls /usr/local/bin` # Get a file listing...
echo $b | fmt -w $WIDTH
# Could also have been done by
# echo $b | fold - -s -w $WIDTH
exit 0 |
See also Example 12-5. - col
This deceptively named filter removes reverse line feeds
from an input stream. It also attempts to replace
whitespace with equivalent tabs. The chief use of
col is in filtering the output
from certain text processing utilities, such as
groff and tbl. - column
Column formatter. This filter transforms list-type
text output into a "pretty-printed" table
by inserting tabs at appropriate places. Example 12-24. Using column to format a directory
listing #!/bin/bash
# This is a slight modification of the example file in the "column" man page.
(printf "PERMISSIONS LINKS OWNER GROUP SIZE MONTH DAY HH:MM PROG-NAME\n" \
; ls -l | sed 1d) | column -t
# The "sed 1d" in the pipe deletes the first line of output,
#+ which would be "total N",
#+ where "N" is the total number of files found by "ls -l".
# The -t option to "column" pretty-prints a table.
exit 0 |
- colrm
Column removal filter. This removes columns (characters)
from a file and writes the file, lacking the range of
specified columns, back to stdout.
colrm 2 4 <filename removes the
second through fourth characters from each line of the
text file filename. | If the file contains tabs or nonprintable
characters, this may cause unpredictable
behavior. In such cases, consider using
expand and
unexpand in a pipe preceding
colrm. |
- nl
Line numbering filter. nl filename
lists filename to
stdout, but inserts consecutive
numbers at the beginning of each non-blank line. If
filename omitted, operates on
stdin. The output of nl is very similar to
cat -n, however, by default
nl does not list blank lines. Example 12-25. nl: A self-numbering script. #!/bin/bash
# line-number.sh
# This script echoes itself twice to stdout with its lines numbered.
# 'nl' sees this as line 4 since it does not number blank lines.
# 'cat -n' sees the above line as number 6.
nl `basename $0`
echo; echo # Now, let's try it with 'cat -n'
cat -n `basename $0`
# The difference is that 'cat -n' numbers the blank lines.
# Note that 'nl -ba' will also do so.
exit 0
# ----------------------------------------------------------------- |
- pr
Print formatting filter. This will paginate files
(or stdout) into sections suitable for
hard copy printing or viewing on screen. Various options
permit row and column manipulation, joining lines, setting
margins, numbering lines, adding page headers, and merging
files, among other things. The pr
command combines much of the functionality of
nl, paste,
fold, column, and
expand. pr -o 5 --width=65 fileZZZ | more
gives a nice paginated listing to screen of
fileZZZ with margins set at 5 and
65. A particularly useful option is -d,
forcing double-spacing (same effect as sed
-G). - gettext
The GNU gettext package is a set of
utilities for localizing
and translating the text output of programs into foreign
languages. While originally intended for C programs, it
now supports quite a number of programming and scripting
languages. The gettext
program works on shell scripts. See
the info page. - msgfmt
A program for generating binary
message catalogs. It is used for localization. - iconv
A utility for converting file(s) to a different encoding
(character set). Its chief use is for localization. - recode
Consider this a fancier version of
iconv, above. This very versatile utility
for converting a file to a different encoding is not part
of the standard Linux installation. - TeX, gs
TeX and Postscript
are text markup languages used for preparing copy for
printing or formatted video display. TeX is Donald Knuth's elaborate
typsetting system. It is often convenient to write a
shell script encapsulating all the options and arguments
passed to one of these markup languages. Ghostscript
(gs) is a GPL-ed Postscript
interpreter. - enscript
Utility for converting plain text file to PostScript For example, enscript filename.txt -p filename.ps
produces the PostScript output file
filename.ps. - groff, tbl, eqn
Yet another text markup and display formatting language
is groff. This is the enhanced GNU version
of the venerable UNIX roff/troff display
and typesetting package. Manpages
use groff. The tbl table processing utility
is considered part of groff, as its
function is to convert table markup into
groff commands. The eqn equation processing utility
is likewise part of groff, and
its function is to convert equation markup into
groff commands. Example 12-26. manview: Viewing formatted manpages
#!/bin/bash
# manview.sh: Formats the source of a man page for viewing.
# This script is useful when writing man page source.
# It lets you look at the intermediate results on the fly
#+ while working on it.
E_WRONGARGS=65
if [ -z "$1" ]
then
echo "Usage: `basename $0` filename"
exit $E_WRONGARGS
fi
# ---------------------------
groff -Tascii -man $1 | less
# From the man page for groff.
# ---------------------------
# If the man page includes tables and/or equations,
#+ then the above code will barf.
# The following line can handle such cases.
#
# gtbl < "$1" | geqn -Tlatin1 | groff -Tlatin1 -mtty-char -man
#
# Thanks, S.C.
exit 0 |
- lex, yacc
The lex lexical analyzer produces
programs for pattern matching. This has been replaced
by the nonproprietary flex on Linux
systems. The yacc utility creates a
parser based on a set of specifications. This has been
replaced by the nonproprietary bison
on Linux systems.
|
|