Часть 4
Введение в модули Perl
Содержание
В этой части вы узнаете о модулях,пакетах и классах.
Перловый модуль похож на обычный набор библиотечных функций.
Термин module в Perl практически совпадает с термином package.
Пакеты появились в Perl 4, модули - в Perl 5.
Код модуля хранится в одном файле.
При этом приложение разбивается на блочные модули.
Вообще понятие module шире нежели package.
Модуль-это набор переменных,кода и других данных.
Например использование глобальных переменных с такими именами ,
как $k,
$j, или $i
- затея не очень хорошая.
Локальная переменная
$i, может быть где угодно.
А использование модулей эту проблему решает,
поскольку видимость переменных ограничена внутри одного модуля.
The symbols defined for your variables are stored in an associative
array, referred to as a symbol table. These symbol tables
are unique to a package. Therefore, variables of the same name
in two different packages can have different values.
Each module has its own symbol table of all symbols that are declared
within it. The symbol table basically isolates synonymous names
in one module from another. The symbol table defines a namespace,
that is, a space for independent variable names to exist in. Thus,
the use of modules, each with its own symbol table, prevents a
variable declared in one section from overwriting the values of
other variables with the same name declared elsewhere in the same
program.
As a matter of fact, all variables in Perl belong to a package.
The variables in a Perl program belong to the main
package. All other packages within a Perl program either are nested
within this main package or exist at the same level. There are
some truly global variables, such as the signal handler array
%SIG, that are available
to all other modules in an application program and cannot be isolated
via namespaces. Only those variable identifiers starting with
letters or an underscore are kept in a module's symbol table.
All other symbols, such as the names STDIN,
STDOUT, STDERR,
ARGV, ARGVOUT,
ENV, Inc,
and SIG are forced to be
in package _main.
Switching between packages affects only namespaces. All you are
doing when you use one package or another is declaring which symbol
table to use as the default symbol table for lookup of variable
names. Only dynamic variables are affected by the use of symbol
tables. Variables declared by the use of the my
keyword are still resolved with the code block they happen to
reside in and are not referenced through symbol tables. In fact,
the scope of a package declaration remains active only within
the code block it is declared in. Therefore, if you switch symbol
tables by using a package within a subroutine, the original symbol
table in effect when the call was made will be restored when the
subroutine returns.
Switching symbol tables affects only the default lookup of dynamic
variable names. You can still explicitly refer to variables, file
handles, and so on in a specific package by prepending a packageName::
to the variable name. You saw what a package context was
when using references in Chapter 3. A package
context simply implies the use of the symbol table by the Perl
interpreter for resolving variable names in a program. By switching
symbol tables, you are switching the package context.
Modules can be nested within other modules. The nested module
can use the variables and functions of the module it is nested
within. For nested modules, you would have to use moduleName::nestedModuleName
and so on. Using the double colon (::)
is synonymous with using a back quote (`).
However, the double colon is the preferred, future way of addressing
variables within modules.
Explicit addressing of module variables is always done with a
complete reference. For example, suppose you have a module, Investment,
which is the default package in use, and you want to address another
module, Bonds, which is nested
within the Investment module.
In this case, you cannot use Bond::.
Instead, you would have to use Investment::Bond::
to address variables and functions within the Bond
module. Using Bond:: would
imply the use of a package Bond
that is nested within the main
module and not within the Investment
module.
The symbol table for a module is actually stored in an associative
array of the module's names appended with two colons. The symbol
table for a module called Bond
will be referred to as the associative array %Bond::.
The name for the symbol table for the main
module is %main::, and can
even be shortened to %::.
Similarly, all nested packages have their symbols stored in associative
arrays with double colons separating each nesting level. For example,
in the Bond module that is
nested within the Investment
module, the associative array for the symbols in the Bond
module will be named %Investment::Bond::.
A typeglob is really a global
type for a symbol name. You can perform aliasing operations by
assigning to a typeglob.
One or more entries in an associative array for symbols will be
used when an assignment via a typeglob
is used. The actual value in each entry of the associative array
is what you are referring to when you use the *variableName
notation. Thus, there are two ways of referring to variable
names in a package:
*Investment::money = *Investment::bills;
$Investment::{'money'} = $Investment::{'bills'};
In the first method, you are referring to the variables via a
typeglob reference. The use
of the symbol table, %Investment::,
is implied here, and Perl will optimize the lookup for symbols
money and bills.
This is the faster and preferred way of addressing a symbol. The
second method uses a lookup for the value of a variable addressed
by 'money' and 'bills'
in the associative array used for symbols, %Investment::
explicitly. This lookup would be done dynamically and will not
be optimized by Perl. Therefore, the lookup will be forced to
check the associative array every time the statement is executed.
As a result, the second method is not efficient and should be
used only for demonstration of how the symbol table is implemented
internally.
Another example in this statement
*kamran = *husain;
causes variables, subroutines, and file handles that are named
via the symbol kamran to
also be addressed via the symbol husain.
That is, all symbol entries in the current symbol table with the
key kamran will now contain
references to those symbols addressed by the key husain.
To prevent such a global assignment, you can use explicit references.
For example, the following statement will let you address the
contents of $husain via the
variable $kamran:
*kamran = \$husain;
However, any arrays such @kamran
and @husain will not be the
same. Only what the references specified explicitly will be changed.
To summarize, when you assign one typeglob
to another, you affect all the entries in a symbol table regardless
of the type of variable being referred to. When you assign a reference
from one variable type to another, you are only affecting one
entry in the symbol table.
A Perl module file has the following format:
package ModuleName;
...
#### Insert module code ####
...
1;
The filename has to be called ModuleName.pm.
The name of a module must end in the string .pm
by convention. The package
statement is the first line of the file. The last line of the
file must contain the line with the 1;
statement. This in effect returns a true
value to the application program using the module. Not using the
1; statement will not let
the module be loaded correctly.
The package statement tells
the Perl interpreter to start with a new namespace domain. Basically,
all your variables in a Perl script belong to a package called
main. Every variable in the
main package can be referred
to as $main'variable.
Here's the syntax for such references:
$packageName'variableName
The single quote (') is synonymous
with the double colon (::)
operator. I cover more uses of the ::
operator in the next chapter. For the time being, you must remember
that the following two statements are equivalent:
$packageName'variableName;
$packageName::variableName;
The double-colon syntax is considered standard in the Perl world.
Therefore, to preserve readability, I use the double-colon syntax
in the rest of this book unless it's absolutely necessary to make
exceptions to prove a point.
The default use of a variable name defers to the current package
active at the time of compilation. Thus, if you are in the package
Finance.pm and specify a
variable $pv, the variable
is actually equal to $Finance::$pv.
You include Perl modules in your program by using the use
or the require statement.
Here's the way to use either of these statements:
use ModuleName;
require ModuleName;
Note that the .pm extension
is not used in the code shown above. Also note that neither statement
allows a file to be included more than once in a program. The
returned value of true (1;)
as the last statement is required to let Perl know that a required
or used module loaded correctly
and lets the Perl interpreter ignore any reloads. In general,
it's better to use the use Module;
statement than the require Module;
statement in a Perl program to remain compatible with future versions
of Perl.
For modules, you might want to consider continuing to use the
require statement. Here's
why: The use statement does
a little bit more work than the require
statement in that it alters the namespace of the module that includes
another module. You want this extra update of the namespace to
be done in a program. However, when writing code for a module,
you may not want the namespace to be altered unless it's explicitly
required. In this event, you will use the require
statement.
The require statement includes
the full pathname of a file in the @Inc
array so that the functions and variables in the module's file
are in a known location during execution time. Therefore, the
functions that are imported from a module are imported via an
explicit module reference at runtime with the require
statement. The use statement
does the same thing as the require
statement because it updates the @Inc
array with full pathnames of loaded modules. The code for the
use function also goes a
step further and calls an import
function in the module being used
to explicitly load the list of exported functions at compile time,
thus saving the time required for an explicit resolution of a
function name during execution.
Basically, the use statement
is equivalent to
require ModuleName; import ModuleName
[list of imported functions];
The use of the use statement
does change your program's namespace because the imported function
names are inserted in the symbol table. The require
statement does not alter your program's namespace. Therefore,
the following statement
use ModuleName ();
is equivalent to this statement:
require ModuleName;
Functions are imported from a module via a call to a function
called import. You can write
your own import function
in a module, or you can use the Exporter
module and use its import
function. In almost all cases, you will use the Exporter
module to provide an import
function instead of reinventing the wheel. (You'll learn more
on this in the next section.) Should you decide not to use the
Exporter module, you will
have to write your own import
function in each module that you write. It's much easier to simply
use the Exporter module and
let Perl do the work for you.
The best way to illustrate the semantics of how a module is used
in Perl is to write a simple module and show how to use it. Let's
take the example of a local loan shark, Rudious Maximus, who is
simply tired of typing the same "request for payment"
letters. Being an avid fan of computers and Perl, Rudious takes
the lazy programmer's approach and writes a Perl module to help
him generate his memos and letters.
Now, instead of typing within fields in a memo template file,
all he has to do is type a few lines to produce his nice, threatening
note. Listing 4.1 shows you what he has to type.
Listing 4.1. Using the Letter
module.
1 #!/usr/bin/perl -w
2 #
3 # Uncomment the line
below to include the current dir in @Inc.
4 # push (@Inc, 'pwd');
5 #
6 use Letter;
7
8 Letter::To("Mr. Gambling Man","The money
for Lucky Dog, Race 2");
9 Letter::ClaimMoneyNice();
10 Letter::ThankDem();
11 Letter::Finish();
The use Letter; statement
is present to force the Perl interpreter to include the code for
the module in the application program. The module should be located
in the /usr/lib/perl5/ directory,
or you can place it in any directory listed in the @Inc
array. The @Inc array is
the list of directories that the Perl interpreter will look for
when attempting to load the code for the named module. The commented
line (number 4) shows how to add the current working directory
to include the path. The next four lines in the file generate
the subject matter for the letter.
Here's the output from using the Letter
module:
To: Mr. Gambling Man
Fm: Rudious Maximus, Loan Shark
Dt: Wed Feb 7 10:35:51 CST 1996
Re: The money for Lucky Dog, Race 2
====================================================
It has come to my attention that your account is
way over due.
You gonna pay us soon?
Or would you like me to come ovah?
Thanks for your support.
Sincerely,
Rudious
The Letter module file is
shown in Listing 4.2. The name of the package is declared in the
first line. Because this module's functions will be exported,
I use the Exporter module.
Therefore, the statement use Exporter;
is required to inherit functionality from the Exporter
module. Another required step is putting the word Exported
in the @ISA array to allow
searching for Exported.pm.
Note |
The @ISA array is a special array within each package. Each item in the array lists where else to look for a method if it cannot be found in the current package. The order in which packages are listed in the @ISA array is the order in which Perl searches for unresolved symbols. A class that is listed in the @ISA array is referred to as the base class of that particular class. Perl will cache
missing methods found in base classes for future references. Modifying the @ISA array will flush the cache and cause Perl to look up all methods again.
|
Let's now look at the code for Letter.pm
in Listing 4.2.
Listing 4.2. The Letter.pm
module.
1 package Letter;
2
3 require Exporter;
4 @ISA = (Exporter);
5
6 =head1 NAME
7
8 Letter - Sample module to generate letterhead for you
9
10 =head1 SYNOPSIS
11
12 use Letter;
13
14 Letter::Date();
15 Letter::To($name,$company,$address);
16
17 Then one of the following:
18 Letter::ClaimMoneyNice() {
19 Letter::ClaimMoney();
20 Letter::ThreatBreakLeg();
21
22 Letter::ThankDem();
23 Letter::Finish();
24
25 =head1 DESCRIPTION
26
27 This module provides a short example of generating a letter
for a
28 friendly neighborbood loan shark.
29
30 The code begins after the "cut" statement.
31 =cut
32
33 @EXPORT = qw( Date,
34
To,
35
ClaimMoney,
36
ClaimMoneyNice,
37
ThankDem,
38
Finish );
39
40 #
41 # Print today's date
42 #
43 sub Letter::Date {
44
$date = 'date';
45
print "\n Today is $date";
46 }
47
48 sub Letter::To {
49
local($name) = shift;
50
local($subject) = shift;
51
print "\n To: $name";
52
print "\n Fm: Rudious Maximus, Loan Shark";
53
print "\n Dt: ", `date`;
54
print "\n Re: $subject";
55
print "\n\n";
56
print "\n====================================================\n";
57 }
58 sub Letter::ClaimMoney() {
59
print "\n You owe me money. Get your act together";
60
print "\n Do you want me to send Bruno over to ";
61
print "\n collect it , or are you gonna pay up?";
62 }
63
64 sub Letter::ClaimMoneyNice() {
65
print "\n It is come to my attention that your account is
";
66
print "\n way over due.";
67
print "\n You gonna pay us soon..";
68
print "\n or would you like me to come ovah?";
69 }
70
71 sub Letter::ThreatBreakLeg() {
72
print "\n apparently letters like these dont help";
73
print "\n I will have to make an example of you";
74
print "\n \n See you in the hospital, pal!";
75 }
76
77 sub Letter::ThankDem() {
78
print "\n\n Thanks for your support";
79 }
80
81 sub Letter::Finish(){
82
printf "\n\n\n\n Sincerely";
83
printf "\n Rudious \n ";
84 }
85
86 1;
Lines containing the equal sign are used for documentation. You
must document each module for your own reference; Perl modules
do not need to be documented, but it's a good idea to write a
few lines about what your code does. A few years from now, you
may forget what a module is about. Good documentation is always
a must if you want to remember what you did in the past!
I cover documentation styles used for Perl in Chapter 8,
"Documenting Perl Scripts." For this sample module,
the =head1 statement begins
the documentation. Everything up to the =cut
statement is ignored by the Perl interpreter.
Next, the module lists all the functions exported by this module
in the @EXPORT array. The
@EXPORT array defines all
the function names that can be called by outside code. If you
do not list a function in this @EXPORT
array, it won't be seen by external code modules.
Following the @EXPORT array
is the body of the code, one subroutine at a time. After all the
subroutines are defined, the final statement 1;
ends the module file. 1;
must be the last executable line in the file.
Let's look at some of the functions defined in this module. The
first function to look at is the simple Date
function, lines 43 to 46, which prints the current UNIX date and
time. There are no parameters to this function, and it doesn't
return anything meaningful back to the caller.
Note the use of my before
the $date variable in line
44. The my keyword is used
to limit the scope of the variable to within the Date
function's curly braces. Code between curly braces is referred
to as a block. Variables declared within a block are limited
in scope to within the curly braces. In 49 and 50, the local variables
$name and $subject
are visible to all functions.
You can also declare variables with the local
qualifier. The use of local
allows a variable to be in scope for the current block as well
as for other blocks of code called from within this block. Thus,
a local $x declared within
one block is visible to all subsequent blocks called from within
this block and can be referenced. In the following sample code,
the ToTitled function's $name
variable can be accessed but not the data in $iphone:
1 sub Letter::ToTitled {
2
local($name) = shift;
3
my($phone) = shift;
The sample code for Letter.pm
showed how to extract one parameter at a time. The subroutine
To() takes two parameters
to set up the header for the memo.
Using functions within a module is not any different than using
and defining Perl modules within the same code file. Parameters
are passed by reference unless otherwise specified. Multiple arrays
passed into a subroutine, if not explicitly dereferenced using
the backslash, are concatenated.
The @_ input array in a function
is always an array of scalar values. Passing values by reference
is the preferred way in Perl to pass a large amount of data into
a subroutine. (See Chapter 3, "References.")
The Finance module, shown
in Listing 4.3, is used to provide simple calculations for loan
values. Using the Finance
module is straightforward. All the functions are written with
the same parameters, as shown in the formula for the functions.
Let's look at how the future value of an investment can be calculated.
For example, if you invest some dollars, $pv,
in a bond that offers a fixed percentage rate, $r,
applied at known intervals for $n
time periods, what is the value of the bond at the time of its
expiration? In this case, you'll be using the following formula:
$fv = $pv * (1+$r) ** $n ;
The function to get the future value is declared as FutureValue.
Refer to Listing 4.3 to see how to use it.
Listing 4.3. Using the Finance
module.
1 #!/usr/bin/perl -w
2
3 push(@Inc,'pwd');
4 use Finance;
5
6 $loan = 5000.00;
7 $apr = 3.5; # APR
8 $year = 10; # in years.
9
10 # ----------------------------------------------------------------
11 # Calculate the value at the end of the loan if interest
12 # is applied every year.
13 # ----------------------------------------------------------------
14 $time = $year;
15 $fv1 = Finance::FutureValue($loan,$apr,$time);
16 print "\n If interest is applied at end of year";
17 print "\n The future value for a loan of \$" . $loan
. "\n";
18 print " at an APR of ", $apr , " for ", $time,
" years";
19 printf " is %8.2f \n" , $fv1;
20
21 # ----------------------------------------------------------------
22 # Calculate the value at the end of the loan if interest
23 # is applied every month.
24 # ----------------------------------------------------------------
25 $rate = $apr / 12; # APR
26 $time = $year * 12; # in months
27 $fv2 = Finance::FutureValue($loan,$rate,$time);
28
29 print "\n If interest is applied at end of each month";
30 print "\n The future value for a loan of \$" . $loan
. "\n";
31 print " at an APR of ", $apr , " for ", $time,
" months";
32 printf " is %8.2f \n" , $fv2;
33
34 printf "\n The difference in value is %8.2f", $fv2
- $fv1;
35 printf "\n Therefore by applying interest at shorter time
periods";
36 printf "\n we are actually getting more money in interest.\n";
Here is sample input and output of Listing 4.3.
$ testme
If interest is
applied at end of year
The future value for a loan of $5000
at an APR of 3.5 for
10 years is 7052.99
If interest is applied
at end of each month
The future value for a loan of $5000
at an APR of 3.5 for
120 months is 7091.72
The difference in value
is 38.73
Therefore by applying interest at shorter time periods
we are actually getting more money in interest.
The revelation in the output is the result of the comparison of
values between $fv1 and $fv2.
The $fv1 value is calculated
with the application of interest once every year over the life
of the bond. $fv2 is the
value if the interest is applied every month at the equivalent
monthly interest rate.
The Finance.pm package is
shown in Listing 4.4 in its early development stages.
Listing 4.4. The Finance.pm
package.
1 package Finance;
2
3 require Exporter;
4 @ISA = (Exporter);
5
6 =head1 Finance.pm
7
8 Financial Calculator - Financial calculations made
easy with Perl
9
10 =head 2
11 use Finance;
12
13 $pv = 10000.0;
14
15 $rate = 12.5 / 12;
# APR per month.
16
17 $time = 360 ; #
months for loan to mature
18
19 $fv = FutureValue();
20
21 print $fv;
22
23 =cut
24
25 @EXPORT = qw( FutureValue,
26
PresentValue,
27
FVofAnnuity,
28
AnnuityOfFV,
29
getLastAverage,
30
getMovingAverage,
31
SetInterest);
32
33 #
34 # Globals, if any
35 #
36
37 local $defaultInterest
= 5.0;
38
39 sub Finance::SetInterest($)
{
40
my $rate = shift(@_);
41
$defaultInterest = $rate;
42
printf "\n \$defaultInterest = $rate";
43 }
44
45 # --------------------------------------------------------------------
46 # Notes:
47 # 1. The interest
rate $r is given in a value of [0-100].
48 # 2. The $n given in the terms is the rate at which the
interest
49 # is
applied.
50 #
51 # --------------------------------------------------------------------
52
53 # --------------------------------------------------------------------
54 # Present value of an investment given
55 # fv - a future
value
56 # r - rate per period
57 # n -
number of period
58 # --------------------------------------------------------------------
59 sub Finance::FutureValue($$$)
{
60
my ($pv,$r,$n) = @_;
61
my $fv = $pv * ((1 + ($r/100)) ** $n);
62
return $fv;
63 }
64
65 # --------------------------------------------------------------------
66 # Present value of an investment given
67 # fv - a future
value
68 # r - rate per period
69 # n -
number of period
70 # --------------------------------------------------------------------
71 sub Finance::PresentValue($$$)
{
72
my $pv;
73
my ($fv,$r,$n) = @_;
74
$pv = $fv / ((1 + ($r/100)) ** $n);
75
return $pv;
76
77 }
78
79 # --------------------------------------------------------------------
80 # Get the future value of an annuity given
81 # mp - Monthly Payment
of Annuity
82 # r - rate per period
83 # n -
number of period
84 # --------------------------------------------------------------------
85
86 sub FVofAnnuity($$$) {
87
my $fv;
88
my $oneR;
89
my ($mp,$r,$n) = @_;
90
91
$oneR = ( 1 + $r) ** $n;
92
$fv = $mp * ( ($oneR - 1)/ $r);
93
return $fv;
94 }
95
96 # --------------------------------------------------------------------
97 # Get the annuity
from the following bits of information
98 # r - rate per period
99 # n -
number of period
100 # fv - Future Value
101 # --------------------------------------------------------------------
102
103 sub AnnuityOfFV($$$) {
104
my $mp; # mp - Monthly Payment of Annuity
105
my $oneR;
106
my ($fv,$r,$n) = @_;
107
108
$oneR = ( 1 + $r) ** $n;
109
$mp = $fv * ( $r/ ($oneR - 1));
110
return $mp;
111 }
112
113 # --------------------------------------------------------------------
114 # Get the average of the last "n" values in an array.
115 # --------------------------------------------------------------------
116 # The last $count number of elements from the array in @values
117 # The total number of elements in @values is in $number
118 #
119 sub getLastAverage($$@) {
120
my ($count, $number, @values) = @_;
121
my $i;
122
123
my $a = 0;
124
return 0 if ($count == 0);
125
for ($i = 0; $i< $count; $i++) {
126
$a += $values[$number - $i - 1];
127
}
128
return $a / $count;
129
}
130
131 # --------------------------------------------------------------------
132 # Get a moving average of the values.
133 # --------------------------------------------------------------------
134 # The window size is the first parameter, the number of items
in the
135 # passed array is next. (This can easily be calculated within
the
136 # function using the scalar() function, but the subroutine
shown here
137 # is also being used to illustrate how to pass pointers.)
The reference to the
138 # array of values is passed next, followed by a reference
to the place
139 # the return values are to be stored.
140 #
141 sub getMovingAve($$\@\@) {
142
my ($count, $number, $values, $movingAve) = @_;
143
my $i;
144
my $a = 0;
145
my $v = 0;
146
147
return 0 if ($count == 0);
148
return -1 if ($count > $number);
149
return -2 if ($count < 2);
150
151
$$movingAve[0] = 0;
152
$$movingAve[$number - 1] = 0;
153
for ($i=0; $i<$count;$i++) {
154
$v = $$values[$i];
155
$a += $v / $count;
156
$$movingAve[$i] = 0;
157
}
158
for ($i=$count; $i<$number;$i++) {
159
$v = $$values[$i];
160
$a += $v / $count;
161
$v = $$values[$i - $count - 1];
162
$a -= $v / $count;
163
$$movingAve[$i] = $a;
164
}
165
return 0;
166
}
167
168 1;
Look at the declaration of the function FutureValue
with ($$$). The three dollar
signs together signify three scalar numbers being passed into
the function. This extra scoping is present for validating the
type of the parameters passed into the function. If you were to
pass a string instead of a number into the function, you would
get a message very similar to this one:
Too many arguments for Finance::FutureValue
at ./f4.pl line 15, near "$time)"
Execution of ./f4.pl aborted due to compilation errors.
The use of prototypes when defining functions prevents you from
sending in values other than what the function expects. Use @
or % to pass in an array
of values. If you are passing by reference, use \@
or \% to show a scalar reference
to an array or hash, respectively. If you do not use the backslash,
all other types in the argument list prototype are ignored. Other
types of disqualifiers include an ampersand for a reference to
a function, an asterisk for any type, and a semicolon to indicate
that all other parameters are optional.
Now, let's look at the lastMovingAverage
function declaration, which specifies two integers in the front
followed by an array. The way the arguments are used in the function
is to assign a value to each of the two scalars, $count
and $number, whereas everything
else is sent to the array. Look at the function getMovingAverage()
to see how two arrays are passed in order to get the moving average
on a list of values.
The way to call the getMovingAverage
function is shown in Listing 4.5.
Listing 4.5. Using the moving average function.
1 #!/usr/bin/perl -w
2
3 push(@Inc,'pwd');
4 use Finance;
5
6 @values = ( 12,22,23,24,21,23,24,23,23,21,29,27,26,28
);
7 @mv = (0);
8 $size = scalar(@values);
9 print "\n Values
to work with = { @values } \n";
10 print " Number of values = $size \n";
11
12 # ----------------------------------------------------------------
13 # Calculate the average of the above function
14 # ----------------------------------------------------------------
15 $ave = Finance::getLastAverage(5,$size,@values);
16 print "\n Average of last 5 days = $ave \n";
17
18 Finance::getMovingAve(5,$size,@values,@mv);
19 print "\n Moving Average with 5 days window = \n { @mv
} \n";
Here's the output from Listing 4.5:
Values to work with = { 12 22 23 24 21
23 24 23 23 21 29 27 26 28 }
Number of values = 14
Average of last 5 days = 26.2
Moving Average with 5 days window =
{ 0 0 0 0 0 19.4 21.8 22 22 21.4 23 23.8 24.2 25.2 }
The getMovingAverage() function
takes two scalars and then two references to arrays as scalars.
Within the function, the two scalars to the arrays are dereferenced
for use as numeric arrays. The returned set of values is inserted
in the area passed in as the second reference. Had the input parameters
not been specified with \@
for each referenced array, the $movingAve
array reference would have been empty and would have caused errors
at runtime. In other words, the following declaration is not correct:
sub getMovingAve($$@@)
The resulting spew of error messages from a bad function prototype
is as follows:
Use of uninitialized value at Finance.pm
line 128.
Use of uninitialized value at Finance.pm line 128.
Use of uninitialized value at Finance.pm line 128.
Use of uninitialized value at Finance.pm line 128.
Use of uninitialized value at Finance.pm line 128.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Values to work with = { 12 22 23 24 21 23 24 23 23 21 29 27 26
28 }
Number of values = 14
Average of last 5 days = 26.2
Moving Average with 5 days window =
{ 0 }
This is obviously not the correct output. Therefore, it's critical
that you pass by reference when sending more than one array.
Global variables for use within the package can also be declared.
Look at the following segment of code from the Finance.pm
module to see what the default value of the Interest
variable would be if nothing was specified in the input. (The
current module requires the interest to be passed in, but you
can change this.)
Here's a little snippet of code that can be added to the end of
the program shown in Listing 4.5 to add the ability to set interest
rates.
20 local $defaultInterest = 5.0;
21 sub Finance::SetInterest($) {
22
my $rate = shift(@_);
23
$rate *= -1 if ($rate < 0);
24
$defaultInterest = $rate;
25
printf "\n \$defaultInterest = $rate";
26 }
The local variable $defaultInterest
is declared in line 20. The subroutine SetInterest
to modify the rate is declared in lines 21 through 26. The $rate
variable uses the values passed into the subroutine and simply
assigns a positive value for it. You can always add more error
checking if necessary.
To access the defaultInterest
variable's value, you could define either a subroutine that returns
the value or refer to the value directly with a call to the following
in your application program:
$Finance::defaultInterest;
The variable holding the return value from the module function
is declared as my variable.
The scope of this variable is within the curly braces of the function
only. When the called subroutine returns, the reference to my
variable is returned. If the calling program uses
this returned reference somewhere, the link counter on the variable
is not zero; therefore, the storage area containing the returned
values is not freed to the memory pool. Thus, the function that
declares
my $pv
and then later returns the value of $pv
returns a reference to the value stored at that location. If the
calling routine performs a call like this one:
Finance::FVofAnnuity($monthly,$rate,$time);
there is no variable specified here into which Perl stores the
returned reference; therefore, any returned value (or a list of
values) is destroyed. Instead, the call with the returned value
assigned to a local variable, such as this one:
$fv = Finance::FVofAnnuity($monthly,$rate,$time);
maintains the variable with the value. Consider the example shown
in Listing 4.6, which manipulates values returned by functions.
Listing 4.6. Sample usage of the my
function.
1 #!/usr/bin/perl -w
2
3 push(@Inc,'pwd');
4 use Finance;
5
6 $monthly = 400;
7 $rate = 0.2; #
i.e. 6 % APR
8 $time = 36; # in months
9
10 print "\n# ------------------------------------------------";
11 $fv = Finance::FVofAnnuity($monthly,$rate,$time);
12 printf "\n For a monthly %8.2f at a rate of %%%6.2f for
%d periods",
13
$monthly, $rate, $time;
14 printf "\n you get a future value of %8.2f ", $fv;
15
16 $fv *= 1.1; # allow 10 % gain in the house value.
17
18 $mo = Finance::AnnuityOfFV($fv,$rate,$time);
19
20 printf "\n To get 10 percent more at the end, i.e. %8.2f",$fv;
21 printf "\n you need a monthly payment value of %8.2f",$mo,$fv;
22
23 print "\n# ------------------------------------------------
\n";
Here is sample input and output for this function:
$ testme
# ------------------------------------------------
For a monthly 400.00
at a rate of % 0.20 for 36 periods
you get a future value of 1415603.75
To get 10 percent more
at the end, i.e. 1557164.12
you need a monthly payment value of 440.00
# ------------------------------------------------
Modules implement classes in a Perl program that uses the object-oriented
features of Perl. Included in object-oriented features is the
concept of inheritance. (You'll learn more on the object-oriented
features of Perl in Chapter 5, "Object-Oriented
Programming in Perl.") Inheritance means the process with
which a module inherits the functions from its base classes. A
module that is nested within another module inherits its parent
modules' functions. So inheritance in Perl is accomplished with
the :: construct. Here's
the basic syntax:
SuperClass::NextSubClass:: ... ::ThisClass.
The file for these is stored in ./SuperClass/NextSubClass/
.
Each double colon indicates a lower-level directory in which to
look for the module. Each module, in turn, declares itself as
a package with statements like the following:
package SuperClass::NextSubClass;
package SuperClass::NextSubClass::EvenLower;
For example, say that you really want to create a Money
class with two subclasses, Stocks
and Finance. Here's how to
structure the hierarchy, assuming you are in the /usr/lib/perl5
directory:
- Create a Money directory
under the /usr/lib/perl5
directory.
- Copy the existing Finance.pm
file into the Money subdirectory.
- Create the new Stocks.pm
file in the Money subdirectory.
- Edit the Finance.pm file
to use the line package Money::Finance
instead of package Finance;.
- Edit scripts to use Money::Finance
as the subroutine prefix instead of Finance::.
- Create a Money.pm file
in the /usr/lib/perl5 directory.
The Perl script that gets the moving average for a series of numbers
is presented in Listing 4.7.
Listing 4.7. Using inheriting modules.
1 #!/usr/bin/perl -w
2 $aa = 'pwd';
3 $aa .=
"/Money";
4 push(@Inc,$aa);
5 use Money::Finance;
6 @values = ( 12,22,23,24,21,23,24,23,23,21,29,27,26,28
);
7 @mv =
(0);
8 $size = scalar(@values);
9 print
"\n Values to work with = { @values } \n";
10 print " Number of values = $size \n";
11 # ----------------------------------------------------------------
12 # Calculate the average of the above function
13 # ----------------------------------------------------------------
14 $ave = Money::Finance::getLastAverage(5,$size,@values);
15 print "\n Average of last 5 days = $ave \n";
16 Money::Finance::getMovingAve(5,$size,@values,@mv);
17 # foreach $i (@values) {
18
# print "\n Moving with 5 days window = $mv[$i] \n";
19 # }
20 print "\n Moving Average with 5 days window = \n { @mv
} \n";
Lines 2 through 4 add the path to the Money
subdirectory. The use statement
in line 5 now addresses the Finance.pm
file in the ./Money subdirectory.
The calls to the functions within Finance.pm
are now called with the prefix Money::Finance::
instead of Finance::. Therefore,
a new subdirectory is shown via the ::
symbol when Perl is searching for modules to load.
The Money.pm file is not
required. Even so, you should create a template for future use.
Actually, the file would be required to put any special requirements
for initialization that the entire hierarchy of modules uses.
The code for initialization is placed in the BEGIN()
function. The sample Money.pm
file is shown in Listing 4.8.
Listing 4.8. The superclass module for Finance.pm.
1 package Money;
2 require Exporter;
3
4
BEGIN {
5
printf "\n Hello! Zipping into existence for you\n";
6
}
7 1;
To see the line of output from the printf
statement in line 5, you have to insert the following commands
at the beginning of your Perl script:
use Money;
use Money::Finance;
To use the functions in the Stocks.pm
module, you use this line:
use Money::Stocks;
The Stocks.pm file appears
in the Money subdirectory
and is defined in the same format as the Finance.pm
file, with the exceptions that use Stocks
is used instead of use Finance
and the set of functions to export is different.
A number of modules are included in the Perl distribution. Check
the /usr/lib/perl5/lib directory
for a complete listing after you install Perl. There are two kinds
of modules you should know about and look for in your Perl 5 release,
Pragmatic and Standard modules.
Pragmatic modules, which are also like pragmas in C compiler directives,
tend to affect the compilation of your program. They are similar
in operation to the preprocessor elements of a C program. Pragmas
are locally scoped so that they can be turned off with the no
command. Thus, the command
no POSIX ;
turns off the POSIX features
in the script. These features can be turned back on with the use
statement.
Standard modules bundled with the Perl package include several
functioning packages of code for you to use. Refer to appendix
B, "Perl Module Archives," for a complete list of these
standard modules.
To find out all the .pm modules
installed on your system, issue the following command. (If you
get an error, add the /usr/lib/perl5
directory to your path.)
find /usr/lib/perl5 -name perl "*.pm"
-print
Extension modules are written in C (or a mixture of Perl and C)
and are dynamically loaded into Perl if and when you need them.
These types of modules for dynamic loading require support in
the kernel. Solaris lets you use these modules. For a Linux machine,
check the installation pages on how to upgrade to the ELF format
binaries for your Linux kernel.
The term CPAN (Comprehensive Perl Archive Network) refers to all
the hosts containing copies of sets of data, documents, and Perl
modules on the Net. To find out about the CPAN site nearest you,
search on the keyword CPAN in search engines such as Yahoo!,
AltaVista, or Magellan. A good place to start is the www.metronet.com
site.
This chapter introduced you to Perl 5 modules and described what
they have to offer. A more comprehensive list is found on the
Internet via the addresses shown in the Web sites http://www.metronet.com
and http://www.perl.com.
A Perl package is a set of Perl code that looks like a library
file. A Perl module is a package that is defined in a library
file of the same name. A module is designed to be reusable. You
can do some type checking with Perl function prototypes to see
whether parameters are being passed correctly. A module has to
export its functions with the @EXPORT
array and therefore requires the Exporter
module. Modules are searched for in the directories listed in
the @Inc array.
Obviously, there is a lot more to writing modules for Perl than
what is shown in this chapter. The simple examples in this chapter
show you how to get started with Perl modules. In the rest of
the book I cover the modules and their features, so hang in there.
I cover Perl objects, classes, and related concepts in Chapter 5.
Chapter 5
Object-Oriented Programming in
Perl
CONTENTS
This chapter covers the object-oriented programming (OOP) features
of Perl. You'll see how to construct objects in Perl as well as
how to use the OOP features offered by Perl. You'll also learn
about inheritance, overriding of methods, and data encapsulation.
A module is also referred to as a package. Objects
in Perl are based on references to data items within a package.
An object in Perl is simply a reference to something that
knows which class it belongs to. For more information, you can
consult the perlmod and perlobj
text files at http://www.metronet.com.
These files are the primary source of information on the Internet
about Perl modules.
When performing object-oriented programming with other languages,
you declare a class and then create (instantiate) objects
of that class. All objects of a particular class behave in a certain
way, which is governed by the methods of the class to which
the object belongs. You create classes by defining new ones or
by inheriting properties from an existing class.
For programmers already familiar with object-oriented principles,
this will all seem familiar. Perl is, and pretty much always has
been, an object-oriented language. In Perl 4, the use of packages
gave you different symbol tables from which to choose your symbol
names. In Perl 5, the syntax has changed a bit, and the use of
objects has been formalized somewhat.
The following three declarations are extremely important to understanding
how objects, classes, and methods work in Perl. Each is covered
in more detail in the rest of the chapter.
- A class is a Perl package.
The package for a class provides the methods for objects.
- A method is a Perl subroutine.
The only catch with writing methods is that the name of the class
is the first argument.
- An object in Perl is a reference
to some data item within the class.
This point is important enough to repeat: A Perl class is simply
a package. When you see a Perl document referring to a class,
think package. Also, both package and module
mean the same thing. For C programmers, it's easy to use ::
notation for classes and ->
for pointing to structure elements and class members.
One of the key features of OOP in any object-oriented language
is that of inheritance. This is where new classes can be created
by adding new features to existing classes. The inheritance feature
offered by Perl is not the same as you would expect in other object-oriented
languages. Perl classes inherit methods only. You have to use
your own mechanisms to implement data inheritance.
Because each class is a package, it has its own name space with
its own associative array of symbol names. Therefore, each class
can have its own independent set of symbol names. As with package
references, you can address the variables in a class with the
' operator. Therefore, members
of a class are addressed as $class'$member.
In Perl 5, you can use the double colon instead of '
to get the reference. Thus, $class'member
is the same as $class::$member.
This section covers the requisite steps to take when you create
a new class. This chapter covers the semantics in the creation
of a very simple class, called Invest,
for printing the required parts of a simple Java application source
code file. No, you will not become a Java expert, nor does this
package require you to have any experience in Java. The concept
of creating a class is what you're concerned with. For example,
this chapter could just as easily have been on creating a phone
book application, but how many such examples have you seen to
date in books? You'll use a different example this time.
First of all, you need to create a package file called Cocoa.pm.
The .pm extension is the
default extension for packages; it stands for Perl Module. A module
is a package, and a package is a class for all practical purposes.
Before you do anything else to the file, place a 1;
in the file. As you add more lines to the package file, make sure
you have 1; as the last line
of this file. The basic structure of the file is shown in Listing
5.1.
Listing 5.1. The package template.
1 package Cocoa;
2 #
3 # Put "require"
statements in for all required,imported packages
4 #
5
6 #
7 # Just add code here
8 #
9
10 1; # terminate the package with the required
1;
It's important that you remember to always keep the required 1;
line as the last of the package file. This statement is required
for all packages in Perl.
Now you're ready to add your methods to this package and make
this a class. The first method you would probably want to add
is the new() method, which
should be called to create a new object. The new()
method is the constructor for the object.
A constructor is a Perl subroutine in a class that returns
a reference to something that has the class name attached to it.
Connecting a class name with a reference is referred to as blessing
an object. The function to establish this connection is bless.
Here's the syntax for the bless
function:
bless YeReference [,classname]
YeReference is the reference
to the object being blessed. classname
is optional and specifies the name of the package from which this
object will have methods. If classname
is not specified, the name of the currently used package is used
instead. Thus, the way to create a constructor in Perl is to return
a reference to an internal structure that has been blessed into
this class. The initial Cocoa.pm
package is shown in Listing 5.2.
Listing 5.2. The first pass at the new()
function.
1 package Cocoa;
2
3 sub new {
4 my $this = {}; # Create an
anonymous hash, and #self points to it.
5 bless $this;
# Connect the hash to the package Cocoa.
6 return $this;
# Return the reference to the hash.
7 }
8
9 1;
The {} constructs a reference
to an empty hash. The returned value to this hash is assigned
to the local variable $this.
The bless() function takes
that reference to $this and
tells the object it references that it's now Cocoa
and then returns the reference.
The returned value to the calling function now refers to this
anonymous hash. On returning from the new()
function, the $this reference
is destroyed, but the calling function keeps a reference to this
hash. Therefore, the reference count to the hash will not be zero,
and Perl keeps the hash in memory. You do not have to keep the
hash in memory, but it's nice to have it around for reference
later.
To create an object, you make a call like this one:
$cup = new Cocoa;
The code is Listing 5.3 shows you how to use this package to create
the constructor.
Listing 5.3. Using the Cocoa
class.
1 #!/usr/bin/perl
2 push (@Inc,'pwd');
3 use Cocoa;
4 $cup = new Cocoa;
The first line refers to the location of the Perl interpreter
to use. Your Perl interpreter may be located at /usr/local/bin/perl
or wherever you installed it.
In line 2, the local directory is added to the search path in
@Inc for the list of paths
to use when the Perl interpreter is looking for a package. You
can create your module in a different directory and specify the
path explicitly there. Had I created the package in /home/khusain/test/scripts/,
line 2 would read as this:
push (@Inc,"/home/khusain/test/scripts");
In line 3 you include the package Cocoa.pm
to get all the functionality in your script. The use
statement asks Perl to look in the @Inc
path for a file called Cocoa.pm
and include it in the copy of the source file being parsed. The
use statement is required
if you want to work with a class.
Line 4 creates the Cocoa
object by calling the new
function on it. Now comes the beautiful (yet confusing and powerful)
part of Perl. There is more than one way to do this. You can rewrite
line 3 as this:
$cup = Cocoa->new();
Or if you are a C-programming hack, you can use the double colons
(::) to force the function
new() from the Cocoa
package. Thus, line 4 could also be written as this:
$cup = Cocoa::new();
There is nothing preventing you from adding more code in the constructor
than what is shown here. For this Cocoa.pm
module, if you would like to print a disclaimer when each object
is created, you can. For example, you can add statements like
these in the constructor for debugging purposes:
print "Hey! I am alive" if
($debuglevel > 1);
This way, you can set a variable $debuglevel
to a numeric value of 2
or greater in your program to display the debug message shown
every time a new Cocoa object
is created. Usually, you would like to initialize variables in
a constructor before any processing is done with the object. For
example, if the object you are constructing will be writing a
log to disk, you would like to open the file it's writing to in
the constructor. (Of course, you would also have to remember to
close the file when the object is destroyed by placing the close()
function call in the destructor.)
Here's what the constructor looks like for the Cocoa.pm
module:
Listing 5.4. Expanding the constructor.
1 sub new {
2 my $this = {};
3 print "\n /* \n ** Created by Cocoa.pm
\n ** Use at own risk";
4 print "\n ** Did this code even
get past the javac compiler? ";
5 print "\n **/ \n";
6 bless $this;
7 return $this;
8 }
The output from running the test script (called testme)
on this bare bones class would look like this:
/*
** Created by Cocoa.pm
** Use at own risk
** Did this code even get past the javac compiler?
**/
Now, regardless of which of these three methods you used to create
the Cocoa object, you should
see the same output.
Some comments have been added at the start of the file with some
print statements. You can
just as easily call other functions in or outside of the package
to get more initialization functionality. You should allow any
given class to be inherited, however. You should be able to call
the new operator with the
class name as the first parameter. This ability to parse the class
name from the first argument causes the class to be inherited.
Thus, the new function becomes more or less like the function
shown in Listing 5.5.
Listing 5.5. The improved new()
function with class name recognition.
1 sub new {
2 my $class = shift; #
Get the request class name
3 my $this = {};
4 bless $this, $class
# Use class name to bless() reference
5 $this->doInitialization();
6 return $this;
7 }
However, this method will force your class users to make calls
in one of three ways:
- Cocoa::new()
- Cocoa->new()
- new Cocoa;
What if you wanted to use a reference to the object instead, such
as $obj->new()? The doInitialization()
method used will be of whatever $class
the object is blessed into. Listing 5.6 uses the function call
ref() to determine if the
class exists per se. The ref()
function returns true if
the item passed to it is a reference and null
if not a reference. In the case of classes, the true
value returned from the ref()
function is the name of the class.
Listing 5.6. The new()
function with the capability to inherit classes.
1 sub new {
2 my $this = shift; #
Get the class name
3
my $class = ref($this) || $this;
 #
If class exists, use it else use reference.
4
my $this = {};
5
6
bless $this, $class
7 $this->doInitialization();
8
9 return $this;
10 }
Within the class package, the methods typically deal with the
reference as an ordinary reference. Outside the class package,
the reference is generally treated as an opaque value that may
only be accessed through the class's methods. You can access the
values within a package directly, but it's not a good idea to
do so because such access defeats the whole purpose of object
orientation.
It's possible to bless a reference object more than once. However,
the caveat to such a task is that the new class must get rid of
the object at the previously blessed reference. For C and Pascal
programmers, this is like assigning a pointer to malloc-ed
memory and then assigning the same pointer to another location
without freeing the previous location. In effect, a Perl object
must belong to one and only one class at a time.
So what's the real difference between an object and a reference?
Perl objects are blessed to belong to a class. References are
not blessed; if they are, they belong to a class and are objects.
Objects know to which class they belong. References do not have
a class, as such, to which they belong.
The arguments to a new()
function for a constructor are called instance variables.
Instance variables are used for initializing each instance of
an object as it is created. For example, the new()
function could expect a name for each new instance of an object
created. Using instance variables allows the customization of
each object as it is created.
Either an anonymous array or an anonymous hash can be used to
hold instance variables. To use a hash to store the parameters
coming in, you would use code similar to what is shown in Listing
5.7.
Listing 5.7. Using instance variables.
1 sub new {
2 my $type
= shift;
3 my %parm
= @_;
4 my $this
= {};
5 $this->{'Name'}
= $parm{'Name'};
6 $this->{'x'} =
$parm{'x'};
7 $this->{'y'} =
$parm{'y'};
8 bless $this,
$type;
9 }
You can also use an array instead of a hash to store the instance
variables. See Listing 5.8 for an example.
Listing 5.8. Using hashes for instance variables.
1 sub new {
2 my $type
= shift;
3 my %parm
= @_;
4 my $this
= [];
5 $this->[0]
= $parm{'Name'};
6 $this->[1]
= $parm{'x'};
7 $this->[2]
= $parm{'y'};
8 bless $this,
$type;
9 }
To construct an object, you can pass the parameters with the new()
function call. For example, the call to create the Cocoa
object becomes this:
$mug = Cocoa::new( 'Name' => 'top',
'x' => 10,
'y' => 20 );
The => operator is just
like the comma operator, although it's a bit more readable. You
can write this code with commas instead of the =>
operator if that's what you prefer.
To access the variables as you would any other data members, you
can use the following statements:
print "Name=$mug->{'Name'}\n";
print "x=$mug->{'x'}\n";
print "y=$mug->{'y'}\n";
A method in a Perl class is simply a Perl subroutine. Perl doesn't
provide any special syntax for method definition. A method expects
its first argument to be the object or package on which it is
being invoked. Perl has just two types of methods: static and
virtual.
A static method expects a class name as the first argument.
A virtual method expects a reference
to an object as the first argument. Therefore, the way each method
handles the first argument determines whether the method is static
or virtual.
A static method applies functionality to the class as a whole
because it uses the name of the class. Therefore, functionality
in static methods is applicable to all objects of the class. Generally,
static methods ignore the first argument because they already
know which class they are in. Therefore, constructors are static
methods.
A virtual method expects a reference to an object as its first
argument. Typically the first thing a virtual method does is to
shift the first argument to a self
or this variable; it then
uses that shifted value as an ordinary reference. For example,
consider the code in Listing 5.9.
Listing 5.9. Listing data items in a class.
1 sub nameLister {
2 my $this = shift;
3 my ($keys ,$value );
4 while (($key, $value) = each (%$this))
{
5 print "\t$key
is $value.\n";
6 }
7 }
Line 2 in this listing is where the $this
variable is set to point to the object. In line 4, the $this
array is dereferenced at every $key
location.
If you tried to invoke the Cocoa.pm
package right now, you would get an error message from Perl at
compile time about the methods not being found. This is because
the Cocoa.pm methods have
not been exported. To export these functions, you need the Exporter
module. This is done by adding the following lines to the start
of code in the package:
require Exporter;
@ISA = qw(Exporter);
These two lines force the inclusion of the Exporter.pm
module and then set the @ISA
array with the name of the Exporter
class for which to look.
To export your own class methods, you would have to list them
in the @EXPORT array. For
example, to export the closeMain
and declareMain methods,
you would use the following statement:
@EXPORT(declareMain, closeMain);
Inheritance in a Perl class is accomplished via the @ISA
array. The @ISA array does
not have to be defined in every package; however, when it is defined,
Perl treats it as a special array of directory names. This is
akin to the @Inc array where
directories are searched for files to include. In the case of
the @ISA array, the paths
define the classes (packages) and where to look for other class
packages, if a method is not found in the current package. Thus,
the @ISA array contains the
names of the base classes from which the current class inherits.
The search is done in the order in which the classes are listed
in the @ISA arrays.
All methods called by a class do have to belong to the same class
or to the base classes defined in the @ISA
array. If a method isn't found in @ISA
array, Perl looks for an AUTOLOAD()
routine. This routine is defined as sub
in the current package and is optional. To use the AUTOLOAD
function, you have to use the autoload.pm
package with the use Autoload;
statement. The AUTOLOAD function
tries to load the called function from the installed Perl libraries.
If the AUTOLOAD call also
fails, Perl makes one final try at the UNIVERSAL
class, which is the catch-all for all methods not defined elsewhere.
Perl will generate an error about unresolved functions if this
step also fails.
Here are some simple rules when exporting methods. First, export
only those functions that you have to. Do not export every function
in your module because you will be increasing the likelihood of
conflicts with a program that is using your module. Use the @EXPORT_OK
array instead of the @EXPORT
array if you feel that the names of your methods may clash with
those in an application. Choosing long, descriptive names for
functions may help eliminate problems with synonymous variable
names.
Second, if you are going to have multiple versions of your module,
consider setting a variable called $VERSION
in your module to a numeric string; for example, "2.11"
or something. This version number will be exported for you automatically
and can be used with the require
statement. Remember to use two digits for all integers in the
version numbers because "1.10"
is interpreted lower than "1.9"
but higher than "1.09".
You will see some modules or programs with a statement of the
following form:
require 5.001;
The statement above indicates that Perl version 5.001 or greater
is required. The same analogy can be used for your module with
a call to a function called require_version
of the following form:
$moduleName->require_version($value);
A returned value of true
will indicate that it's okay to proceed. A returned value of false
will indicate that the version number of the module is less than
what is specified in the $value.
There are two ways to invoke a method for an object: one via a
reference to an object (virtual) and the other via explicitly
referring to the class name (static). A method has to be exported
for you to be able to call it. Let's add a few more methods to
the Cocoa class to get the
file to look like the one shown in Listing 5.10.
Listing 5.10. Adding methods to the Cocoa
class.
1 package Cocoa;
2 require Exporter;
3
4 @ISA = qw(Exporter);
5 @EXPORT = qw(setImports,
declareMain, closeMain);
6
7 #
8 # This routine creates the references for imports in Java
functions
9 #
10 sub setImports{
11 my $class = shift @_;
12 my @names = @_;
13
14 foreach (@names) {
15 print "import "
. $_ . ";\n";
16 }
17 }
18
19 #
20 # This routine declares the main function in a Java script
21 #
22 sub declareMain{
23 my $class = shift @_;
24 my ( $name, $extends, $implements)
= @_;
25
26 print "\n public class
$name";
27 if ($extends) {
28
print " extends " . $extends;
29 }
30 if ($implements) {
31
print " implements " . $implements;
32 }
33 print " { \n";
34 }
35
36 #
37 # This routine declares the main function in a Java script
38 #
39 sub closeMain{
40 print "} \n";
41 }
42
43 #
44 # This subroutine creates the header for the file.
45 #
46 sub new {
47 my $this = {};
48 print "\n /* \n ** Created by
Cocoa.pm \n ** Use at own risk \n */ \n";
49 bless $this;
50 return $this;
51 }
52
53 1;
Now let's write a simple Perl script to use the methods for this
class. Because you can only start and close the header, let's
see how the code for a script to create a skeleton Java applet
source looks. (See Listing 5.11.)
Listing 5.11. Using the methods just added in Listing 5.10.
1 #!/usr/bin/perl
2
3 use Cocoa;
4
5 $cup = new Cocoa;
6
7 $cup->setImports( 'java.io.InputStream', 'java.net.*');
8 $cup->declareMain( "Msg" , "java.applet.Applet",
"Runnable");
9 $cup->closeMain();
What we are doing in this script is generating code for a Java
applet called Msg, which
extends the java.applet.Applet
applet and implements functions that can be run. The function
is called with a function $cup->...
call. Lines 7 through 9 could be rewritten as functions, like
this:
Cocoa::setImports($cup, 'java.io.InputStream',
'java.net.*');
Cocoa::declareMain($cup, "Msg" , "java.applet.Applet",
"Runnable");
Cocoa::closeMain($cup);
This type of equivalence was shown in a previous section, "Blessing
a Constructor." In both cases, the first parameter is the
reference to the object itself. Running this test script generates
the following output:
/*
** Created by Cocoa.pm
** Use at own risk
*/
import java.io.InputStream;
import java.net.*;
public class Msg extends java.applet.Applet implements Runnable
{
}
There are a couple of points to note when calling the methods.
If you have any arguments to a method, use parentheses if you
are using the method ->
(also known as indirect). The parentheses are required to include
all the arguments with this statement:
$cup->setImports( 'java.io.InputStream',
'java.net.*');
However, this statement:
Cocoa::setImports($cup, 'java.io.InputStream',
'java.net.*');
can also be rewritten without parentheses.
Cocoa::setImports $cup, 'java.io.InputStream',
'java.net.*' ;
The choice is really yours as to how you intend to make your code
readable for other programmers. Use parentheses if you feel that
the code will be more readable.
There are times when you'll want to specify which class' method
to use, such as when the same-named method is specified in two
different classes. For example, if the function grind
is defined in both Espresso
and Qava classes, you can
specify which class' function to use with the use of the ::
operator. These two calls:
$mess = Espresso::grind("whole","lotta","bags");
Espresso::grind($mess, "whole","lotta","bags");
use the call in Espresso,
whereas the following calls use the grind()
function in the Qava class:
$mess = Qava::grind("whole","lotta","bags");
Qava::grind($mess, "whole","lotta","bags");
Sometimes you want to call a method based on some action that
the program you are writing has already taken. In other words,
you want to use the Qava
method for a certain condition and the Espresso
method for another. In this case, you can use symbolic references
to make the call to the required function. This is illustrated
in the following example:
$method = $local ? "Qava::"
: "Espresso::";
$cup->{$method}grind(@args);
Perl tracks the number of links to objects. When the last reference
to an object goes away, the object is automatically destroyed.
This destruction of an object may occur after your code stops
and the script is about to exit. For global variables, the destruction
happens after the last line in your code executes.
If you want to capture control just before the object is freed,
you can define a DESTROY()
method in your class. Note the use of all capitals in the name.
The DESTROY() method is called
just before the object is released, allowing you to do any cleanup.
The DESTROY() function does
not call other DESTROY()
functions. Perl doesn't do nested destruction for you. If your
constructor reblessed a reference from one of your base classes,
your DESTROY() may need to
call DESTROY() for any base
classes. All object references that are contained in a given object
are freed and destroyed automatically when the current object
is freed.
Normally, you don't have to define a DESTROY
function. However, when you do need it, its form is as follows:
sub DESTROY {
#
# Add code here.
#
}
For most purposes, Perl uses a simple reference-based garbage
collection system. The number of references to any given object
at the time of garbage collection has to be greater than zero,
or else the memory for that object is freed. When your program
exits, an exhaustive search-and-destroy function in Perl does
the garbage collection. Everything in the process is summarily
deleted. In UNIX or UNIX-like systems, this may seem like a waste,
but it is actually quite necessary in embedded systems or in a
multithreaded environment.
Methods in classes are inherited with the use of the paths in
the @ISA array. Variables
have to be inherited and set up explicitly for inheritance. Let's
say you define a new class called Bean.pm
to include some of the functionality that another class, Coffee.pm,
will inherit.
The example in this section demonstrates how to inherit instance
variables from one class (also referred to as a superclass
or base class). The steps in inheritance require calling
the superclass's constructor and adding one's own instance variables
to the new object.
In this example, the Coffee
class is the class that inherits values from the base class Bean.
The two files are called Coffee.pm
and Bean.pm, respectively.
The code for Bean.pm is shown
in Listing 5.12.
Listing 5.12. The Bean.pm
file.
1 package Bean;
2 require Exporter;
3
4 @ISA = qw(Exporter);
5 @EXPORT = qw(setBeanType);
6
7 sub new {
8 my $type = shift;
9
my $this = {};
10 $this->{'Bean'} = 'Colombian';
11 bless $this, $type;
12 return $this;
13 }
14
15 #
16 # This subroutine sets the
17 sub setBeanType{
18 my ($class, $name) = @_;
19 $class->{'Bean'} = $name;
20 print "Set bean to $name \n";
21 }
22 1;
In this listing, the $this
variable sets a value in the anonymous hash for the 'Bean'
class to be 'Colombian'.
The setBeanType function
method is also declared so that the item referred to by the word
'Bean' is set for any class
that is sent in as an argument. Therefore, you can use this setBeanType
function in other classes to set the value of any member whose
name is 'Bean'.
The subroutine for resetting the value of 'Bean'
uses the $class reference
to get to the anonymous hash for the object. Remember that it
is a reference to this anonymous hash that created the reference
in the first place with the new()
function.
The values in the Bean class
are inherited by the Coffee
class. The Coffee.pm file
is shown in Listing 5.13.
Listing 5.13. Using inheritance.
1 #
2 # The Coffee.pm file to illustrate
inheritance.
3 #
4 package Coffee;
5 require
Exporter;
6 require Bean;
7 @ISA
= qw(Exporter, Bean);
8 @EXPORT = qw(setImports, declareMain,
closeMain);
9
#
10 # set item
11 #
12 sub setCoffeeType{
13 my ($class,$name)
= @_;
14 $class->{'Coffee'}
= $name;
15 print
"Set coffee type to $name \n";
16 }
17 #
18 # constructor
19 #
20 sub new {
21 my $type =
shift;
22 my $this =
Bean->new(); ##### <-- LOOK HERE!!!
####
23 $this->{'Coffee'}
= 'Instant'; # unless told otherwise
24 bless
$this, $type;
25 return
$this;
26 }
27 1;
Note the use of the require Bean;
statement at line 6. (See Chapter 4, "Introduction
to Perl Modules," in the section titled "Using Perl
Modules" for the reasons why the require
statement is used instead of the use
statement.) This line forces the inclusion of the Bean.pm
file and all its related functions without importing the functions
until compile time. Lines 12 through 16 define a subroutine to
use when resetting the value of the local variable in $class->{'Coffee'}.
Look at the new() constructor
for the Coffee class. The
$this reference points to
the anonymous hash returned by Bean.pm,
not a hash created locally. In other words, the following statement
creates an entirely different hash that has nothing to do with
the hash created in the Bean.pm
constructor.
my $this = {}; # This is not the way
to do it for inheritance.
my $this = $theSuperClass->new(); # this is the way.
Listing 5.14 illustrates how to call these functions.
Listing 5.14. Using inheritance.
1 #!/usr/bin/perl
2 push (@Inc,'pwd');
3 use Coffee;
4 $cup = new Coffee;
5 print "\n --------------------
Initial values ------------ \n";
6 print "Coffee: $cup->{'Coffee'} \n";
7 print "Bean:
$cup->{'Bean'} \n";
8 print "\n -------------------- Change Bean Type ----------
\n";
9 $cup->setBeanType('Mixed');
10 print "Bean Type is now $cup->{'Bean'} \n";
11 print "\n ------------------ Change Coffee Type ----------
\n";
12 $cup->setCoffeeType('Instant');
13 print "Type of coffee: $cup->{'Coffee'} \n";
The initial values for the 'Bean'
and 'Coffee' indexes in the
anonymous hash for the object are printed first. The member functions
are called to set the values to different names and are printed
out.
Here is the output of the script:
-------------------- Initial values ------------
Coffee: Instant
Bean: Colombian
-------------------- Change Bean Type ----------
Set bean to Mixed
Bean Type is now Mixed
------------------ Change Coffee Type ----------
Set coffee type to Instant
Type of coffee: Instant
Methods can have several types of arguments. It's how you process
the arguments that counts. For example, you can add the method
shown in Listing 5.15 to the Coffee.pm
module.
Listing 5.15. Variable-length lists of parameters.
1 sub makeCup {
2 my ($class, $cream, $sugar,
$dope) = @_;
3 print "\n==================================
\n";
4 print "Making a cup
\n";
5 print "Add cream \n"
if ($cream);
6 print "Add $sugar sugar
cubes\n" if ($sugar);
7 print "Making some really
addictive coffee ;-) \n" if ($dope);
8 print "==================================
\n";
9 }
This function takes three arguments, but it processes them only
if it sees them. To test this functionality, consider the Perl
code shown in Listing 5.16.
Listing 5.16. Testing variable length lists.
1 #!/usr/bin/perl
2 push (@Inc,'pwd');
3 use
Coffee;
4 $cup = new Coffee;
5 #
6 # With no parameters
7 #
8 print "\n Calling with
no parameters: \n";
9 $cup->makeCup;
10 #
11 # With one parameter
12 #
13 print "\n Calling with
one parameter: \n";
14 $cup->makeCup('1');
15 #
16 # With two parameters
17 #
18 print "\n Calling with
two parameters: \n";
19 $cup->makeCup(1,'2');
20 #
21 # With all three parameters
22 #
23 print "\n Calling with
three parameters: \n";
24 $cup->makeCup('1',3,'1');
Line 9 calls the function with no parameters. In Line 14, the
call is with one parameter. The parameters are passed either as
strings or integers-something this particular method does not
care about (see lines 19 and 24). However, some methods you write
in the future may require this distinction.
Here's the output from this program:
Calling with no parameters:
==================================
Making a cup
==================================
Calling with
one parameter:
==================================
Making a cup
Add cream
==================================
Calling with two parameters:
==================================
Making a cup
Add cream
Add 2 sugar cubes
==================================
Calling with three parameters:
==================================
Making a cup
Add cream
Add 3 sugar cubes
Making some really addictive coffee ;-)
==================================
In any event, you can have default values in the function to set
if the expected parameter is not passed in. Thus, the behavior
of the method can be different depending on the number of arguments
you pass into it.
Inheriting functionality from another class is beneficial in that
you can get all the exported functionality of the base class in
your new class. To see an example of how this works, let's add
a function called printType
in the Bean.pm class. Here's
the subroutine:
sub printType {
my $class = shift @_;
print "The type of Bean is $class->{'Bean'}
\n";
}
Do not forget to update the @EXPORT
array by adding the name of the function to export. The new statement
should look like this:
@EXPORT = qw(setBeanType, printType,
printType);
Next, call the printType
function. The following three lines show three ways to call this
function:
$cup->Coffee::printType();
$cup->printType();
$cup->Bean::printType();
The output from all three lines is the same:
The type of Bean is Mixed
The type of Bean is Mixed
The type of Bean is Mixed
Why is this so? Because there is no printType()
function in the inheriting class, the printType()
function in the base class is used instead. Naturally, if you
want your own class to have its own printType
function, you would define its own printType
function.
In the Coffee.pm file, you
would add the following lines to the end of the file:
#
# This routine prints the type of $class->{'Coffee'}
#
sub printType {
my $class = shift @_;
print "The type of Coffee is $class->{'Coffee'}
\n";
}
@EXPORT would also have to
be modified to work with this function:
@EXPORT = qw(setImports, declareMain,
closeMain, printType);
The output from the three lines now looks like this:
The type of Coffee is Instant
The type of Coffee is Instant
The type of Bean is Mixed
Now the base class function is called only when the Bean::
override is given. In the other cases, only the inherited class
function is called.
What if you do not know what the base class name is or even where
the name is defined. In this case, you can use the SUPER::
pseudoclass reserved word. Using the SUPER::
override allows you to call an overridden superclass method without
actually knowing where that method is defined. The SUPER::
construct is meaningful only within the class.
If you're trying to control where the method search begins and
you're executing in the class itself, you can use the SUPER::
pseudoclass, which says to start looking in your base class's
@ISA list without having
to explicitly name it.
$this->SUPER::function( ... argument
list ... );
Therefore, instead of Bean::,
you can use SUPER::. The
call to the function printType()
becomes this:
$cup->SUPER::printType();
Here's the output:
The type of Bean is Mixed
One advertised strength of object-oriented languages is the ease
with which new code can use old code. Packages and modules in
Perl provide a great deal of data encapsulation. You are never
really guaranteeing that a class inheriting your code will not
attempt to access your class variables directly. They can if they
really want to. However, this type of procedure is considered
bad practice, and shame on you if you do it.
When writing a package, you should ensure that everything a method
needs is available via the object or is passed as a parameter
to the method. From within the package, access any global variables
only through references passed in via methods.
For static or global data to be used by the methods, you have
to define the context of the data in the base class using the
local() construct. The subclass
will then call the base class to get the data for it. On occasion,
a subclass may want to override that data and replace it with
new data. When this happens, the superclass may not know how to
find the new copy of the data. In such cases, it's best to define
a reference to the data and then have all base classes and subclasses
modify the variable via that reference.
Finally, you'll see references to objects and classes like this:
use Coffee::Bean;
This code is interpreted to mean "look for Bean.pm
in the Coffee subdirectory
in all the directories in the @Inc
array." So, if you were to move Bean.pm
into the ./Coffee directory,
all the previous examples would work with the new use
statement. The advantage to this approach is that you have one
file for the parent class in one directory and the files for each
base class in their own sub-directories. It helps keep code organized.
Therefore, to have a statement like this:
use Another::Sub::Menu;
you would see a directory subtree like this:
./Another/Sub/Menu.pm
Let's look at an example of a simple portfolio manager class called
Invest.pm. There are two
subclasses derived from it that manage the type of funds. The
three files are shown in Listings 5.17, 5.18, and 5.19. The test
code to use these modules is shown in Listing 5.20. The Invest.pm
file is placed in the current directory, and the Stock.pm
and Fund.pm files are placed
in the Invest subdirectory.
Listing 5.17. The ./Invest.pm
file.
1 package Invest;
2
3 require Exporter;
4 @ISA = (Exporter);
5
6 =head1 NAME
7
8 Letter - Sample module to simulate Bond behaviour
9
10 =head1 SYNOPSIS
11
12 use Invest;
13 use Invest::Fund;
14 use Invest::Stock;
15
16 $port = new Invest::new();
17
18 $i1 = Invest::Fund('symbol'
=> 'twcux');
19 $i2 = Invest::Stock('symbol'
=> 'INTC');
20 $i3 = Invest::Stock('symbol'
=> 'MSFT');
21
22 $port->Invest::AddItem($i1);
23 $port->Invest::AddItem($i2);
24 $port->Invest::AddItem($i3);
25
26 $port->ShowPortfolio();
27
28 =head1 DESCRIPTION
29
30 This module provides a short example of generating a letter
for a
31 friendly neighborbood loan shark.
32
33 The code begins after the "cut" statement.
34 =cut
35
36 @EXPORT = qw( new, AddItem, ShowPortfolio, PrintMe);
37
38 @portfolio = ();
39 $portIndex = 0;
40
41 sub Invest::new {
42 my $this
= shift;
43 my $class
= ref($this) || $this;
44 my $self
= {};
45 bless
$self, $class;
46 $portIndex = 0;
47
48 printf "\n Start portfolio";
49 return
$self;
50 }
51
52 sub Invest::AddItem {
53 my ($type,$stock) = @_;
54 $portfolio[$portIndex] =
$stock;
55 $portIndex++;
56 }
57
58 sub Invest::ShowPortfolio {
59 my $i;
60 printf "\n Our Portfolio
is:";
61 foreach $i (@portfolio)
{
62
print "\n ". $i->{'shares'} . " shares
of " . $i->{'symbol'};
63 }
64 print "\n";
65 }
66
67 sub PrintMe {
68 my $this = shift;
69 print "\n Class : $$this";
70 }
71 1;
Listing 5.18. The ./Invest/Stock.pm
file.
1 package Invest::Stock;
2
3 require Exporter;
4 @ISA = (Exporter);
5 @EXPORT = qw( new
);
6
7 sub new {
8 my
$this = shift;
9 my
$class = ref($this) || $this;
10 my $self
= {};
11 my %parm = @_;
12
13 bless
$self, $class;
14 $self->{'symbol'} = $parm{'symbol'};
15 $self->{'shares'} = $parm{'shares'};
16 printf "\n New stock
$parm{'symbol'} added";
17 return
$self;
18 }
19
20 1;
Listing 5.19. The ./Invest/Fund.pm
file.
1 package Invest::Fund;
2
3 require Exporter;
4 @ISA = (Exporter,Invest);
5
6 @EXPORT = qw( new );
7
8 sub new {
9 my
$this = shift;
10 my $class
= ref($this) || $this;
11 # my $self
= {};
12 my $self
= Invest::new();
13 my %parm = @_;
14
15 bless
$self, $class;
16 $self->{'symbol'} = $parm{'symbol'};
17 $self->{'shares'} = $parm{'shares'};
18 printf "\n new mutual
fund $parm{'symbol'} added";
19 return
$self;
20 }
21
22 1;
Listing 5.20. Using the Invest,
Fund, and Stock
files.
1 #!/usr/bin/perl
2
3 push(@Inc,'pwd');
4
5 use Invest;
6 use Invest::Fund;
7 use Invest::Stock;
8
9 $port = new Invest;
10
11 $i1 = new Invest::Fund('symbol' => 'TWCUX', 'shares' =>
'100');
12 $i2 = new Invest::Fund('symbol' => 'FIXLL', 'shares' =>
'200');
13
14 $i3 = new Invest::Stock('symbol' => 'INTC', 'shares' =>
'400');
15 $i4 = new Invest::Stock('symbol' => 'MSFT', 'shares' =>
'200');
16
17 print "\n";
18 $port->Invest::AddItem($i1);
19 $port->Invest::AddItem($i2);
20 $port->Invest::AddItem($i3);
21 $port->Invest::AddItem($i4);
22 print "\n";
23
24 $port->ShowPortfolio();
This chapter has provided a brief introduction to object-oriented
programming in Perl. Perl provides the OOP features of data encapsulation
and inheritance using modules and packages. A class in Perl is
simply a package. This class package for a class provides all
the methods for objects created for the class.
An object is simply a reference to data that knows to which class
it belongs. A method in a class is simply a subroutine. The only
catch with writing such methods is that the name of the class
is always the first argument in the method.
The bless() function is used
to tie a reference to a class name. The bless()
function is called in the constructor function new()
to create an object and then connect the reference to the object
with the name of the class.
In inheritance, the base class is the class from which methods
(and data) are inherited. The base class is also called the superclass.
The class that inherits these items from the superclass is called
the subclass. Multiple inheritance is allowed in Perl. Data inheritance
is the programmers' responsibility with the use of references.
The subclass is allowed to know things about its immediate superclass,
and the superclass is allowed to know nothing about a subclass.
Subclasses exist as .pm files
in subdirectories under the superclass directory name.
Chapter 6
Binding Variables to Objects
CONTENTS
This chapter is dedicated to describing how the tie()
function works in Perl. The tie()
function enables you to create variables that are tied to specific
methods called when a variable is written to or read from. Using
the tie() function, you can
eliminate the need for calling methods designed specifically for
certain types of variables.
The tie() function is used
to bind a variable to an object class. Here's the syntax:
tie ($variable,$classname,@list);
The tie() function binds
a variable to a class that then provides the methods for the variable.
The $variable has
to be set to the name of the variable to be tied. The $classname
is the name of the class implementing objects to which you want
the variable tied. The @list
is the list of arguments that may be required for this variable's
class methods.
The object returned by the tie()
function is just like the new()
function for an object. You can use the returned value from the
tie() function to access
other methods in the class you just tied the object to.
Once you tie a variable to a class, the class's behavior is reflected
in the way you access the variable. Each of the methods for the
type of object is called when the type of action for the variable
is seen by Perl. Three types of objects can be tied to classes:
scalars, arrays, and associative arrays.
It's often necessary to disassociate an object from a class. This
is done with the use of the untie()
function. Simply call the untie($object)
function and you're done. The untie()
function works whether you are tied to a scalar, array, or hash
object. The next three sections illustrate how to use tie()
on each of these objects.
A class implementing a scalar object that can be tied to must
implement these four methods:
- TIESCALAR classname, LIST
- FETch this
- STORE this, value
- DESTROY this
Think of these methods as events for the class. When a variable
is first tied to a class, the TIESCALAR
method is called. Every time the tied variable is read from, the
value from the FETch method
is returned. When the tied variable is assigned to, the STORE
method is called. Finally, when the tied variable loses scope,
the DESTROY method is called.
Listing 6.1 contains a script that uses the tie()
function for the Knot class.
Listing 6.1. Using the tie()
function.
1 #!/usr/bin/perl
2
3 push(@Inc,".");
4 use Knot;
5
6 #
7 # From now on the
variable $currentTime
8 # will behave as defined in functions in Knot.pm
9 #
10 tie $currentTime, 'Knot';
11
12 #
13 # Test the FETch method
14 #
15 $x = $currentTime;
16 print " x= $x\n";
17 print " current = $currentTime\n";
18
19 #
20 # Test the STORE method
21 #
22 # In Knot.pm we have defined the $currentTime
23 # variable to behave as a readonly object.
24 # The following message will bail out with an error message.
25 #
26 $currentTime = $x;
27
28
29 #
30 # As soon as we drop out of the script here, the DESTROY
31 # method will be called on $currentTime.
32 #
Let's examine the code in Listing 6.1 to see where the methods
for Knot.pm are invoked.
In line 3, the address of the current directory is added to search
for included modules. In line 4, the Knot.pm
module is requested with the use
statement. The Knot.pm file
contains the module code for the class that allows variables to
be tied.
In line 10, the variable $currentTime
is tied to the class Knot.
The TIESCALAR function in
Knot.pm is called at this
point. There are no additional arguments to be passed to the TIESCALAR
function, so only two parameters, the variable number and the
class name, are sent.
In line 15, the $currentTime
variable is read from and the value of $currentTime
is assigned to $x. Instead
of treating $currentTime
as a normal variable, Perl uses the FETch
method of the tied class. The FETch
method returns the current date in this example. You can write
your own function. In line 17, the $currentTime
variable is accessed again. This time, the FETch
method is called again.
The program must not attempt to assign a value to the $currentTime
variable. See line 26 in Listing 6.1. The Knot.pm
module is implemented to allow only read-only variables; therefore,
the FETch function will print
an error message when the code at line 26 is executed.
Finally, the DESTROY method
is called when the $currentTime
variable is destroyed. The destruction is done automatically by
Perl when the $currentTime
variable goes out of scope. In this example, the DESTROY
method simply prints an error message.
Here is the output from Listing 6.1.
x= Sat Jun 1 12:54:25
CDT 1996
current = Sat Jun 1
12:54:25 CDT 1996
Hey dude! We are making this readonly!
at Knot.pm line 54
Knot::STORE
called at ./6_1.pl line 26
Knot:: unknotted!
Now let's look at the Knot.pm
file in Listing 6.2.
Listing 6.2. The Knot.pm
file.
1 #!/usr/bin/perl
2
3 # ----------------------------------------------------------------
4 # Sample file that shows how to tie variables to classes.
5 #
6 # This library is hereby placed in the public domain.
Copy freely
7 # as long as you
give me credit for it!
8 # Kamran Husain. khusain@ikra.com
9 # ----------------------------------------------------------------
10 package Knot;
11 use Carp;
12 use strict;
13
14 #
15 # TIESCALAR classname, argument-list-here
16 # This is the constructor for the class. It
returns a reference
17 # to a new object for the class name.
18 #
19 sub TIESCALAR {
20 my $class
= shift;
21 my $this
= {};
22 #
23 # print "\n $class, $this";
24 #
25 return
bless \$this, $class;
26 }
27
28 #
29 # FETch this
30 # The FETch method will be triggered every time the tied variable
31 # is accessed.
32 # The only argument to this function is the object itself.
33 # In this case, we just return the date.
34 #
35 sub FETch {
36
my $self = shift;
37
confess "wrong type" unless ref $self;
38
croak "usage error" if @_;
39
my $override;
40
$override = 'date';
41
return $override;
42
}
43
44 #
45 # STORE this, value
46 # This method will be triggered every time the tied variable
is
47 # written to. It expects only two arguments: a reference to
itself
48 # and a value that is being assigned.
49 #
50 sub STORE {
51 my
$self = shift;
52 confess
"wrong type" unless ref $self;
53 my
$value = shift;
54 confess
"Hey dude! We are making this readonly!\n";
55 return
$value;
56 }
57
58 #
59 # DESTROY this
60 # This method will be triggered when the tied variable needs
to be
61 # destructed. This method can be just empty for most classes
since
62 # Perl's garbage collection will.
63 #
64
65 sub DESTROY {
66 my
$self = shift;
67 confess
"wrong type" unless ref $self;
68 print
"\nKnot:: unknotted!\n";
69 }
70
71 #
72 # The obligatory ending true statement.
73 #
74 1;
Knot.pm defines the package
in line 10 and imports the Carp
and strict packages in lines
11 and 12, respectively. Line 74 terminates the module.
Lines 19 through 26 define the TIESCALAR
function. The TIESCALAR function
behaves a lot like the constructor of an object in Perl. It creates
an associative array and returns a reference to this array after
a call to the bless() function.
(See Chapter 5, "Object-Oriented Programming
in Perl," for more information on bless-ing
objects.)
The FETch method starts at
line 35. The FETch method
is called every time the tied variable is read from. The only
argument to the FETch method
is a reference to the object itself. At line 37, the class type
is confirmed, although it's not absolutely necessary to do this.
Lines 39 through 41 return the current date and time for the value
of the tied variable.
The STORE method is defined
from line 50 through 56. In this case, we do not let values of
the arguments that are passed in be assigned to anything because
we want to make this value a read-only variable. You can easily
modify this function to take some other action than what's shown
in this example. The FETch
method takes only two arguments: a reference to itself and a value
that is being assigned. The confess()
call is from within the Carp
module.
The DESTROY method is called
when the tied variable is destroyed. Normally, this function is
empty. For this example, there is a print
statement to show where the DESTROY
function is called.
An array variable can be tied to a class in the same manner as
a scalar can be tied to a class. The real difference is that the
input parameters to the methods now need an index used to address
a value in array. A class implementing an ordinary array must
have these methods:
- TIEARRAY classname, LIST
- FETch this, key
- STORE this, key, value
- DESTROY this
The FETch, DESTROY,
and STORE methods have the
same names as those for scalars. However, the name of the constructor
is different-it's called TIEARRAY.
Let's define a new array type called Cuboid,
which has its first five indexes provide special functions. The
first three indexes are written to as the height, width, and depth
of a cuboid. The next two indexes contain the volume and surface
area of the cuboid and are made read-only. The rest of the array
can be made into a bounded array to allow a user to store his
or her own values. As soon as a value is stored in the Cuboid
array, the values of items at index 3
and 4 are recalculated to
provide the latest volume and surface area of a cuboid.
Listing 6.3 illustrates how to use this array.
Listing 6.3. Using the Cuboid.pm
module.
1 #!/usr/bin/perl
2
3 push(@Inc,".");
4 use Cuboid;
5
6 tie @myCube, 'Cuboid', 3;
7
8 $myCube[0] = 2;
9 $myCube[1] = 3;
10 $myCube[2] = 4;
11
12 for ($i=0; $i < 5; $i++) {
13 print " myCube[$i] = $myCube[$i]
\n";
14 }
15
Here is the output of this code.
array will be 8 elements long
[STORE 2 at 0]
[STORE 3 at 1]
[STORE 4 at 2]
myCube[0] = 2
myCube[1] = 3
myCube[2] = 4
myCube[3] = 24
myCube[4] = 52
Now let's examine the Cuboid.pm
module, which is presented in Listing 6.4.
Listing 6.4. The Cuboid.pm
module.
1 # ------------------------------------------------------------
2 package Cuboid;
3 use Carp;
4 use strict;
5
6 #
7 # The constructor
for this class.
8 # ------------------------------------------------------------
9 # Array[0] = ht;
read write
10 # Array[1] = wd;
read write
11 # Array[2] = dp;
read write
12 # Array[3] = volume; read
only
13 # Array[4] = surfaceArea; read only
14 # Array[5...maxsize] = read/write values for the user;
15 # ------------------------------------------------------------
16
17 my $SACRED = 5;
18
19 sub TIEARRAY {
20
21 my $class = shift;
22 my $maxsize = shift;
23
24 #
25 # Bailout if the array is not tied
correctly.
26 #
27 confess
"usage: tie(\@ary, 'Cuboid', maxsize)"
28 if
@_ || $maxsize =~ /\D/;
29 $maxsize
+= $SACRED;
30 print
"array will be $maxsize elements long\n";
31 return
bless {
32 MAXSIZE
=> $maxsize,
33 ARRAY
=> [0,0,0,0,0],
34 },
$class;
35 }
36
37 # FETch this, index
38 # This method will be triggered every time an individual element
the tied
39 # array is accessed (read). It takes one argument beyond its
self
40 # reference: the index whose value we're trying to fetch.
41 #
42 sub FETch {
43 my($self,$ndx)
= @_;
44 if
($ndx > $self->{MAXSIZE}) {
45 confess
"Error Out of Bounds: $ndx > $self->{MAXSIZE}";
46 }
47 return
$self->{ARRAY}[$ndx];
48 }
49
50 # STORE this, index, value
51 # This method will be called whenever an element in the tied
array
52 # is written to. It takes three arguments: a reference to itself,
53 # the index to store stuff at, and the value to store at the
index.
54 #
55 # The items at [3] and [4] are not allowed to be written to.
56 #
57 sub STORE {
58 my($self,
$ndx, $value) = @_;
59 print
"[STORE $value at $ndx]\n";
60 if
($ndx > $self->{MAXSIZE} ) {
61 confess
"Error Out Of Bounds: $ndx > $self->{MAXSIZE}";
62 }
63 if (($ndx ==
3) || ( $ndx == 4)) {
64 confess
"Cannot store in read only area: $ndx";
65 }
66 $self->{ARRAY}[$ndx]
= $value;
67
$self->{ARRAY}[3] =
68 ($self->{ARRAY}[0]
* $self->{ARRAY}[1] * $self->{ARRAY}[2]) ;
69
$self->{ARRAY}[4] =
70 ($self->{ARRAY}[0]
* $self->{ARRAY}[1]) +
71 ($self->{ARRAY}[1]
* $self->{ARRAY}[2]) +
72 ($self->{ARRAY}[0]
* $self->{ARRAY}[2]);
73
74
$self->{ARRAY}[4] *= 2;
75
76 return
$self->{ARRAY}[$ndx] ;
77 }
78
79 # DESTROY
80
81 sub DESTROY { }
82
83 1;
The Cuboid package is started
at line 2 and ends at line 83 with a required statement. The package
uses Carp and strict
packages at lines 3 and 4, respectively. Lines 9 through 14 describe
the structure of this object. The size of the sacrosanct indexes
is set at line 17.
The TIEARRAY constructor
starts at line 19. Note how the constructor takes two parameters:
one for the class and one for the maximum size the array can take.
Line 27 contains some error-checking routines using the confess()
function in the Carp module.
The maximum size is adjusted for the sacrosanct indexes, and an
appropriate message is printed out at line 30.
A reference to the newly created object is returned in lines 31
through 34. Note how the array is initialized and the member MAXSIZE
set at line 33.
The FETch method for the
array behaves in the same manner as for a scalar. The FETch
method is called every time an individual element in the tied
array is read from. The FETch
method takes two arguments: a reference to itself and the index
whose value is being fetched. Look at line 43 where these two
values are assigned to $self
and $ndx. Bounds are checked
at line 44, and we bail out in case of an error at line 45. If
the bounds are correct, the value is returned at the request index
via code at line 47.
The STORE method starts at
line 50 and takes three arguments: a reference to itself, the
index at which to store, and the value to store. In the STORE
method, the extracted values are printed at lines 58 and 59. Bounds
checking is done at line 60 with a bailout at line 61 in case
of errors. Lines 63 through 65 do not permit storing values at
index 3 or 4.
At line 66 the input value is assigned. At this point, you could
make the code faster by checking whether only indexes 0
to 1 are modified, but you'll
need to do that on your own. The area and volume results are stored
at index 3 and 4.
The assigned value is returned in line 76.
The DESTROY method is just
a dummy function that doesn't do much. You really don't need to
have a DESTROY method. The
one shown here is for illustration only.
Caution |
Only the read and write operations of an array are affected by the tie() operation. The functions $#,
push(), pop(), and so on of an array are not affected by the tie() function.
|
An associative array is more complicated to implement than either
a scalar or an array because of the extra functions that have
to be added to it. A class that can be tied to an associative
array should have the following methods:
- TIEHASH classname, LIST
- FETch this, key
- STORE this, key, value
- DELETE this, key
- EXISTS this, key
- FIRSTKEY this
- NEXTKEY this, lastkey
- DESTROY this
The next example is a simpler version of the one presented by
Tom Christiansen in his perltie.html
file, which is available on www.perl.com.
The example presented by Tom is a bit dangerous to use because
you can wipe out your .profile,
.history, .elm,
.term, and other "dot"
files. I wrote this example to map the text versions of the perltie.html
file into a hashed array to generate a table of contents for this
book based on the first line of each heading. The example presented
here simply lists the first line of each heading. There is no
reason why you cannot print other information in the file, however.
Listing 6.5 shows the script that uses the tie()
function on a hash. The module used for mirroring the contents
of each chapter file is called Mirror.pm.
It's used in line 4 of this code.
Listing 6.5. Using the tie()
function on a hash.
1 #!/usr/bin/perl
2
3 push(@Inc,".");
4 use Mirror;
5
6 tie %chapters, 'Mirror', chaps;
7
8 foreach $k (keys %chapters) {
9
print "$k is $chapters{$k}";
10 }
11 print " End of script\n"
Here is sample input and output from the code in Listing 6.6.
$ test3
ch25.txt is (a) Chapter 25
ch08.txt is (a) Chapter 8
ch28.txt is (a) Chapte 28
ch02.txt is (a) Chapter 2
ch29.txt is (a) Chapter 29
ch12.txt is (a) Chapter 12
ch15.txt is (a) Chapter 15
ch06.txt is (a) Chapter 6
All done!
End of script
$
What you see above is the first line of every file listed in the
hash instead of the filename! Had we not tied our own functions
to the %chapters hash, we
would be seeing the names of the file, such as ch29.txt,
ch12.txt, and so forth. Instead
of these names, when each element in the %chapters
hash is accessed, our own function is called and prints out the
first line in every file.
Of course, in your code, you would probably be using different
functions to implement your own strategy and functions with a
hash. For example, each access to a hash is tied to a record in
a file. The function behind the access to each element in the
hash would somehow take the record and format it to suit your
needs. The example shown here in Listing 6.6 is simple enough
to modify to fit into your own particular application.
Let's now look at how the code for Mirror.pm
is written. By examining this code, you will be able to see how
each function is defined for each type of action that you can
tie a function to.
Listing 6.6. The Mirror.pm
file.
1 # This file is hereby put
in the public domain. Copy freely.
2 # Just give me some credit for it ;-) if you like.
Kamran.
3 package Mirror;
4 use Carp;
5
6 # TIEHASH classname, $list
7 # This
is the constructor for the class. That means it is expected to
8 # return a blessed reference of a new
object.
9 #
10 sub TIEHASH {
11 my
$self = shift;
12 my
$dir = shift || 'pwd';
13 my
$node = {
14 HOME =>
$dir,
15 LIST =>
{},
16
17 };
18 #
19 #
print " Directory = $dir\n";
20 #
21 opendir(DIR,
$dir) || croak "Cannot open $dir: $!";
22 foreach
$item ( grep /\.*txt/, readdir(DIR)) {
23 $node->{LIST}{$item}
= 'head -1 $dir/$item';
24 print
"$node->{LIST}{$item} \n" if $debug;
25 }
26 closedir
DIR;
27 return
bless $node, $self;
28 }
29
30
31 # FETch this, key
32 # This method is called whenever an element in the tied
hash is
33 # being read. It
takes two arguments: a reference to itself and the
34 # key whose value is being asked.
35
36 sub FETch {
37 my
$self = shift;
38 my
$fname = shift;
39 my
$dir = $self->{HOME};
40 my
$file = "$dir/$fname";
41 unless
(exists $self->{LIST}->{$fname} || -f $file) {
42 carp
"ERROR: no such file $fname ";
43 return
undef;
44 }
45 if
(defined $self->{LIST}->{$fname}) {
46 return
$self->{LIST}->{$fname};
47 }
else {
48 return
$self->{LIST}->{$fname} = 'head -1 $file';
49 }
50
51 }
52
53
54 # STORE this, key, value
55 #
This method is called whenever an element in the hash is
56 # written to. It takes
three arguments: a reference to itself, the
57 #
index to store at, and the value to store.
58
59 sub STORE {
60 my
$self = shift; # this
61 my
$fname = shift; # the key
62 my
$value = shift; # the value
63 my
$dir = $self->{HOME};
64 my
$file = "$dir/$fname";
65 print
"Storing $fname, $value $file \n";
66 if
($value eq "done") {
67
print "Storing $fname, $value $file \n";
68 return
$self->{LIST}->{$fname} = 'head -1 $file';
69
}
70 else {
71 return
$self->{LIST}->{$fname} = $value;
72
}
73 }
74
75 # DELETE this, key
76 #
77 # This
method is called when an item is deleted from the hash.
78 #
79 sub DELETE {
80
81 my
$self = shift;
82 my
$fname = shift;
83 my
$dir = $self->{HOME};
84 my
$file = "$dir/$fname";
85 delete
$self->{LIST}->{$fname};
86 }
87
88 # CLEAR this
89 #
This method is called when the whole hash is set to an empty list.
90 #
91 sub CLEAR {
92 my
$self = shift;
93 my
$fname;
94 foreach
$fname ( keys %{$self->{LIST}}) {
95 $self->DELETE($fname);
96 }
97 }
98
99 #
100 # EXISTS this, key
101 # This method is called when the exists()
is called on a hash.
102 #
103 sub EXISTS {
104 my
$self = shift;
105 my
$dir = $self->{HOME};
106 my
$fname = shift;
107 my
$file = "$dir/$fname";
108 return
exists $self->{LIST}->{$file};
109 }
110
111 # FIRSTKEY this
112 # This method is called when you start
to iterate a list.
113 #
114 sub FIRSTKEY {
115 my
$self = shift;
116 my
$x = keys %{$self->{LIST}};
117 each
%{$self->{LIST}}
118 }
119
120
121 #
122 # NEXTKEY this, lastkey
123 # This method is called
during a keys() or each() iteration. The
124 # first argument is the object itself. The second argument
is the last
125 # key that was accessed.
126
127 sub NEXTKEY {
128 my
$self = shift;
129 return
each %{ $self->{LIST} }
130 }
131
132
133 #
134 # DESTROY the infamous epitaph!
135 #
136 sub DESTROY { print "\n All done!"; }
137
138 1;
The TIEHASH function definition
begins at line 10. The constructor takes two values as arguments:
the first is the name of the class, and the second is an optional
directory to work in. If the second parameter is not specified,
the current working directory is used. The $node
hash is used to store two parameters: HOME
for the working directory and LIST
for the list of items in this hash.
At line 21, the required information is collected for the hash
and is stored away in LIST
at line 23. The debug statement
at line 24 is a very valuable tool while debugging. At line 27,
the class is blessed and the reference to the $node
is returned.
The FETch method is called
whenever an element in the tied hash is being read. It takes two
arguments: a reference to itself and the key whose value is being
asked for. See lines 37 and 38, where the two parameters are extracted.
It would be prudent to add some error correction here lest we
look at the wrong path-this is done at line 41. At line 45 the
returned value as cached in the init
stages is returned. Had a new item to the hash been added with
an undefined value, the code at line 45 would assign it a value.
The assignment-handling function is the STORE
function. This function is called whenever an element in the hash
is written to. It takes three arguments: a reference to itself,
the index at which to store, and the value to store. The three
arguments are extracted at lines 60 through 62. The third argument
may be null, in which case the head of the file in the HOME
directory is used. (Look at lines 66 through 72.)
The DELETE function is called
when an item is deleted from the hash. The function is defined
at line 81. There are two arguments to this function: a reference
to the object and the index to remove. The delete()
function is called to remove the indexed item from the LIST
hash at
line 85.
The CLEAR function is called
when the whole hash is removed, possibly by assigning an empty
list to it. (The tied array has no such callback!) There is only
one argument to this function, and that is a reference to itself.
The CLEAR function is set
to call the DELETE function
in this example (see line 95). This call saves some code, but
we could have just as easily used the delete()
function in the same manner as DELETE.
The EXISTS function is called
to check whether an item exists in a hash (see line 103). There
are two arguments to this function: a reference to the object
and the index to remove. It simply re-creates the key and uses
this key to return a value from within the LIST
hash.
The FIRSTKEY and NEXTKEY
methods are called when the each()
and keys() methods are called.
The FIRSTKEY method is called
when you start to iterate a list. The NEXTKEY
method gets called during a keys()
or each() iteration. The
first argument to NEXTKEY
is a reference to the object itself. The second argument is the
last that was accessed.
For the file in Listing 6.6, you should now be able to derive
your own classes for mapping hashes to functions. The tie()
function, when used with hashes, provides more flexibility in
defining methods than what the tie()
function for arrays provides. However, using the hash is more
complex than the array function because you have to define more
methods with the hashing method.
There is supposedly some relief, though, with the use of the TieHash
module provided with the Perl distribution. The TieHash
module has predefined methods for you to pick and choose whatever
functions you want to implement, and the rest are defaulted. There
is a man page for the module
in the Perl distribution, but it did not provide much information
on how to actually use the module. Perhaps we'll see more documentation
on this tool in later releases.
This chapter has provided only some basic information on the use
of the tie() function. There
is an excellent document, called perltie.html,
by Tom Christiansen that is available at most of the Perl archive
sites. The perltie.html document
has more detailed information on how to use the tie()
functions. The tie() function
is also used in modules distributed with Perl. Two interesting
modules to look at are the Config
and DBM file modules, which
show interesting uses of the tie()
function. The DBM file modules
provide detailed examples of mapping records to disk with the
use of tie() functions on
hashes.
This chapter has provided the basic information on how to use
the tie() function to provide
an association between a Perl variable and executable functions.
Scalars, arrays, and hashes can be associated with methods having
special names such as FETch,
STORE, and so on. By designing
classes that provide methods these names, you can provide extra
functionality to map scalar, array, or hash objects to other objects,
processes, or disk files.
Chapter 7
String and Patterns
CONTENTS
This chapter covers some of the most important features of Perl:
its string- and pattern-manipulation routines. Most of the Perl
programming you do will involve strings in one form or another.
It's very important to learn how to use the string search and
replace operations efficiently in Perl. An inefficient search
pattern can slow a script down to a crawl.
Let's first start with the basic operations in Perl for working
with strings. Some of this chapter will be a rehash of what was
covered in Chapters 2 through 5.
Now it's time to cover the topic in detail given the background
information in these chapters. I cover the following string utility
functions in this chapter:
- The chop()
and length() functions
- Handling the case in strings
- Joining strings together
- Printing formatted numbers
- The substr()
function
- Special characters in Perl pattern searches
- Shortcuts for special words in Perl
- The quotemeta()
function
- Specifying the number of matches
- Specifying more than one choice in a pattern
- Searching a string with more than one
pattern
To find the length of a string, you can call the length($str)
function, which returns the number of characters in a string.
The chop function removes
the last character in a string. This is useful in removing the
carriage return from a user-entered string. For an example, see
Listing 7.1.
Listing 7.1. Using length()
and chop().
1 #!/usr/bin/perl
2
3 $input = <STDIN>
;
4
5 $len = length($input);
6 print "\nLength = $len of $input before the chopping
\n";
7 chop($input);
8 $len = length($input);
9 print "\nLength
= $len of $input after the chopping \n";
10
11 $ 7_1.pl
12 Hello! I am a Test!
13
14 Length = 20 of Hello! I am a Test! before the chopping
15
16 Length = 19 of Hello! I am a Test! after the chopping
Perl provides four functions to make your life easier when handling
the case of characters in a string:
lc($string)
| Converts a string to lowercase |
uc($string)
| Converts a string to uppercase |
lcfirst($string)
| Converts the first character of a string to uppercase
|
ucfirst($string)
| Converts the first character of a string to lowercase
|
Listing 7.2 presents sample code that illustrates how these functions
work.
Listing 7.2. Using the case functions.
1 #!/usr/bin/perl
2 $name = "tis a test OF THE sYSTEm" ;
3 $ucase = uc($name);
4 $lcase = lc($name);
5
6 print "$name \n";
7 print "$ucase
\n";
8 print "$lcase \n";
9
10 $nice = lcfirst($ucase);
11 print "lcfirst on $ucase = \n\t $nice \n";
12
13 $crooked = ucfirst($lcase);
14 print "ucfirst on $lcase = \n\t$crooked \n";
Here is the output from Listing 7.2.
tis a test OF THE sYSTEm
TIS A TEST OF THE SYSTEM
tis a test of the system
lcfirst on TIS A TEST OF THE SYSTEM =
tIS A TEST OF THE SYSTEM
ucfirst on tis a test of the system =
Tis a test of the system
The dot operator is great for connecting strings. For example,
the following statements will print John
Hancock plus a new line:
$first="John";
$last ="Hancock";
print $first . " " . $last . "\n" ;
To print the elements of an array in a string, you can use the
join function to create one
long string. Here's the syntax for the join
function:
join ($joinstr, @list);
The $joinstr variable is
the string to use when connecting the elements of @list
together. Refer to the following statements to see how to create
a line for the /etc/passwd
file:
@list = ("khusain","sdfsdew422dxd","501","100",
"Kamran Husain","/home/khusain","/bin/bash");
$passwdEntry = join (":", @list);
Perl provides two functions, printf
and sprintf, that behave
like the printf family of
functions in the C programming language. The printf
function sends its output to the current file. The sprintf
function takes at least two arguments, a string and a format string,
followed by any other arguments. The sprintf
function sends the formatted output to the string in the first
argument. For example, the string $answer
contains the result of the sprintf
statement:
$a= 10;
$b= 10;
sprintf $answer, "%d + %d is %d and in %x", $a,$b,$a+$b,$a+$b;
A quick way to find the location of a substring in a string is
to use the index function,
which searches from left to right. To search from right to left,
use the rindex function.
Here's the syntax for these functions:
position = index ($string, $substring,
[$offset]);
position = rindex ($string, $substring, [$offset]);
$string is the character
string to search the $substring
in. The $offset parameter
is optional and defaults to the start of the string when not provided
to the function. Listing 7.3 is a function that looks for the
position of the word two in each line of an input file
(just like grep would except
that we print out the position of the character, too).
Listing 7.3. Using the index
and rindex
functions.
1 #!/usr/bin/perl
2
3 %finds = ();
4 $line = 0;
5
6 print "\n Enter word to search for:";
7 $word = <STDIN>;
8 chop ($word);
9
10 print "\n Enter file to search in:";
11 $fname = <STDIN>;
12 chop($fname);
13 open (IFILE, $fname) || die "Cannot open $fname $!\n";
14
15 while (<IFILE>) {
16 $position = index($_,$word);
17 if ($position >= 0) {
18 $finds{"$line"}
= $position;
19 }
20 $line++;
21 }
22 close IFILE;
23 while(($key,$value) = each(%finds)) {
24 print " Line $key : $value \n";
25 }
This program searches for the first occurrence of the word in
the file specified by the user. Each line in the file is searched
for the pattern. If the pattern is found, the program prints the
location of the pattern at each line and column number. The first
while loop searches in a
given file, and the second while
lists all the items collected in the %finds
associative array.
Listing 7.3 finds only the first occurrence of a pattern in a
line. You can use the offset
argument to search for a pattern other than from the start. The
offset argument is specified
from 0 and up. Listing 7.4 presents another search program that
finds more than one occurrence on a line.
Listing 7.4. Searching more than once.
1 #!/usr/bin/perl
2
3 %finds = ();
4 $fname = "news.txt";
5 $word = "the";
6 open (IFILE, $fname) || die "Cannot open $fname $!\n";
7
8 print "Search for :$word: \n";
9 while (<IFILE>)
{
10 $thispos = 0;
11 $nextpos = 0;
12 while (1) {
13 $nextpos
= index($_,$word,$thispos);
14 last if
($nextpos == -1);
15 $count++;
16 $finds{"$count"}
= $nextpos;
17 $thispos
= $nextpos + 1;
18 }
19 }
20 close IFILE;
21 print "\nLn : Column";
22 while(($key,$value) = each(%finds)) {
23 print " $key : $value \n";
24 }
The output of Listing 7.4 on a sample file would be something
like this:
Ln : Column
1 : 31
2 : 54
3 : 38
4 : 53
The substr function is used
to extract parts of a string from other strings. Here's the syntax
for this function:
substr ($master, $offset,
$length);
$master is the string
from which a substring is to be copied, starting at the index
specified at $offset
and up to $length
characters. Listing 7.5 illustrates the use of this function.
Listing 7.5. Using the substr
function.
1 #!/usr/bin/perl
2 # Check out the substr function.
3 #
4 $quote = "No man but a blockhead ever wrote except
for money";
5 # quote
by Samuel Johnson
6
7 $sub[0] = substr
($quote, 9, 6);
8
9 $name = "blockhead"
;
10 $pos = index($quote,$name);
11 $len = length($name);
12 $sub[1] = substr ($quote, $pos, $len);
13 $pos = index($quote,"wrote");
14 $sub[2] = substr ($quote, $pos, 6);
15
16 for ($i = 0; $i < 3; $i++) {
17 print "\$sub[$i] is \""
. $sub[$i] . "\" \n";
18 }
19
20 #
21 # To replace a string, let's try substr on the left-hand side.
22 #
23 # Replace the words 'a blockhead', with the words 'an altruist'.
24 # (Sorry Sam.)
25 $name = "a blockhead" ;
26 $pos = index($quote,$name);
27 $len = length($name);
28
29 substr ($quote, $pos, $len) = "an altruist";
30 print "After substr = $quote \n";
The output from the code in Listing 7.5 is as follows:
$sub[0] is "t a bl"
$sub[1] is "blockhead"
$sub[2] is "wrote "
After substr = No man but an altruist ever wrote except for money
You can see how the substr
operator can be used to extract values from another string. Basically,
you tell the substr function
how many characters you need and from where, and the chopped off
portion is returned from the function.
The substr function can also
be used to make substitutions within a string. In this listing,
the words "a blockhead"
are replaced by "an altruist".
The part of the string specified by substr
is replaced by the value appearing to the right of the assignment
operator. Here's the syntax for these calls to substr:
substr ($master, $offset,
$length) = $newStr;
$master must be a
string that can be written to (that is, not a tied variable- >see Chapter 6,
"Binding Variables to Objects," for information on using
tie() on variables). $offset
is where the substitution begins for up to $length
characters. The value of $offset
+ $length must be less than the existing length
of the string. The $newStr
variable can be the empty string if you want to remove the substring
at the offset. To substitute the tail-end of the string starting
from the offset, do not specify the $length
argument.
For example, this line:
$len = 22; substr ($quote, $pos, $len)
= "an altruist";
prints the following line in the previous example:
After substr = No man but an altruist
The offset can be a negative number to specify counting from the
right side of the string. For example, the following line replaces
three characters at the fifth index from the right side in $quote
with the word "cash":
substr($quote, -5, 3) = "cash";
The substr function is great
when working with known strings that do cut and paste operations.
For more general strings, you have to work with patterns that
can be described using regular expressions. If you are familiar
with the grep command in
UNIX, you already know about regular expressions. Basically, a
regular expression is a way of specifying strings like
"all words beginning with the letter a" or "all
strings with an xy in the middle somewhere." The next section
illustrates how Perl can help make these types of search and replace
patterns easier.
Perl enables you to match patterns within strings with the =~
operator. To see whether a string has a certain pattern in it,
you use the following syntax:
$result = $variable =~ /pattern/
The value $result is true
if the pattern is found in $variable.
To check whether a string does not have a pattern, you have to
use the !~ operator, like
this:
$result = $variable !~ /pattern/
Listing 7.6 shows how to match strings literally. It prints a
message if the string Apple,
apple, or Orange
is found, or if the strings Grape
and grape are not found.
Listing 7.6. Substitution with patterns.
1 #!/usr/bin/perl
2
3 $input = <STDIN>
;
4 chop($input);
5 print "Orange
found! \n" if ( $input =~ /Orange/ );
6 print "Apple found! \n" if ( $input
=~ /[Aa]pple/ );
7 print "Grape not found! \n" if ( $input !~
/[Gg]rape/ );
So, how did you search for apple
and Apple in one statement?
This involves specifying a pattern to the search string. The syntax
for the =~ operator is this:
[$variable =~] [m]/PATTERN/[i][o][g]
$variable is searched for
the pattern in PATTERN. The
delimiter of the text being searched is a white space or an end-of-line
character. The i specifies
a case-insensitive search. The g
is used as an iterator to search more than once on the same string.
The o interpolates characters.
I cover all these options shortly.
Let's look at how the patterns in PATTERN
are defined. If you are already familiar with the grep
utility in UNIX, you are familiar with patterns.
A character is matched for the string verbatim when placed in
PATTERN. For example, /Orange/
matched the string Orange
only. To match a character other than a new line you can use the
dot (.) operator. For example,
to match Hat or Cat,
you would use the pattern:
/.at/
This also matches Bat, hat,
Mat, and so on. If you just
want to get Cat and Hat,
you can use a character class using the square brackets ([]).
For example, the pattern
/[ch]cat/
will match Cat or Hat,
but not cat, hat,
bat, and so on. The characters
in a class are case sensitive. So to allow the lowercase versions,
you would use the pattern:
/[cChH]cat/
It's cumbersome to list a lot of characters in the []
class, so the dash (-) operator
can define a range of characters to use. These two statements
look for a digit:
/[0-9]/
/[0123456789]/
The [] operator can be used
with other items in the pattern. Consider these two sample statements,
which do the same thing:
/a[0123456789]/ # matches a, followed
by any digit,
/a[0-9]/ # matches a, followed by any digit,
/[a-zA-Z]/ # a letter of the alphabet.
The range [a-z] matches any
lowercase letter, and the range [A-Z]
matches any uppercase letter. The following pattern matches aA,
bX, and so on:
/[a-z][A-Z]/
To match three or more letter matches, it would be very cumbersome
to write something
like this:
/[a-zA-Z][a-zA-Z][a-zA-Z]/
This is where the special characters in Perl pattern searching
come into play.
Here is a list of all the special characters in search strings
(I'll go into the detail of how they work later):
- The .
character matches one character.
- The +
character matches one or more occurrences of a character.
- The ?
character matches zero or one occurrences of a character.
- The *
character matches zero or more occurrences of a character.
- The -
character is used to specify ranges in characters.
- The []
characters define a class of characters.
- The ^
character matches the beginning of a line.
- The $
character matches the end of a line.
- The {}
characters specify the number of occurrences of a character.
- The |
character is the OR operator
for more than one pattern.
The plus (+) character specifies
"one or more of the preceding characters." Patterns
containing + always try to
match as many characters they can. For example, the pattern /ka+/
matches any of these strings:
kamran #
returns "ka"
kaamran # returns "kaa"
kaaaamran # returns "kaaaa"
Another way to use the +
operator is for matching more than one space. For example, Listing
7.7 takes an input line and splits the words into an array. Items
in the array generated by this code will not include any items
generated by matching more than one consecutive space. The match
/ +/ specifies "one
or more space(s)."
Listing 7.7. Using the pattern matching +
operator.
1 #!/usr/bin/perl
2 $input = <STDIN>;
3 chop ($input);
4 @words = split (/ +/, $input);
5 foreach $i (@words) {
6 print $i . "\n";
7 }
If you do not use the + sign
to signify more than one space in the pattern, you'll wind up
with an array item for each white space that immediately follows
a white space. The pattern / /
specifies the start of a new word as soon as it sees a white space.
If there are two spaces together, the next white space will trigger
the start of a new word. By using the +
sign, you are saying "one or more white space together"
is the start of a new word.
Tip |
If you are going to repeatedly search one scalar variable, call the study() function on the scalar. The syntax is study ($scalar);. Only one variable can be used with study() at one time.
|
The asterisk (*) special
character matches zero or more occurrences of any preceding character.
The asterisk can also be used with the []
classes:
/9*/ # matches
an empty word, 9, 99, 999, ... and so on
/79*/ # matches 7, 79, 799, 7999, ... and so
on
/ab*/ # matches a, ab, abb, abbb, ... and so
on
Because the asterisk matches zero or more occurrences, the pattern
/[0-9]*/
will match a number or an empty line! So do not confuse the asterisk
with the plus operator. Consider this statement:
@words = split (/[\t\n ]*/, $list);
This matches zero or more occurrences of the space, newline, or
tab character. What this translates to in Perl is "match
every character." You'll wind up with an array of strings,
each of them one character long, of the all the characters in
the input line.
The ? character matches zero
or one occurrence of any preceding character. For example, the
following pattern will match Apple
or Aple, but not Appple:
/Ap?le/
Let's look at a sample pattern that searches the use of hashes,
arrays, and possibly the use of handles. The code in Listing 7.8
will be enhanced in the next two sections. For the moment, let's
use the code in Listing 7.8 to see how the asterisk operator works
in pattern matches.
Listing 7.8. Using the asterisk operator.
1 #!/usr/bin/perl
2 # We will finish this program in the next section.
3 $scalars = 0;
4 $hashes = 0;
5 $arrays = 0;
6 $handles = 0;
7
8 while (<STDIN>) {
9
@words = split (/[\(\)\t ]+/);
10 foreach $token (@words) {
11 if ($token =~ /\$[_a-zA-Z][_0-9a-zA-Z]*/)
{
12 #
print ("$token is a legal scalar variable\n");
13 $scalars++;
14 } elsif ($token =~ /@[_a-zA-Z][_0-9a-zA-Z]*/)
{
15 #
print ("$token is a legal array variable\n");
16 $arrays++;
17 } elsif ($token =~ /%[_a-zA-Z][_0-9A-Z]*/)
{
18 #
print ("$token is a legal hash variable\n");
19 $hashes++;
20 } elsif ($token =~ /\<[A-Z][_0-9A-Z]*\>/)
{
21 #
print ("$token is probably a file handle\n");
22 $handles++;
23 }
24 }
25 }
26
27 print " This file used scalars $scalars times\n";
28 print " This file used arrays $arrays times\n";
29 print " This file used hashes $hashes times\n";
30 print " This file used handles $handles times\n";
Lines 9 and 10 split the incoming stream into words. Note how
the pattern in line 9 splits words at spaces, tabs, and in between
parentheses. At line 11, we are looking for a word that starts
with a $, has a non-numeric
character or underscore as the first character, and is followed
by an alphanumeric string or underscores.
At lines 14 and 17, the same pattern is applied, with the exception
of an at (@) sign and a hash
(#) sign are looked for instead
of a dollar ($) sign in order
to search for arrays and hashes, respectively. At line 20, the
file handle is assumed to a word in all caps, not starting with
an underscore, but with alphanumeric characters in it.
The previous listing can get legal names if the pattern is anywhere
in a word. However, we want the search to be limited to word boundaries.
For example, right now the script cannot distinguish between the
following three lines of input because they all match the /\$[a-zA-Z][_0-9a-zA-Z]*/
somewhere in them:
$catacomb
OBJ::$catacomb
#$catacomb#
White spaces do not include tabs, newlines, and so on. Here are
the special characters to use in pattern matching to signify these
characters:
\t |
Tab |
\n |
Newline |
\r |
Carriage return |
\f |
Form feed. |
\\ |
Backslash (\) |
\Q and \E
| Pattern delimiters |
In general, you can escape any special character in a pattern
with the backslash (\). The
backslash itself is escaped with another backslash. The \Q
and \E characters are used
in Perl to delimit the interpretation of any special characters.
When the Perl interpreter sees \Q,
every character following \Q
is not interpreted and is used literally until the pattern terminates
or Perl sees \E. Here are
a few examples:
/\Q^Section$/ # match the string "^Section$"
literally.
/^Section$/ # match a line with the solitary
word Section in it.
/\Q^Section$/ # match a line which ends with ^Section
To further clarify where the variable begins and ends, you can
use these anchors:
\A |
Match at beginning of string only |
\Z |
Match at end of string only |
\b |
Match on word boundary |
\B |
Match inside word |
Here are some examples and how they are interpreted given a string
with the word hello in it somewhere:
/\Ahel/ # match
only if the first three characters are "hel"
/llo\Z/ # match only if the last three
characters are "llo"
/llo$/ # matches only if the
last three characters are "llo"
/\Ahello\Z/ # same as /^hello$/ unless doing multiple line matching
/\bhello/ # matches "hello", not "Othello",
but also matches "hello."
/\bhello/ # matches "$hello" because
$ is not part of a word.
/hello\b/ # matches "hello", and "Othello",
but not "hello."
/\bhello\b/ # matches "hello", and not "Othello"
nor "hello."
A "word" for use with these anchors is assumed to contain
letters, digits, and underscore characters. No other characters,
such as the tilde (~), hash
(#), or exclamation point
(!) are part of the word.
Therefore, the pattern /\bhello/
will match the string "$hello",
because $ is not part of
a word.
The \B pattern anchor takes
the opposite action than that of \b.
It matches only if the pattern is contained in a word. For example,
the pattern below:
/\Bhello/
match "$hello"
and "Othello" but
not "hello" nor
"hello." Whereas,
the pattern here:
/hello\B/
will match "hello."
but not "hello",
"Othello" nor
"$hello". Finally this pattern
/\Bhello\B/
will match "Othello"
but not "hello",
"$hello" nor "hello.".
/\Bhello/ # match
"$hello" and "Othello" but not "hello"
nor "hello."
/hello\B/ # match "hello." but
not "hello", "Othello" nor "$hello".
/\Bhello\B/ # match "Othello" but not "hello",
"$hello" nor "hello.".
Listing 7.9 contains the code from Listing 7.8 with the addition
of the new word boundary functions.
Listing 7.9. Using the boundary characters.
1 #!/usr/bin/perl
2
3 $scalars = 0;
4 $hashes = 0;
5 $arrays = 0;
6 $handles = 0;
7
8 while (<STDIN>) {
9
@words = split (/[\t ]+/);
10 foreach $token (@words) {
11 if ($token =~ /\$\b[a-zA-Z][_0-9a-zA-Z]*\b/)
{
12 #
print ("$token is a legal scalar variable\n");
13 $scalars++;
14 } elsif ($token =~ /@\b[a-zA-Z][_0-9a-zA-Z]*\b/)
{
15 #
print ("$token is a legal array variable\n");
16 $arrays++;
17 } elsif ($token =~ /%\b[a-zA-Z][_0-9A-Z]*\b/)
{
18 #
print ("$token is a legal hash variable\n");
19 $hashes++;
20 } elsif ($token =~ /\<[A-Z][_0-9A-Z]*\>/)
{
21 #
print ("$token is probably a file handle\n");
22 $handles++;
23 }
24 }
25 }
26
27 print " This file used scalars $scalars times\n";
28 print " This file used arrays $arrays times\n";
29 print " This file used hashes $hashes times\n";
30 print " This file used handles $handles times\n";
Here is sample input and output for this program that takes an
existing script file in test.txt
and uses it as the input to the test.pl
program.
$ cat test.txt
#!/usr/bin/perl
$input = <STDIN>;
chop ($input);
@words = split (/ +/, $input);
foreach $i (@words) {
print
" [$i] \n";
}
$ test.pl < test.txt
This file used
scalars 5 times
This file used arrays 2 times
This file used hashes
0 times
This file used handles 1 times
Patterns do not have to be typed literally to be used in the /
/ search functions. You can also specify them
from within variables. Listing 7.10 is a modification of Listing
7.9, which uses three variables to hold the patterns instead of
specifying them in the if
statement.
Listing 7.10. Using pattern matches in variables.
1 #!/usr/bin/perl
2
3 $scalars = 0;
4 $hashes = 0;
5 $arrays = 0;
6 $handles = 0;
7
8 $sType = "\\\$\\b[a-zA-Z][_0-9a-zA-Z]*\\b";
9 $aType = "@\\b[a-zA-Z][_0-9a-zA-Z]*\\b";
10 $hType = "%\\b[a-zA-Z][_0-9A-Z]*\\b/";
11
12 while (<STDIN>) {
13 @words = split (/[\t ]+/);
14 foreach $token (@words) {
15 if ($token =~ /$sType/ ) {
16 #
print ("$token is a legal scalar variable\n");
17 $scalars++;
18 } elsif ($token =~ /$aType/ ) {
19 #
print ("$token is a legal array variable\n");
20 $arrays++;
21 } elsif ($token =~ /$hType/ ) {
22 #
print ("$token is a legal hash variable\n");
23 $hashes++;
24 } elsif ($token =~ /\<[A-Z][_0-9A-Z]*\>/)
{
25 #
print ("$token is probably a file handle\n");
26 $handles++;
27 }
28 }
29 }
30
31 print " This file used scalars $scalars times\n";
32 print " This file used arrays $arrays times\n";
33 print " This file used hashes $hashes times\n";
34 print " This file used handles $handles times\n";
In this code, the variables $aType,
$hType, and $sType
can be used elsewhere in the program verbatim. What you have to
do, though, is to escape the backslashes twice, once to get past
the Perl parser for the string and the other for the pattern searcher
if you are using double quotes. When using single quotes, you
can use the following line:
$sType = '\$\\b[a-zA-Z][_0-9a-zA-Z]*\b';
instead of this line:
$sType = "\\\$\\b[a-zA-Z][_0-9a-zA-Z]*\\b";
Make sure that you remember to include the enclosing /
characters when using a $variable
for a pattern. Forgetting to do this will give erroneous results.
Also, be sure you see how each backslash is placed to escape characters
correctly.
The [] classes for patterns
simplify searches quite a bit. In Perl, there are several shortcut
patterns that describe words or numbers. You have seen them already
in the previous examples and chapters.
Here are the shortcuts:
Shortcut | Description
| Pattern String |
\d |
Any digit | [0-9]
|
\D |
Anything other than a digit | [^0-9]
|
\w |
Any word character | [_0-9a-zA-Z]
|
\W |
Anything not a word character | [^_0-9a-zA-Z]
|
\s |
White space | [ \r\t\n\f]
|
\S |
Anything other than white space | [^ \r\t\n\f]
|
These escape sequences can be used anywhere ordinary characters
are used. For example, the pattern /[\da-z]/
matches any digit or lowercase letter.
The definition of word boundary as used by the \b
and \B special characters
is done with the use of \w
and \W. The patterns /\w\W/
and /\W\w/ can be used to
detect word boundaries. If the pattern /\w\W/
matches a pair of characters, it means that the first character
is part of a word and the second is not. This further means that
the first character is at the end of a matched word and that a
word boundary exists between the first and second characters matched
by the pattern and you are at the end of a word.
Conversely, if /\W\w/ matches
a pair of characters, the first character is not part of a word
and the second character is part of the word. This means that
the second character is the beginning of a word. Again, a word
boundary exists between the first and second characters matched
by the pattern. Therefore, you are at the start of a word.
The quotemeta function puts
a backslash in front of any non-word character in a given string.
Here's the syntax for quotemeta:
$
newstring = quotemeta($oldstring);
The action of the quotemeta
string can best be described using regular expressions as
$string =~ s/(\W)/\\$1/g;
Sometimes matching once, twice, or more than once is not sufficient
for a particular search. What if you wanted to match from two
to four times? In this case you can use the {
} operators in the search function. For example, in
the following pattern you can search for all words that begin
with ch followed by two or
three digits followed by .txt:
/ch[0-9]{2,3}.txt/
For exactly three digits after the ch
text, you can use this:
/ch[0-9]{ 3}.txt/
For three or more digits after the ch
text, you can use this:
/ch[0-9]{3,}.txt/
To match any three characters following the ch
text, you can use this:
/ch.{3,}.txt/
Perl enables you to specify more than one choice when attempting
to match a pattern. The pipe symbol (|)
works like an OR operator,
enabling you to specify two or more patterns to match. For example,
the pattern
/houston|rockets/
matches the string houston
or the string rockets, whichever
comes first. You can use special characters with the patterns.
For example, the pattern /[a-z]+|[0-9]+/
matches one or more lowercase letters or one or more digits. The
match for a valid integer in Perl is defined as this:
/\b\d+\b|\b0[xX][\da-fA-F]+\b/)
There are two alternatives to check for here. The first one is
^\d+ (that is, check for
one or more digits to cover both octal and decimal digits). The
second ^0[xX][\da-fA-F]+$
looks for 0x or 0X
followed by hex digits. Any other pattern is disregarded. The
delimiting \b tags limit
the search to word boundaries.
Sometimes it's necessary to search for occurrences for the same
pattern to match at more than one location. You saw earlier in
the example for using substr
how we kept the index around between successive searches on one
string. Perl offers another alternative to this problem: the pos()
function. The pos function
returns the location of the last pattern match in a string. You
can reuse the last match value when using the global (g)
pattern matching operator. The syntax for the pos
function is
$offset = pos($string);
where $string is the string
whose pattern is being matched. The returned $offset
is the number of characters already matched or skipped.
Listing 7.11 presents a simple script to search for the letter
n in Bananarama.
Listing 7.11. Using the pos
function.
1 #!/usr/bin/perl
2 $string = "Bananarama";
3 while ($string =~ /n/g) {
4 $offset
= pos($string);
5 print("Found
an n at $offset\n");
6 }
Here's the output for this program:
Found an n at 2
Found an n at 4
Found an n at 6
Found an n at 8
Found an n at 10
The starting position for pos()
to work does not have to start at 0. Like the substr()
function, you can use pos()
on the right side of the equal sign. To start a search at position
6, simply type this line before you process the string:
pos($string) = 5;
To restart searching from the beginning, reset the value of pos
to 0.
There will be times when you want to write patterns that address
groups of numbers. For example, a section of comma-delimited data
from the output of a spreadsheet is of this form:
digits,digits,digits,digits
A bit repetitive, isn't it? To extract this tidbit of information
from the middle of a document, you could use something like this:
/[\d]+[,.][\d]+[,.][\d]+[,.][\d]+/
What if there were 10 columns? The pattern would be long, and
you'd be prone to make mistakes.
Perl provides a macro substitution to allow repetitions of a known
sequence. Every pattern in a matched string that is enclosed in
memory is stored in memory in the order it is declared. To retrieve
a sequence from memory, use the special character \n,
where n is an integer
representing the nth pattern stored in memory.
For example, you can write the previous lines using these two
repetitive patterns:
([\d]+)
([,.])
The string that is used for matching the pattern would look like
this:
/([\d]+])([,.])\1\2\1\2\1\2/
The pattern matched by [\d]+
is stored in memory. When the Perl interpreter sees the escape
sequence \1, it matches the
first matched pattern. When it sees \2,
it matches the second pattern. Pattern sequences are stored in
memory from left to right. As another example, the following matches
a phone number in the United States, which is of the form ###-###-####,
where the # is a digit:
/\d{3}(\-))\d{3}\1\d{2}/
The pattern sequence memory is preserved only for the length of
the pattern. You can access these variables for a short time,
at least until another pattern match is hit, by examining the
special variables of the form $n.
The $n variables contain
the value of patterns matched in parentheses right after a match.
The special variable $&
contains the entire matched pattern.
In the previous snippet of code, to get the data matched in columns
into separate variables, you can use something like this excerpt
in a program:
if (/-?(\d+)\.?(\d+)/) {
$matchedPart = $&;
$col_1 = $1;
$col_2 = $2;
$col_3 = $3;
$col_4 = $4;
}
The order of precedence when using ()
is higher than that of other pattern-matching characters. Here
is the order of precedence from high to low:
() |
Pattern memory |
+ * ? {}
| Number of occurrences |
^ $ \b \B \W \w
| Pattern anchors |
| |
The OR operator
|
The pattern-memory special characters ()
serve as delimiters for the OR
operator. The side effect of this delimiting is that the parenthesized
part of the pattern is mapped into a $n
register. For example, in the following line, the \1
refers to (b|d), not the
(a|o) matching pattern:
/(b|d)(a|o)(rk).*\1\2\3/
There are several pattern-matching options in Perl to control
how strings are matched. You saw these options earlier when I
introduced the syntax for pattern matching. Here are the options:
g
| Match all possible patterns |
i
| Ignore case when matching strings |
m
| Treat string as multiple lines |
o
| Only evaluate once |
s
| Treat string as single line |
x
| Ignore white space in pattern |
All these pattern options must be specified immediately after
the option. For example, the following pattern uses the i
option to ignore case:
/first*name/i
More than one option can be specified at one time and can be specified
in any order.
The g operator tells the
Perl interpreter to match all the possible patterns in a string.
For example, if the string bananarama
is searched using the following pattern:
/.a/g
it will match ba, na,
na, ra,
and ma. You can assign the
return of all these matches to an array. Here's an example:
@words = "bananarama" =~ /.a/g;
for $i (@words) {
print
"$i \n";
}
You can use patterns with the g
option in loops. The returned value of the match is repeated until
it returns false. Inside
the loop you can use the &#
operator. For example, in the word Mississippi,
you can loop around looking for two characters together like this:
$string = "Mississippi";
while ($string =~ /([a-z]\1/g) {
$found
= $&;
print
("$found\n");
}
Tip |
Don't forget that you can use the pos() function in a while loop to see at what position the last match occurred.
|
The i option enables you
to perform a case-insensitive search. The match will be made regardless
of whether the string is uppercase or lowercase or a mixture of
cases.
The m option allows searching
on more than one line per match. When the m
option is specified, the ^
special character matches either the start of the string or the
start of any new line. Also, the $
character can match either the new line or the end of text.
The o option enables a pattern
to be evaluated only once. This is never really used in practice.
Basically, it forces Perl to disregard further matches on the
same input line.
Normally the dot (.) character
does not match the new line. When you specify the s
option, you allow the pattern to be matched across multiple lines
because this allows the dot character to be matched with a new
line.
The x operator tells Perl
to ignore any white spaces in the pattern match unless the white
space has been preceded by a backslash. The real benefit to using
the x option is to improve
readability because pattern specifications do not have to be crunched
together anymore. For example, these two patterns match the same
string:
/([\d]+])([,.])\1\2\1\2\1\2/
/([\d]+])([,.]) \1\2\ 1\2\ 1\2/x
You have already seen how to substitute text through the use of
the substr function. The
pattern-matching function can be extended to do string substitution
with the use of the s operator.
Here's the syntax:
s/pattern/replacement/[options]
The replacement string is interpreted literally and cannot have
a pattern. The Perl interpreter searches for the pattern specified
by the placeholder pattern. If it finds the pattern, it replaces
the pattern with the string represented by the placeholder replacement.
Here's an example:
$string = "cabi.net";
$string =~ s/cabi/sig/;
The contents of $string will
be sig.net instead of cabi.net.
The good news is that all the pattern matching stuff up to this
point in the chapter applies here! So, you can use any of the
pattern special characters in the substitution operator. For example,
the following replaces all words with one or more digits with
the letter X:
s/[\d]+/X/
Specify an empty string for the replacement if you just want to
delete a set of strings. For example, the following line replaces
all words with one or more digits in them:
s/[\d]+//
The pattern match memory sequence applies here. For example, to
swap the two columns of data, you can use this line:
s/(\d+)\s\1/$2 $1/
The substitution pattern matches a sequence of one or more digits,
followed by a space, followed by another set of digits. The output
is the values of the $1 and
$2 registers swapped in sequence.
The substitution operator supports several options just like the
match operator:
g
| Change all occurrences of the pattern |
i
| Ignore case in pattern |
e
| Evaluate replacement string as expression |
m
| Treat string to be matched as multiple lines
|
o
| Only evaluate once |
s
| Treat string to be matched as single line |
x
| Ignore white space in pattern |
As with pattern matching, options are appended to the end of the
operator. Most of these options work the same way as they did
for matching patterns during a search.
The g option changes all
occurrences of a pattern in a particular string. For instance,
the following substitution puts parentheses around all the numbers
in a line:
s/(\d+)/($1)/g
The i option ignores case
when substituting. For example, the substitution
s/\bweb\b/WEB/gi
replaces all occurrences of the words web,
WeB, wEB,
and so on with the word WEB.
Although you cannot put patterns in the replacement string, you
can run the eval() function
on it. The e option treats
the replacement string as an expression, which it evaluates before
replacing it in the original string. The results of the evaluation
are used instead. Suppose that you wanted to repeat a string twice
on a line. A common use is to redefine the values in a header
file to twice what they are. For example, the string
$define ABX 123
matches all the variables of the form and replaces the numeric
part of the line with twice its value. Listing 7.12 presents a
simple script to do this with a C header file.
Listing 7.12. Using the pattern replace to do simple operations.
1 #!/usr/bin/perl
2
3 open (FILE, "tt.h")
|| die $!;
4 $i = 0;
5 while (<FILE>)
{
6 $string
= $_;
7 if(/define/)
{
8
$string =~ s/(\d+)/$1 * 2/e;
9
print "$string \n";
10
$i++;
11 }
12 else {
13
print "$string \n";
14 }
15 }
16
17 close FILE;
The o option tells the Perl
interpreter to replace a scalar variable only on the first match.
All subsequent pattern matches are ignored.
The s option ensures that
the newline character \n
is matched by the . special
character:
With the m option, the ^
and $ characters match the
start and end of any line as they do in pattern matches.
The \A and \Z
escape sequences always match only the beginning and end of the
string. The actions taken by these options are not affected by
the s or m
options.
The x option causes the interpreter
to ignore all white spaces unless they are escaped by a backslash.
The only benefit gained from this operation is to make patterns
easier to read. See the example for using the x
option shown in the pattern-matching options section earlier in
this chapter.
The forward slash (/) delimiter
can be substituted with another character for showing where to
delimit text. For example, you can use <>,
# (hash), or ()
(parentheses) characters as delimiters, as illustrated in Listing
7.13.
Listing 7.13. Using a different delimiter for the forward slash.
1 #!/usr/bin/perl
2
3 $name = "/usr/local/lib";
4
5 $s1 = $name ;
6 $s1 =~ s#/usr/local/#/local/#;
7 print $s1 . "\n";
8
9 $s2 = $name ;
10 $s2 =~ s</usr/local/></local/>;
11 print $s2 . "\n";
12
13 $s3 = $name ;
14 $s3 =~ s(/usr/local/)(/local/);
15 print $s3 . "\n";
The UNIX tr command is also
available in Perl as the tr
function. The tr function
lets you substitute one group of characters with another. Here's
the syntax:
tr/string1/string2/
where string1 contains a
list of characters to be replaced, and string2
contains the characters that replace them. Each character in string2
is replaced with a character in the same position in string1.
If string1 is longer than
string2, the last character
of string1 is repeated to
pad the contents of string2.
If the same character appears more than once in string1,
the first replacement found will be used.
$string = "12345678901234567890";
$string =~ tr/2345/ABC/;
Here, all characters 2, 3,
4, and 5
in the string are replaced with A,
B, C,
and C, respectively. The
C is repeated here by Perl
as it makes the length of the replacement string equal to that
of the string being replaced. So, the replacement string is "ABccC"
for matching with "12345".
The most common use of the translation operator is to convert
a string from uppercase to lowercase, or vice versa.
while ($line = <STDIN>) {
$line =~
tr/A-Z/a-z/;
print
($line);
}
To convert all characters in a string to uppercase, here's another
sample function:
while ($line = <STDIN>) {
$line =~
tr/a-z/A-Z/;
print
($line);
}
There are a few things about the tr
operator that you should remember:
- If you do not specify a variable name,
the $_ variable is
used instead.
- The tr
function returns the number of characters replaced. That is, the
statement $num = $line =~ tr/a-z/A-Z/
will set the value of $num
to the number of characters replaced. Using tr
to get the length of a string is possible if you replace each
character by itself, but it's probably better to use length
instead.
- No pattern-matching special characters
are supported by tr.
- You can use y
in place of tr if you like.
That is, these two statements are the same: $string
=~ y/a-z/A-Z/; and $string
=~ tr/a-z/A-Z/;. Functionally, the y
is the same as tr. Why use
y? Using the y
function might impress your colleagues and get you a key to the
executive bathroom. Wow!
The program in Listing 7.14 tallies the number of times vowels
are used in a text file.
Listing 7.14. Tallying vowels in a text file.
1 #!/usr/bin/perl
2
3 $count = 0;
4
5 while ($input = <STDIN>)
{
6 chop
($input);
7 $total
+= length($input);
8 $_
= $input;
9 $count
+= tr/aeiou/aeiou/;
10 }
11
12 print ("In this file, there are: $count vowels \n";
The translation operator supports three options. These options
are specified after the patterns using this syntax:
tr/string1/string2/[cds]
Here are the options for tr:
c
| Translate all characters not specified |
d
| Delete all specified characters |
s
| Replace multiple identical output characters with a single character
|
The c operator stands for
complement. That is, it does the opposite of what the character
specifies. For example, the following line replaces all characters
that are not in [a-zA-Z0-9]
with a space:
$onlyDigits =~ tr/\w/ /c;
The d option deletes every
specified character:
$noDigits =~ tr/\d//d;
This deletes all the digits from $noDigits.
The s option stands for squeeze.
With the s option, tr
translates only one character if two or more consecutive characters
translate to the same output character. For example, the following
line replaces everything that is not a digit and outputs only
one space between digits.
Pattern-specific matching capabilities are possible with the use
of this operator:
(?ccpattern)
cc is a single
character representing the extended pattern-matching capability
being used for the pattern. cc
can be one of these values:
?: |
Do not store the pattern in parentheses in memory.
|
?o |
Where o can be an option to apply to the pattern and can be i for case insensitive, m for multiple lines, s for single line, or x for ignore white space.
|
?= |
Look ahead in buffer. |
?! |
Look back in buffer. |
?# |
Add comments. |
You have seen how () stores
a pattern match in memory. By using ?:
you can force the pattern not to be stored in memory. In the following
two statements, \1 points
to \d+ in the first and [a-z]
in the second:
/(\d+)([a-z]+/
/(?:\d+)([a-z]+/
The string, ?o, specifies
a pattern-matching option within the pattern itself. The o
could be i for ignore case.
For example, the following patterns are the same:
/[a-z]+/i
/(?i)[a-z]+/
You can specify different cases for different parts of the same
search pattern. Here's an example:
$pattern1 = "[A-Z]+";
$pattern2 = "(?i)[a-z0-9_]+";
if ($string =~ /$pattern1|$pattern2/) {
...
}
This pattern matches either any collection of uppercase letters
or any collection of letters with digits and an underscore.
You can use the ?= feature
to look ahead for a pattern. For example, the pattern
/123(?=XYZ)/
only matches 123 if it is
immediately followed by XYZ.
The matched string in $&
will be 123, not 123XYZ.
To look at the back of a string, use the ?!
operator. For example,
/(?!XYZ)123/
matches 123 only if it immediately
follows XYZ. The matched
pattern in $& will still
be 123.
Reading complicated patterns is not easy, even if you are the
author. Adding comments makes it easier to follow complicated
patterns. Finally, you can add comments about a pattern with the
?# operator. Here's an example:
/(?i)[a-z][\d]{2,3}(?
The above example will match two or three digits following a lowercase
letter.
With the function substr
you can extract a substring from a string or replace a portion
of a string or append to the front or back end of another string.
The lc and uc
functions convert strings to lowercase and uppercase. The first
letter of a string can be converted to lowercase or uppercase
using either lcfirst or ucfirst.
The quotemeta function places
a backslash in front of every nonword character in a string. New
character strings can be created using join,
which creates a string from the members of a list, and sprintf,
which works like printf except
that the output goes to a string. Functions that search character
strings include index, which
searches for a substring starting from the left of a string, and
rindex, which searches for
a substring starting from the right of a string. You can retrieve
the length of a character string using length.
The pos function enables
you to determine or set the current pattern-matching location
in a string. The tr function
replaces one set of characters with another.
|