Search     or:     and:
 LINUX 
 Language 
 Kernel 
 Package 
 Book 
 Test 
 OS 
 Forum 
iakovlev.org

Часть 4

Введение в модули Perl


Содержание


В этой части вы узнаете о модулях,пакетах и классах.

Что такое перловый модуль?

Перловый модуль похож на обычный набор библиотечных функций. Термин module в Perl практически совпадает с термином package. Пакеты появились в Perl 4, модули - в Perl 5.

Код модуля хранится в одном файле. При этом приложение разбивается на блочные модули. Вообще понятие module шире нежели package.

Модуль-это набор переменных,кода и других данных. Например использование глобальных переменных с такими именами , как $k, $j, или $i - затея не очень хорошая. Локальная переменная $i, может быть где угодно. А использование модулей эту проблему решает, поскольку видимость переменных ограничена внутри одного модуля.

The symbols defined for your variables are stored in an associative array, referred to as a symbol table. These symbol tables are unique to a package. Therefore, variables of the same name in two different packages can have different values.

Each module has its own symbol table of all symbols that are declared within it. The symbol table basically isolates synonymous names in one module from another. The symbol table defines a namespace, that is, a space for independent variable names to exist in. Thus, the use of modules, each with its own symbol table, prevents a variable declared in one section from overwriting the values of other variables with the same name declared elsewhere in the same program.

As a matter of fact, all variables in Perl belong to a package. The variables in a Perl program belong to the main package. All other packages within a Perl program either are nested within this main package or exist at the same level. There are some truly global variables, such as the signal handler array %SIG, that are available to all other modules in an application program and cannot be isolated via namespaces. Only those variable identifiers starting with letters or an underscore are kept in a module's symbol table. All other symbols, such as the names STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, Inc, and SIG are forced to be in package _main.

Switching between packages affects only namespaces. All you are doing when you use one package or another is declaring which symbol table to use as the default symbol table for lookup of variable names. Only dynamic variables are affected by the use of symbol tables. Variables declared by the use of the my keyword are still resolved with the code block they happen to reside in and are not referenced through symbol tables. In fact, the scope of a package declaration remains active only within the code block it is declared in. Therefore, if you switch symbol tables by using a package within a subroutine, the original symbol table in effect when the call was made will be restored when the subroutine returns.

Switching symbol tables affects only the default lookup of dynamic variable names. You can still explicitly refer to variables, file handles, and so on in a specific package by prepending a packageName:: to the variable name. You saw what a package context was when using references in Chapter 3. A package context simply implies the use of the symbol table by the Perl interpreter for resolving variable names in a program. By switching symbol tables, you are switching the package context.

Modules can be nested within other modules. The nested module can use the variables and functions of the module it is nested within. For nested modules, you would have to use moduleName::nestedModuleName and so on. Using the double colon (::) is synonymous with using a back quote (`). However, the double colon is the preferred, future way of addressing variables within modules.

Explicit addressing of module variables is always done with a complete reference. For example, suppose you have a module, Investment, which is the default package in use, and you want to address another module, Bonds, which is nested within the Investment module. In this case, you cannot use Bond::. Instead, you would have to use Investment::Bond:: to address variables and functions within the Bond module. Using Bond:: would imply the use of a package Bond that is nested within the main module and not within the Investment module.

The symbol table for a module is actually stored in an associative array of the module's names appended with two colons. The symbol table for a module called Bond will be referred to as the associative array %Bond::. The name for the symbol table for the main module is %main::, and can even be shortened to %::. Similarly, all nested packages have their symbols stored in associative arrays with double colons separating each nesting level. For example, in the Bond module that is nested within the Investment module, the associative array for the symbols in the Bond module will be named %Investment::Bond::.

A typeglob is really a global type for a symbol name. You can perform aliasing operations by assigning to a typeglob. One or more entries in an associative array for symbols will be used when an assignment via a typeglob is used. The actual value in each entry of the associative array is what you are referring to when you use the *variableName notation. Thus, there are two ways of referring to variable names in a package:

*Investment::money = *Investment::bills;

$Investment::{'money'} = $Investment::{'bills'};

In the first method, you are referring to the variables via a typeglob reference. The use of the symbol table, %Investment::, is implied here, and Perl will optimize the lookup for symbols money and bills. This is the faster and preferred way of addressing a symbol. The second method uses a lookup for the value of a variable addressed by 'money' and 'bills' in the associative array used for symbols, %Investment:: explicitly. This lookup would be done dynamically and will not be optimized by Perl. Therefore, the lookup will be forced to check the associative array every time the statement is executed. As a result, the second method is not efficient and should be used only for demonstration of how the symbol table is implemented internally.

Another example in this statement

*kamran = *husain;

causes variables, subroutines, and file handles that are named via the symbol kamran to also be addressed via the symbol husain. That is, all symbol entries in the current symbol table with the key kamran will now contain references to those symbols addressed by the key husain. To prevent such a global assignment, you can use explicit references. For example, the following statement will let you address the contents of $husain via the variable $kamran:

*kamran = \$husain;

However, any arrays such @kamran and @husain will not be the same. Only what the references specified explicitly will be changed. To summarize, when you assign one typeglob to another, you affect all the entries in a symbol table regardless of the type of variable being referred to. When you assign a reference from one variable type to another, you are only affecting one entry in the symbol table.

A Perl module file has the following format:

package ModuleName;
...
####  Insert module code ####
...
1;

The filename has to be called ModuleName.pm. The name of a module must end in the string .pm by convention. The package statement is the first line of the file. The last line of the file must contain the line with the 1; statement. This in effect returns a true value to the application program using the module. Not using the 1; statement will not let the module be loaded correctly.

The package statement tells the Perl interpreter to start with a new namespace domain. Basically, all your variables in a Perl script belong to a package called main. Every variable in the main package can be referred to as $main'variable.

Here's the syntax for such references:

$packageName'variableName

The single quote (') is synonymous with the double colon (::) operator. I cover more uses of the :: operator in the next chapter. For the time being, you must remember that the following two statements are equivalent:

$packageName'variableName;
$packageName::variableName;

The double-colon syntax is considered standard in the Perl world. Therefore, to preserve readability, I use the double-colon syntax in the rest of this book unless it's absolutely necessary to make exceptions to prove a point.

The default use of a variable name defers to the current package active at the time of compilation. Thus, if you are in the package Finance.pm and specify a variable $pv, the variable is actually equal to $Finance::$pv.

Using Perl Modules: use vs. require

You include Perl modules in your program by using the use or the require statement. Here's the way to use either of these statements:

use ModuleName;
require ModuleName;

Note that the .pm extension is not used in the code shown above. Also note that neither statement allows a file to be included more than once in a program. The returned value of true (1;) as the last statement is required to let Perl know that a required or used module loaded correctly and lets the Perl interpreter ignore any reloads. In general, it's better to use the use Module; statement than the require Module; statement in a Perl program to remain compatible with future versions of Perl.

For modules, you might want to consider continuing to use the require statement. Here's why: The use statement does a little bit more work than the require statement in that it alters the namespace of the module that includes another module. You want this extra update of the namespace to be done in a program. However, when writing code for a module, you may not want the namespace to be altered unless it's explicitly required. In this event, you will use the require statement.

The require statement includes the full pathname of a file in the @Inc array so that the functions and variables in the module's file are in a known location during execution time. Therefore, the functions that are imported from a module are imported via an explicit module reference at runtime with the require statement. The use statement does the same thing as the require statement because it updates the @Inc array with full pathnames of loaded modules. The code for the use function also goes a step further and calls an import function in the module being used to explicitly load the list of exported functions at compile time, thus saving the time required for an explicit resolution of a function name during execution.

Basically, the use statement is equivalent to

require ModuleName; import ModuleName [list of imported functions];

The use of the use statement does change your program's namespace because the imported function names are inserted in the symbol table. The require statement does not alter your program's namespace. Therefore, the following statement

use ModuleName ();

is equivalent to this statement:

require ModuleName;

Functions are imported from a module via a call to a function called import. You can write your own import function in a module, or you can use the Exporter module and use its import function. In almost all cases, you will use the Exporter module to provide an import function instead of reinventing the wheel. (You'll learn more on this in the next section.) Should you decide not to use the Exporter module, you will have to write your own import function in each module that you write. It's much easier to simply use the Exporter module and let Perl do the work for you.

The Sample Letter.pm Module

The best way to illustrate the semantics of how a module is used in Perl is to write a simple module and show how to use it. Let's take the example of a local loan shark, Rudious Maximus, who is simply tired of typing the same "request for payment" letters. Being an avid fan of computers and Perl, Rudious takes the lazy programmer's approach and writes a Perl module to help him generate his memos and letters.

Now, instead of typing within fields in a memo template file, all he has to do is type a few lines to produce his nice, threatening note. Listing 4.1 shows you what he has to type.


Listing 4.1. Using the Letter module.
 1 #!/usr/bin/perl -w
 2 #
 3 # Uncomment the line below to include the current dir in @Inc.
 4 # push (@Inc, 'pwd');
 5 #
 6 use Letter;
 7
 8 Letter::To("Mr. Gambling Man","The money for Lucky Dog, Race 2");
 9 Letter::ClaimMoneyNice();
10 Letter::ThankDem();
11 Letter::Finish();

The use Letter; statement is present to force the Perl interpreter to include the code for the module in the application program. The module should be located in the /usr/lib/perl5/ directory, or you can place it in any directory listed in the @Inc array. The @Inc array is the list of directories that the Perl interpreter will look for when attempting to load the code for the named module. The commented line (number 4) shows how to add the current working directory to include the path. The next four lines in the file generate the subject matter for the letter.

Here's the output from using the Letter module:

To: Mr. Gambling Man
Fm: Rudious Maximus, Loan Shark
Dt: Wed Feb  7 10:35:51 CST 1996

Re: The money for Lucky Dog, Race 2


====================================================

It has come to my attention that your account is
way over due.
You gonna pay us soon?
Or would you like me to come ovah?

Thanks for your support.



Sincerely,
Rudious

The Letter module file is shown in Listing 4.2. The name of the package is declared in the first line. Because this module's functions will be exported, I use the Exporter module. Therefore, the statement use Exporter; is required to inherit functionality from the Exporter module. Another required step is putting the word Exported in the @ISA array to allow searching for Exported.pm.

Note
The @ISA array is a special array within each package. Each item in the array lists where else to look for a method if it cannot be found in the current package. The order in which packages are listed in the @ISA array is the order in which Perl searches for unresolved symbols. A class that is listed in the @ISA array is referred to as the base class of that particular class. Perl will cache missing methods found in base classes for future references. Modifying the @ISA array will flush the cache and cause Perl to look up all methods again.

Let's now look at the code for Letter.pm in Listing 4.2.


Listing 4.2. The Letter.pm module.
 1 package Letter;
 2
 3 require Exporter;
 4 @ISA = (Exporter);
 5
 6 =head1 NAME
 7
 8 Letter - Sample module to generate letterhead for you
 9
10 =head1 SYNOPSIS
11
12     use Letter;
13
14     Letter::Date();
15     Letter::To($name,$company,$address);
16
17 Then one of the following:
18     Letter::ClaimMoneyNice()  {
19     Letter::ClaimMoney();
20     Letter::ThreatBreakLeg();
21
22     Letter::ThankDem();
23     Letter::Finish();
24
25 =head1 DESCRIPTION
26
27 This module provides a short example of generating a letter for a
28 friendly neighborbood loan shark.
29
30 The code begins after the "cut" statement.
31 =cut
32
33 @EXPORT = qw( Date,
34                         To,
35                         ClaimMoney,
36                         ClaimMoneyNice,
37                         ThankDem,
38                         Finish );
39
40 #
41 # Print today's date
42 #
43 sub Letter::Date {
44             $date = 'date';
45             print "\n Today is $date";
46 }
47
48 sub Letter::To {
49             local($name) = shift;
50             local($subject) = shift;
51             print "\n To: $name";
52             print "\n Fm: Rudious Maximus, Loan Shark";
53             print "\n Dt: ", `date`;
54             print "\n Re: $subject";
55             print "\n\n";
56             print "\n====================================================\n";
57 }
58 sub Letter::ClaimMoney()  {
59             print "\n You owe me money. Get your act together";
60             print "\n Do you want me to send Bruno over to ";
61             print "\n collect it , or are you gonna pay up?";
62 }
63
64 sub Letter::ClaimMoneyNice()  {
65             print "\n It is come to my attention that your account is ";
66             print "\n way over due.";
67             print "\n You gonna pay us soon..";
68             print "\n or would you like me to come ovah?";
69 }
70
71 sub Letter::ThreatBreakLeg() {
72             print "\n apparently letters like these dont help";
73             print "\n I will have to make an example of you";
74             print "\n \n See you in the hospital, pal!";
75 }
76
77 sub Letter::ThankDem() {
78             print "\n\n Thanks for your support";
79 }
80
81 sub Letter::Finish(){
82             printf "\n\n\n\n Sincerely";
83             printf "\n Rudious \n ";
84 }
85
86 1;

Lines containing the equal sign are used for documentation. You must document each module for your own reference; Perl modules do not need to be documented, but it's a good idea to write a few lines about what your code does. A few years from now, you may forget what a module is about. Good documentation is always a must if you want to remember what you did in the past!

I cover documentation styles used for Perl in Chapter 8, "Documenting Perl Scripts." For this sample module, the =head1 statement begins the documentation. Everything up to the =cut statement is ignored by the Perl interpreter.

Next, the module lists all the functions exported by this module in the @EXPORT array. The @EXPORT array defines all the function names that can be called by outside code. If you do not list a function in this @EXPORT array, it won't be seen by external code modules.

Following the @EXPORT array is the body of the code, one subroutine at a time. After all the subroutines are defined, the final statement 1; ends the module file. 1; must be the last executable line in the file.

Let's look at some of the functions defined in this module. The first function to look at is the simple Date function, lines 43 to 46, which prints the current UNIX date and time. There are no parameters to this function, and it doesn't return anything meaningful back to the caller.

Note the use of my before the $date variable in line 44. The my keyword is used to limit the scope of the variable to within the Date function's curly braces. Code between curly braces is referred to as a block. Variables declared within a block are limited in scope to within the curly braces. In 49 and 50, the local variables $name and $subject are visible to all functions.

You can also declare variables with the local qualifier. The use of local allows a variable to be in scope for the current block as well as for other blocks of code called from within this block. Thus, a local $x declared within one block is visible to all subsequent blocks called from within this block and can be referenced. In the following sample code, the ToTitled function's $name variable can be accessed but not the data in $iphone:

1 sub Letter::ToTitled {
2             local($name) = shift;
3             my($phone) = shift;

Subroutines and Passing Parameters

The sample code for Letter.pm showed how to extract one parameter at a time. The subroutine To() takes two parameters to set up the header for the memo.

Using functions within a module is not any different than using and defining Perl modules within the same code file. Parameters are passed by reference unless otherwise specified. Multiple arrays passed into a subroutine, if not explicitly dereferenced using the backslash, are concatenated.

The @_ input array in a function is always an array of scalar values. Passing values by reference is the preferred way in Perl to pass a large amount of data into a subroutine. (See Chapter 3, "References.")

Another Sample Module: Finance

The Finance module, shown in Listing 4.3, is used to provide simple calculations for loan values. Using the Finance module is straightforward. All the functions are written with the same parameters, as shown in the formula for the functions.

Let's look at how the future value of an investment can be calculated. For example, if you invest some dollars, $pv, in a bond that offers a fixed percentage rate, $r, applied at known intervals for $n time periods, what is the value of the bond at the time of its expiration? In this case, you'll be using the following formula:

$fv = $pv * (1+$r) ** $n ;

The function to get the future value is declared as FutureValue. Refer to Listing 4.3 to see how to use it.


Listing 4.3. Using the Finance module.
 1 #!/usr/bin/perl -w
 2
 3 push(@Inc,'pwd');
 4 use Finance;
 5
 6 $loan = 5000.00;
 7 $apr =  3.5;   #  APR
 8 $year = 10;   #  in years.
 9
10 # ----------------------------------------------------------------
11 # Calculate the value at the end of the loan if interest
12 # is applied every year.
13 # ----------------------------------------------------------------
14 $time = $year;
15 $fv1 = Finance::FutureValue($loan,$apr,$time);
16 print "\n If interest is applied at end of year";
17 print "\n The future value for a loan of \$" . $loan . "\n";
18 print " at an APR of ", $apr , " for ",  $time, " years";
19 printf " is %8.2f \n" , $fv1;
20
21 # ----------------------------------------------------------------
22 # Calculate the value at the end of the loan if interest
23 # is applied every month.
24 # ----------------------------------------------------------------
25 $rate = $apr / 12;   # APR
26 $time = $year * 12; # in months
27 $fv2 = Finance::FutureValue($loan,$rate,$time);
28
29 print "\n If interest is applied at end of each month";
30 print "\n The future value for a loan of \$" . $loan . "\n";
31 print " at an APR of ", $apr , " for ",  $time, " months";
32 printf " is %8.2f \n" , $fv2;
33
34 printf "\n The difference in value is %8.2f", $fv2 - $fv1;
35 printf "\n Therefore by applying interest at shorter time periods";
36 printf "\n we are actually getting more money in interest.\n";

Here is sample input and output of Listing 4.3.

$ testme

 If interest is applied at end of year
 The future value for a loan of $5000
 at an APR of 3.5 for 10 years is  7052.99

 If interest is applied at end of each month
 The future value for a loan of $5000
 at an APR of 3.5 for 120 months is  7091.72

 The difference in value is    38.73
 Therefore by applying interest at shorter time periods
 we are actually getting more money in interest.

The revelation in the output is the result of the comparison of values between $fv1 and $fv2. The $fv1 value is calculated with the application of interest once every year over the life of the bond. $fv2 is the value if the interest is applied every month at the equivalent monthly interest rate.

The Finance.pm package is shown in Listing 4.4 in its early development stages.


Listing 4.4. The Finance.pm package.
  1 package Finance;
  2
  3 require Exporter;
  4 @ISA = (Exporter);
  5
  6 =head1 Finance.pm
  7
  8 Financial Calculator - Financial calculations made easy with Perl
  9
 10 =head 2
 11 use Finance;
 12
 13 $pv = 10000.0;
 14
 15 $rate = 12.5 / 12; # APR per month.
 16
 17 $time = 360 ;      # months for loan to mature
 18
 19 $fv = FutureValue();
 20
 21 print $fv;
 22
 23 =cut
 24
 25 @EXPORT = qw(  FutureValue,
 26                         PresentValue,
 27                         FVofAnnuity,
 28                         AnnuityOfFV,
 29                         getLastAverage,
 30                         getMovingAverage,
 31                         SetInterest);
 32
 33 #
 34 # Globals, if any
 35 #
 36
 37 local $defaultInterest = 5.0;
 38
 39 sub Finance::SetInterest($) {
 40             my $rate = shift(@_);
 41             $defaultInterest = $rate;
 42             printf "\n \$defaultInterest = $rate";
 43 }
 44
 45 # --------------------------------------------------------------------
 46 # Notes:
 47 # 1. The interest rate $r is given in a value of [0-100].
 48 # 2. The $n given in the terms is the rate at which the interest
 49 #          is applied.
 50 #
 51 # --------------------------------------------------------------------
 52
 53 # --------------------------------------------------------------------
 54 # Present value of an investment given
 55 # fv - a future value
 56 # r  - rate per period
 57 # n  - number of period
 58 # --------------------------------------------------------------------
 59 sub Finance::FutureValue($$$) {
 60             my ($pv,$r,$n) = @_;
 61             my $fv = $pv * ((1 + ($r/100)) ** $n);
 62             return $fv;
 63 }
 64
 65 # --------------------------------------------------------------------
 66 # Present value of an investment given
 67 # fv - a future value
 68 # r  - rate per period
 69 # n  - number of period
 70 # --------------------------------------------------------------------
 71 sub Finance::PresentValue($$$) {
 72             my $pv;
 73             my ($fv,$r,$n) = @_;
 74             $pv = $fv / ((1 + ($r/100)) ** $n);
 75             return $pv;
 76
 77 }
 78
 79 # --------------------------------------------------------------------
 80 # Get the future value of an annuity given
 81 # mp - Monthly Payment of Annuity
 82 # r  - rate per period
 83 # n  - number of period
 84 # --------------------------------------------------------------------
 85
 86 sub FVofAnnuity($$$) {
 87             my $fv;
 88             my $oneR;
 89             my ($mp,$r,$n) = @_;
 90
 91             $oneR = ( 1 + $r) ** $n;
 92             $fv = $mp * ( ($oneR - 1)/ $r);
 93             return $fv;
 94 }
 95
 96 # --------------------------------------------------------------------
 97 # Get the annuity from the following bits of information
 98 # r  - rate per period
 99 # n  - number of period
100 # fv - Future Value
101 # --------------------------------------------------------------------
102
103 sub AnnuityOfFV($$$) {
104             my $mp; # mp - Monthly Payment of Annuity
105             my $oneR;
106             my ($fv,$r,$n) = @_;
107
108             $oneR = ( 1 + $r) ** $n;
109             $mp = $fv * ( $r/ ($oneR - 1));
110             return $mp;
111 }
112
113 # --------------------------------------------------------------------
114 # Get the average of the last "n" values in an array.
115 # --------------------------------------------------------------------
116 # The last $count number of elements from the array in @values
117 # The total number of elements in @values is in $number
118 #
119 sub getLastAverage($$@) {
120             my ($count, $number, @values) =  @_;
121             my $i;
122
123             my $a = 0;
124             return 0 if ($count == 0);
125             for ($i     = 0; $i< $count; $i++) {
126                         $a += $values[$number - $i - 1];
127                         }
128             return $a / $count;
129             }
130
131 # --------------------------------------------------------------------
132 # Get a moving average of the values.
133 # --------------------------------------------------------------------
134 # The window size is the first parameter, the number of items in the
135 # passed array is next. (This can easily be calculated within the
136 # function using the scalar() function, but the subroutine shown here
137 # is also being used to illustrate how to pass pointers.) The reference to the
138 # array of values is passed next, followed by a reference to the place
139 # the return values are to be stored.
140 #
141 sub getMovingAve($$\@\@) {
142             my ($count, $number, $values, $movingAve) =  @_;
143             my $i;
144             my $a = 0;
145             my $v = 0;
146
147             return 0 if ($count == 0);
148             return -1 if ($count > $number);
149             return -2 if ($count < 2);
150
151             $$movingAve[0] = 0;
152             $$movingAve[$number - 1] = 0;
153             for ($i=0; $i<$count;$i++) {
154                         $v = $$values[$i];
155                         $a += $v / $count;
156                         $$movingAve[$i] = 0;
157                         }
158             for ($i=$count; $i<$number;$i++) {
159                         $v = $$values[$i];
160                         $a += $v / $count;
161                         $v = $$values[$i - $count - 1];
162                         $a -= $v / $count;
163                         $$movingAve[$i] = $a;
164                         }
165             return      0;
166             }
167
168 1;

Look at the declaration of the function FutureValue with ($$$). The three dollar signs together signify three scalar numbers being passed into the function. This extra scoping is present for validating the type of the parameters passed into the function. If you were to pass a string instead of a number into the function, you would get a message very similar to this one:

Too many arguments for Finance::FutureValue at ./f4.pl line 15, near "$time)"
Execution of ./f4.pl aborted due to compilation errors.

The use of prototypes when defining functions prevents you from sending in values other than what the function expects. Use @ or % to pass in an array of values. If you are passing by reference, use \@ or \% to show a scalar reference to an array or hash, respectively. If you do not use the backslash, all other types in the argument list prototype are ignored. Other types of disqualifiers include an ampersand for a reference to a function, an asterisk for any type, and a semicolon to indicate that all other parameters are optional.

Now, let's look at the lastMovingAverage function declaration, which specifies two integers in the front followed by an array. The way the arguments are used in the function is to assign a value to each of the two scalars, $count and $number, whereas everything else is sent to the array. Look at the function getMovingAverage() to see how two arrays are passed in order to get the moving average on a list of values.

The way to call the getMovingAverage function is shown in Listing 4.5.


Listing 4.5. Using the moving average function.
 1 #!/usr/bin/perl -w
 2
 3 push(@Inc,'pwd');
 4 use Finance;
 5
 6 @values = ( 12,22,23,24,21,23,24,23,23,21,29,27,26,28 );
 7 @mv = (0);
 8 $size = scalar(@values);
 9 print "\n Values to work with = { @values } \n";
10 print " Number of values = $size \n";
11
12 # ----------------------------------------------------------------
13 # Calculate the average of the above function
14 # ----------------------------------------------------------------
15 $ave = Finance::getLastAverage(5,$size,@values);
16 print "\n Average of last 5 days = $ave \n";
17
18 Finance::getMovingAve(5,$size,@values,@mv);
19 print "\n Moving Average with 5 days window = \n { @mv } \n";

Here's the output from Listing 4.5:

Values to work with = { 12 22 23 24 21 23 24 23 23 21 29 27 26 28 }
Number of values = 14

Average of last 5 days = 26.2

Moving Average with 5 days window =
{ 0 0 0 0 0 19.4 21.8 22 22 21.4 23 23.8 24.2 25.2 }

The getMovingAverage() function takes two scalars and then two references to arrays as scalars. Within the function, the two scalars to the arrays are dereferenced for use as numeric arrays. The returned set of values is inserted in the area passed in as the second reference. Had the input parameters not been specified with \@ for each referenced array, the $movingAve array reference would have been empty and would have caused errors at runtime. In other words, the following declaration is not correct:

sub getMovingAve($$@@)

The resulting spew of error messages from a bad function prototype is as follows:

Use of uninitialized value at Finance.pm line 128.
Use of uninitialized value at Finance.pm line 128.
Use of uninitialized value at Finance.pm line 128.
Use of uninitialized value at Finance.pm line 128.
Use of uninitialized value at Finance.pm line 128.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.
Use of uninitialized value at Finance.pm line 133.
Use of uninitialized value at Finance.pm line 135.

Values to work with = { 12 22 23 24 21 23 24 23 23 21 29 27 26 28 }
Number of values = 14

Average of last 5 days = 26.2

Moving Average with 5 days window =
{ 0 }

This is obviously not the correct output. Therefore, it's critical that you pass by reference when sending more than one array.

Global variables for use within the package can also be declared. Look at the following segment of code from the Finance.pm module to see what the default value of the Interest variable would be if nothing was specified in the input. (The current module requires the interest to be passed in, but you can change this.)

Here's a little snippet of code that can be added to the end of the program shown in Listing 4.5 to add the ability to set interest rates.

20 local $defaultInterest = 5.0;
21 sub Finance::SetInterest($) {
22             my $rate = shift(@_);
23             $rate *= -1 if ($rate < 0);
24             $defaultInterest = $rate;
25             printf "\n \$defaultInterest = $rate";
26 }

The local variable $defaultInterest is declared in line 20. The subroutine SetInterest to modify the rate is declared in lines 21 through 26. The $rate variable uses the values passed into the subroutine and simply assigns a positive value for it. You can always add more error checking if necessary.

To access the defaultInterest variable's value, you could define either a subroutine that returns the value or refer to the value directly with a call to the following in your application program:

$Finance::defaultInterest;

Returned Values from Subroutines in a Package

The variable holding the return value from the module function is declared as my variable. The scope of this variable is within the curly braces of the function only. When the called subroutine returns, the reference to my variable is returned. If the calling program uses this returned reference somewhere, the link counter on the variable is not zero; therefore, the storage area containing the returned values is not freed to the memory pool. Thus, the function that declares

my $pv

and then later returns the value of $pv returns a reference to the value stored at that location. If the calling routine performs a call like this one:

Finance::FVofAnnuity($monthly,$rate,$time);

there is no variable specified here into which Perl stores the returned reference; therefore, any returned value (or a list of values) is destroyed. Instead, the call with the returned value assigned to a local variable, such as this one:

$fv = Finance::FVofAnnuity($monthly,$rate,$time);

maintains the variable with the value. Consider the example shown in Listing 4.6, which manipulates values returned by functions.


Listing 4.6. Sample usage of the my function.
 1 #!/usr/bin/perl -w
 2
 3 push(@Inc,'pwd');
 4 use Finance;
 5
 6 $monthly = 400;
 7 $rate = 0.2;   # i.e. 6 % APR
 8 $time = 36;    # in months
 9
10 print "\n# ------------------------------------------------";
11 $fv = Finance::FVofAnnuity($monthly,$rate,$time);
12 printf "\n For a monthly %8.2f at a rate of %%%6.2f for %d periods",
13                         $monthly, $rate, $time;
14 printf "\n you get a future value of %8.2f ", $fv;
15
16 $fv *= 1.1; # allow 10 % gain in the house value.
17
18 $mo = Finance::AnnuityOfFV($fv,$rate,$time);
19
20 printf "\n To get 10 percent more at the end, i.e. %8.2f",$fv;
21 printf "\n you need a monthly payment value of %8.2f",$mo,$fv;
22
23 print "\n# ------------------------------------------------ \n";

Here is sample input and output for this function:

$ testme
# ------------------------------------------------
 For a monthly   400.00 at a rate of %  0.20 for 36 periods
 you get a future value of 1415603.75
 To get 10 percent more at the end, i.e. 1557164.12
 you need a monthly payment value of   440.00
# ------------------------------------------------

Multiple Inheritance

Modules implement classes in a Perl program that uses the object-oriented features of Perl. Included in object-oriented features is the concept of inheritance. (You'll learn more on the object-oriented features of Perl in Chapter 5, "Object-Oriented Programming in Perl.") Inheritance means the process with which a module inherits the functions from its base classes. A module that is nested within another module inherits its parent modules' functions. So inheritance in Perl is accomplished with the :: construct. Here's the basic syntax:

SuperClass::NextSubClass:: ... ::ThisClass.

The file for these is stored in ./SuperClass/NextSubClass/…. Each double colon indicates a lower-level directory in which to look for the module. Each module, in turn, declares itself as a package with statements like the following:

package SuperClass::NextSubClass;
package SuperClass::NextSubClass::EvenLower;

For example, say that you really want to create a Money class with two subclasses, Stocks and Finance. Here's how to structure the hierarchy, assuming you are in the /usr/lib/perl5 directory:

  1. Create a Money directory under the /usr/lib/perl5 directory.
  2. Copy the existing Finance.pm file into the Money subdirectory.
  3. Create the new Stocks.pm file in the Money subdirectory.
  4. Edit the Finance.pm file to use the line package Money::Finance instead of package Finance;.
  5. Edit scripts to use Money::Finance as the subroutine prefix instead of Finance::.
  6. Create a Money.pm file in the /usr/lib/perl5 directory.

The Perl script that gets the moving average for a series of numbers is presented in Listing 4.7.


Listing 4.7. Using inheriting modules.
 1  #!/usr/bin/perl -w
 2  $aa = 'pwd';
 3  $aa .= "/Money";
 4  push(@Inc,$aa);
 5  use Money::Finance;
 6  @values = ( 12,22,23,24,21,23,24,23,23,21,29,27,26,28 );
 7  @mv = (0);
 8  $size = scalar(@values);
 9  print "\n Values to work with = { @values } \n";
10 print " Number of values = $size \n";
11 # ----------------------------------------------------------------
12 # Calculate the average of the above function
13 # ----------------------------------------------------------------
14 $ave = Money::Finance::getLastAverage(5,$size,@values);
15 print "\n Average of last 5 days = $ave \n";
16 Money::Finance::getMovingAve(5,$size,@values,@mv);
17 # foreach $i (@values) {
18             # print "\n Moving with 5 days window = $mv[$i] \n";
19 # }
20 print "\n Moving Average with 5 days window = \n { @mv } \n";

Lines 2 through 4 add the path to the Money subdirectory. The use statement in line 5 now addresses the Finance.pm file in the ./Money subdirectory. The calls to the functions within Finance.pm are now called with the prefix Money::Finance:: instead of Finance::. Therefore, a new subdirectory is shown via the :: symbol when Perl is searching for modules to load.

The Money.pm file is not required. Even so, you should create a template for future use. Actually, the file would be required to put any special requirements for initialization that the entire hierarchy of modules uses. The code for initialization is placed in the BEGIN() function. The sample Money.pm file is shown in Listing 4.8.


Listing 4.8. The superclass module for Finance.pm.
1 package Money;
2 require Exporter;
3
4             BEGIN {
5             printf "\n Hello! Zipping into existence for you\n";
6             }
7 1;

To see the line of output from the printf statement in line 5, you have to insert the following commands at the beginning of your Perl script:

use Money;
use Money::Finance;

To use the functions in the Stocks.pm module, you use this line:

use Money::Stocks;

The Stocks.pm file appears in the Money subdirectory and is defined in the same format as the Finance.pm file, with the exceptions that use Stocks is used instead of use Finance and the set of functions to export is different.

The Perl Module Libraries

A number of modules are included in the Perl distribution. Check the /usr/lib/perl5/lib directory for a complete listing after you install Perl. There are two kinds of modules you should know about and look for in your Perl 5 release, Pragmatic and Standard modules.

Pragmatic modules, which are also like pragmas in C compiler directives, tend to affect the compilation of your program. They are similar in operation to the preprocessor elements of a C program. Pragmas are locally scoped so that they can be turned off with the no command. Thus, the command

no POSIX ;

turns off the POSIX features in the script. These features can be turned back on with the use statement.

Standard modules bundled with the Perl package include several functioning packages of code for you to use. Refer to appendix B, "Perl Module Archives," for a complete list of these standard modules.

To find out all the .pm modules installed on your system, issue the following command. (If you get an error, add the /usr/lib/perl5 directory to your path.)

find /usr/lib/perl5 -name perl "*.pm" -print

Extension Modules

Extension modules are written in C (or a mixture of Perl and C) and are dynamically loaded into Perl if and when you need them. These types of modules for dynamic loading require support in the kernel. Solaris lets you use these modules. For a Linux machine, check the installation pages on how to upgrade to the ELF format binaries for your Linux kernel.

What Is CPAN?

The term CPAN (Comprehensive Perl Archive Network) refers to all the hosts containing copies of sets of data, documents, and Perl modules on the Net. To find out about the CPAN site nearest you, search on the keyword CPAN in search engines such as Yahoo!, AltaVista, or Magellan. A good place to start is the www.metronet.com site.

Summary

This chapter introduced you to Perl 5 modules and described what they have to offer. A more comprehensive list is found on the Internet via the addresses shown in the Web sites http://www.metronet.com and http://www.perl.com.

A Perl package is a set of Perl code that looks like a library file. A Perl module is a package that is defined in a library file of the same name. A module is designed to be reusable. You can do some type checking with Perl function prototypes to see whether parameters are being passed correctly. A module has to export its functions with the @EXPORT array and therefore requires the Exporter module. Modules are searched for in the directories listed in the @Inc array.

Obviously, there is a lot more to writing modules for Perl than what is shown in this chapter. The simple examples in this chapter show you how to get started with Perl modules. In the rest of the book I cover the modules and their features, so hang in there.

I cover Perl objects, classes, and related concepts in Chapter 5.

Chapter 5

Object-Oriented Programming in Perl


CONTENTS


This chapter covers the object-oriented programming (OOP) features of Perl. You'll see how to construct objects in Perl as well as how to use the OOP features offered by Perl. You'll also learn about inheritance, overriding of methods, and data encapsulation.

Introduction to Modules

A module is also referred to as a package. Objects in Perl are based on references to data items within a package. An object in Perl is simply a reference to something that knows which class it belongs to. For more information, you can consult the perlmod and perlobj text files at http://www.metronet.com. These files are the primary source of information on the Internet about Perl modules.

When performing object-oriented programming with other languages, you declare a class and then create (instantiate) objects of that class. All objects of a particular class behave in a certain way, which is governed by the methods of the class to which the object belongs. You create classes by defining new ones or by inheriting properties from an existing class.

For programmers already familiar with object-oriented principles, this will all seem familiar. Perl is, and pretty much always has been, an object-oriented language. In Perl 4, the use of packages gave you different symbol tables from which to choose your symbol names. In Perl 5, the syntax has changed a bit, and the use of objects has been formalized somewhat.

The Three Important Rules

The following three declarations are extremely important to understanding how objects, classes, and methods work in Perl. Each is covered in more detail in the rest of the chapter.

  • A class is a Perl package. The package for a class provides the methods for objects.
  • A method is a Perl subroutine. The only catch with writing methods is that the name of the class is the first argument.
  • An object in Perl is a reference to some data item within the class.

Classes in Perl

This point is important enough to repeat: A Perl class is simply a package. When you see a Perl document referring to a class, think package. Also, both package and module mean the same thing. For C programmers, it's easy to use :: notation for classes and -> for pointing to structure elements and class members.

One of the key features of OOP in any object-oriented language is that of inheritance. This is where new classes can be created by adding new features to existing classes. The inheritance feature offered by Perl is not the same as you would expect in other object-oriented languages. Perl classes inherit methods only. You have to use your own mechanisms to implement data inheritance.

Because each class is a package, it has its own name space with its own associative array of symbol names. Therefore, each class can have its own independent set of symbol names. As with package references, you can address the variables in a class with the ' operator. Therefore, members of a class are addressed as $class'$member. In Perl 5, you can use the double colon instead of ' to get the reference. Thus, $class'member is the same as $class::$member.

Creating a Class

This section covers the requisite steps to take when you create a new class. This chapter covers the semantics in the creation of a very simple class, called Invest, for printing the required parts of a simple Java application source code file. No, you will not become a Java expert, nor does this package require you to have any experience in Java. The concept of creating a class is what you're concerned with. For example, this chapter could just as easily have been on creating a phone book application, but how many such examples have you seen to date in books? You'll use a different example this time.

First of all, you need to create a package file called Cocoa.pm. The .pm extension is the default extension for packages; it stands for Perl Module. A module is a package, and a package is a class for all practical purposes. Before you do anything else to the file, place a 1; in the file. As you add more lines to the package file, make sure you have 1; as the last line of this file. The basic structure of the file is shown in Listing 5.1.


Listing 5.1. The package template.
 1 package Cocoa;
 2 #
 3 # Put "require" statements in for all required,imported packages
 4 #
 5
 6 #
 7 # Just add code here
 8 #
 9
10 1;   # terminate the package with the required 1;

It's important that you remember to always keep the required 1; line as the last of the package file. This statement is required for all packages in Perl.

Now you're ready to add your methods to this package and make this a class. The first method you would probably want to add is the new() method, which should be called to create a new object. The new() method is the constructor for the object.

Blessing a Constructor

A constructor is a Perl subroutine in a class that returns a reference to something that has the class name attached to it. Connecting a class name with a reference is referred to as blessing an object. The function to establish this connection is bless. Here's the syntax for the bless function:

bless YeReference [,classname]

YeReference is the reference to the object being blessed. classname is optional and specifies the name of the package from which this object will have methods. If classname is not specified, the name of the currently used package is used instead. Thus, the way to create a constructor in Perl is to return a reference to an internal structure that has been blessed into this class. The initial Cocoa.pm package is shown in Listing 5.2.


Listing 5.2. The first pass at the new() function.
1 package Cocoa;
2
3 sub new {
4     my $this = {};  # Create an anonymous hash, and #self points to it.
5     bless $this;       # Connect the hash to the package Cocoa.
6     return $this;     # Return the reference to the hash.
7     }
8
9 1;

The {} constructs a reference to an empty hash. The returned value to this hash is assigned to the local variable $this. The bless() function takes that reference to $this and tells the object it references that it's now Cocoa and then returns the reference.

The returned value to the calling function now refers to this anonymous hash. On returning from the new() function, the $this reference is destroyed, but the calling function keeps a reference to this hash. Therefore, the reference count to the hash will not be zero, and Perl keeps the hash in memory. You do not have to keep the hash in memory, but it's nice to have it around for reference later.

To create an object, you make a call like this one:

$cup = new Cocoa;

The code is Listing 5.3 shows you how to use this package to create the constructor.


Listing 5.3. Using the Cocoa class.
1 #!/usr/bin/perl
2 push (@Inc,'pwd');
3 use Cocoa;
4 $cup = new Cocoa;

The first line refers to the location of the Perl interpreter to use. Your Perl interpreter may be located at /usr/local/bin/perl or wherever you installed it.

In line 2, the local directory is added to the search path in @Inc for the list of paths to use when the Perl interpreter is looking for a package. You can create your module in a different directory and specify the path explicitly there. Had I created the package in /home/khusain/test/scripts/, line 2 would read as this:

push (@Inc,"/home/khusain/test/scripts");

In line 3 you include the package Cocoa.pm to get all the functionality in your script. The use statement asks Perl to look in the @Inc path for a file called Cocoa.pm and include it in the copy of the source file being parsed. The use statement is required if you want to work with a class.

Line 4 creates the Cocoa object by calling the new function on it. Now comes the beautiful (yet confusing and powerful) part of Perl. There is more than one way to do this. You can rewrite line 3 as this:

$cup = Cocoa->new();

Or if you are a C-programming hack, you can use the double colons (::) to force the function new() from the Cocoa package. Thus, line 4 could also be written as this:

$cup = Cocoa::new();

There is nothing preventing you from adding more code in the constructor than what is shown here. For this Cocoa.pm module, if you would like to print a disclaimer when each object is created, you can. For example, you can add statements like these in the constructor for debugging purposes:

print "Hey! I am alive" if ($debuglevel > 1);

This way, you can set a variable $debuglevel to a numeric value of  2 or greater in your program to display the debug message shown every time a new Cocoa object is created. Usually, you would like to initialize variables in a constructor before any processing is done with the object. For example, if the object you are constructing will be writing a log to disk, you would like to open the file it's writing to in the constructor. (Of course, you would also have to remember to close the file when the object is destroyed by placing the close() function call in the destructor.)

Here's what the constructor looks like for the Cocoa.pm module:


Listing 5.4. Expanding the constructor.
1 sub new {
2     my $this = {};
3     print "\n /* \n ** Created by Cocoa.pm \n ** Use at own risk";
4     print "\n ** Did this code even get past the javac compiler? ";
5     print "\n **/ \n";
6     bless $this;
7     return $this;
8     }

The output from running the test script (called testme) on this bare bones class would look like this:

/*
** Created by Cocoa.pm
** Use at own risk
** Did this code even get past the javac compiler?
**/

Now, regardless of which of these three methods you used to create the Cocoa object, you should see the same output.

Some comments have been added at the start of the file with some print statements. You can just as easily call other functions in or outside of the package to get more initialization functionality. You should allow any given class to be inherited, however. You should be able to call the new operator with the class name as the first parameter. This ability to parse the class name from the first argument causes the class to be inherited. Thus, the new function becomes more or less like the function shown in Listing 5.5.


Listing 5.5. The improved new() function with class name recognition.
1 sub new {
2     my $class = shift;        # Get the request class name
3     my $this = {};
4     bless $this, $class       # Use class name to bless() reference
5     $this->doInitialization();
6     return $this;
7 }

However, this method will force your class users to make calls in one of three ways:

  • Cocoa::new()
  • Cocoa->new()
  • new Cocoa;

What if you wanted to use a reference to the object instead, such as $obj->new()? The doInitialization() method used will be of whatever $class the object is blessed into. Listing 5.6 uses the function call ref() to determine if the class exists per se. The ref() function returns true if the item passed to it is a reference and null if not a reference. In the case of classes, the true value returned from the ref() function is the name of the class.


Listing 5.6. The new() function with the capability to inherit classes.
 1 sub new {
 2     my $this = shift;                # Get the class name
 3     my $class = ref($this) || $this;
 
   # If class exists, use it  else use reference.
 4     my $this = {};
 5
 6     bless $this, $class
 7     $this->doInitialization();
 8
 9     return $this;
10 }

Within the class package, the methods typically deal with the reference as an ordinary reference. Outside the class package, the reference is generally treated as an opaque value that may only be accessed through the class's methods. You can access the values within a package directly, but it's not a good idea to do so because such access defeats the whole purpose of object orientation.

It's possible to bless a reference object more than once. However, the caveat to such a task is that the new class must get rid of the object at the previously blessed reference. For C and Pascal programmers, this is like assigning a pointer to malloc-ed memory and then assigning the same pointer to another location without freeing the previous location. In effect, a Perl object must belong to one and only one class at a time.

So what's the real difference between an object and a reference? Perl objects are blessed to belong to a class. References are not blessed; if they are, they belong to a class and are objects. Objects know to which class they belong. References do not have a class, as such, to which they belong.

Instance Variables

The arguments to a new() function for a constructor are called instance variables. Instance variables are used for initializing each instance of an object as it is created. For example, the new() function could expect a name for each new instance of an object created. Using instance variables allows the customization of each object as it is created.

Either an anonymous array or an anonymous hash can be used to hold instance variables. To use a hash to store the parameters coming in, you would use code similar to what is shown in Listing 5.7.


Listing 5.7. Using instance variables.
1 sub new {
2         my $type = shift;
3         my %parm = @_;
4         my $this = {};
5         $this->{'Name'} = $parm{'Name'};
6         $this->{'x'}  = $parm{'x'};
7         $this->{'y'}  = $parm{'y'};
8         bless $this, $type;
9 }

You can also use an array instead of a hash to store the instance variables. See Listing 5.8 for an example.


Listing 5.8. Using hashes for instance variables.
1 sub new {
2         my $type = shift;
3         my %parm = @_;
4         my $this = [];
5         $this->[0] = $parm{'Name'};
6         $this->[1] = $parm{'x'};
7         $this->[2] = $parm{'y'};
8         bless $this, $type;
9 }

To construct an object, you can pass the parameters with the new() function call. For example, the call to create the Cocoa object becomes this:

$mug = Cocoa::new( 'Name' => 'top',
  'x' => 10,
  'y' => 20 );

The => operator is just like the comma operator, although it's a bit more readable. You can write this code with commas instead of the => operator if that's what you prefer.

To access the variables as you would any other data members, you can use the following statements:

print "Name=$mug->{'Name'}\n";
print "x=$mug->{'x'}\n";

print "y=$mug->{'y'}\n";

Methods

A method in a Perl class is simply a Perl subroutine. Perl doesn't provide any special syntax for method definition. A method expects its first argument to be the object or package on which it is being invoked. Perl has just two types of methods: static and virtual.

A static method expects a class name as the first argument. A virtual  method expects a reference to an object as the first argument. Therefore, the way each method handles the first argument determines whether the method is static or virtual.

A static method applies functionality to the class as a whole because it uses the name of the class. Therefore, functionality in static methods is applicable to all objects of the class. Generally, static methods ignore the first argument because they already know which class they are in. Therefore, constructors are static methods.

A virtual method expects a reference to an object as its first argument. Typically the first thing a virtual method does is to shift the first argument to a self or this variable; it then uses that shifted value as an ordinary reference. For example, consider the code in Listing 5.9.


Listing 5.9. Listing data items in a class.
1 sub nameLister {
2     my $this = shift;
3     my ($keys ,$value );
4     while (($key, $value) = each (%$this)) {
5         print "\t$key is $value.\n";
6     }
7 }

Line 2 in this listing is where the $this variable is set to point to the object. In line 4, the $this array is dereferenced at every $key location.

Exporting Methods with Exporter.pm

If you tried to invoke the Cocoa.pm package right now, you would get an error message from Perl at compile time about the methods not being found. This is because the Cocoa.pm methods have not been exported. To export these functions, you need the Exporter module. This is done by adding the following lines to the start of code in the package:

require Exporter;
@ISA = qw(Exporter);

These two lines force the inclusion of the Exporter.pm module and then set the @ISA array with the name of the Exporter class for which to look.

To export your own class methods, you would have to list them in the @EXPORT array. For example, to export the closeMain and declareMain methods, you would use the following statement:

@EXPORT(declareMain, closeMain);

Inheritance in a Perl class is accomplished via the @ISA array. The @ISA array does not have to be defined in every package; however, when it is defined, Perl treats it as a special array of directory names. This is akin to the @Inc array where directories are searched for files to include. In the case of the @ISA array, the paths define the classes (packages) and where to look for other class packages, if a method is not found in the current package. Thus, the @ISA array contains the names of the base classes from which the current class inherits. The search is done in the order in which the classes are listed in the @ISA arrays.

All methods called by a class do have to belong to the same class or to the base classes defined in the @ISA array. If a method isn't found in @ISA array, Perl looks for an AUTOLOAD() routine. This routine is defined as sub in the current package and is optional. To use the AUTOLOAD function, you have to use the autoload.pm package with the use Autoload; statement. The AUTOLOAD function tries to load the called function from the installed Perl libraries. If the AUTOLOAD call also fails, Perl makes one final try at the UNIVERSAL class, which is the catch-all for all methods not defined elsewhere. Perl will generate an error about unresolved functions if this step also fails.

Here are some simple rules when exporting methods. First, export only those functions that you have to. Do not export every function in your module because you will be increasing the likelihood of conflicts with a program that is using your module. Use the @EXPORT_OK array instead of the @EXPORT array if you feel that the names of your methods may clash with those in an application. Choosing long, descriptive names for functions may help eliminate problems with synonymous variable names.

Second, if you are going to have multiple versions of your module, consider setting a variable called $VERSION in your module to a numeric string; for example, "2.11" or something. This version number will be exported for you automatically and can be used with the require statement. Remember to use two digits for all integers in the version numbers because "1.10" is interpreted lower than "1.9" but higher than "1.09". You will see some modules or programs with a statement of the following form:

require 5.001;

The statement above indicates that Perl version 5.001 or greater is required. The same analogy can be used for your module with a call to a function called require_version of the following form:

$moduleName->require_version($value);

A returned value of true will indicate that it's okay to proceed. A returned value of false will indicate that the version number of the module is less than what is specified in the $value.

Invoking Methods

There are two ways to invoke a method for an object: one via a reference to an object (virtual) and the other via explicitly referring to the class name (static). A method has to be exported for you to be able to call it. Let's add a few more methods to the Cocoa class to get the file to look like the one shown in Listing 5.10.


Listing 5.10. Adding methods to the Cocoa class.
 1 package Cocoa;
 2 require Exporter;
 3
 4 @ISA = qw(Exporter);
 5 @EXPORT = qw(setImports, declareMain, closeMain);
 6
 7 #
 8 # This routine creates the references for imports in Java functions
 9 #
10 sub setImports{
11     my $class = shift @_;
12     my @names = @_;
13
14     foreach (@names) {
15      print "import " .  $_ . ";\n";
16      }
17     }
18
19 #
20 # This routine declares the main function in a Java script
21 #
22 sub declareMain{
23     my $class = shift @_;
24     my ( $name, $extends, $implements) = @_;
25
26      print "\n public class $name";
27      if ($extends) {
28           print " extends " . $extends;
29      }
30      if ($implements) {
31           print " implements " . $implements;
32      }
33    print " { \n";
34 }
35
36 #
37 # This routine declares the main function in a Java script
38 #
39 sub closeMain{
40    print "} \n";
41 }
42
43 #
44 #  This subroutine creates the header for the file.
45 #
46 sub new {
47     my $this = {};
48     print "\n /* \n ** Created by Cocoa.pm \n ** Use at own risk \n */ \n";
49     bless $this;
50     return $this;
51     }
52
53 1;

Now let's write a simple Perl script to use the methods for this class. Because you can only start and close the header, let's see how the code for a script to create a skeleton Java applet source looks. (See Listing 5.11.)


Listing 5.11. Using the methods just added in Listing 5.10.
1 #!/usr/bin/perl
2
3 use Cocoa;
4
5 $cup = new Cocoa;
6
7 $cup->setImports( 'java.io.InputStream', 'java.net.*');
8 $cup->declareMain( "Msg" , "java.applet.Applet", "Runnable");
9 $cup->closeMain();

What we are doing in this script is generating code for a Java applet called Msg, which extends the java.applet.Applet applet and implements functions that can be run. The function is called with a function $cup->... call. Lines 7 through 9 could be rewritten as functions, like this:

Cocoa::setImports($cup, 'java.io.InputStream', 'java.net.*');
Cocoa::declareMain($cup, "Msg" , "java.applet.Applet", "Runnable");
Cocoa::closeMain($cup);

This type of equivalence was shown in a previous section, "Blessing a Constructor." In both cases, the first parameter is the reference to the object itself. Running this test script generates the following output:

/*
** Created by Cocoa.pm
** Use at own risk
*/
import java.io.InputStream;
import java.net.*;

public class Msg extends java.applet.Applet implements Runnable {
}

There are a couple of points to note when calling the methods. If you have any arguments to a method, use parentheses if you are using the method -> (also known as indirect). The parentheses are required to include all the arguments with this statement:

$cup->setImports( 'java.io.InputStream', 'java.net.*');

However, this statement:

Cocoa::setImports($cup, 'java.io.InputStream', 'java.net.*');

can also be rewritten without parentheses.

Cocoa::setImports $cup, 'java.io.InputStream', 'java.net.*' ;

The choice is really yours as to how you intend to make your code readable for other programmers. Use parentheses if you feel that the code will be more readable.

Overrides

There are times when you'll want to specify which class' method to use, such as when the same-named method is specified in two different classes. For example, if the function grind is defined in both Espresso and Qava classes, you can specify which class' function to use with the use of the :: operator. These two calls:

$mess = Espresso::grind("whole","lotta","bags");
Espresso::grind($mess, "whole","lotta","bags");

use the call in Espresso, whereas the following calls use the grind() function in the Qava class:

$mess = Qava::grind("whole","lotta","bags");
Qava::grind($mess, "whole","lotta","bags");

Sometimes you want to call a method based on some action that the program you are writing has already taken. In other words, you want to use the Qava method for a certain condition and the Espresso method for another. In this case, you can use symbolic references to make the call to the required function. This is illustrated in the following example:

$method = $local ? "Qava::" : "Espresso::";
$cup->{$method}grind(@args);

Destructors

Perl tracks the number of links to objects. When the last reference to an object goes away, the object is automatically destroyed. This destruction of an object may occur after your code stops and the script is about to exit. For global variables, the destruction happens after the last line in your code executes.

If you want to capture control just before the object is freed, you can define a DESTROY() method in your class. Note the use of all capitals in the name. The DESTROY() method is called just before the object is released, allowing you to do any cleanup. The DESTROY() function does not call other DESTROY() functions. Perl doesn't do nested destruction for you. If your constructor reblessed a reference from one of your base classes, your DESTROY() may need to call DESTROY() for any base classes. All object references that are contained in a given object are freed and destroyed automatically when the current object is freed.

Normally, you don't have to define a DESTROY function. However, when you do need it, its form is as follows:

sub DESTROY {
#
# Add code here.
#
}

For most purposes, Perl uses a simple reference-based garbage collection system. The number of references to any given object at the time of garbage collection has to be greater than zero, or else the memory for that object is freed. When your program exits, an exhaustive search-and-destroy function in Perl does the garbage collection. Everything in the process is summarily deleted. In UNIX or UNIX-like systems, this may seem like a waste, but it is actually quite necessary in embedded systems or in a multithreaded environment.

Inheritance

Methods in classes are inherited with the use of the paths in the @ISA array. Variables have to be inherited and set up explicitly for inheritance. Let's say you define a new class called Bean.pm to include some of the functionality that another class, Coffee.pm, will inherit.

The example in this section demonstrates how to inherit instance variables from one class (also referred to as a superclass or base class). The steps in inheritance require calling the superclass's constructor and adding one's own instance variables to the new object.

In this example, the Coffee class is the class that inherits values from the base class Bean. The two files are called Coffee.pm and Bean.pm, respectively. The code for Bean.pm is shown in Listing 5.12.


Listing 5.12. The Bean.pm file.
 1 package Bean;
 2 require Exporter;
 3
 4 @ISA = qw(Exporter);
 5 @EXPORT = qw(setBeanType);
 6
 7 sub new {
 8     my $type = shift;
 9     my $this = {};
10     $this->{'Bean'} = 'Colombian';
11     bless $this, $type;
12     return $this;
13     }
14
15 #
16 # This subroutine sets the
17 sub setBeanType{
18     my ($class, $name) =  @_;
19     $class->{'Bean'} = $name;
20     print "Set bean to $name \n";
21     }
22 1;

In this listing, the $this variable sets a value in the anonymous hash for the 'Bean' class to be 'Colombian'. The setBeanType function method is also declared so that the item referred to by the word 'Bean' is set for any class that is sent in as an argument. Therefore, you can use this setBeanType function in other classes to set the value of any member whose name is 'Bean'.

The subroutine for resetting the value of 'Bean' uses the $class reference to get to the anonymous hash for the object. Remember that it is a reference to this anonymous hash that created the reference in the first place with the new() function.

The values in the Bean class are inherited by the Coffee class. The Coffee.pm file is shown in Listing 5.13.


Listing 5.13. Using inheritance.
 1    #
 2    # The Coffee.pm file to illustrate inheritance.
 3    #
 4    package Coffee;
 5    require Exporter;
 6    require Bean;
 7    @ISA = qw(Exporter, Bean);
 8    @EXPORT = qw(setImports, declareMain, closeMain);
 9     #
10     # set item
11     #
12     sub setCoffeeType{
13         my ($class,$name) =  @_;
14         $class->{'Coffee'} = $name;
15         print "Set coffee type to $name \n";
16         }
17     #
18     #  constructor
19     #
20     sub new {
21         my $type  = shift;
22         my $this  = Bean->new();     ##### <-- LOOK HERE!!! ####
23         $this->{'Coffee'} = 'Instant';  # unless told otherwise
24         bless $this, $type;
25         return $this;
26         }
27     1;

Note the use of the require Bean; statement at line 6. (See Chapter 4, "Introduction to Perl Modules," in the section titled "Using Perl Modules" for the reasons why the require statement is used instead of the use statement.) This line forces the inclusion of the Bean.pm file and all its related functions without importing the functions until compile time. Lines 12 through 16 define a subroutine to use when resetting the value of the local variable in $class->{'Coffee'}.

Look at the new() constructor for the Coffee class. The $this reference points to the anonymous hash returned by Bean.pm, not a hash created locally. In other words, the following statement creates an entirely different hash that has nothing to do with the hash created in the Bean.pm constructor.

my $this = {}; # This is not the way to do it for inheritance.
my $this = $theSuperClass->new(); # this is the way.

Listing 5.14 illustrates how to call these functions.


Listing 5.14. Using inheritance.
 1 #!/usr/bin/perl
 2 push (@Inc,'pwd');
 3 use Coffee;
 4 $cup = new Coffee;
 5 print "\n -------------------- Initial values ------------ \n";
 6 print "Coffee: $cup->{'Coffee'} \n";
 7 print "Bean: $cup->{'Bean'} \n";
 8 print "\n -------------------- Change Bean Type ---------- \n";
 9 $cup->setBeanType('Mixed');
10 print "Bean Type is now $cup->{'Bean'} \n";
11 print "\n ------------------ Change Coffee Type ---------- \n";
12 $cup->setCoffeeType('Instant');
13 print "Type of coffee: $cup->{'Coffee'} \n";

The initial values for the 'Bean' and 'Coffee' indexes in the anonymous hash for the object are printed first. The member functions are called to set the values to different names and are printed out.

Here is the output of the script:

-------------------- Initial values ------------
Coffee: Instant
Bean: Colombian

-------------------- Change Bean Type ----------
Set bean to Mixed
Bean Type is now Mixed

------------------ Change Coffee Type ----------
Set coffee type to Instant
Type of coffee: Instant

Methods can have several types of arguments. It's how you process the arguments that counts. For example, you can add the method shown in Listing 5.15 to the Coffee.pm module.


Listing 5.15. Variable-length lists of parameters.
1 sub makeCup {
2      my ($class, $cream, $sugar, $dope) = @_;
3      print "\n================================== \n";
4      print "Making a cup \n";
5      print "Add cream \n" if ($cream);
6      print "Add $sugar sugar cubes\n" if ($sugar);
7      print "Making some really addictive coffee ;-) \n" if ($dope);
8      print "================================== \n";
9 }

This function takes three arguments, but it processes them only if it sees them. To test this functionality, consider the Perl code shown in Listing 5.16.


Listing 5.16. Testing variable length lists.
 1    #!/usr/bin/perl
 2    push (@Inc,'pwd');
 3    use Coffee;
 4    $cup = new Coffee;
 5    #
 6    #  With no parameters
 7    #
 8    print "\n Calling  with no parameters: \n";
 9    $cup->makeCup;
10    #
11    #  With one parameter
12    #
13    print "\n Calling  with one parameter: \n";
14    $cup->makeCup('1');
15    #
16    #  With two parameters
17    #
18    print "\n Calling  with two parameters: \n";
19    $cup->makeCup(1,'2');
20    #
21    #  With all three parameters
22    #
23    print "\n Calling  with three parameters: \n";
24    $cup->makeCup('1',3,'1');

Line 9 calls the function with no parameters. In Line 14, the call is with one parameter. The parameters are passed either as strings or integers-something this particular method does not care about (see lines 19 and 24). However, some methods you write in the future may require this distinction.

Here's the output from this program:

 Calling  with no parameters:

==================================
Making a cup
==================================

 Calling  with one parameter:

==================================
Making a cup
Add cream
==================================

 Calling  with two parameters:

==================================
Making a cup
Add cream
Add 2 sugar cubes
==================================

 Calling with three parameters:

==================================
Making a cup
Add cream
Add 3 sugar cubes
Making some really addictive coffee ;-)
==================================

In any event, you can have default values in the function to set if the expected parameter is not passed in. Thus, the behavior of the method can be different depending on the number of arguments you pass into it.

Overriding Methods

Inheriting functionality from another class is beneficial in that you can get all the exported functionality of the base class in your new class. To see an example of how this works, let's add a function called printType in the Bean.pm class. Here's the subroutine:

sub printType {
    my $class =  shift @_;
    print "The type of Bean is $class->{'Bean'} \n";
}

Do not forget to update the @EXPORT array by adding the name of the function to export. The new statement should look like this:

@EXPORT = qw(setBeanType, printType, printType);

Next, call the printType function. The following three lines show three ways to call this function:

$cup->Coffee::printType();
$cup->printType();
$cup->Bean::printType();

The output from all three lines is the same:

The type of Bean is Mixed
The type of Bean is Mixed
The type of Bean is Mixed

Why is this so? Because there is no printType() function in the inheriting class, the printType() function in the base class is used instead. Naturally, if you want your own class to have its own printType function, you would define its own printType function.

In the Coffee.pm file, you would add the following lines to the end of the file:

#
# This routine prints the type of $class->{'Coffee'}
#
sub printType {
    my $class =  shift @_;
    print "The type of Coffee is $class->{'Coffee'} \n";
}

@EXPORT would also have to be modified to work with this function:

@EXPORT = qw(setImports, declareMain, closeMain, printType);

The output from the three lines now looks like this:

The type of Coffee is Instant
The type of Coffee is Instant
The type of Bean is Mixed

Now the base class function is called only when the Bean:: override is given. In the other cases, only the inherited class function is called.

What if you do not know what the base class name is or even where the name is defined. In this case, you can use the SUPER:: pseudoclass reserved word. Using the SUPER:: override allows you to call an overridden superclass method without actually knowing where that method is defined. The SUPER:: construct is meaningful only within the class.

If you're trying to control where the method search begins and you're executing in the class itself, you can use the SUPER:: pseudoclass, which says to start looking in your base class's @ISA list without having to explicitly name it.

$this->SUPER::function( ... argument list ... );

Therefore, instead of Bean::, you can use SUPER::. The call to the function printType() becomes this:

$cup->SUPER::printType();

Here's the output:

The type of Bean is Mixed

A Few Comments About Classes and Objects in Perl

One advertised strength of object-oriented languages is the ease with which new code can use old code. Packages and modules in Perl provide a great deal of data encapsulation. You are never really guaranteeing that a class inheriting your code will not attempt to access your class variables directly. They can if they really want to. However, this type of procedure is considered bad practice, and shame on you if you do it.

When writing a package, you should ensure that everything a method needs is available via the object or is passed as a parameter to the method. From within the package, access any global variables only through references passed in via methods.

For static or global data to be used by the methods, you have to define the context of the data in the base class using the local() construct. The subclass will then call the base class to get the data for it. On occasion, a subclass may want to override that data and replace it with new data. When this happens, the superclass may not know how to find the new copy of the data. In such cases, it's best to define a reference to the data and then have all base classes and subclasses modify the variable via that reference.

Finally, you'll see references to objects and classes like this:

use Coffee::Bean;

This code is interpreted to mean "look for Bean.pm in the Coffee subdirectory in all the directories in the @Inc array." So, if you were to move Bean.pm into the ./Coffee directory, all the previous examples would work with the new use statement. The advantage to this approach is that you have one file for the parent class in one directory and the files for each base class in their own sub-directories. It helps keep code organized. Therefore, to have a statement like this:

use Another::Sub::Menu;

you would see a directory subtree like this:

./Another/Sub/Menu.pm

Let's look at an example of a simple portfolio manager class called Invest.pm. There are two subclasses derived from it that manage the type of funds. The three files are shown in Listings 5.17, 5.18, and 5.19. The test code to use these modules is shown in Listing 5.20. The Invest.pm file is placed in the current directory, and the Stock.pm and Fund.pm files are placed in the Invest subdirectory.


Listing 5.17. The ./Invest.pm file.
 1 package Invest;
 2
 3 require Exporter;
 4 @ISA = (Exporter);
 5
 6 =head1 NAME
 7
 8 Letter - Sample module to simulate Bond behaviour
 9
10 =head1 SYNOPSIS
11
12     use Invest;
13     use Invest::Fund;
14     use Invest::Stock;
15
16     $port = new Invest::new();
17
18      $i1 = Invest::Fund('symbol' => 'twcux');
19      $i2 = Invest::Stock('symbol' => 'INTC');
20      $i3 = Invest::Stock('symbol' => 'MSFT');
21
22      $port->Invest::AddItem($i1);
23      $port->Invest::AddItem($i2);
24      $port->Invest::AddItem($i3);
25
26      $port->ShowPortfolio();
27
28 =head1 DESCRIPTION
29
30 This module provides a short example of generating a letter for a
31 friendly neighborbood loan shark.
32
33 The code begins after the "cut" statement.
34 =cut
35
36 @EXPORT = qw( new, AddItem, ShowPortfolio, PrintMe);
37
38 @portfolio = ();
39 $portIndex = 0;
40
41 sub Invest::new {
42         my $this = shift;
43         my $class = ref($this) || $this;
44         my $self = {};
45         bless $self, $class;
46      $portIndex = 0;
47
48      printf "\n Start portfolio";
49         return $self;
50 }
51
52 sub Invest::AddItem {
53      my ($type,$stock) = @_;
54      $portfolio[$portIndex] = $stock;
55      $portIndex++;
56 }
57
58 sub Invest::ShowPortfolio  {
59      my $i;
60      printf "\n Our Portfolio is:";
61      foreach $i (@portfolio) {
62           print "\n ".  $i->{'shares'} . " shares of " . $i->{'symbol'};
63      }
64      print "\n";
65 }
66
67 sub PrintMe {
68      my $this = shift;
69      print "\n Class : $$this";
70 }
71 1;


Listing 5.18. The ./Invest/Stock.pm file.
 1 package Invest::Stock;
 2
 3 require Exporter;
 4 @ISA = (Exporter);
 5 @EXPORT = qw( new );
 6
 7 sub new {
 8         my $this = shift;
 9         my $class = ref($this) || $this;
10         my $self = {};
11      my %parm = @_;
12
13         bless $self, $class;
14      $self->{'symbol'} = $parm{'symbol'};
15      $self->{'shares'} = $parm{'shares'};
16      printf "\n New stock $parm{'symbol'} added";
17         return $self;
18 }
19
20 1;


Listing 5.19. The ./Invest/Fund.pm file.
 1 package Invest::Fund;
 2
 3 require Exporter;
 4 @ISA = (Exporter,Invest);
 5
 6 @EXPORT = qw( new );
 7
 8 sub new {
 9         my $this = shift;
10         my $class = ref($this) || $this;
11         # my $self = {};
12         my $self = Invest::new();
13      my %parm = @_;
14
15         bless $self, $class;
16      $self->{'symbol'} = $parm{'symbol'};
17      $self->{'shares'} = $parm{'shares'};
18      printf "\n new mutual fund $parm{'symbol'} added";
19         return $self;
20 }
21
22 1;


Listing 5.20. Using the Invest, Fund, and Stock files.
 1 #!/usr/bin/perl
 2
 3 push(@Inc,'pwd');
 4
 5 use Invest;
 6 use Invest::Fund;
 7 use Invest::Stock;
 8
 9 $port = new Invest;
10
11 $i1 = new Invest::Fund('symbol' => 'TWCUX', 'shares' => '100');
12 $i2 = new Invest::Fund('symbol' => 'FIXLL', 'shares' => '200');
13
14 $i3 = new Invest::Stock('symbol' => 'INTC', 'shares' => '400');
15 $i4 = new Invest::Stock('symbol' => 'MSFT', 'shares' => '200');
16
17 print "\n";
18 $port->Invest::AddItem($i1);
19 $port->Invest::AddItem($i2);
20 $port->Invest::AddItem($i3);
21 $port->Invest::AddItem($i4);
22 print "\n";
23
24 $port->ShowPortfolio();

Summary

This chapter has provided a brief introduction to object-oriented programming in Perl. Perl provides the OOP features of data encapsulation and inheritance using modules and packages. A class in Perl is simply a package. This class package for a class provides all the methods for objects created for the class.

An object is simply a reference to data that knows to which class it belongs. A method in a class is simply a subroutine. The only catch with writing such methods is that the name of the class is always the first argument in the method.

The bless() function is used to tie a reference to a class name. The bless() function is called in the constructor function new() to create an object and then connect the reference to the object with the name of the class.

In inheritance, the base class is the class from which methods (and data) are inherited. The base class is also called the superclass. The class that inherits these items from the superclass is called the subclass. Multiple inheritance is allowed in Perl. Data inheritance is the programmers' responsibility with the use of references. The subclass is allowed to know things about its immediate superclass, and the superclass is allowed to know nothing about a subclass. Subclasses exist as .pm files in subdirectories under the superclass directory name.

Chapter 6

Binding Variables to Objects


CONTENTS


This chapter is dedicated to describing how the tie() function works in Perl. The tie() function enables you to create variables that are tied to specific methods called when a variable is written to or read from. Using the tie() function, you can eliminate the need for calling methods designed specifically for certain types of variables.

The tie() Function

The tie() function is used to bind a variable to an object class. Here's the syntax:

tie ($variable,$classname,@list);

The tie() function binds a variable to a class that then provides the methods for the variable. The $variable has to be set to the name of the variable to be tied. The $classname is the name of the class implementing objects to which you want the variable tied. The @list is the list of arguments that may be required for this variable's class methods.

The object returned by the tie() function is just like the new() function for an object. You can use the returned value from the tie() function to access other methods in the class you just tied the object to.

Once you tie a variable to a class, the class's behavior is reflected in the way you access the variable. Each of the methods for the type of object is called when the type of action for the variable is seen by Perl. Three types of objects can be tied to classes: scalars, arrays, and associative arrays.

It's often necessary to disassociate an object from a class. This is done with the use of the untie() function. Simply call the untie($object) function and you're done. The untie() function works whether you are tied to a scalar, array, or hash object. The next three sections illustrate how to use tie() on each of these objects.

Tying Scalars

A class implementing a scalar object that can be tied to must implement these four methods:

  • TIESCALAR classname, LIST
  • FETch this
  • STORE this, value
  • DESTROY this

Think of these methods as events for the class. When a variable is first tied to a class, the TIESCALAR method is called. Every time the tied variable is read from, the value from the FETch method is returned. When the tied variable is assigned to, the STORE method is called. Finally, when the tied variable loses scope, the DESTROY method is called.

Listing 6.1 contains a script that uses the tie() function for the Knot class.


Listing 6.1. Using the tie() function.
 1 #!/usr/bin/perl
 2
 3 push(@Inc,".");
 4 use Knot;
 5
 6 #
 7 # From now on the variable $currentTime
 8 # will behave as defined in functions in Knot.pm
 9 #
10 tie $currentTime, 'Knot';
11
12 #
13 # Test the FETch method
14 #
15 $x = $currentTime;
16 print " x= $x\n";
17 print " current = $currentTime\n";
18
19 #
20 # Test the STORE method
21 #
22 # In Knot.pm we have defined the $currentTime
23 # variable to behave as a readonly object.
24 # The following message will bail out with an error message.
25 #
26 $currentTime = $x;
27
28
29 #
30 # As soon as we drop out of the script here, the DESTROY
31 # method will be called on $currentTime.
32 #

Let's examine the code in Listing 6.1 to see where the methods for Knot.pm are invoked. In line 3, the address of the current directory is added to search for included modules. In line 4, the Knot.pm module is requested with the use statement. The Knot.pm file contains the module code for the class that allows variables to be tied.

In line 10, the variable $currentTime is tied to the class Knot. The TIESCALAR function in Knot.pm is called at this point. There are no additional arguments to be passed to the TIESCALAR function, so only two parameters, the variable number and the class name, are sent.

In line 15, the $currentTime variable is read from and the value of $currentTime is assigned to $x. Instead of treating $currentTime as a normal variable, Perl uses the FETch method of the tied class. The FETch method returns the current date in this example. You can write your own function. In line 17, the $currentTime variable is accessed again. This time, the FETch method is called again.

The program must not attempt to assign a value to the $currentTime variable. See line 26 in Listing 6.1. The Knot.pm module is implemented to allow only read-only variables; therefore, the FETch function will print an error message when the code at line 26 is executed.

Finally, the DESTROY method is called when the $currentTime variable is destroyed. The destruction is done automatically by Perl when the $currentTime variable goes out of scope. In this example, the DESTROY method simply prints an error message.

Here is the output from Listing 6.1.

 x= Sat Jun  1 12:54:25 CDT 1996

 current = Sat Jun  1 12:54:25 CDT 1996

Hey dude! We are making this readonly!
 at Knot.pm line 54
    Knot::STORE called at ./6_1.pl line 26

Knot::  unknotted!

Now let's look at the Knot.pm file in Listing 6.2.


Listing 6.2. The Knot.pm file.
 1 #!/usr/bin/perl
 2
 3 # ----------------------------------------------------------------
 4 # Sample file that shows how to tie variables to classes.
 5 #
 6 # This library is hereby placed in the public domain. Copy freely
 7 # as long as you give me credit for it!
 8 # Kamran Husain. khusain@ikra.com
 9 # ----------------------------------------------------------------
10 package Knot;
11 use Carp;
12 use strict;
13
14 #
15 # TIESCALAR classname, argument-list-here
16 #     This is the constructor for the class.  It returns a reference
17 #     to a new object for the class name.
18 #
19 sub TIESCALAR {
20         my $class = shift;
21         my $this = {};
22     #
23     # print "\n $class, $this";
24     #
25         return bless \$this, $class;
26         }
27
28 #
29 # FETch this
30 # The FETch method will be triggered every time the tied variable
31 # is accessed.
32 # The only argument to this function is the object itself.
33 # In this case, we just return the date.
34 #
35 sub FETch {
36           my $self = shift;
37           confess "wrong type" unless ref $self;
38           croak "usage error" if @_;
39           my $override;
40           $override = 'date';
41           return $override;
42           }
43
44 #
45 # STORE this, value
46 # This method will be triggered every time the tied variable is
47 # written to. It expects only two arguments: a reference to itself
48 # and a value that is being assigned.
49 #
50 sub STORE {
51              my $self = shift;
52              confess "wrong type" unless ref $self;
53              my $value = shift;
54          confess "Hey dude! We are making this readonly!\n";
55              return $value;
56          }
57
58 #
59 # DESTROY this
60 # This method will be triggered when the tied variable needs to be
61 # destructed. This method can be just empty for most classes since
62 # Perl's garbage collection will.
63 #
64
65 sub DESTROY {
66              my $self = shift;
67              confess "wrong type" unless ref $self;
68          print "\nKnot::  unknotted!\n";
69          }
70
71 #
72 # The obligatory ending true statement.
73 #
74 1;

Knot.pm defines the package in line 10 and imports the Carp and strict packages in lines 11 and 12, respectively. Line 74 terminates the module.

Lines 19 through 26 define the TIESCALAR function. The TIESCALAR function behaves a lot like the constructor of an object in Perl. It creates an associative array and returns a reference to this array after a call to the bless() function. (See Chapter 5, "Object-Oriented Programming in Perl," for more information on bless-ing objects.)

The FETch method starts at line 35. The FETch method is called every time the tied variable is read from. The only argument to the FETch method is a reference to the object itself. At line 37, the class type is confirmed, although it's not absolutely necessary to do this. Lines 39 through 41 return the current date and time for the value of the tied variable.

The STORE method is defined from line 50 through 56. In this case, we do not let values of the arguments that are passed in be assigned to anything because we want to make this value a read-only variable. You can easily modify this function to take some other action than what's shown in this example. The FETch method takes only two arguments: a reference to itself and a value that is being assigned. The confess() call is from within the Carp module.

The DESTROY method is called when the tied variable is destroyed. Normally, this function is empty. For this example, there is a print statement to show where the DESTROY function is called.

Tying to an Array

An array variable can be tied to a class in the same manner as a scalar can be tied to a class. The real difference is that the input parameters to the methods now need an index used to address a value in array. A class implementing an ordinary array must have these methods:

  • TIEARRAY classname, LIST
  • FETch this, key
  • STORE this, key, value
  • DESTROY this

The FETch, DESTROY, and STORE methods have the same names as those for scalars. However, the name of the constructor is different-it's called TIEARRAY. Let's define a new array type called Cuboid, which has its first five indexes provide special functions. The first three indexes are written to as the height, width, and depth of a cuboid. The next two indexes contain the volume and surface area of the cuboid and are made read-only. The rest of the array can be made into a bounded array to allow a user to store his or her own values. As soon as a value is stored in the Cuboid array, the values of items at index 3 and 4 are recalculated to provide the latest volume and surface area of a cuboid.

Listing 6.3 illustrates how to use this array.


Listing 6.3. Using the Cuboid.pm module.
 1 #!/usr/bin/perl
 2
 3 push(@Inc,".");
 4 use Cuboid;
 5
 6 tie @myCube, 'Cuboid', 3;
 7
 8 $myCube[0] = 2;
 9 $myCube[1] = 3;
10 $myCube[2] = 4;
11
12 for ($i=0; $i < 5; $i++) {
13     print " myCube[$i] = $myCube[$i] \n";
14 }
15

Here is the output of this code.

array will be 8 elements long
[STORE 2 at 0]
[STORE 3 at 1]
[STORE 4 at 2]
 myCube[0] = 2
 myCube[1] = 3
 myCube[2] = 4
 myCube[3] = 24
 myCube[4] = 52

Now let's examine the Cuboid.pm module, which is presented in Listing 6.4.


Listing 6.4. The Cuboid.pm module.
 1 # ------------------------------------------------------------
 2 package Cuboid;
 3 use Carp;
 4 use strict;
 5
 6 #
 7 # The constructor for this class.
 8 # ------------------------------------------------------------
 9 # Array[0] = ht;           read write
10 # Array[1] = wd;           read write
11 # Array[2] = dp;           read write
12 # Array[3] = volume;       read only
13 # Array[4] = surfaceArea;  read only
14 # Array[5...maxsize] = read/write values for the user;
15 # ------------------------------------------------------------
16
17 my $SACRED = 5;
18
19 sub TIEARRAY {
20
21     my $class = shift;
22     my $maxsize = shift;
23
24     #
25     # Bailout if the array is not tied correctly.
26     #
27              confess "usage: tie(\@ary, 'Cuboid', maxsize)"
28                  if @_ || $maxsize =~ /\D/;
29          $maxsize += $SACRED;
30         print "array will be $maxsize elements long\n";
31              return bless {
32                  MAXSIZE => $maxsize,
33                  ARRAY => [0,0,0,0,0],
34              }, $class;
35          }
36
37 # FETch this, index
38 # This method will be triggered every time an individual element the tied
39 # array is accessed (read). It takes one argument beyond its self
40 # reference: the index whose value we're trying to fetch.
41 #
42 sub FETch {
43            my($self,$ndx) = @_;
44            if ($ndx > $self->{MAXSIZE}) {
45                  confess "Error Out of Bounds: $ndx > $self->{MAXSIZE}";
46                }
47            return $self->{ARRAY}[$ndx];
48          }
49
50 # STORE this, index, value
51 # This method will be called whenever an element in the tied array
52 # is written to. It takes three arguments: a reference to itself,
53 # the index to store stuff at, and the value to store at the index.
54 #
55 # The items at [3] and [4] are not allowed to be written to.
56 #
57 sub STORE {
58            my($self, $ndx, $value) = @_;
59            print "[STORE $value at $ndx]\n";
60            if ($ndx > $self->{MAXSIZE} ) {
61              confess "Error Out Of Bounds: $ndx > $self->{MAXSIZE}";
62            }
63        if (($ndx == 3) || ( $ndx == 4))  {
64              confess "Cannot store in read only area: $ndx";
65            }
66            $self->{ARRAY}[$ndx] = $value;
67           $self->{ARRAY}[3]  =
68         ($self->{ARRAY}[0] * $self->{ARRAY}[1] * $self->{ARRAY}[2]) ;
69           $self->{ARRAY}[4] =
70         ($self->{ARRAY}[0] * $self->{ARRAY}[1])  +
71         ($self->{ARRAY}[1] * $self->{ARRAY}[2])  +
72         ($self->{ARRAY}[0] * $self->{ARRAY}[2]);
73
74           $self->{ARRAY}[4] *=  2;
75
76            return $self->{ARRAY}[$ndx] ;
77          }
78
79 # DESTROY
80
81 sub DESTROY { }
82
83 1;

The Cuboid package is started at line 2 and ends at line 83 with a required statement. The package uses Carp and strict packages at lines 3 and 4, respectively. Lines 9 through 14 describe the structure of this object. The size of the sacrosanct indexes is set at line 17.

The TIEARRAY constructor starts at line 19. Note how the constructor takes two parameters: one for the class and one for the maximum size the array can take. Line 27 contains some error-checking routines using the confess() function in the Carp module. The maximum size is adjusted for the sacrosanct indexes, and an appropriate message is printed out at line 30.

A reference to the newly created object is returned in lines 31 through 34. Note how the array is initialized and the member MAXSIZE set at line 33.

The FETch method for the array behaves in the same manner as for a scalar. The FETch method is called every time an individual element in the tied array is read from. The FETch method takes two arguments: a reference to itself and the index whose value is being fetched. Look at line 43 where these two values are assigned to $self and $ndx. Bounds are checked at line 44, and we bail out in case of an error at line 45. If the bounds are correct, the value is returned at the request index via code at line 47.

The STORE method starts at line 50 and takes three arguments: a reference to itself, the index at which to store, and the value to store. In the STORE method, the extracted values are printed at lines 58 and 59. Bounds checking is done at line 60 with a bailout at line 61 in case of errors. Lines 63 through 65 do not permit storing values at index 3 or 4.

At line 66 the input value is assigned. At this point, you could make the code faster by checking whether only indexes 0 to 1 are modified, but you'll need to do that on your own. The area and volume results are stored at index 3 and 4. The assigned value is returned in line 76.

The DESTROY method is just a dummy function that doesn't do much. You really don't need to have a DESTROY method. The one shown here is for illustration only.

Caution
Only the read and write operations of an array are affected by the tie() operation. The functions $#, push(), pop(), and so on of an array are not affected by the tie() function.

Tying to an Associative Array

An associative array is more complicated to implement than either a scalar or an array because of the extra functions that have to be added to it. A class that can be tied to an associative array should have the following methods:

  • TIEHASH classname, LIST
  • FETch this, key
  • STORE this, key, value
  • DELETE this, key
  • EXISTS this, key
  • FIRSTKEY this
  • NEXTKEY this, lastkey
  • DESTROY this

The next example is a simpler version of the one presented by Tom Christiansen in his perltie.html file, which is available on www.perl.com. The example presented by Tom is a bit dangerous to use because you can wipe out your .profile, .history, .elm, .term, and other "dot" files. I wrote this example to map the text versions of the perltie.html file into a hashed array to generate a table of contents for this book based on the first line of each heading. The example presented here simply lists the first line of each heading. There is no reason why you cannot print other information in the file, however.

Listing 6.5 shows the script that uses the tie() function on a hash. The module used for mirroring the contents of each chapter file is called Mirror.pm. It's used in line 4 of this code.


Listing 6.5. Using the tie() function on a hash.
 1 #!/usr/bin/perl
 2
 3 push(@Inc,".");
 4 use Mirror;
 5
 6 tie %chapters, 'Mirror', chaps;
 7
 8 foreach $k (keys %chapters) {
 9     print "$k is  $chapters{$k}";
10     }
11 print " End of script\n"

Here is sample input and output from the code in Listing 6.6.

$ test3
ch25.txt is  (a) Chapter 25
ch08.txt is  (a) Chapter 8
ch28.txt is  (a) Chapte 28
ch02.txt is  (a) Chapter 2
ch29.txt is  (a) Chapter 29
ch12.txt is  (a) Chapter 12
ch15.txt is  (a) Chapter 15
ch06.txt is  (a) Chapter 6

 All done!
 End of script
$

What you see above is the first line of every file listed in the hash instead of the filename! Had we not tied our own functions to the %chapters hash, we would be seeing the names of the file, such as ch29.txt, ch12.txt, and so forth. Instead of these names, when each element in the %chapters hash is accessed, our own function is called and prints out the first line in every file.

Of course, in your code, you would probably be using different functions to implement your own strategy and functions with a hash. For example, each access to a hash is tied to a record in a file. The function behind the access to each element in the hash would somehow take the record and format it to suit your needs. The example shown here in Listing 6.6 is simple enough to modify to fit into your own particular application.

Let's now look at how the code for Mirror.pm is written. By examining this code, you will be able to see how each function is defined for each type of action that you can tie a function to.


Listing 6.6. The Mirror.pm file.
  1 # This file is hereby put in the public domain. Copy freely.
  2 # Just give me some credit for it ;-) if you like. Kamran.
  3 package Mirror;
  4 use Carp;
  5
  6 # TIEHASH classname, $list
  7 #  This is the constructor for the class. That means it is expected to
  8 #  return a blessed reference of a new object.
  9 #
 10 sub TIEHASH {
 11              my $self = shift;
 12              my $dir  = shift || 'pwd';
 13              my $node = {
 14                  HOME    => $dir,
 15                  LIST    => {},
 16
 17              };
 18          #
 19          # print " Directory = $dir\n";
 20          #
 21              opendir(DIR, $dir) || croak "Cannot open $dir: $!";
 22              foreach $item ( grep /\.*txt/, readdir(DIR)) {
 23                  $node->{LIST}{$item} = 'head -1 $dir/$item';
 24          print "$node->{LIST}{$item} \n" if $debug;
 25              }
 26              closedir DIR;
 27              return bless $node, $self;
 28          }
 29
 30
 31 # FETch this, key
 32 # This method is called whenever an element in the tied hash is
 33 # being read. It takes two arguments: a reference to itself and the
 34 # key whose value is being asked.
 35
 36 sub FETch {
 37              my $self = shift;
 38          my $fname = shift;
 39              my $dir = $self->{HOME};
 40              my $file = "$dir/$fname";
 41              unless (exists $self->{LIST}->{$fname} || -f $file) {
 42                  carp "ERROR: no such file $fname ";
 43                  return undef;
 44              }
 45              if (defined $self->{LIST}->{$fname}) {
 46                  return $self->{LIST}->{$fname};
 47              } else {
 48                  return $self->{LIST}->{$fname} = 'head -1 $file';
 49              }
 50
 51          }
 52
 53
 54 # STORE this, key, value
 55 #     This method is called whenever an element in the hash is
 56 #     written to.  It takes three arguments: a reference to itself, the
 57 #     index to store at, and the value to store.
 58
 59 sub STORE {
 60              my $self = shift;  # this
 61              my $fname = shift;    # the key
 62              my $value = shift; # the value
 63              my $dir = $self->{HOME};
 64              my $file = "$dir/$fname";
 65          print "Storing $fname, $value $file \n";
 66         if ($value eq "done") {
 67             print "Storing $fname, $value $file \n";
 68                      return $self->{LIST}->{$fname} = 'head -1 $file';
 69             }
 70         else    {
 71                      return $self->{LIST}->{$fname} =  $value;
 72             }
 73          }
 74
 75 # DELETE this, key
 76 #
 77 #    This method is called when an item is deleted from the hash.
 78 #
 79 sub DELETE   {
 80
 81              my $self = shift;
 82              my $fname = shift;
 83              my $dir = $self->{HOME};
 84              my $file = "$dir/$fname";
 85              delete $self->{LIST}->{$fname};
 86          }
 87
 88 # CLEAR this
 89 #     This method is called when the whole hash is set to an empty list.
 90 #
 91 sub CLEAR    {
 92              my $self = shift;
 93              my $fname;
 94              foreach $fname ( keys %{$self->{LIST}}) {
 95                  $self->DELETE($fname);
 96              }
 97          }
 98
 99 #
100 # EXISTS this, key
101 #     This method is called when the exists() is called on a hash.
102 #
103 sub EXISTS   {
104              my $self = shift;
105              my $dir = $self->{HOME};
106              my $fname = shift;
107              my $file = "$dir/$fname";
108              return exists $self->{LIST}->{$file};
109          }
110
111 # FIRSTKEY this
112 #     This method is called when you start to iterate a list.
113 #
114 sub FIRSTKEY {
115              my $self = shift;
116              my $x  = keys %{$self->{LIST}};
117              each %{$self->{LIST}}
118          }
119
120
121 #
122 # NEXTKEY this, lastkey
123 #      This method is called during a keys() or each() iteration. The
124 # first argument is the object itself. The second argument is the last
125 # key that was accessed.
126
127 sub NEXTKEY  {
128              my $self = shift;
129              return each %{ $self->{LIST} }
130          }
131
132
133 #
134 # DESTROY  the infamous epitaph!
135 #
136 sub DESTROY  { print "\n All done!"; }
137
138 1;

The TIEHASH function definition begins at line 10. The constructor takes two values as arguments: the first is the name of the class, and the second is an optional directory to work in. If the second parameter is not specified, the current working directory is used. The $node hash is used to store two parameters: HOME for the working directory and LIST for the list of items in this hash.

At line 21, the required information is collected for the hash and is stored away in LIST at line 23. The debug statement at line 24 is a very valuable tool while debugging. At line 27, the class is blessed and the reference to the $node is returned.

The FETch method is called whenever an element in the tied hash is being read. It takes two arguments: a reference to itself and the key whose value is being asked for. See lines 37 and 38, where the two parameters are extracted. It would be prudent to add some error correction here lest we look at the wrong path-this is done at line 41. At line 45 the returned value as cached in the init stages is returned. Had a new item to the hash been added with an undefined value, the code at line 45 would assign it a value.

The assignment-handling function is the STORE function. This function is called whenever an element in the hash is written to. It takes three arguments: a reference to itself, the index at which to store, and the value to store. The three arguments are extracted at lines 60 through 62. The third argument may be null, in which case the head of the file in the HOME directory is used. (Look at lines 66 through 72.)

The DELETE function is called when an item is deleted from the hash. The function is defined at line 81. There are two arguments to this function: a reference to the object and the index to remove. The delete() function is called to remove the indexed item from the LIST hash at
line 85.

The CLEAR function is called when the whole hash is removed, possibly by assigning an empty list to it. (The tied array has no such callback!) There is only one argument to this function, and that is a reference to itself. The CLEAR function is set to call the DELETE function in this example (see line 95). This call saves some code, but we could have just as easily used the delete() function in the same manner as DELETE.

The EXISTS function is called to check whether an item exists in a hash (see line 103). There are two arguments to this function: a reference to the object and the index to remove. It simply re-creates the key and uses this key to return a value from within the LIST hash.

The FIRSTKEY and NEXTKEY methods are called when the each() and keys() methods are called. The FIRSTKEY method is called when you start to iterate a list. The NEXTKEY method gets called during a keys() or each() iteration. The first argument to NEXTKEY is a reference to the object itself. The second argument is the last that was accessed.

For the file in Listing 6.6, you should now be able to derive your own classes for mapping hashes to functions. The tie() function, when used with hashes, provides more flexibility in defining methods than what the tie() function for arrays provides. However, using the hash is more complex than the array function because you have to define more methods with the hashing method.

There is supposedly some relief, though, with the use of the TieHash module provided with the Perl distribution. The TieHash module has predefined methods for you to pick and choose whatever functions you want to implement, and the rest are defaulted. There is a man page for the module in the Perl distribution, but it did not provide much information on how to actually use the module. Perhaps we'll see more documentation on this tool in later releases.

For More Information

This chapter has provided only some basic information on the use of the tie() function. There is an excellent document, called perltie.html, by Tom Christiansen that is available at most of the Perl archive sites. The perltie.html document has more detailed information on how to use the tie() functions. The tie() function is also used in modules distributed with Perl. Two interesting modules to look at are the Config and DBM file modules, which show interesting uses of the tie() function. The DBM file modules provide detailed examples of mapping records to disk with the use of tie() functions on hashes.

Summary

This chapter has provided the basic information on how to use the tie() function to provide an association between a Perl variable and executable functions. Scalars, arrays, and hashes can be associated with methods having special names such as FETch, STORE, and so on. By designing classes that provide methods these names, you can provide extra functionality to map scalar, array, or hash objects to other objects, processes, or disk files.

Chapter 7

String and Patterns


CONTENTS


This chapter covers some of the most important features of Perl: its string- and pattern-manipulation routines. Most of the Perl programming you do will involve strings in one form or another. It's very important to learn how to use the string search and replace operations efficiently in Perl. An inefficient search pattern can slow a script down to a crawl.

Basic String Operations

Let's first start with the basic operations in Perl for working with strings. Some of this chapter will be a rehash of what was covered in Chapters 2 through 5. Now it's time to cover the topic in detail given the background information in these chapters. I cover the following string utility functions in this chapter:

  • The chop() and length() functions
  • Handling the case in strings
  • Joining strings together
  • Printing formatted numbers
  • The substr() function
  • Special characters in Perl pattern searches
  • Shortcuts for special words in Perl
  • The quotemeta() function
  • Specifying the number of matches
  • Specifying more than one choice in a pattern
  • Searching a string with more than one pattern

The chop() and length() Functions

To find the length of a string, you can call the length($str) function, which returns the number of characters in a string. The chop function removes the last character in a string. This is useful in removing the carriage return from a user-entered string. For an example, see Listing 7.1.


Listing 7.1. Using length() and chop().
 1 #!/usr/bin/perl
 2
 3 $input = <STDIN> ;
 4
 5 $len = length($input);
 6 print "\nLength = $len of $input before the chopping \n";
 7 chop($input);
 8 $len = length($input);
 9 print "\nLength = $len of $input after the chopping \n";
10
11 $ 7_1.pl
12 Hello! I am a Test!
13
14 Length = 20 of Hello! I am a Test! before the chopping
15
16 Length = 19 of Hello! I am a Test! after the chopping

Handling the Case in Strings

Perl provides four functions to make your life easier when handling the case of characters in a string:

lc($string) Converts a string to lowercase
uc($string) Converts a string to uppercase
lcfirst($string) Converts the first character of a string to uppercase
ucfirst($string) Converts the first character of a string to lowercase

Listing 7.2 presents sample code that illustrates how these functions work.


Listing 7.2. Using the case functions.
 1 #!/usr/bin/perl
 2 $name = "tis a test OF THE sYSTEm" ;
 3 $ucase = uc($name);
 4 $lcase = lc($name);
 5
 6 print "$name  \n";
 7 print "$ucase \n";
 8 print "$lcase \n";
 9
10 $nice = lcfirst($ucase);
11 print "lcfirst on $ucase = \n\t $nice \n";
12
13 $crooked = ucfirst($lcase);
14 print "ucfirst on $lcase = \n\t$crooked \n";

Here is the output from Listing 7.2.

tis a test OF THE sYSTEm
TIS A TEST OF THE SYSTEM
tis a test of the system
lcfirst on TIS A TEST OF THE SYSTEM =
     tIS A TEST OF THE SYSTEM
ucfirst on tis a test of the system =
    Tis a test of the system

Joining Strings Together

The dot operator is great for connecting strings. For example, the following statements will print John Hancock plus a new line:

$first="John";
$last ="Hancock";
print $first . " " . $last . "\n" ;

To print the elements of an array in a string, you can use the join function to create one long string. Here's the syntax for the join function:

join ($joinstr, @list);

The $joinstr variable is the string to use when connecting the elements of @list together. Refer to the following statements to see how to create a line for the /etc/passwd file:

@list = ("khusain","sdfsdew422dxd","501","100",
        "Kamran Husain","/home/khusain","/bin/bash");
$passwdEntry = join (":", @list);

Printing Formatted Numbers

Perl provides two functions, printf and sprintf, that behave like the printf family of functions in the C programming language. The printf function sends its output to the current file. The sprintf function takes at least two arguments, a string and a format string, followed by any other arguments. The sprintf function sends the formatted output to the string in the first argument. For example, the string $answer contains the result of the sprintf statement:

$a= 10;
$b= 10;
sprintf $answer, "%d + %d is %d and in %x", $a,$b,$a+$b,$a+$b;

Finding Substrings

A quick way to find the location of a substring in a string is to use the index function, which searches from left to right. To search from right to left, use the rindex function. Here's the syntax for these functions:

position = index ($string, $substring, [$offset]);
position = rindex ($string, $substring, [$offset]);

$string is the character string to search the $substring in. The $offset parameter is optional and defaults to the start of the string when not provided to the function. Listing 7.3 is a function that looks for the position of the word two in each line of an input file (just like grep would except that we print out the position of the character, too).


Listing 7.3. Using the index and rindex functions.
 1 #!/usr/bin/perl
 2
 3 %finds = ();
 4 $line  = 0;
 5
 6 print "\n Enter word to search for:";
 7 $word = <STDIN>;
 8 chop ($word);
 9
10 print "\n Enter file to search in:";
11 $fname = <STDIN>;
12 chop($fname);
13 open (IFILE, $fname) || die "Cannot open $fname $!\n";
14
15 while (<IFILE>) {
16     $position = index($_,$word);
17     if ($position >= 0) {
18         $finds{"$line"} = $position;
19     }
20     $line++;
21 }
22 close IFILE;
23 while(($key,$value) = each(%finds)) {
24     print " Line $key : $value \n";
25     }

This program searches for the first occurrence of the word in the file specified by the user. Each line in the file is searched for the pattern. If the pattern is found, the program prints the location of the pattern at each line and column number. The first while loop searches in a given file, and the second while lists all the items collected in the %finds associative array.

Listing 7.3 finds only the first occurrence of a pattern in a line. You can use the offset argument to search for a pattern other than from the start. The offset argument is specified from 0 and up. Listing 7.4 presents another search program that finds more than one occurrence on a line.


Listing 7.4. Searching more than once.
 1 #!/usr/bin/perl
 2
 3 %finds = ();
 4 $fname = "news.txt";
 5 $word = "the";
 6 open (IFILE, $fname) || die "Cannot open $fname $!\n";
 7
 8 print "Search for :$word: \n";
 9 while (<IFILE>) {
10     $thispos = 0;
11     $nextpos = 0;
12     while (1) {
13         $nextpos = index($_,$word,$thispos);
14         last if ($nextpos == -1);
15         $count++;
16         $finds{"$count"} = $nextpos;
17         $thispos = $nextpos + 1;
18         }
19 }
20 close IFILE;
21 print "\nLn : Column";
22 while(($key,$value) = each(%finds)) {
23     print " $key : $value \n";
24     }

The output of Listing 7.4 on a sample file would be something like this:

Ln : Column
 1 : 31
 2 : 54
 3 : 38
 4 : 53

The substr Function

The substr function is used to extract parts of a string from other strings. Here's the syntax for this function:

substr ($master, $offset, $length);

$master is the string from which a substring is to be copied, starting at the index specified at $offset and up to $length characters. Listing 7.5 illustrates the use of this function.


Listing 7.5. Using the substr function.
 1 #!/usr/bin/perl
 2 #  Check out the substr function.
 3 #
 4 $quote = "No man but a blockhead ever wrote except for money";
 5 #  quote by Samuel Johnson
 6
 7 $sub[0] = substr ($quote, 9, 6);
 8
 9 $name = "blockhead" ;
10 $pos = index($quote,$name);
11 $len = length($name);
12 $sub[1] = substr ($quote, $pos, $len);
13 $pos = index($quote,"wrote");
14 $sub[2] = substr ($quote, $pos, 6);
15
16 for ($i = 0; $i < 3; $i++) {
17     print "\$sub[$i] is \"" .  $sub[$i] . "\" \n";
18 }
19
20 #
21 # To replace a string, let's try substr on the left-hand side.
22 #
23 # Replace the words 'a blockhead', with the words 'an altruist'.
24 # (Sorry Sam.)
25 $name = "a blockhead" ;
26 $pos = index($quote,$name);
27 $len = length($name);
28
29 substr ($quote, $pos, $len) = "an altruist";
30 print "After substr = $quote \n";

The output from the code in Listing 7.5 is as follows:

$sub[0] is "t a bl"
$sub[1] is "blockhead"
$sub[2] is "wrote "

After substr = No man but an altruist ever wrote except for money

You can see how the substr operator can be used to extract values from another string. Basically, you tell the substr function how many characters you need and from where, and the chopped off portion is returned from the function.

The substr function can also be used to make substitutions within a string. In this listing, the words "a blockhead" are replaced by "an altruist". The part of the string specified by substr is replaced by the value appearing to the right of the assignment operator. Here's the syntax for these calls to substr:

substr ($master, $offset, $length) = $newStr;

$master must be a string that can be written to (that is, not a tied variable- >see Chapter 6, "Binding Variables to Objects," for information on using tie() on variables). $offset is where the substitution begins for up to $length characters. The value of $offset + $length must be less than the existing length of the string. The $newStr variable can be the empty string if you want to remove the substring at the offset. To substitute the tail-end of the string starting from the offset, do not specify the $length argument.

For example, this line:

$len = 22; substr ($quote, $pos, $len) = "an altruist";

prints the following line in the previous example:

After substr = No man but an altruist

The offset can be a negative number to specify counting from the right side of the string. For example, the following line replaces three characters at the fifth index from the right side in $quote with the word "cash":

substr($quote, -5, 3) = "cash";

The substr function is great when working with known strings that do cut and paste operations. For more general strings, you have to work with patterns that can be described using regular expressions. If you are familiar with the grep command in UNIX, you already know about regular expressions. Basically, a regular expression is a way of specifying strings like "all words beginning with the letter a" or "all strings with an xy in the middle somewhere." The next section illustrates how Perl can help make these types of search and replace patterns easier.

String Searching with Patterns

Perl enables you to match patterns within strings with the =~ operator. To see whether a string has a certain pattern in it, you use the following syntax:

$result = $variable =~ /pattern/

The value $result is true if the pattern is found in $variable. To check whether a string does not have a pattern, you have to use the !~ operator, like this:

$result = $variable !~ /pattern/

Listing 7.6 shows how to match strings literally. It prints a message if the string Apple, apple, or Orange is found, or if the strings Grape and grape are not found.


Listing 7.6. Substitution with patterns.
 1 #!/usr/bin/perl
 2
 3 $input = <STDIN> ;
 4 chop($input);
 5 print "Orange found! \n" if ( $input =~ /Orange/ );
 6 print "Apple found! \n" if (  $input =~ /[Aa]pple/ );
 7 print "Grape not found! \n" if ( $input !~ /[Gg]rape/ );

So, how did you search for apple and Apple in one statement? This involves specifying a pattern to the search string. The syntax for the =~ operator is this:

[$variable =~] [m]/PATTERN/[i][o][g]

$variable is searched for the pattern in PATTERN. The delimiter of the text being searched is a white space or an end-of-line character. The i specifies a case-insensitive search. The g is used as an iterator to search more than once on the same string. The o interpolates characters. I cover all these options shortly.

Let's look at how the patterns in PATTERN are defined. If you are already familiar with the grep utility in UNIX, you are familiar with patterns.

A character is matched for the string verbatim when placed in PATTERN. For example, /Orange/ matched the string Orange only. To match a character other than a new line you can use the dot (.) operator. For example, to match Hat or Cat, you would use the pattern:

/.at/

This also matches Bat, hat, Mat, and so on. If you just want to get Cat and Hat, you can use a character class using the square brackets ([]). For example, the pattern

/[ch]cat/

will match Cat or Hat, but not cat, hat, bat, and so on. The characters in a class are case sensitive. So to allow the lowercase versions, you would use the pattern:

/[cChH]cat/

It's cumbersome to list a lot of characters in the [] class, so the dash (-) operator can define a range of characters to use. These two statements look for a digit:

/[0-9]/
/[0123456789]/

The [] operator can be used with other items in the pattern. Consider these two sample statements, which do the same thing:

/a[0123456789]/ # matches a, followed by any digit,
/a[0-9]/ # matches a, followed by any digit,
/[a-zA-Z]/ # a letter of the alphabet.

The range [a-z] matches any lowercase letter, and the range [A-Z] matches any uppercase letter. The following pattern matches aA, bX, and so on:

/[a-z][A-Z]/

To match three or more letter matches, it would be very cumbersome to write something
like this:

/[a-zA-Z][a-zA-Z][a-zA-Z]/

This is where the special characters in Perl pattern searching come into play.

Special Characters in Perl Pattern Searches

Here is a list of all the special characters in search strings (I'll go into the detail of how they work later):

  • The . character matches one character.
  • The + character matches one or more occurrences of a character.
  • The ? character matches zero or one occurrences of a character.
  • The * character matches zero or more occurrences of a character.
  • The - character is used to specify ranges in characters.
  • The [] characters define a class of characters.
  • The ^ character matches the beginning of a line.
  • The $ character matches the end of a line.
  • The {} characters specify the number of occurrences of a character.
  • The | character is the OR operator for more than one pattern.

The plus (+) character specifies "one or more of the preceding characters." Patterns containing + always try to match as many characters they can. For example, the pattern /ka+/ matches any of these strings:

kamran        # returns "ka"
kaamran       # returns "kaa"
kaaaamran     # returns "kaaaa"

Another way to use the + operator is for matching more than one space. For example, Listing 7.7 takes an input line and splits the words into an array. Items in the array generated by this code will not include any items generated by matching more than one consecutive space. The match / +/ specifies "one or more space(s)."


Listing 7.7. Using the pattern matching + operator.
1 #!/usr/bin/perl
2 $input = <STDIN>;
3 chop ($input);
4 @words = split (/ +/, $input);
5 foreach $i (@words) {
6     print $i . "\n";
7     }

If you do not use the + sign to signify more than one space in the pattern, you'll wind up with an array item for each white space that immediately follows a white space. The pattern / / specifies the start of a new word as soon as it sees a white space. If there are two spaces together, the next white space will trigger the start of a new word. By using the + sign, you are saying "one or more white space together" is the start of a new word.

Tip
If you are going to repeatedly search one scalar variable, call the study() function on the scalar. The syntax is study ($scalar);. Only one variable can be used with study() at one time.

The asterisk (*) special character matches zero or more occurrences of any preceding character. The asterisk can also be used with the [] classes:

/9*/    # matches an empty word, 9, 99, 999, ... and so on
/79*/   # matches 7, 79, 799, 7999, ... and so on
/ab*/   # matches a, ab, abb, abbb, ... and so on

Because the asterisk matches zero or more occurrences, the pattern

/[0-9]*/

will match a number or an empty line! So do not confuse the asterisk with the plus operator. Consider this statement:

@words = split (/[\t\n ]*/, $list);

This matches zero or more occurrences of the space, newline, or tab character. What this translates to in Perl is "match every character." You'll wind up with an array of strings, each of them one character long, of the all the characters in the input line.

The ? character matches zero or one occurrence of any preceding character. For example, the following pattern will match Apple or Aple, but not Appple:

/Ap?le/

Let's look at a sample pattern that searches the use of hashes, arrays, and possibly the use of handles. The code in Listing 7.8 will be enhanced in the next two sections. For the moment, let's use the code in Listing 7.8 to see how the asterisk operator works in pattern matches.


Listing 7.8. Using the asterisk operator.
 1 #!/usr/bin/perl
 2 # We will finish this program in the next section.
 3 $scalars =  0;
 4 $hashes =  0;
 5 $arrays =  0;
 6 $handles =  0;
 7
 8 while (<STDIN>) {
 9     @words = split (/[\(\)\t ]+/);
10     foreach $token (@words) {
11     if ($token =~ /\$[_a-zA-Z][_0-9a-zA-Z]*/) {
12               # print ("$token is a legal scalar variable\n");
13         $scalars++;
14     } elsif ($token =~ /@[_a-zA-Z][_0-9a-zA-Z]*/) {
15               # print ("$token is a legal array variable\n");
16         $arrays++;
17     } elsif ($token =~ /%[_a-zA-Z][_0-9A-Z]*/) {
18               # print ("$token is a legal hash variable\n");
19         $hashes++;
20     } elsif ($token =~ /\<[A-Z][_0-9A-Z]*\>/) {
21               # print ("$token is probably a file handle\n");
22         $handles++;
23     }
24    }
25 }
26
27 print " This file used scalars $scalars times\n";
28 print " This file used arrays  $arrays  times\n";
29 print " This file used hashes $hashes times\n";
30 print " This file used handles $handles times\n";

Lines 9 and 10 split the incoming stream into words. Note how the pattern in line 9 splits words at spaces, tabs, and in between parentheses. At line 11, we are looking for a word that starts with a $, has a non-numeric character or underscore as the first character, and is followed by an alphanumeric string or underscores.

At lines 14 and 17, the same pattern is applied, with the exception of an at (@) sign and a hash (#) sign are looked for instead of a dollar ($) sign in order to search for arrays and hashes, respectively. At line 20, the file handle is assumed to a word in all caps, not starting with an underscore, but with alphanumeric characters in it.

The previous listing can get legal names if the pattern is anywhere in a word. However, we want the search to be limited to word boundaries. For example, right now the script cannot distinguish between the following three lines of input because they all match the /\$[a-zA-Z][_0-9a-zA-Z]*/ somewhere in them:

$catacomb
OBJ::$catacomb
#$catacomb#

White spaces do not include tabs, newlines, and so on. Here are the special characters to use in pattern matching to signify these characters:

\t Tab
\n Newline
\r Carriage return
\f Form feed.
\\ Backslash (\)
\Q and \E Pattern delimiters

In general, you can escape any special character in a pattern with the backslash (\). The backslash itself is escaped with another backslash. The \Q and \E characters are used in Perl to delimit the interpretation of any special characters. When the Perl interpreter sees \Q, every character following \Q is not interpreted and is used literally until the pattern terminates or Perl sees \E. Here are a few examples:

/\Q^Section$/ # match the string "^Section$" literally.
/^Section$/   # match a line with the solitary word Section in it.
/\Q^Section$/ # match a line which ends with ^Section

To further clarify where the variable begins and ends, you can use these anchors:

\A Match at beginning of string only
\Z Match at end of string only
\b Match on word boundary
\B Match inside word

Here are some examples and how they are interpreted given a string with the word hello in it somewhere:

/\Ahel/     # match only if the first three characters are "hel"
/llo\Z/     # match only if the last three characters are "llo"
/llo$/      # matches only if the last three characters are "llo"
/\Ahello\Z/ # same as /^hello$/ unless doing multiple line matching
/\bhello/   # matches "hello", not "Othello", but also matches "hello."
/\bhello/   # matches "$hello" because $ is not part of a word.
/hello\b/   # matches "hello", and "Othello", but not "hello."
/\bhello\b/ # matches "hello", and not "Othello" nor "hello."

A "word" for use with these anchors is assumed to contain letters, digits, and underscore characters. No other characters, such as the tilde (~), hash (#), or exclamation point (!) are part of the word. Therefore, the pattern /\bhello/ will match the string "$hello", because $ is not part of a word.

The \B pattern anchor takes the opposite action than that of \b. It matches only if the pattern is contained in a word. For example, the pattern below:

/\Bhello/    

match "$hello" and "Othello" but not "hello" nor "hello." Whereas, the pattern here:

/hello\B/   

will match "hello." but not "hello", "Othello" nor "$hello". Finally this pattern

/\Bhello\B/

will match "Othello" but not "hello", "$hello" nor "hello.".

/\Bhello/    # match "$hello" and "Othello" but not "hello" nor "hello."
/hello\B/    # match "hello." but not "hello", "Othello" nor "$hello".
/\Bhello\B/  # match "Othello" but not "hello", "$hello" nor "hello.".

Listing 7.9 contains the code from Listing 7.8 with the addition of the new word boundary functions.


Listing 7.9. Using the boundary characters.
 1 #!/usr/bin/perl
 2
 3 $scalars =  0;
 4 $hashes =  0;
 5 $arrays =  0;
 6 $handles =  0;
 7
 8 while (<STDIN>) {
 9     @words = split (/[\t ]+/);
10     foreach $token (@words) {
11     if ($token =~ /\$\b[a-zA-Z][_0-9a-zA-Z]*\b/) {
12               # print ("$token is a legal scalar variable\n");
13         $scalars++;
14     } elsif ($token =~ /@\b[a-zA-Z][_0-9a-zA-Z]*\b/) {
15               # print ("$token is a legal array variable\n");
16         $arrays++;
17     } elsif ($token =~ /%\b[a-zA-Z][_0-9A-Z]*\b/) {
18               # print ("$token is a legal hash variable\n");
19         $hashes++;
20     } elsif ($token =~ /\<[A-Z][_0-9A-Z]*\>/) {
21               # print ("$token is probably a file handle\n");
22         $handles++;
23     }
24    }
25 }
26
27 print " This file used scalars $scalars times\n";
28 print " This file used arrays  $arrays  times\n";
29 print " This file used hashes $hashes times\n";
30 print " This file used handles $handles times\n";

Here is sample input and output for this program that takes an existing script file in test.txt and uses it as the input to the test.pl program.

$ cat test.txt
#!/usr/bin/perl

$input = <STDIN>;
chop ($input);

@words = split (/ +/, $input);
foreach $i (@words) {
    print " [$i] \n";
    }

$ test.pl  < test.txt
 This file used scalars 5 times
 This file used arrays  2  times
 This file used hashes 0 times
 This file used handles 1 times

Patterns do not have to be typed literally to be used in the / / search functions. You can also specify them from within variables. Listing 7.10 is a modification of Listing 7.9, which uses three variables to hold the patterns instead of specifying them in the if statement.


Listing 7.10. Using pattern matches in variables.
 1 #!/usr/bin/perl
 2
 3 $scalars =  0;
 4 $hashes =  0;
 5 $arrays =  0;
 6 $handles =  0;
 7
 8 $sType = "\\\$\\b[a-zA-Z][_0-9a-zA-Z]*\\b";
 9 $aType = "@\\b[a-zA-Z][_0-9a-zA-Z]*\\b";
10 $hType = "%\\b[a-zA-Z][_0-9A-Z]*\\b/";
11
12 while (<STDIN>) {
13     @words = split (/[\t ]+/);
14     foreach $token (@words) {
15     if ($token =~ /$sType/ ) {
16               # print ("$token is a legal scalar variable\n");
17         $scalars++;
18     } elsif ($token =~ /$aType/ ) {
19               # print ("$token is a legal array variable\n");
20         $arrays++;
21     } elsif ($token =~ /$hType/ ) {
22               # print ("$token is a legal hash variable\n");
23         $hashes++;
24     } elsif ($token =~ /\<[A-Z][_0-9A-Z]*\>/) {
25               # print ("$token is probably a file handle\n");
26         $handles++;
27     }
28    }
29 }
30
31 print " This file used scalars $scalars times\n";
32 print " This file used arrays  $arrays  times\n";
33 print " This file used hashes $hashes times\n";
34 print " This file used handles $handles times\n";

In this code, the variables $aType, $hType, and $sType can be used elsewhere in the program verbatim. What you have to do, though, is to escape the backslashes twice, once to get past the Perl parser for the string and the other for the pattern searcher if you are using double quotes. When using single quotes, you can use the following line:

$sType = '\$\\b[a-zA-Z][_0-9a-zA-Z]*\b';

instead of this line:

$sType = "\\\$\\b[a-zA-Z][_0-9a-zA-Z]*\\b";

Make sure that you remember to include the enclosing / characters when using a $variable for a pattern. Forgetting to do this will give erroneous results. Also, be sure you see how each backslash is placed to escape characters correctly.

Shortcuts for Words in Perl

The [] classes for patterns simplify searches quite a bit. In Perl, there are several shortcut patterns that describe words or numbers. You have seen them already in the previous examples and chapters.

Here are the shortcuts:

Shortcut
Description Pattern String
\d
Any digit[0-9]
\D
Anything other than a digit[^0-9]
\w
Any word character[_0-9a-zA-Z]
\W
Anything not a word character[^_0-9a-zA-Z]
\s
White space [ \r\t\n\f]
\S
Anything other than white space[^ \r\t\n\f]

These escape sequences can be used anywhere ordinary characters are used. For example, the pattern /[\da-z]/ matches any digit or lowercase letter.

The definition of word boundary as used by the \b and \B special characters is done with the use of \w and \W. The patterns /\w\W/ and /\W\w/ can be used to detect word boundaries. If the pattern /\w\W/ matches a pair of characters, it means that the first character is part of a word and the second is not. This further means that the first character is at the end of a matched word and that a word boundary exists between the first and second characters matched by the pattern and you are at the end of a word.

Conversely, if /\W\w/ matches a pair of characters, the first character is not part of a word and the second character is part of the word. This means that the second character is the beginning of a word. Again, a word boundary exists between the first and second characters matched by the pattern. Therefore, you are at the start of a word.

The quotemeta Function

The quotemeta function puts a backslash in front of any non-word character in a given string. Here's the syntax for quotemeta:

$
newstring = quotemeta($oldstring);

The action of the quotemeta string can best be described using regular expressions as

$string =~ s/(\W)/\\$1/g;

Specifying the Number of Matches

Sometimes matching once, twice, or more than once is not sufficient for a particular search. What if you wanted to match from two to four times? In this case you can use the { } operators in the search function. For example, in the following pattern you can search for all words that begin with ch followed by two or three digits followed by .txt:

/ch[0-9]{2,3}.txt/

For exactly three digits after the ch text, you can use this:

/ch[0-9]{ 3}.txt/

For three or more digits after the ch text, you can use this:

/ch[0-9]{3,}.txt/

To match any three characters following the ch text, you can use this:

/ch.{3,}.txt/

Specifying More Than One Choice

Perl enables you to specify more than one choice when attempting to match a pattern. The pipe symbol (|) works like an OR operator, enabling you to specify two or more patterns to match. For example, the pattern

/houston|rockets/

matches the string houston or the string rockets, whichever comes first. You can use special characters with the patterns. For example, the pattern /[a-z]+|[0-9]+/ matches one or more lowercase letters or one or more digits. The match for a valid integer in Perl is defined as this:

/\b\d+\b|\b0[xX][\da-fA-F]+\b/)

There are two alternatives to check for here. The first one is ^\d+ (that is, check for one or more digits to cover both octal and decimal digits). The second ^0[xX][\da-fA-F]+$ looks for 0x or 0X followed by hex digits. Any other pattern is disregarded. The delimiting \b tags limit the search to word boundaries.

Searching a String for More Than One Pattern to Match

Sometimes it's necessary to search for occurrences for the same pattern to match at more than one location. You saw earlier in the example for using substr how we kept the index around between successive searches on one string. Perl offers another alternative to this problem: the pos() function. The pos function returns the location of the last pattern match in a string. You can reuse the last match value when using the global (g) pattern matching operator. The syntax for the pos function is

$offset = pos($string);

where $string is the string whose pattern is being matched. The returned $offset is the number of characters already matched or skipped.

Listing 7.11 presents a simple script to search for the letter n in Bananarama.


Listing 7.11. Using the pos function.
1 #!/usr/bin/perl
2 $string = "Bananarama";
3 while ($string =~ /n/g) {
4         $offset = pos($string);
5         print("Found an n at $offset\n");
6 }

Here's the output for this program:

Found an n at 2
Found an n at 4
Found an n at 6
Found an n at 8
Found an n at 10

The starting position for pos() to work does not have to start at 0. Like the substr() function, you can use pos() on the right side of the equal sign. To start a search at position 6, simply type this line before you process the string:

pos($string) = 5;

To restart searching from the beginning, reset the value of pos to 0.

Reusing Portions of Patterns

There will be times when you want to write patterns that address groups of numbers. For example, a section of comma-delimited data from the output of a spreadsheet is of this form:

digits,digits,digits,digits

A bit repetitive, isn't it? To extract this tidbit of information from the middle of a document, you could use something like this:

/[\d]+[,.][\d]+[,.][\d]+[,.][\d]+/

What if there were 10 columns? The pattern would be long, and you'd be prone to make mistakes.

Perl provides a macro substitution to allow repetitions of a known sequence. Every pattern in a matched string that is enclosed in memory is stored in memory in the order it is declared. To retrieve a sequence from memory, use the special character \n, where n is an integer representing the nth pattern stored in memory.

For example, you can write the previous lines using these two repetitive patterns:

([\d]+)
([,.])

The string that is used for matching the pattern would look like this:

/([\d]+])([,.])\1\2\1\2\1\2/

The pattern matched by [\d]+ is stored in memory. When the Perl interpreter sees the escape sequence \1, it matches the first matched pattern. When it sees \2, it matches the second pattern. Pattern sequences are stored in memory from left to right. As another example, the following matches a phone number in the United States, which is of the form ###-###-####, where the # is a digit:

/\d{3}(\-))\d{3}\1\d{2}/

The pattern sequence memory is preserved only for the length of the pattern. You can access these variables for a short time, at least until another pattern match is hit, by examining the special variables of the form $n. The $n variables contain the value of patterns matched in parentheses right after a match. The special variable $& contains the entire matched pattern.

In the previous snippet of code, to get the data matched in columns into separate variables, you can use something like this excerpt in a program:

if (/-?(\d+)\.?(\d+)/) {
$matchedPart = $&;
$col_1 = $1;
$col_2 = $2;
$col_3 = $3;
$col_4 = $4;
}

The order of precedence when using () is higher than that of other pattern-matching characters. Here is the order of precedence from high to low:

() Pattern memory
+ * ? {} Number of occurrences
^ $ \b \B \W \w Pattern anchors
| The OR operator

The pattern-memory special characters () serve as delimiters for the OR operator. The side effect of this delimiting is that the parenthesized part of the pattern is mapped into a $n register. For example, in the following line, the \1 refers to (b|d), not the (a|o) matching pattern:

/(b|d)(a|o)(rk).*\1\2\3/

Pattern-Matching Options

There are several pattern-matching options in Perl to control how strings are matched. You saw these options earlier when I introduced the syntax for pattern matching. Here are the options:

g
Match all possible patterns
i
Ignore case when matching strings
m
Treat string as multiple lines
o
Only evaluate once
s
Treat string as single line
x
Ignore white space in pattern

All these pattern options must be specified immediately after the option. For example, the following pattern uses the i option to ignore case:

/first*name/i

More than one option can be specified at one time and can be specified in any order.

The g operator tells the Perl interpreter to match all the possible patterns in a string. For example, if the string bananarama is searched using the following pattern:

/.a/g

it will match ba, na, na, ra, and ma. You can assign the return of all these matches to an array. Here's an example:

@words = "bananarama" =~ /.a/g;
for $i (@words) {
    print "$i \n";
}

You can use patterns with the g option in loops. The returned value of the match is repeated until it returns false. Inside the loop you can use the &# operator. For example, in the word Mississippi, you can loop around looking for two characters together like this:

$string = "Mississippi";
while ($string =~ /([a-z]\1/g) {
          $found = $&;
          print ("$found\n");
}

Tip
Don't forget that you can use the pos() function in a while loop to see at what position the last match occurred.

The i option enables you to perform a case-insensitive search. The match will be made regardless of whether the string is uppercase or lowercase or a mixture of cases.

The m option allows searching on more than one line per match. When the m option is specified, the ^ special character matches either the start of the string or the start of any new line. Also, the $ character can match either the new line or the end of text.

The o option enables a pattern to be evaluated only once. This is never really used in practice. Basically, it forces Perl to disregard further matches on the same input line.

Normally the dot (.) character does not match the new line. When you specify the s option, you allow the pattern to be matched across multiple lines because this allows the dot character to be matched with a new line.

The x operator tells Perl to ignore any white spaces in the pattern match unless the white space has been preceded by a backslash. The real benefit to using the x option is to improve readability because pattern specifications do not have to be crunched together anymore. For example, these two patterns match the same string:

/([\d]+])([,.])\1\2\1\2\1\2/
/([\d]+])([,.]) \1\2\ 1\2\ 1\2/x

Substituting Text Through Pattern Matching

You have already seen how to substitute text through the use of the substr function. The pattern-matching function can be extended to do string substitution with the use of the s operator. Here's the syntax:

s/pattern/replacement/[options]

The replacement string is interpreted literally and cannot have a pattern. The Perl interpreter searches for the pattern specified by the placeholder pattern. If it finds the pattern, it replaces the pattern with the string represented by the placeholder replacement. Here's an example:

$string = "cabi.net";
$string =~ s/cabi/sig/;

The contents of $string will be sig.net instead of cabi.net.

The good news is that all the pattern matching stuff up to this point in the chapter applies here! So, you can use any of the pattern special characters in the substitution operator. For example, the following replaces all words with one or more digits with the letter X:

s/[\d]+/X/

Specify an empty string for the replacement if you just want to delete a set of strings. For example, the following line replaces all words with one or more digits in them:

s/[\d]+//

The pattern match memory sequence applies here. For example, to swap the two columns of data, you can use this line:

s/(\d+)\s\1/$2 $1/

The substitution pattern matches a sequence of one or more digits, followed by a space, followed by another set of digits. The output is the values of the $1 and $2 registers swapped in sequence.

The substitution operator supports several options just like the match operator:

g
Change all occurrences of the pattern
i
Ignore case in pattern
e
Evaluate replacement string as expression
m
Treat string to be matched as multiple lines
o
Only evaluate once
s
Treat string to be matched as single line
x
Ignore white space in pattern

As with pattern matching, options are appended to the end of the operator. Most of these options work the same way as they did for matching patterns during a search.

The g option changes all occurrences of a pattern in a particular string. For instance, the following substitution puts parentheses around all the numbers in a line:

s/(\d+)/($1)/g

The i option ignores case when substituting. For example, the substitution

s/\bweb\b/WEB/gi

replaces all occurrences of the words web, WeB, wEB, and so on with the word WEB.

Although you cannot put patterns in the replacement string, you can run the eval() function on it. The e option treats the replacement string as an expression, which it evaluates before replacing it in the original string. The results of the evaluation are used instead. Suppose that you wanted to repeat a string twice on a line. A common use is to redefine the values in a header file to twice what they are. For example, the string

$define ABX 123

matches all the variables of the form and replaces the numeric part of the line with twice its value. Listing 7.12 presents a simple script to do this with a C header file.


Listing 7.12. Using the pattern replace to do simple operations.
 1 #!/usr/bin/perl
 2
 3 open (FILE, "tt.h") || die $!;
 4 $i = 0;
 5 while (<FILE>) {
 6         $string = $_;
 7         if(/define/) {
 8             $string  =~ s/(\d+)/$1 * 2/e;
 9             print "$string \n";
10             $i++;
11         }
12         else {
13             print "$string \n";
14         }
15         }
16
17 close FILE;

The o option tells the Perl interpreter to replace a scalar variable only on the first match. All subsequent pattern matches are ignored.

The s option ensures that the newline character \n is matched by the . special character:

With the m option, the ^ and $ characters match the start and end of any line as they do in pattern matches.

The \A and \Z escape sequences always match only the beginning and end of the string. The actions taken by these options are not affected by the s or m options.

The x option causes the interpreter to ignore all white spaces unless they are escaped by a backslash. The only benefit gained from this operation is to make patterns easier to read. See the example for using the x option shown in the pattern-matching options section earlier in this chapter.

The forward slash (/) delimiter can be substituted with another character for showing where to delimit text. For example, you can use <>, # (hash), or () (parentheses) characters as delimiters, as illustrated in Listing 7.13.


Listing 7.13. Using a different delimiter for the forward slash.
 1 #!/usr/bin/perl
 2
 3 $name = "/usr/local/lib";
 4
 5 $s1 = $name ;
 6 $s1 =~ s#/usr/local/#/local/#;
 7 print $s1 . "\n";
 8
 9 $s2 = $name ;
10 $s2 =~ s</usr/local/></local/>;
11 print $s2 . "\n";
12
13 $s3 = $name ;
14 $s3 =~ s(/usr/local/)(/local/);
15 print $s3 . "\n";

The Translation Operator

The UNIX tr command is also available in Perl as the tr function. The tr function lets you substitute one group of characters with another. Here's the syntax:

tr/string1/string2/

where string1 contains a list of characters to be replaced, and string2 contains the characters that replace them. Each character in string2 is replaced with a character in the same position in string1.

If string1 is longer than string2, the last character of string1 is repeated to pad the contents of string2. If the same character appears more than once in string1, the first replacement found will be used.

$string = "12345678901234567890";
$string =~ tr/2345/ABC/;

Here, all characters 2, 3, 4, and 5 in the string are replaced with A, B, C, and C, respectively. The C is repeated here by Perl as it makes the length of the replacement string equal to that of the string being replaced. So, the replacement string is "ABccC" for matching with "12345".

The most common use of the translation operator is to convert a string from uppercase to lowercase, or vice versa.

while ($line = <STDIN>) {
         $line =~ tr/A-Z/a-z/;
         print ($line);
}

To convert all characters in a string to uppercase, here's another sample function:

while ($line = <STDIN>) {
         $line =~ tr/a-z/A-Z/;
         print ($line);
}

There are a few things about the tr operator that you should remember:

  • If you do not specify a variable name, the $_ variable is used instead.
  • The tr function returns the number of characters replaced. That is, the statement $num = $line =~ tr/a-z/A-Z/ will set the value of $num to the number of characters replaced. Using tr to get the length of a string is possible if you replace each character by itself, but it's probably better to use length instead.
  • No pattern-matching special characters are supported by tr.
  • You can use y in place of tr if you like. That is, these two statements are the same: $string =~ y/a-z/A-Z/; and $string =~ tr/a-z/A-Z/;. Functionally, the y is the same as tr. Why use y? Using the y function might impress your colleagues and get you a key to the executive bathroom. Wow!

The program in Listing 7.14 tallies the number of times vowels are used in a text file.


Listing 7.14. Tallying vowels in a text file.
 1 #!/usr/bin/perl
 2
 3 $count = 0;
 4
 5 while ($input = <STDIN>) {
 6         chop ($input);
 7         $total += length($input);
 8         $_ = $input;
 9         $count += tr/aeiou/aeiou/;
10 }
11
12 print ("In this file, there are: $count vowels \n";

The translation operator supports three options. These options are specified after the patterns using this syntax:

tr/string1/string2/[cds]

Here are the options for tr:

c
Translate all characters not specified
d
Delete all specified characters
s
Replace multiple identical output characters with a single character

The c operator stands for complement. That is, it does the opposite of what the character specifies. For example, the following line replaces all characters that are not in [a-zA-Z0-9] with a space:

$onlyDigits =~ tr/\w/ /c;

The d option deletes every specified character:

$noDigits =~ tr/\d//d;

This deletes all the digits from $noDigits.

The s option stands for squeeze. With the s option, tr translates only one character if two or more consecutive characters translate to the same output character. For example, the following line replaces everything that is not a digit and outputs only one space between digits.

Extended Pattern Matching

Pattern-specific matching capabilities are possible with the use of this operator:

(?ccpattern)

cc is a single character representing the extended pattern-matching capability being used for the pattern. cc can be one of these values:

?: Do not store the pattern in parentheses in memory.
?o Where o can be an option to apply to the pattern and can be i for case insensitive, m for multiple lines, s for single line, or x for ignore white space.
?= Look ahead in buffer.
?! Look back in buffer.
?# Add comments.

You have seen how () stores a pattern match in memory. By using ?: you can force the pattern not to be stored in memory. In the following two statements, \1 points to \d+ in the first and [a-z] in the second:

/(\d+)([a-z]+/
/(?:\d+)([a-z]+/

The string, ?o, specifies a pattern-matching option within the pattern itself. The o could be i for ignore case. For example, the following patterns are the same:

/[a-z]+/i
/(?i)[a-z]+/

You can specify different cases for different parts of the same search pattern. Here's an example:

$pattern1 = "[A-Z]+";
$pattern2 = "(?i)[a-z0-9_]+";
if ($string =~ /$pattern1|$pattern2/) {
        ...
}

This pattern matches either any collection of uppercase letters or any collection of letters with digits and an underscore.

You can use the ?= feature to look ahead for a pattern. For example, the pattern

/123(?=XYZ)/

only matches 123 if it is immediately followed by XYZ. The matched string in $& will be 123, not 123XYZ.

To look at the back of a string, use the ?! operator. For example,

/(?!XYZ)123/

matches 123 only if it immediately follows XYZ. The matched pattern in $& will still be 123.

Reading complicated patterns is not easy, even if you are the author. Adding comments makes it easier to follow complicated patterns. Finally, you can add comments about a pattern with the ?# operator. Here's an example:

/(?i)[a-z][\d]{2,3}(?

The above example will match two or three digits following a lowercase letter.

Summary

With the function substr you can extract a substring from a string or replace a portion of a string or append to the front or back end of another string. The lc and uc functions convert strings to lowercase and uppercase. The first letter of a string can be converted to lowercase or uppercase using either lcfirst or ucfirst. The quotemeta function places a backslash in front of every nonword character in a string. New character strings can be created using join, which creates a string from the members of a list, and sprintf, which works like printf except that the output goes to a string. Functions that search character strings include index, which searches for a substring starting from the left of a string, and rindex, which searches for a substring starting from the right of a string. You can retrieve the length of a character string using length. The pos function enables you to determine or set the current pattern-matching location in a string. The tr function replaces one set of characters with another.

Оставьте свой комментарий !

Ваше имя:
Комментарий:
Оба поля являются обязательными

 Автор  Комментарий к данной статье