Search     or:     and:
 LINUX 
 Language 
 Kernel 
 Package 
 Book 
 Test 
 OS 
 Forum 
 iakovlev.org 
 Books
  Краткое описание
 Linux
 W. R. Стивенс TCP 
 W. R. Стивенс IPC 
 A.Rubini-J.Corbet 
 K. Bauer 
 Gary V. Vaughan 
 Д Вилер 
 В. Сталлинг 
 Pramode C.E. 
 Steve Pate 
 William Gropp 
 K.A.Robbins 
 С Бекман 
 Р Стивенс 
 Ethereal 
 Cluster 
 Languages
 C
 Perl
 M.Pilgrim 
 А.Фролов 
 Mendel Cooper 
 М Перри 
 Kernel
 C.S. Rodriguez 
 Robert Love 
 Daniel Bovet 
 Д Джеф 
 Максвелл 
 G. Kroah-Hartman 
 B. Hansen 
NEWS
Последние статьи :
  Тренажёр 16.01   
  Эльбрус 05.12   
  Алгоритмы 12.04   
  Rust 07.11   
  Go 25.12   
  EXT4 10.11   
  FS benchmark 15.09   
  Сетунь 23.07   
  Trees 25.06   
  Apache 03.02   
 
TOP 20
 Linux Kernel 2.6...5168 
 Trees...938 
 Максвелл 3...869 
 Go Web ...821 
 William Gropp...801 
 Ethreal 3...785 
 Gary V.Vaughan-> Libtool...772 
 Ethreal 4...769 
 Rodriguez 6...763 
 Ext4 FS...753 
 Clickhouse...753 
 Steve Pate 1...752 
 Ethreal 1...741 
 Secure Programming for Li...730 
 C++ Patterns 3...716 
 Ulrich Drepper...696 
 Assembler...694 
 DevFS...660 
 Стивенс 9...649 
 MySQL & PosgreSQL...630 
 
  01.01.2024 : 3621733 посещений 

iakovlev.org
 

Chapter 4. Content Handlers

Первый хэндлер , который начинает обрабатывать внешний запрос - mod_alias . Если он почему-то не проинсталлирован , что вряд ли , тогда http_core хэндлер. Затем апач разбивает полученный оттранслированный путь на 2 части : "filename" и "additional path information" . Пусть рутовый каталог есть /home/www и получен запрос на URI /abc/def/ghi. Процесс трансляции можно отследить по следующей таблице :

Physical Directory

Translated Filename

Additional Path Information

/home/www

/home/www/abc

/def/ghi

/home/www/abc

/home/www/abc/def

/ghi

/home/www/abc/def

/home/www/abc/def/ghi

empty

/home/www/abc/def/ghi

/home/www/abc/def/ghi

empty

Затем начинает работать handler mod_mime , который определяет , является ли расширение файла типом MIME . В конфиге с помощью директивы AddHandler можно хэндлер привязать к обработке конкретного расширения : AddHandler server-parsed .shtml Сам хэндлер определен в mod_include . Директива SetHandler используется внутри директив <Directory>, <Location>, <Files> для привязки хэндлера к конкретному виртуальному каталогу : <Location /shtml> SetHandler server-parsed </Location>   <Files map-*> SetHandler imap-file </Files> Кстати , директивы AddHandler и SetHandler реализованы не в самом ядре апача , а в модуле mod_actions. При использовании mod_perl нужно использовать комбинацию из 2-х директив - SetHandler и PerlHandler : <Location /graph> SetHandler perl-script PerlHandler Apache::Graph </Location> Если нужно привязать Perl content handler к расширению , используется директива <Files>: <Files ~ "\.graph$"> SetHandler perl-script PerlHandler Apache::Graph </Files> Для отмены действия SetHandler можно написать в конфиге : <Location /graph/tutorial> SetHandler default-handler </Location>

Добавление футера на веб-странице

Для демонстрации работы контент-хэндлера разработаем модуль , который добавляет футер на все страницы данного каталога . Пример 4-1 включает код Apache::Footer . Рассмотрим его подробнее : package Apache::Footer; use strict; use Apache::Constants qw(:common); use Apache::File (); Прагма use strict защищает модуль от таких ошибок , как конфликт имен переменных , неправильный синтаксис написания функций без круглых скобок и т.д. Собственно функция handler и генерит контент : sub handler { my $r = shift; return DECLINED unless $r->content_type() eq 'text/html'; my $file = $r->filename;   unless (-e $r->finfo) { $r->log_error("File does not exist: $file"); return NOT_FOUND; } unless (-r _) { $r->log_error("File permissions deny access: $file"); return FORBIDDEN; } Сначала мы получаем имя файла , потом проверяем его наличие и право на обработку . Получим время изменения файла : my $modtime = localtime((stat _)[9]); Пытаемся открыть файл , и если что , возвращаем 500-ю ошибку : my $fh; unless ($fh = Apache::File->new($file)) { $r->log_error("Couldn't open $file for reading: $!"); return SERVER_ERROR; } Строим футер : my $footer = <<END; <hr> &copy; 2001 <a href="http://www.ora.com/">O'Reilly &amp; Associates</a><br> <em>Last Modified: $modtime</em> END Переписываем документ : $r->send_http_header; while (<$fh>) { s!(</BODY>)!$footer$1!oi; } continue { $r->print($_); } В конце перлу нужно послать сигнал о том , что все прошло корректно : return OK; }   1; Полный вариант : package Apache::Footer; # file: Apache/Footer.pm   use strict; use Apache::Constants qw(:common); use Apache::File ();   sub handler { my $r = shift; return DECLINED unless $r->content_type() eq 'text/html';   my $file = $r->filename;   unless (-e $r->finfo) { $r->log_error("File does not exist: $file"); return NOT_FOUND; } unless (-r _) { $r->log_error("File permissions deny access: $file"); return FORBIDDEN; }   my $modtime = localtime((stat _)[9]);   my $fh; unless ($fh = Apache::File->new($file)) { $r->log_error("Couldn't open $file for reading: $!"); return SERVER_ERROR; } my $footer = <<END; <hr> &copy; 2001 <a href=">http://www.ora.com/">O'Reilly &amp; Associates</a><br> <em>Last Modified: $modtime</em> END   $r->send_http_header; while (<$fh>) { s!(</BODY>)!$footer$1!oi; } continue { $r->print($_); }   return OK; }   1; _ _END_ _ Если все файлы лежат в одном каталоге , то футер можно прописать в конфиге так : <Location /footer> SetHandler perl-script PerlHandler Apache::Footer </Location> Если файлы раскиданы по дереву , прикрутить Apache::Footer можно с помощью уникального расширения .footer : AddType text/html .footer <Files ~ "\.footer$"> SetHandler perl-script PerlHandler Apache::Footer </Files> Реализовать футер можно также с помощью .htaccess . При этом не нужно перезагружать сервер : SetHandler perl-script PerlHandler Apache::Footer

SSI

Предыдущий вариант футера имеет один недостаок - для того , чтобы изменить футер , нужно изменить код , что ни есть хорошо . Подобная технология еще называется hardcode . Более гибким методом является использование модуля Apache::Sandwich. Еще более мощным инструментом является SSI. SSI - это фрагмент кода внутри HTML , обрамленный комментарием . Например , текущие время и дата с помощью SSI : Today is <!--#echo var="DATE_LOCAL"-->. Мы будем использовать mod_perl для разработки наших собственных SSI . Базовая идея : <!--#DIRECTIVE PARAM1 PARAM2 PARAM3 PARAM4...--> Имеется директива и набор параметров . Можно использовать одинарные и двойные кавычки . Директива грузится не из модуля , а из конфига . Директива : sub HELLO { "Hello World!"; } SSI: I said <!--#HELLO--> Следующая более сложная подпрограмма возвращает хидер из внешнего запроса : sub HTTP_HEADER { my ($r,$field) = @_; $r->header_in($field); } SSI : You are using the browser <!-- #HTTP_HEADER User-Agent -->. Пример HTML-файла с SSI : <html> <head> <title>Server-Side Includes</title></head> <body bgcolor=white> <h1>Server-Side Includes</h2> This is some straight text.<p> This is a "<!-- #HELLO -->" include.<p> The file size is <strong><!-- #FSIZE --></strong>, and it was last modified on <!-- #MODTIME %x --><p> Today is <!-- #DATE "%A, in <em>anno domini</em> %Y"-->.<p> The user agent is <em><!--#HTTP_HEADER User-Agent--></em>.<p> Oops: <!--#OOPS 0--><p> Here is an included file: <pre> <!--#INCLUDE /include.txt 1--> </pre> <!--#FOOTER--> </body> </html> Реализация SSI выполняется с помощью модуля Apache::ESSI : package Apache::ESSI; use strict; use Apache::Constants qw(:common); use Apache::File (); use Text::ParseWords qw(quotewords); my (%MODIFIED, %SUBSTITUTION); sub handler { my $r = shift; $r->content_type() eq 'text/html' || return DECLINED; my $fh = Apache::File->new($r->filename) || return DECLINED; my $sub = read_definitions($r) || return SERVER_ERROR; $r->send_http_header; $r->print($sub->($r, $fh)); return OK; } sub read_definitions { my $r = shift; my $def = $r->dir_config('ESSIDefs'); return unless $def; return unless -e ($def = $r->server_root_relative($def)); Вначале read_definitions( ) возвращает путь к файлу , в котором находятся определения SSI . В данном случае это переменная ESSIDefs , определенная в конфиге с помощью PerlSetVar-директивы. Путь к этому файлу может быть относительным или абсолютным . Путь передается методу server_root_relative( ). Затем мы проверяем кэш и время изменения файла . Используются 2 хэша : %SUBSTITUTION хранит откомпилированный код и %MODIFIED хранит дату . Компиляция SSI-функции : my $package = join "::", _ _PACKAGE_ _, $def; $package =~ tr/a-zA-Z0-9_/_/c; eval "package $package; do '$def'"; if($@) { $r->log_error("Eval of $def did not return true: $@"); return; } $SUBSTITUTION{$def} = sub { do_substitutions($package, @_); }; $MODIFIED{$def} = -M $def; # store modification date return $SUBSTITUTION{$def}; } Результат компиляции сохраняем в %SUBSTITUTION , изменяем %MODIFIED. Добавляем содержимое переменной $package в do_substitutions( ) sub do_substitutions { my $package = shift; my($r, $fh) = @_; # Make sure that eval() errors aren't trapped. local $SIG{_ _WARN_ _}; local $SIG{_ _DIE_ _}; local $/; #slurp $fh my $data = <$fh>; $data =~ s/<!--\s*\#(\w+) # start of a function name \s*(.*?) # optional parameters \s*--> # end of comment /call_sub($package, $1, $r, $2)/xseg; $data; } Полный пример : package Apache::ESSI; # file: Apache/ESSI.pm use strict; use Apache::Constants qw(:common); use Apache::File (); use Text::ParseWords qw(quotewords); my (%MODIFIED, %SUBSTITUTION); sub handler { my $r = shift; $r->content_type() eq 'text/html' || return DECLINED; my $fh = Apache::File->new($r->filename) || return DECLINED; my $sub = read_definitions($r) || return SERVER_ERROR; $r->send_http_header; $r->print($sub->($r, $fh)); return OK; } sub read_definitions { my $r = shift; my $def = $r->dir_config('ESSIDefs'); return unless $def; return unless -e ($def = $r->server_root_relative($def)); return $SUBSTITUTION{$def} if $MODIFIED{$def} && $MODIFIED{$def} <= -M _; my $package = join "::", _ _PACKAGE_ _, $def; $package =~ tr/a-zA-Z0-9_/_/c; eval "package $package; do '$def'"; if($@) { $r->log_error("Eval of $def did not return true: $@"); return; } $SUBSTITUTION{$def} = sub { do_substitutions($package, @_); }; $MODIFIED{$def} = -M $def; # store modification date return $SUBSTITUTION{$def}; } sub do_substitutions { my $package = shift; my($r, $fh) = @_; # Make sure that eval() errors aren't trapped. local $SIG{_ _WARN_ _}; local $SIG{_ _DIE_ _}; local $/; #slurp $fh my $data = <$fh>; $data =~ s/<!--\s*\#(\w+) # start of a function name \s*(.*?) # optional parameters \s*--> # end of comment /call_sub($package, $1, $r, $2)/xseg; $data; } sub call_sub { my($package, $name, $r, $args) = @_; my $sub = \&{join '::', $package, $name}; $r->chdir_file; my $res = eval { $sub->($r, quotewords('[ ,]',0,$args)) }; return "<em>[$@]</em>" if $@; return $res; } 1; _ _END_ _ Файл определения SSI-функций : # This file is require'd, and therefore must end with # a true value. use Apache::File (); use Apache::Util qw(ht_time size_string); # insert the string "Hello World!" sub HELLO { my $r = shift; "Hello World!"; } # insert today's date possibly modified by a strftime() format # string sub DATE { my ($r,$format) = @_; return scalar(localtime) unless $format; return ht_time(time, $format, 0); } # insert the modification time of the document, possibly modified # by a strftime() format string. sub MODTIME { my ($r,$format) = @_; my $mtime = (stat $r->finfo)[9]; return localtime($mtime) unless $format; return ht_time($mtime, $format, 0); } # insert the size of the current document sub FSIZE { my $r = shift; return size_string -s $r->finfo; } # divide 10 by the argument (used to test runtime error trapping) sub OOPS { 10/$_[1]; } # insert a canned footer sub FOOTER { my $r = shift; my $modtime = MODTIME($r); return <<END; <hr> &copy; 2001 <a href="http://www.ora.com/">O'Reilly &amp; Associates</a><br> <em>Last Modified: $modtime</em> END } # insert the named field from the incoming request sub HTTP_HEADER { my ($r,$h) = @_; $r->header_in($h); } #ensure that path is relative, and does not contain ".." sub is_below_only { $_[0] !~ m:(^/|(^|/)\.\.(/|$)): } # Insert the contents of a file. If the $virtual flag is set # does a document-root lookup, otherwise treats filename as a # physical path. sub INCLUDE { my ($r,$path,$virtual) = @_; my $file; if($virtual) { $file = $r->lookup_uri($path)->filename; } else { unless(is_below_only($path)) { die "Can't include $path\n"; } $file = $path; } my $fh = Apache::File->new($file) || die "Couldn't open $file: $!\n"; local $/; return <$fh>; } 1; Для нормальной работы SSI необходимы также Apache Embperl и ePerl пакеты .

Динамический бар навигации

Многие сайты используют навигационный бар для помощи пользователям. Обычно там присутствует ссылки и кнопки в форме картинок. На некоторых сайтах используются скрипты для придания кнопочного эффекта "rollover" , когда кнопка меняет цвет при наведении мыши. Если содержимое сайта динамически меняется , то соответственное добавление бара ко вновь добавленной странице может превратиться в проблему, поскольку прийдется постоянно апгрэйдить ее содержимое.

Мы используем Apache content handlers. Мы разработаем бар-модуль под названием Apache::NavBar. При активации , этот модуль автоматом добавит бар ко всем страницам на сайте. На нем будет размещаться линк. (Figure 4-3).

Бар будет строиться динамически с помощью конфига. Например , как это делается на сайте http://stein.cshl.org:

# Конфиг 
 /index.html             Home 
 /jade/                  Jade 
 /AcePerl/               AcePerl 
 /software/boulder/      BoulderIO 
 /software/WWW/          WWW 
 /linux/                 Linux
 

Справа в конфиге - 6 значений : "Home," "Jade," "AcePerl," "BoulderIO," "WWW," , "Linux" (имена пакетов). Слева - соответствующие урлы.

Пример 4-6 дает полный код для Apache::NavBar. В конце примера - perl.conf (или httpd.conf если хотите), который и активирует этот бар.

package Apache::NavBar; 
 # file Apache/NavBar.pm 
   
 use strict; 
 use Apache::Constants qw(:common); 
 use Apache::File (); 
   
 my %BARS = (); 
 my $TABLEATTS   = 'WIDTH="100%" BORDER=1'; 
 my $TABLECOLOR  = '#C8FFFF'; 
 my $ACTIVECOLOR = '#FF0000'; 
   
 

The preamble brings in the usual modules and defines some constants that will be used later in the code. Among the constants are ones that control the color and size of the navigation bar.

 sub handler { 
     my $r = shift; 
     my $bar = read_configuration($r) || return DECLINED;
 

The handler( ) function starts by calling an internal function named read_configuration( ), which, as its name implies, parses the navigation bar configuration file. If successful, the function returns a custom-designed NavBar object that implements the methods we need to build the navigation bar on the fly. As in the server-side includes example, we cache NavBar objects in the package global %BARS and only re-create them when the configuration file changes. The cache logic is all handled internally by read_configuration( ).

If, for some reason, read_configuration( ) returns an undefined value, we decline the transaction by returning DECLINED. Apache will display the page, but the navigation bar will be missing.

    $r->content_type eq 'text/html'  || return DECLINED; 
     my $fh = Apache::File->new($r->filename) || return DECLINED;
 

As in the server-side include example, we check the MIME type of the requested file. If it isn't of type text/html, then we can't add a navigation bar to it and we return DECLINED to let Apache take its default actions. Otherwise, we attempt to open the file by calling Apache::File 's new( ) method. If this fails, we again return DECLINED to let Apache generate the appropriate error message.

    my $navbar = make_bar($r, $bar);
 

Having successfully processed the configuration file and opened the requested file, we call an internal subroutine named make_bar( ) to create the HTML text for the navigation bar. We'll look at this subroutine momentarily. This fragment of HTML is stored in a variable named $navbar.

    $r->send_http_header; 
     return OK if $r->header_only; 
   
     local $/ = ""; 
     while (<$fh>) { 
        s:(</BODY>):$navbar$1:i; 
        s:(<BODY.*?>):$1$navbar:si; 
     } continue {  
        $r->print($_);  
     } 
   
     return OK; 
 }
 

The remaining code should look familiar. We send the HTTP header and loop through the text in paragraph-style chunks looking for all instances of the <BODY> and </BODY> tags. When we find either tag we insert the navigation bar just below or above it. We use paragraph mode (by setting $/ to the empty string) in order to catch documents that have spread the initial <BODY> tag among multiple lines.

sub make_bar { 
     my($r, $bar) = @_; 
     # create the navigation bar 
     my $current_url = $r->uri; 
     my @cells;
 

The make_bar( ) function is responsible for generating the navigation bar HTML code. First, it recovers the current document's URI by calling the Apache request object's uri( ) method. Next, it calls $bar->urls( ) to fetch the list of partial URIs for the site's major areas and iterates over the areas in a for( ) loop:

   for my $url ($bar->urls) { 
       my $label = $bar->label($url); 
       my $is_current = $current_url =~ /^$url/; 
       my $cell = $is_current ? 
           qq(<FONT COLOR="$ACTIVECOLOR">$label</FONT>) 
               : qq(<A HREF="$url">$label</A>); 
       push @cells,  
       qq(<TD CLASS="navbar" ALIGN=CENTER BGCOLOR="$TABLECOLOR">$cell</TD>\n); 
    }
 

For each URI, the code fetches its human-readable label by calling $bar->label( ) and determines whether the current document is part of the area using a pattern match. What happens next depends on whether the current document is part of the area or not. In the former case, the code generates a label enclosed within a <FONT> tag with the COLOR attribute set to red. In the latter case, the code generates a hypertext link. The label or link is then pushed onto a growing array of HTML table cells.

    return qq(<TABLE $TABLEATTS><TR>@cells</TR></TABLE>\n); 
 }
 

At the end of the loop, the code incorporates the table cells into a one-row table and returns the HTML to the caller.

We next look at the read_configuration( ) function:

sub read_configuration { 
     my $r = shift; 
     my $conf_file; 
     return unless $conf_file = $r->dir_config('NavConf'); 
     return unless -e ($conf_file = $r->server_root_relative($conf_file));
 

Potentially there can be several configuration files, each one for a different part of the site. The path to the configuration file is specified by a per-directory Perl configuration variable named NavConf. We retrieve the path to the configuration file with dir_config( ), convert it into an absolute path name with server_root_relative( ), and test that the file exists with the -e operator.

    my $mod_time = (stat _)[9]; 
     return $BARS{$conf_file} if $BARS{$conf_file}  
       && $BARS{$conf_file}->modified >= $mod_time; 
     return $BARS{$conf_file} = NavBar->new($conf_file); 
 }
 

Because we don't want to reparse the configuration each time we need it, we cache the NavBar object in much the same way we did with the server-side include example. Each NavBar object has a modified( ) method that returns the time that its configuration file was modified. The NavBar objects are held in a global cache named %BARS and indexed by the name of the configuration files. The next bit of code calls stat( ) to return the configuration file's modification time--notice that we can stat( ) the _ filehandle because the foregoing -e operation will have cached its results. We then check whether there is already a ready-made NavBar object in the cache, and if so, whether its modification date is not older than the configuration file. If both tests are true, we return the cached object; otherwise, we create a new one by calling the NavBar new( ) method.

You'll notice that we use a different technique for finding the modification date here than we did in Apache::ESSI (Example 4-3). In the previous example, we used the -M file test flag, which returns the relative age of the file in days since the Perl interpreter was launched. In this example, we use stat( ) to determine the absolute age of the file from the filesystem timestamp. The reason for this will become clear later, when we modify the module to handle If-Modified-Since caching.

Toward the bottom of the example is the definition for the NavBar class. It defines three methods named new( ), urls( ), and label( ) :

package NavBar;
  
 # create a new NavBar object 
 sub new { 
     my ($class,$conf_file) = @_; 
     my (@c,%c); 
     my $fh = Apache::File->new($conf_file) || return; 
     while (<$fh>) { 
        chomp; 
        s/^\s+//; s/\s+$//;   # fold leading and trailing whitespace 
        next if /^#/ || /^$/; # skip comments and empty lines 
        next unless my($url, $label) = /^(\S+)\s+(.+)/; 
        push @c, $url;     # keep the url in an ordered array 
        $c{$url} = $label; # keep its label in a hash 
     } 
     return bless {'urls'  => \@c, 
                  'labels' => \%c, 
                  'modified' => (stat $conf_file)[9]}, $class; 
 }
 

The new( ) method is called to parse a configuration file and return a new NavBar object. It opens up the indicated configuration file, splits each row into the URI and label parts, and stores the two parts into a hash. Since the order in which the various areas appear in the navigation bar is significant, this method also saves the URIs to an ordered array.

# return ordered list of all the URIs in the navigation bar 
 sub urls  { return @{shift->{'urls'}}; } 
   
 # return the label for a particular URI in the navigation bar 
 sub label { return $_[0]->{'labels'}->{$_[1]} || $_[1]; } 
   
 # return the modification date of the configuration file 
 sub modified { return $_[0]->{'modified'}; } 
   
 1;
 

The urls( ) method returns the ordered list of areas, and the label( ) method uses the NavBar object's hash to return the human-readable label for the given URI. If none is defined, it just returns the URL. modified( ) returns the modification time of the configuration file.

Example 4-6: A Dynamic Navigation Bar
package Apache::NavBar; 
 # file Apache/NavBar.pm 
   
 use strict; 
 use Apache::Constants qw(:common); 
 use Apache::File (); 
   
 my %BARS = (); 
 my $TABLEATTS   = 'WIDTH="100%" BORDER=1'; 
 my $TABLECOLOR  = '#C8FFFF'; 
 my $ACTIVECOLOR = '#FF0000'; 
   
 sub handler { 
     my $r = shift; 
     my $bar = read_configuration($r) || return DECLINED; 
     $r->content_type eq 'text/html'  || return DECLINED; 
     my $fh = Apache::File->new($r->filename) || return DECLINED; 
     my $navbar = make_bar($r, $bar); 
      
     $r->send_http_header; 
     return OK if $r->header_only; 
   
     local $/ = ""; 
     while (<$fh>) { 
        s:(</BODY>):$navbar$1:oi; 
        s:(<BODY.*?>):$1$navbar:osi; 
     } continue {  
        $r->print($_);  
     } 
   
     return OK; 
 } 
   
 sub make_bar { 
     my($r, $bar) = @_; 
     # create the navigation bar 
     my $current_url = $r->uri; 
     my @cells; 
     for my $url ($bar->urls) { 
        my $label = $bar->label($url); 
        my $is_current = $current_url =~ /^$url/; 
        my $cell = $is_current ? 
            qq(<FONT COLOR="$ACTIVECOLOR">$label</FONT>) 
                : qq(<A HREF="$url">$label</A>); 
        push @cells,  
        qq(<TD CLASS="navbar" ALIGN=CENTER BGCOLOR="$TABLECOLOR">$cell</TD>\n); 
     } 
     return qq(<TABLE $TABLEATTS><TR>@cells</TR></TABLE>\n); 
 } 
   
 # read the navigation bar configuration file and return it as a hash.
 sub read_configuration { 
     my $r = shift; 
     my $conf_file; 
     return unless $conf_file = $r->dir_config('NavConf'); 
     return unless -e ($conf_file = $r->server_root_relative($conf_file)); 
     my $mod_time = (stat _)[9]; 
     return $BARS{$conf_file} if $BARS{$conf_file}  
       && $BARS{$conf_file}->modified >= $mod_time; 
     return $BARS{$conf_file} = NavBar->new($conf_file); 
 } 
   
 package NavBar;
   
 # create a new NavBar object 
 sub new { 
     my ($class,$conf_file) = @_; 
     my (@c,%c); 
     my $fh = Apache::File->new($conf_file) || return; 
     while (<$fh>) { 
        chomp; 
        s/^\s+//; s/\s+$//;   # fold leading and trailing whitespace 
        next if /^#/ || /^$/; # skip comments and empty lines 
        next unless my($url, $label) = /^(\S+)\s+(.+)/; 
        push @c, $url;     # keep the url in an ordered array 
        $c{$url} = $label; # keep its label in a hash 
     } 
     return bless {'urls' => \@c, 
                  'labels' => \%c, 
                  'modified' => (stat $conf_file)[9]}, $class; 
 } 
   
 # return ordered list of all the URIs in the navigation bar 
 sub urls  { return @{shift->{'urls'}}; } 
   
 # return the label for a particular URI in the navigation bar 
 sub label { return $_[0]->{'labels'}->{$_[1]} || $_[1]; } 
   
 # return the modification date of the configuration file 
 sub modified { return $_[0]->{'modified'}; } 
   
 1; 
 _    _END_    _
 

A configuration file section to go with Apache::NavBar might read:

<Location /> 
   SetHandler  perl-script 
   PerlHandler Apache::NavBar 
   PerlSetVar  NavConf conf/navigation.conf 
 </Location>
 

Because so much of what Apache::NavBar and Apache:ESSI do is similar, you might want to merge the navigation bar and server-side include examples. This is just a matter of cutting and pasting the navigation bar code into the server-side function definitions file and then writing a small stub function named NAVBAR( ). This stub function will call the subroutines that read the configuration file and generate the navigation bar table. You can then incorporate the appropriate navigation bar into your pages anywhere you like with an include like this one:

<!--#NAVBAR-->
 

Handling If-Modified-Since

One of us (Lincoln) thought the virtual navigation bar was so neat that he immediately ran out and used it for all documents on his site. Unfortunately, he had some pretty large (>400 MB) files there, and he soon noticed something interesting. Before installing the navigation bar handler, browsers would cache the large HTML files locally and only download them again when they had changed. After installing the handler, however, the files were always downloaded. What happened?

When a browser is asked to display a document that it has cached locally, it sends the remote server a GET request with an additional header field named If-Modified-Since. The request looks something like this:

GET /index.html HTTP/1.0 
 If-Modified-Since: Tue, 24 Feb 1998 11:19:03 GMT 
 User-Agent: (etc. etc. etc.)

The server will compare the document's current modification date to the time given in the request. If the document is more recent than that, it will return the whole document. Otherwise, the server will respond with a 304 "not modified" message and the browser will display its cached copy. This reduces network bandwidth usage dramatically.

When you install a custom content handler, the If-Modified-Since mechanism no longer works unless you implement it. In fact, you can generally ignore If-Modified-Since because content handlers usually generate dynamic documents that change from access to access. However, in some cases the content you provide is sufficiently static that it pays to cache the documents. The navigation bar is one such case because even though the bar is generated dynamically, it rarely changes from day to day.

In order to handle If-Modified-Since caching, you have to settle on a definition for the document's most recent modification date. In the case of a static document, this is simply the modification time of the file. In the case of composite documents that consist equally of static file content and a dynamically generated navigation bar, the modification date is either the time that the HTML file was last changed or the time that the navigation bar configuration file was changed, whichever happens to be more recent. Fortunately for us, we're already storing the configuration file's modification date in the NavBar object, so finding this aggregate modification time is relatively simple.

To use these routines, simply add the following just before the call to $r->send_http_header in the handler( ) subroutine:

$r->update_mtime($bar->modified);
 $r-.set_last_modified;
 my $rc = $r-> meets_conditions
 return $rc unless $rc == OK;
 

We first call the update_mtime( ) function with the navigation bar's modification date. This function will compare the specified date with the modification date of the request document and update the request's internal mtime field to the most recent of the two. We then call set_last_modified( ) to copy the mtime field into the outgoing Last-Modified header. If a synthesized document depends on several configuration files, you should call update_mtime( ) once for each configuration file, followed by set_last_modified( ) at the very end.

The complete code for the new and improved Apache::NavBar, with the If-Modified-Since improvements, can be found at this book's companion web site.

If you think carefully about this module, you'll see that it still isn't strictly correct. There's a third modification date that we should take into account, that of the module source code itself. Changes to the source code may affect the appearance of the document without changing the modification date of either the configuration file or the HTML file. We could add a new update_mtime( ) with the modification time of the Apache::NavBar module, but then we'd have to worry about modification times of libraries that Apache::NavBar depends on, such as Apache::File. This gets hairy very quickly, which is why caching becomes a moot issue for any dynamic document much more complicated than this one. See "The Apache::File Class" in Chapter 9, Perl API Reference Guide, for a complete rundown of the methods that are available to you for controlling HTTP/1.1 caching.

Sending Static Files

If you want your content handler to send a file through without modifying it, the easiest way is to let Apache do all the work for you. Simply return DECLINED from your handler (before you send the HTTP header or the body) and the request will fall through to Apache's default handler. This is a lot easier, not to mention faster, than opening up the file, reading it line by line, and transmitting it unchanged. In addition, Apache will automatically handle a lot of the details for you, first and foremost of which is handling the If-Modified-Since header and other aspects of client-side caching.

If you have a compelling reason to send static files manually, see Using Apache::File to Send Static Files in Chapter 9 for a full description of the technique. Also see "Redirection," later in this chapter, for details on how to direct the browser to request a different URI or to make Apache send the browser a different document from the one that was specifically requested.

Virtual Documents

The previous sections of this chapter have been concerned with transforming existing files. Now we turn our attention to spinning documents out of thin air. Despite the fact that these two operations seem very different, Apache content handlers are responsible for them both. A content handler is free to ignore the translation of the URI that is passed to it. Apache neither knows nor cares that the document produced by the content handler has no correspondence to a physical file.

We've already seen an Apache content handler that produces a virtual document. Chapter 2, A First Module, gave the code for Apache::Hello, an Apache Perl module that produces a short HTML document. For convenience, we show it again in Example 4-7. This content handler is essentially identical to the previous content handlers we've seen. The main difference is that the content handler sets the MIME content type itself, calling the request object's content_type( ) method to set the MIME type to type text/html. This is in contrast to the idiom we used earlier, where the handler allowed Apache to choose the content type for it. After this, the process of emitting the HTTP header and the document itself is the same as we've seen before.

After setting the content type, the handler calls send_http_header( ) to send the HTTP header to the browser, and immediately exits with an OK status code if header_only( ) returns true (this is a slight improvement over the original Chapter 2 version of the program). We call get_remote_host( ) to get the DNS name of the remote host machine, and incorporate the name into a short HTML document that we transmit using the request object's print( ) method. At the end of the handler, we return OK.

There's no reason to be limited to producing virtual HTML documents. You can just as easily produce images, sounds, and other types of multimedia, provided of course that you know how to produce the file format that goes along with the MIME type.

Example 4-7: "Hello World" Redux
package Apache::Hello; 
 # file: Apache/Hello.pm
   
 use strict; 
 use Apache::Constants qw(:common);
   
 sub handler { 
     my $r = shift; 
     $r->content_type('text/html'); 
     $r->send_http_header; 
     return OK unless $r->header_only; 
     my $host = $r->get_remote_host; 
     $r->print(<<END); 
 <HTML> 
 <HEAD> 
 <TITLE>Hello There</TITLE> 
 </HEAD> 
 <BODY> 
 <H1>Hello $host</h2> 
 "Hello world" is a terribly overused phrase in programming books, 
 don't you think? 
 </BODY> 
 </HTML> 
 END 
     return OK; 
 }
   
 1;
 

Redirection

Instead of synthesizing a document, a content handler has the option of redirecting the browser to fetch a different URI using the HTTP redirect mechanism. You can use this facility to randomly select a page or picture to display in response to a URI request (many banner ad generators work this way) or to implement a custom navigation system.

Redirection is extremely simple with the Apache API. You need only add a Location field to the HTTP header containing the full or partial URI of the desired destination, and return a REDIRECT result code. A complete functional example using mod_perl is only a few lines (Example 4-8). This module, named Apache::GoHome, redirects users to the hardcoded URI http://www.ora.com/. When the user selects a document or a portion of the document tree that this content handler has been attached to, the browser will immediately jump to that URI.

The module begins by importing the REDIRECT error code from Apache::Constants (REDIRECT isn't among the standard set of result codes imported with :common). The handler( ) method then adds the desired location to the outgoing headers by calling Apache::header_out( ). header_out( ) can take one or two arguments. Called with one argument, it returns the current value of the indicated HTTP header field. Called with two arguments, it sets the field indicated by the first argument to the value indicated by the second argument. In this case, we use the two-argument form to set the HTTP Location field to the desired URI.

The final step is to return the REDIRECT result code. There's no need to generate an HTML body, since most HTTP-compliant browsers will take you directly to the Location URI. However, Apache adds an appropriate body automatically in order to be HTTP-compliant. You can see the header and body message using telnet:

% telnet localhost 80 
 Trying 127.0.0.1... 
 Connected to localhost. 
 Escape character is '^]'. 
 GET /gohome HTTP/1.0  
 HTTP/1.1 302 Moved Temporarily 
 Date: Mon, 05 Oct 1998 22:15:17 GMT 
 Server: Apache/1.3.3-dev (Unix) mod_perl/1.16 
 Location: http://www.ora.com/ 
 Connection: close 
 Content-Type: text/html 
   
 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 
 <HTML><HEAD> 
 <TITLE>302 Moved Temporarily</TITLE> 
 </HEAD><BODY> 
 <H1>Moved Temporarily</h2> 
 The document has moved <A HREF="http://www.ora.com/">here</A>.<P> 
 </BODY></HTML> 
 Connection closed by foreign host.
 

You'll notice from this example that the REDIRECT status causes a "Moved Temporarily" message to be issued. This is appropriate in most cases because it makes no warrants to the browser that it will be redirected to the same location the next time it tries to fetch the desired URI. If you wish to redirect permanently, you should use the MOVED status code instead, which results in a "301 Moved Permanently" message. A smart browser might remember the redirected URI and fetch it directly from its new location the next time it's needed.

Example 4-8: Generating a Redirect from a Content Handler
package Apache::GoHome; 
 # file: Apache/GoHome.pm
   
 use strict; 
 use Apache::Constants qw(REDIRECT);
   
 sub handler { 
   my $r = shift; 
   $r->content_type('text/html'); 
   $r->header_out(Location => 'http://www.ora.com/'); 
   return REDIRECT; 
 }
   
 1; 
 _    _END_    _
 

As a more substantial example of redirection in action, consider Apache::RandPicture (Example 4-9) which randomly selects a different image file to display each time it's called. It works by selecting an image file from among the contents of a designated directory, then redirecting the browser to that file's URI. In addition to demonstrating a useful application of redirection, it again shows off the idiom for interconverting physical file names and URIs.

The handler begins by fetching the name of a directory to fetch the images from, which is specified in the server configuration file by the Perl variable PictureDir. Because the selected image has to be directly fetchable by the browser, the image directory must be given as a URI rather than as a physical path.

The next task is to convert the directory URI into a physical directory path. The subroutine adds a / to the end of the URI if there isn't one there already (ensuring that Apache treats the URI as a directory), then calls the request object's lookup_uri( ) and filename( ) methods in order to perform the URI translation steps. The code looks like this:

    my $subr = $r->lookup_uri($dir_uri); 
     my $dir = $subr->filename; 
 

Now we need to obtain a listing of image files in the directory. The simple way to do this would be to use the Perl glob operator, for instance:

chdir $dir; 
 @files = <*.{jpg,gif}>;
 

However, this technique is flawed. First off, on many systems the glob operation launches a C subshell, which sends performance plummeting and won't even work on systems without the C shell (like Win32 platforms). Second, it makes assumptions about the extension types of image files. Your site may have defined an alternate extension for image files (or may be using a completely different system for keeping track of image types, such as the Apache MIME magic module), in which case this operation will miss some images.

Instead, we create a DirHandle object using Perl's directory handle object wrapper. We call the directory handle's read( ) method repeatedly to iterate through the contents of the directory. For each item we ask Apache what it thinks the file's MIME type should be, by calling the lookup_uri( ) method to turn the filename into a subrequest and content_type( ) to fetch the MIME type information from the subrequest. We perform a pattern match on the returned type and, if the file is one of the MIME image types, add it to a growing list of image URIs. The subrequest object's uri( ) method is called to return the absolute URI for the image. The whole process looks like this:

    my @files; 
     for my $entry ($dh->read) { 
        # get the file's MIME type 
        my $rr = $subr->lookup_uri($entry); 
        my $type = $rr->content_type; 
        next unless $type =~ m:^image/:; 
        push @files, $rr->uri; 
     }
 

Note that we look up the directory entry's filename by calling the subrequest object's lookup_uri( ) method rather than using the main request object stored in $r. This takes advantage of the fact that subrequests will look up relative paths relative to their own URI.

The next step is to select a member of this list randomly, which we do using this time-tested Perl idiom:

 my $lucky_one = $files[rand @files];
 

The last step is to set the Location header to point at this file (being sure to express the location as a URI) and to return a REDIRECT result code. If you install the module using the sample configuration file and <IMG> tag shown at the bottom of the listing, a different picture will be displayed every time you load the page.

Example 4-9: Redirecting the Browser to a Randomly Chosen Picture
package Apache::RandPicture; 
 # file: Apache/RandPicture.pm 
   
 use strict; 
 use Apache::Constants qw(:common REDIRECT); 
 use DirHandle (); 
   
 sub handler { 
     my $r = shift; 
     my $dir_uri = $r->dir_config('PictureDir'); 
     unless ($dir_uri) { 
        $r->log_reason("No PictureDir configured"); 
        return SERVER_ERROR; 
     } 
     $dir_uri .= "/" unless $dir_uri =~ m:/$:; 
   
     my $subr = $r->lookup_uri($dir_uri); 
     my $dir = $subr->filename; 
     # Get list of images in the directory. 
     my $dh = DirHandle->new($dir); 
     unless ($dh) { 
        $r->log_error("Can't read directory $dir: $!"); 
        return SERVER_ERROR; 
     } 
   
     my @files; 
     for my $entry ($dh->read) { 
        # get the file's MIME type 
        my $rr = $subr->lookup_uri($entry); 
        my $type = $rr->content_type; 
        next unless $type =~ m:^image/:; 
        push @files, $rr->uri; 
     } 
     $dh->close; 
     unless (@files) { 
        $r->log_error("No image files in directory"); 
        return SERVER_ERROR; 
     } 
   
     my $lucky_one = $files[rand @files]; 
     $r->header_out(Location => $lucky_one); 
     return REDIRECT; 
 } 
   
 1; 
 _    _END_    _
 

A configuration section to go with Apache::RandPicture might be:

<Location /random/picture> 
    SetHandler  perl-script 
    PerlHandler Apache::RandPicture 
    PerlSetVar  PictureDir   /banners 
 </Location>
 

And you'd use it in an HTML document like this:

<image src="/random/picture" alt="[Our Sponsor]">
 

Although elegant, this technique for selecting a random image file suffers from a bad performance bottleneck. Instead of requiring only a single network operation to get the picture from the server to the browser, it needs two round-trips across the network: one for the browser's initial request and redirect and one to fetch the image itself.

You can eliminate this overhead in several different ways. The more obvious technique is to get rid of the redirection entirely and simply send the image file directly. After selecting the random image and placing it in the variable $lucky_one, we replace the last two lines of the handler( ) subroutine with code like this:

    $subr = $r->lookup_uri($lucky_one); 
     $r->content_type($subr->content_type); 
     $r->send_http_header; 
     return OK unless $r->header_only; 
     my $fh = Apache::File->new($subr->filename) || return FORBIDDEN; 
     $r->send_fd($fh);
 

We create yet another subrequest, this one for the selected image file, then use information from the subrequest to set the outgoing content type. We then open up the file and send it with the send_fd( ) method.

However, this is still a little wasteful because it requires you to open up the file yourself. A more subtle solution would be to let Apache do the work of sending the file by invoking the subrequest's run( ) method. run( ) invokes the subrequest's content handler to send the body of the document, just as if the browser had made the request itself. The code now looks like this:

   my $subr = $r->lookup_uri($lucky_one); 
    unless ($subr->status == DOCUMENT_FOLLOWS) { 
        $r->log_error("Can't lookup file $lucky_one}: $!");      
        return SERVER_ERROR; 
    } 
    $r->content_type($subr->content_type); 
    $r->send_http_header; 
    return OK if $r->header_only; 
    $subr->run; 
    return OK;
 

We call lookup_uri( ) and check the value returned by its status( ) method in order to make sure that it is DOCUMENT_FOLLOWS (status code 200, the same as HTTP_OK). This constant is not exported by Apache::Constants by default but has to be imported explicitly. We then set the main request's content type to the same as that of the subrequest, and send off the appropriate HTTP header. Finally, we call the subrequest's run( ) method to invoke its content handler and send the contents of the image to the browser.

Internal Redirection

The two Apache::RandPicture optimizations that we showed in the previous section involve a lot of typing, and the resulting code is a bit obscure. A far more elegant solution is to let Apache do all the work for you with its internal redirection mechanism. In this scheme, Apache handles the entire redirection internally. It pretends that the web browser made the request for the new URI and sends the contents of the file, without letting the browser in on the secret. It is functionally equivalent to the solution that we showed at the end of the preceding section.

To invoke the Apache internal redirection system, modify the last two lines of Apache::RandPicture 's handler( ) subroutine to read like this:

    $r->internal_redirect($lucky_one); 
     return OK;
 

The request object's internal_redirect( ) method takes a single argument consisting of an absolute local URI (one starting with a / ). The method does all the work of translating the URI, invoking its content handler, and returning the file contents, if any. Unfortunately internal_redirect( ) returns no result code, so there's no way of knowing whether the redirect was successful (you can't do this from a conventional redirect either). However, the call will return in any case, allowing you to do whatever cleanup is needed. You should exit the handler with a result code of OK.

In informal benchmarks, replacing the basic Apache::RandPicture with a version that uses internal redirection increased the throughput by a factor of two, exactly what we'd expect from halving the number of trips through the network. In contrast, replacing all the MIME type lookups with a simpler direct grep for image file extensions had negligible effect on the speed of the module. Apache's subrequest mechanism is very efficient.

If you have very many images in the random pictures directory (more than a few hundred), iterating through the directory listing each time you need to fetch an image will result in a noticeable performance hit. In this case, you'll want to cache the directory listing in a package variable the first time you generate it and only rebuild the listing when the directory's modification time changes (or just wait for a server restart, if the directory doesn't change often). You could adapt the Apache::ESSI caching system for this purpose.

Internal redirection is a win for most cases when you want to redirect the browser to a different URI on your own site. Be careful not to use it for external URIs, however. For these, you must either use standard redirection or invoke Apache's proxy API (Chapter 7).

When you use internal redirection to pass control from one module to another, the second module in the chain can retrieve the original query string, the document URI, and other information about the original request by calling the request object's prev( ) method or, in Apache::Registry scripts only, by examining certain environment variables. There is also a way, using Apache::err_header_out( ) for the original module to set various HTTP header fields, such as cookies, that will be transferred to the second across the internal redirect. Because internal redirects are most commonly used in error handlers, these techniques are discussed in the section "Handling Errors" later in this chapter.

Processing Input

You can make the virtual documents generated by the Apache API interactive in exactly the way that you would documents generated by CGI scripts. Your module will generate an HTML form for the user to fill out. When the user completes and submits the form, your module will process the parameters and generate a new document, which may contain another fill-out form that prompts the user for additional information. In addition, you can store information inside the URI itself by placing it in the additional path information part.

CGI Parameters

When a fill-out form is submitted, the contents of its fields are turned into a series of name=value parameter pairs that are available for your module's use. Unfortunately, correctly processing these parameter pairs is annoying because, for a number of historical reasons, there are a variety of formats that you must know about and deal with. The first complication is that the form may be submitted using either the HTTP GET or POST method. If the GET method is used, the URI encoded parameter pairs can be found separated by ampersands in the "query string," the part of the URI that follows the ? character:

http://your.site/uri/path?name1=val1&name2=val2&name3=val3...
 

To recover the parameters from a GET request, mod_perl users should use the request object's args( ) method. In a scalar context this method returns the entire query string, ampersands and all. In an array context, this method returns the parsed name=value pairs; however, you will still have to do further processing in order to correctly handle multivalued parameters. This feature is only found in the Perl API. Programmers who use the C API must recover the query string from the request object's args field and do all the parsing manually.

If the client uses the POST method to submit the fill-out form, the parameter pairs can be found in something called the "client block." C API users must call three functions named setup_client_block( ), should_client_block( ), and get_client_block( ) in order to retrieve the information.

While these methods are also available in the Perl API, mod_perl users have an easier way: they need only call the request object's content( ) method to retrieve the preparsed list of name=value pairs. However, there's a catch: this only works for the older application/x-www-form-urlencoded style of parameter encoding. If the browser uses the newer multipart/form-data encoding (which is used for file uploads, among other things), then mod_perl users will have to read and parse the content information themselves. read( ) will fetch the unparsed content information by looping until the requested number of bytes have been read (or a predetermined timeout has occurred). Fortunately, there are a number of helpful modules that allow mod_perl programmers to accept file uploads without parsing the data themselves, including CGI.pm and Apache::Request, both of which we describe later.

To show you the general technique for prompting and processing user input, Example 4-10 gives a new version of Apache::Hello. It looks for a parameter named user_name and displays a customized welcome page, if present. Otherwise, it creates a more generic message. In both cases, it also displays a fill-out form that prompts the user to enter a new value for user_name. When the user presses the submission button labeled "Set Name," the information is POSTed to the module and the page is redisplayed (Figure 4-4).

Figure 4-4. The Apache::Hello2 module can process user input.

 

The code is very simple. On entry to handler( ) the module calls the request object's method( ) method to determine whether the handler was invoked using a POST request, or by some other means (usually GET). If the POST method was used, the handler calls the request object's content( ) method to retrieve the posted parameters. Otherwise, it attempts to retrieve the information from the query string by calling args( ). The parsed name=value pairs are now stuffed into a hash named %params for convenient access.

Having processed the user input, if any, the handler retrieves the value of the user_name parameter from the hash and stores it in a variable. If the parameter is empty, we default to "Unknown User."

The next step is to generate the document. We set the content type to text/html as before and emit the HTTP header. We again call the request object's header_only( ) to determine whether the client has requested the entire document or just the HTTP header information.

This is followed by a single long Apache::print( ) statement. We create the HTML header and body, along with a suitable fill-out form. Notice that we use the current value of the user name variable to initialize the appropriate text field. This is a frill that we have always thought was kind of neat.

Example 4-10: Processing User Input with the Apache Perl API
package Apache::Hello2; 
 # file: Apache/Hello2.pm 
 use strict; 
 use Apache::Constants qw(:common); 
   
 sub handler { 
     my $r = shift; 
     my %params = $r->method eq 'POST' ? $r->content : $r->args; 
     my $user_name = $params{'user_name'} || 'Unknown User';
   
     $r->content_type('text/html'); 
     $r->send_http_header; 
     return OK if $r->header_only; 
   
     $r->print(<<END); 
 <HTML> 
 <HEAD> 
 <TITLE>Hello There</TITLE> 
 </HEAD> 
 <BODY> 
 <H1>Hello $user_name</h2> 
 Who would take this book seriously if the first example didn\'t 
 say "hello $user_name"? 
 <HR> 
 <FORM METHOD="POST"> 
 Enter your name: <INPUT TYPE="text" NAME="user_name" VALUE="$user_name"> 
 <INPUT TYPE="submit" VALUE="Set Name"> 
 </FORM> 
 </BODY> 
 </HTML> 
 END
   
     return OK; 
 } 
   
 1; 
 _    _END_    _
 

A perl.conf entry to go with it might read:

<Location /hello/friend> 
  SetHandler  perl-script 
  PerlHandler Apache::Hello2 
 </Location>
 

This method of processing user input is only one of several equally valid alternatives. For example, you might want to work with query string and POSTed parameters simultaneously, to accommodate this type of fill-out form:

<FORM ACTION="/hello/friend?day=saturday" METHOD="POST"> 
    <INPUT TYPE="text" NAME="user_name"> 
    <INPUT TYPE="submit"> 
 </FORM>
 

In this case, you could recover the values of both the day and user_name parameters using a code fragment like this one:

my %params = ($r->args, $r->content);
 

If the same parameter is present in both the query string and the POSTed values, then the latter will override the former. Depending on your application's logic, you might like this behavior. Alternatively, you could store the two types of parameter in different places or take different actions depending on whether the parameters were submitted via GET or POST. For example, you might want to use query string parameters to initialize the default values of the fill-out form and enter the information into a database when a POST request is received.

When you store the parsed parameters into a hash, you lose information about parameters that are present more than once. This can be bad if you are expecting multivalued parameters, such as those generated by a selection list or a series of checkboxes linked by the same name. To keep multivalued information, you need to do something like this:

my %params; 
 my @args = ($r->args, $r->content); 
 while (my($name,$value) = splice @args,0,2) { 
   push @{$params{$name}}, $value; 
 }
 

This bit of code aggregates the GET and POST parameters into a single array named @args. It then loops through each name=value pair, building up a hash in which the key is the parameter name and the value is an array reference containing all the values for that parameter. This way, if you have a selection list that generates query strings of the form:

vegetable=kale&vegetable=broccoli&vegetable=carrots
 

you can recover the complete vegetable list in this manner:

@vegetables = @{$params{'vegetable'}};
 

An alternative is to use a module that was still in development at the time this chapter was written. This module, named Apache::Request, uses the CGI.pm-style method calls to process user input but does so efficiently by going directly to the request object. With this module, the user input parameters are retrieved by calling param( ). Call param( ) without any arguments to retrieve a list of all the parameter names. Call param( ) with a parameter name to return a list of the values for that parameter in an array context, and the first member of the list in a scalar context. Unlike the vanilla request object, input of type multipart/form-data is handled correctly, and uploaded files can be recovered too (using the same API as CGI.pm).

To take advantage of Apache::Request in our "Hello World" module, we modify the top part of the module to read as follows:

package Apache::Hello3; 
 # file: Apache/Hello3.pm
   
 use strict; 
 use Apache::Constants qw(:common); 
 use Apache::Request;
   
 sub handler { 
    my $r = Apache::Request->new(shift); 
    my $user_name = $r->param('user_name') || 'Unknown User'; 
    $r->content_type('text/html'); 
    $r->print(<<END); 
 Who cares if every single example 
 says "Hello World"???! 
 END 
 ; 
 ...
 

The main detail here is that instead of retrieving the request object directly, we wrap it inside an Apache::Request object. Apache::Request adds param( ) and a few other useful methods and inherits all other method calls from the Apache class. More information will be found in the Apache::Request manual page when that package is officially released.

Like CGI.pm, Apache::Request allows you to handle browser file uploading, although it is somewhat different in detail from the interface provided in CGI.pm versions 2.46 and lower (the two libraries have been brought into harnony in Version 2.47). As in ordinary CGI, you create a file upload field by defining an <INPUT> element of type "file" within a <FORM> section of type "multipart/form-data". After the form is POSTed, you retrieve the file contents by reading from a filehandle returned by the Apache::Request upload( ) method. This code fragment illustrates the technique:

my $r = Apache::Request->new(shift);
 my $moose = 0;
 my $uploaded_file = $r->upload('uploaded-file');
 my $uploaded_name = $r->param('uploaded-file');
 while (<$uploaded_file>) {
     $moose++ if /moose/;
 }
 print "$moose moose(s) found in $uploaded_name\n";
 

Additional Path Information

Recall that after Apache parses an incoming URI to figure out what module to invoke, there may be some extra bits left over. This extra stuff becomes the "additional path information" and is available for your module to use in any way it wishes. Because it is hierarchical, the additional path information part of the URI follows all the same relative path rules as the rest of the URI. For example, .. means to move up one level. For this reason, additional path information is often used to navigate through a virtual document tree that is dynamically created and maintained by a CGI script or module. However, you don't have to take advantage of the hierarchical nature of path information. You can just use it as a handy place to store variables. In the next chapter, we'll use additional path information to stash a session identifier for a long-running web application.

Apache modules fetch additional path information by calling the request object's path_info( ) method. If desired, they can then turn the path information into a physical filename by calling lookup_uri( ).

An example of how additional path information can be used as a virtual document tree is shown in Example 4-11, which contains the code for Apache::TreeBrowser. This module generates a series of documents that are organized in a browseable tree hierarchy that is indistinguishable to the user from a conventional HTML file hierarchy. However, there are no physical files. Instead, the documents are generated from a large treelike Perl data structure that specifies how each "document" should be displayed. Here is an excerpt:

'bark'=>{ 
     -title=>'The Wrong Tree', 
     -contents=>'His bark was worse than his bite.', 
     'smooth'=>{ 
        -title=>'Like Butter', 
        -contents=>'As smooth as silk.' 
        }, 
     'rough'=>{ 
        -title=>'Ruffled', 
        -contents=>"Don't get rough with me." 
        }, 
  }...
 

In this bit of the tree, a document named "bark" has the title "The Wrong Tree" and the contents "His bark was worse than his bite." Beneath this document are two subdocuments named "smooth" and "rough." The "smooth" document has the title "Like Butter" and the contents "As smooth as silk." The "rough" document is similarly silly. These subdocuments can be addressed with the additional path information /bark/smooth and /bark/rough, respectively. The parent document, naturally enough, is addressed by /bark. Within the module, we call each chunk of this data structure a "node."

Using the information contained in the data structure, Apache::TreeBrowser constructs the document and displays its information along with a browseable set of links organized in hierarchical fashion (see Figure 4-5). As the user moves from document to document, the currently displayed document is highlighted--sort of a hierarchical navigation bar!

Figure 4-5. Apache::TreeBrowser creates a hierarchical navigation tree.

 

The source code listing is long, so we'll run through it a chunk at a time:

package Apache::TreeBrowser; 
   
 use strict; 
 use Apache::Constants qw(:common REDIRECT); 
   
 my $TREE = make_tree();
   
 sub handler { 
     my $r = shift;
 

The module starts by importing the usual Apache constants and the REDIRECT result code. It then creates the browseable tree by calling an internal subroutine named make_tree( ) and stores the information in a package global named $TREE. In a real-life application, this data structure would be created in some interesting way, for example, using a query on a database, but in this case make_tree( ) just returns the hardcoded data structure that follows the _ _DATA_ _ token at the end of the code.

    my $path_info = $r->path_info; 
     my $path_translated = $r->lookup_uri($path_info)->filename; 
     my $current_uri = $r->uri;
 

Now's the time to process the additional path information. The handler fetches the path information by calling the request object's path_info( ) method and fetches the module's base URI by calling uri( ). Even though we won't be using it, we transform the additional path information into a physical pathname by calling lookup_uri( ) and filename( ). This is useful for seeing how Apache does URI translation.

   unless ($path_info) { 
        $r->header_out(Location => "$current_uri/"); 
        return REDIRECT; 
    }
 

For this module to work correctly, some additional path information has to be provided, even if it's only a / character. If we find that the additional path information is empty, we rectify the situation by redirecting the browser to our URI with an additional / appended to the end. This is similar to the way that Apache redirects browsers to directories when the terminal slash is omitted.

    $r->content_type('text/html'); 
     $r->send_http_header; 
     return OK if $r->header_only; 
     my($junk, @components) = split "/", $path_info;
   
     # follow the components down 
     my($node, $name) = ($TREE, ''); 
     foreach (@components) { 
        last unless $node->{$_}; 
        $name = $_; 
        $node = $node->{$_}; 
     }
 

At this point we begin to construct the document. We set the content type to text/html, send out the HTTP header, and exit if header_only( ) returns true. Otherwise, we split the path information into its components and then traverse the tree, following each component name until we either reach the last component on the list or come to a component that doesn't have a corresponding entry in the tree (which sometimes happens when users type in the URI themselves). By the time we reach the end of the tree traversal, the variable $node points to the part of the tree that is referred to by the additional path information or, if the path information wasn't entirely correct, to the part of the tree corresponding to the last valid path component.

    $r->print(<<END); 
 <HTML> 
 <HEAD> 
 <TITLE>$node->{-title}</TITLE> 
 </HEAD> 
 <BODY BGCOLOR="white"> 
 <H1>$node->{-title}</h2> 
   
 Contents = <b>$node->{-contents}</b> 
   
 <H2>Navigation Tree</H2> 
 END 
     my $prefix = "../" x @components; 
     print $prefix ?  
        qq(<H3><A HREF="$prefix">Tree Root</A></H3>\n) :  
        qq(<H3><FONT COLOR="red">Tree Root</FONT></H3>);
 

We now call print( ) to print out the HTML document. We first display the current document's title and contents. We then print a hyperlink that points back to the "root" (really the top level) of the tree. Notice how we construct this link by creating a relative URI based on the number of components in the additional path information. If the additional path information is currently /bark/rough/cork, we construct a link whose HREF is ../../../. Through the magic of relative addressing, this will take us back to the root / document.

    print_node('', $TREE, $node, $prefix); 
     print qq(<A HREF="../">Go up one level</A><P>) if $name;
 

The next task is to construct the hierarchical navigation system shown in Figure 4-5. We do this by calling print_node( ), an internal function. This is followed by a link to the next-higher document, which is simply the relative path ../.

    $r->print(<<END); 
 Node = <EM>$name</EM><br> 
 URI = <EM>$current_uri</EM><br> 
 Path information =<EM>$path_info</EM><br> 
 Translated path = <EM>$path_translated</EM> 
 </BODY> 
 </HTML> 
 END 
   
     return OK; 
 }
 

Last, we print out some more information about the current document, including the internal name of the document, the current URI, the additional path information, and the translated path information.

Let's now look at the print_node( ) subroutine:

sub print_node { 
     my ($name, $node, $current, $prefix) = @_; 
     my (@branches) = grep !/^-/, sort keys %$node; 
     if ($name) { 
        # print the node itself 
        print $node != $current ? 
            qq(<LI><A HREF="$prefix$name/">$name</A></LI>\n) :
               qq(<LI><FONT COLOR="red">$name</FONT></LI>\n); 
        # print branches underneath it 
        $prefix .= "$name/"; 
     } 
     return unless @branches; 
     print "<UL>\n"; 
     foreach (@branches) { 
        print_node($_, $node->{$_}, $current, $prefix); 
     } 
     print "</UL>\n"; 
 }
 

This subroutine is responsible for displaying a tree node as a nested list. It starts by finding all the branches beneath the requested node, which just happens to be all the hash keys that don't begin with a hyphen. It then prints out the name of the node. If the node being displayed corresponds to the current document, the name is surrounded by <FONT> tags to display it in red. Otherwise, the node name is turned into a hyperlink that points to the appropriate document. Then, for each subdocument beneath the current node, it invokes itself recursively to display the subdocument. The most obscure part of this subroutine is the need to append a $prefix variable to each URI the routine generates. $prefix contains just the right number of ../ sequences to make the URIs point to the root of the virtual document tree. This simplifies the program logic.

The last function in this module is make_tree( ). It simply reads in the text following the _ _DATA_ _ token and eval( ) s it, turning it into a Perl data structure:

sub make_tree { 
     local $/; 
     my $data = <DATA>; 
     eval $data; 
 }
   
 1; 
  _    _DATA_    _  
 
Example 4-11: Using Path Information to Browse a Tree
package Apache::TreeBrowser; 
 # file: Apache/TreeBrowser.pm 
   
 use strict; 
 use Apache::Constants qw(:common REDIRECT); 
   
 my $TREE = make_tree(); 
   
 sub handler { 
     my $r = shift; 
     my $path_info = $r->path_info; 
     my $path_translated = $r->lookup_uri($path_info)->filename; 
     my $current_uri = $r->uri; 
     unless ($path_info) { 
        $r->header_out(Location => "$current_uri/"); 
        return REDIRECT; 
     } 
      
     $r->content_type('text/html'); 
     $r->send_http_header; 
     return OK if $r->header_only; 
     my($junk, @components) = split "/", $path_info; 
      
     # follow the components down 
     my($node, $name) = ($TREE, ''); 
     foreach (@components) { 
        last unless $node->{$_}; 
        $name = $_; 
        $node = $node->{$_}; 
     } 
      
     $r->print(<<END); 
 <HTML> 
 <HEAD> 
 <TITLE>$node->{-title}</TITLE> 
 </HEAD> 
 <BODY BGCOLOR="white"> 
 <H1>$node->{-title}</h2> 
   
 Contents = <b>$node->{-contents}</b> 
   
 <H2>Navigation Tree</H2> 
 END 
                                         
     my $prefix = "../" x @components; 
     print $prefix ?  
        qq(<H3><A HREF="$prefix">Tree Root</A></H3>\n) :  
        qq(<H3><FONT COLOR="red">Tree Root</FONT></H3>); 
      
     print_node('', $TREE, $node, $prefix); 
     print qq(<A HREF="../">Go up one level</A><P>) if $name; 
      
     $r->print(<<END); 
 Node = <EM>$name</EM><br> 
 URI = <EM>$current_uri</EM><br> 
 Path information =<EM>$path_info</EM><br> 
 Translated path = <EM>$path_translated</EM> 
 </BODY> 
 </HTML> 
 END 
   
     return OK; 
 } 
   
 sub print_node { 
     my ($name, $node, $current, $prefix) = @_; 
     my (@branches) = grep !/^-/, sort keys %$node; 
     if ($name) { 
        # print the node itself 
        print $node != $current ? 
            qq(<LI><A HREF="$prefix$name/">$name</A></LI>\n) : 
                qq(<LI><FONT COLOR="red">$name</FONT></LI>\n); 
        # print branches underneath it 
        $prefix .= "$name/"; 
     } 
     return unless @branches; 
     print "<UL>\n"; 
     foreach (@branches) { 
        print_node($_, $node->{$_}, $current, $prefix); 
     } 
     print "</UL>\n"; 
 } 
   
 # create a sample tree to browse 
 sub make_tree { 
     local $/; 
     my $data = <DATA>; 
     eval $data; 
 } 
   
 _    _DATA_    _ 
 return { 
     -title => 'The Root of All Evil', 
     -contents => 'And so it begins...', 
     'bark' => { 
        -title => 'The Wrong Tree', 
        -contents => 'His bark was worse than his bite.', 
        'smooth' => { 
            -title => 'Like Butter', 
            -contents => 'As smooth as silk.', 
        }, 
        'rough' => { 
            -title => 'Ruffled', 
            -contents => "Don't get rough with me.", 
            'cork' => { 
                -title => 'Corked', 
                -contents => "Corks don't grow on trees...or do they?", 
            }, 
            'cinnamon' => { 
                -title => 'The Cinnamon Tree', 
                -contents => 'Little bird, little bird in the cinnamon tree...', 
            }, 
        } 
     }, 
     'bough' => { 
        -title => 'Stealing a Bough', 
        -contents => "I've taken a bough of silence.", 
        'forked' => { 
            -title => 'Forked Boughs', 
            -contents => 'What lightning and snakes\' tongues have in common.', 
        }, 
        'straight' => { 
            -title => 'Single Boughs', 
            -contents => 'Straight, but not narrow.', 
        }, 
        'extra' => { 
            -title => 'Take a Bough', 
            -contents => 'Nothing beats that special feeling,  
                               when you are stealing that extra bough!', 
        }, 
     }, 
     'branch' => { 
        -title => 'The Branch Not Taken', 
        -contents => 'Branch or be branched.', 
        'twig' => { 
            -title => 'Twiggy', 
            -contents => 'Anorexia returns!', 
            'twiglet' => { 
                -title => 'The Leastest Node', 
                -contents => 'Winnie the Pooh, Eeyore, and Twiglet.', 
            }, 
        }, 
        'leaf' => { 
            -title => 'Leaf me Alone!', 
            -contents => 'Look back, Leaf Ericksonn.', 
        } 
     }, 
 }
 

Here is a sample configuration file entry to go with Apache::TreeBrowser :

<Location /virtual> 
 SetHandler perl-script 
 PerlHandler Apache::TreeBrowser 
 </Location>
 

Apache::Registry

If you are using mod_perl to write Apache modules, then you probably want to take advantage of Apache::Registry. Apache::Registry is a prewritten Apache Perl module that is a content handler for files containing Perl code. In addition to making it unnecessary to restart the server every time you revise a source file, Apache::Registry sets up a simulated CGI environment, so that programs that expect to get information about the transaction from environment variables can continue to do so. This allows legacy CGI applications to run under the Apache Perl API, and lets you use server-side code libraries (such as the original CGI.pm) that assume the script is running in a CGI environment.

Apache::Registry is similar in concept to the content filters we created earlier in this chapter, but instead of performing simple string substitutions on the contents of the requested file, Apache::Registry compiles and executes the code contained within it. In order to avoid recompiling the script each time it's requested, Apache::Registry caches the compiled code and checks the file modification time each time it's requested in order to determine whether it can safely use the cached code or whether it must recompile the file. Should you ever wish to look at its source code, Apache::Registry is a good example of a well-written Apache content handler that exercises much of the Perl API.

We created a typical configuration file entry for Apache::Registry in Chapter 2. Let's examine it in more detail now.

Alias /perl/ /usr/local/apache/perl/ 
 <Location /perl> 
    SetHandler     perl-script 
    PerlHandler    Apache::Registry 
    PerlSendHeader On 
    Options        +ExecCGI 
 </Location>
 

The Alias directive simply maps the physical directory /usr/local/apache/perl/ to a virtual directory named /perl. The <Location> section is more interesting. It uses SetHandler to make perl-script the content handler for this directory and sets Apache::Registry to be the module to handle requests for files within this part of the document tree.

The PerlSendHeader On line tells mod_perl to intercept anything that looks like a header line (such as Content-Type: text/html ) and to automatically turn it into a correctly formatted HTTP/1.0 header the way that Apache does with CGI scripts. This allows you to write scripts without bothering to call the request object's send_http_header( ) method. Like other Apache::Registry features, this option makes it easier to port CGI scripts to the Apache API. If you use CGI.pm's header( ) function to generate HTTP headers, you do not need to activate this directive because CGI.pm detects mod_perl and calls send_http_header( ) for you. However, it does not hurt to use this directive anyway.

Option +ExecCGI ordinarily tells Apache that it's all right for the directory to contain CGI scripts. In this case the flag is required by Apache::Registry to confirm that you really know what you're doing. In addition, all scripts located in directories handled by Apache::Registry must be executable--another check against accidentally leaving wayward nonscript files in the directory.

When you use Apache::Registry, you can program in either of two distinct styles. You can choose to ignore the Apache Perl API entirely and act as if your script were executed within a CGI environment, or you can ignore the CGI compatibility features and make Apache API calls. You can also combine both programming styles in a single script, although you run the risk of confusing yourself and anyone else who needs to maintain your code!

A typical example of the first style is the hello.pl script (Example 4-12), which you also saw in Chapter 2. The interesting thing about this script is that there's nothing Apache-specific about it. The same script will run as a standard CGI script under Apache or any other web server. Any library modules that rely on the CGI environment will work as well.

Example 4-12: An Apache::Registry Script That Uses CGI-Compatibility Mode
#!/usr/local/bin/perl 
 # file: hello.pl
   
 print "Content-Type: text/html\n\n";
   
 print <<END; 
 <HTML> 
 <HEAD> 
 <TITLE>Hello There</TITLE> 
 </HEAD> 
 <BODY> 
 <H1>Hello $ENV{REMOTE_HOST}</h2> 
 Who would take this book seriously if the examples 
 didn't say "hello world" in at least four different ways? 
 </BODY> 
 </HTML> 
 END
 

Example 4-13 shows the same script rewritten more compactly by taking advantage of the various shortcuts provided by the CGI.pm module.

Example 4-13: An Apache::Registry Script That Uses CGI.pm
#!/usr/local/bin/perl 
 # file: hello2.pl
   
 use CGI qw(:standard); 
 print header, 
    start_html('Hello There'), 
    h1('Hello',remote_host()), 
    'Who would take this book seriously if the examples', 
    'didn\'t say "hello world" in at least four different ways?', 
    end_html;
 

In contrast, Example 4-14 shows the script written in the Apache Perl API style. If you compare the script to Example 4-7, which used the vanilla API to define its own content handler, you'll see that the contents of this script (with the exception of the #! line at the top) are almost identical to the body of the handler( ) subroutine defined there. The main difference is that instead of retrieving the Apache request object from the subroutine argument list, we get it by calling Apache->request(). request( ) is a static (class) method in the Apache package where the current request object can always be found.

There are also some subtle differences between Apache::Registry scripts that make Apache API calls and plain content handlers. One thing to notice is that there is no return value from Apache::Registry scripts. Apache::Registry normally assumes an HTTP status code of 200 (OK). However, you can change the status code manually by calling the request object's status( ) method to change the status code before sending out the header:

$r->status(404);  # forbidden
 

Strictly speaking, it isn't necessary to call send_http_header( ) if you have PerlSendHeader On. However, it is good practice to do so, and it won't lead to redundant headers being printed.

Alternatively, you can use the CGI compatibility mode to set the status by printing out an HTTP header that contains a Status: field:

print "Status: 404 Forbidden\n\n";
 

Another subtle difference is that at least one of the command-line switches that may be found on the topmost #! line is significant. The -w switch, if present, will signal Apache::Registry to turn on Perl warnings by setting the $^W global to a true value. Another common switch used with CGI scripts is -T, which turns on taint checking. Currently, taint checking can be activated for the Perl interpreter as a whole only at server startup time by setting the configuration directive PerlTaintCheck On. However, if Apache::Registry notices -T on the #! line and taint checks are not activated, it will print a warning in the server error log.

Since Apache::Registry scripts can do double duty as normal CGI scripts and as mod_perl scripts, it's sometimes useful for them to check the environment and behave differently in the two situations. They can do this by checking for the existence of the environment variable MOD_PERL or for the value of GATEWAY_INTERFACE. When running under mod_perl, GATEWAY_INTERFACE will be equal to CGI-Perl/1.1. Under the normal CGI interface, it will be CGI/1.1.

Example 4-14: An Apache::Registry Script That Uses the Apache API
#!/usr/local/bin/perl 
 # file: hello3.pl
   
 use strict;
   
 my $r = Apache->request; 
 $r->content_type('text/html'); 
 $r->send_http_header;
 return OK unless $r->header_only; 
  
 my $host = $r->get_remote_host; 
 $r->print(<<END); 
 <HTML> 
 <HEAD> 
 <TITLE>Hello There</TITLE> 
 </HEAD> 
 <BODY> 
 <H1>Hello $host</h2> 
 Enough with the "Hello worlds" already! 
 </BODY> 
 </HTML> 
 END
 

A Useful Apache::Registry Application

All the Apache::Registry examples that we've seen so far have been short and, frankly, silly. Now let's look at an example of a real-world script that actually does something useful. The guestbook script (Example 4-15), as its name implies, manages a typical site guestbook, where visitors can enter their names, email addresses, and comments. It works well as both a standalone CGI script and a mod_perl Apache::Registry script, automatically detecting when it is running under the Apache Perl API in order to take advantage of mod_perl 's features. In addition to showing you how to generate a series of fill-out forms to handle a moderately complex user interaction, this script demonstrates how to read and update a file without the risk of several instances of the script trying to do so simultaneously.

Unlike some other guestbook programs, this one doesn't append users' names to a growing HTML document. Instead, it maintains a flat file in which each user's entry is represented as a single line in the file. Tabs separate the five fields, which are the date of the entry, the user's name, the user's email address, the user's location (e.g., city of residence), and comments. Nonalphanumeric characters are URL-escaped to prevent the format from getting messed up if the user enters newlines or tabs in the fields, giving records that look like:

05/07/98  JR  jr_ewing%40dallas.com  Dallas,%20TX  Like%20the%20hat
 

When the script is first called, it presents the user with the option of signing the guestbook file or looking at previous entries (Figure 4-6).

Figure 4-6. The Apache::Registry guestbook script generates its own fill-out form.

 

If the user presses the button labeled "Sign Guestbook," a confirmation page appears, which echoes the entry and prompts the user to edit or confirm it (Figure 4-7).

Figure 4-7. The confirmation page generated by guestbook

 

Pressing the "Change Entry" button takes the user back to the previous page with the fields filled in and waiting for the user's changes. Pressing "Confirm Entry" appends the user's entry to the guestbook file and displays the whole file (Figure 4-8).

Figure 4-8. The listing of previous guestbook entries generated by guestbook

 

Turning to the source code, the script begins by importing functions from a variety of modules, including CGI.pm, IO::File, Fcntl, and POSIX :

use strict; 
 use CGI qw(:standard :html3 escape unescape escapeHTML); 
 use IO::File (); 
 use Fcntl qw(:flock); 
 use POSIX qw(strftime); 
 use vars qw(@FIELDS %REQUIRED %BIG $GUESTBOOKFILE);
   
 @FIELDS = qw(name e-mail location comments); 
 %REQUIRED = ('name' => 1, 'e-mail' => 1); 
 %BIG = ('comments' => 1);
 

The script then defines some constants. @FIELDS is an array of all the fields known to the guestbook. By changing the contents of this array you can generate different fill-out forms. %REQUIRED is a hash that designates certain fields as required, in this case name and e-mail. The script will refuse to add an entry to the guestbook until these fields are filled out (however, no error checking on the contents of the fields is done). %BIG is a hash containing the names of fields that are displayed as large text areas, in this case comments. Other fields are displayed as one-line text entries.

if ($ENV{MOD_PERL}) { 
     $GUESTBOOKFILE = Apache->request->dir_config('GuestbookFile'); 
 } 
 $GUESTBOOKFILE ||= "/usr/tmp/guestbookfile.txt";
 

Next the script checks if it is running under mod_perl by checking for the MOD_PERL environment variable. If the script finds that it is running under mod_perl, it fetches the Apache request object and queries the object for a per-directory configuration variable named GuestbookFile. This contains the physical pathname of the file where the guestbook entries are stored. If the script is a standalone CGI script, or if no GuestbookFile configuration variable is defined, the script defaults to a hardcoded file path. In the case of Apache::Registry scripts, the PerlSetVar directive used to set per-directory configuration variables must be located in a .htaccess file in the same directory as the script.

print header, 
     start_html(-title => 'Guestbook', -bgcolor => 'silver'), 
     h1("Guestbook");
 

The script now begins to generate the document by calling shortcut functions defined in the CGI module to generate the HTTP header, the HTML header and title, and a level 1 heading of "Guestbook."

 CASE: { 
      $_ = param('action'); 
      /^sign/i and do    { sign_guestbook(); last CASE; }; 
      /^confirm/i and do { write_guestbook() and view_guestbook(); last CASE; }; 
      /^view/i and do    { view_guestbook(1); last CASE; }; 
      generate_form(); 
  }
 

We now enter the variable part of the script. Depending on what phase of the transaction the user is in, we either want to prompt the user to fill out the guestbook form, confirm an entered entry, or view the entire guestbook. We distinguish between the phases by looking at the contents of a script parameter named action. If action equals sign, we know that the user has just completed the fill-out form and pressed the "Sign Guestbook" button, so we jump to the routine responsible for this part of the transaction. Similarly, we look for action values of confirm and view, and jump to the appropriate routines for these actions. If action is missing, or if it has some value we don't expect, we take the default action of generating the fill-out form.

print end_html; 
 exit 0;
 

Having done its work, the script prints out the </HTML> tag and exits.

sub generate_form { 
     my @rows; 
     for my $field (@FIELDS) { 
        my $title = "Your $field"; 
        $title .= " (optional)" if !$REQUIRED{$field}; 
        my $element = $BIG{$field} ?  
            textarea(-name => $field, 
                     -rows => 4, 
                     -columns => 50, 
                     -wrap => 1) 
                : textfield(-name => $field, -size => 50); 
        push @rows, th($title) . td($element); 
     } 
     print start_form, 
     table(TR{-align => 'LEFT'}, \@rows), 
     br, 
     submit(-name => 'action', -value => 'View Guestbook'), 
     submit(-name => 'action', -value => 'Sign Guestbook'), 
     end_form; 
 }
 

The subroutine responsible for generating the form is named, appropriately enough, generate_form( ). It iterates over @FIELDS and dynamically generates a text label and a form element for each field, modifying the format somewhat based on whether the field is marked optional or big. Each label/field pair is pushed onto a list named @rows. When the loop is finished, @rows is turned into a nicely formatted table using CGI.pm's table-generation shortcuts. The "View Guestbook" and "Sign Guestbook" buttons are added to the form, and the routine finishes.

sub sign_guestbook { 
     my @missing = check_missing(@FIELDS); 
     if (@missing) { 
        print_warning(@missing); 
        generate_form(); 
        return; 
     }
 

sign_guestbook( ) has a slightly more complex job. Its first task is to check the submitted form for missing required fields by calling the internal subroutine check_missing( ). If any are missing, it displays the missing fields by calling another internal subroutine, print_warning( ), and then invokes generate_form( ) to redisplay the form with its current values. No particular hocus-pocus is required to display the partially completed form correctly; this is just one of the beneficial side effects of CGI.pm's "sticky forms" feature.

    my @rows; 
     foreach (@FIELDS) { 
        push @rows, TR(th({-align=>'LEFT'},$_),  
                       td(escapeHTML(param($_)))); 
     } 
     print "Here is your guestbook entry.  Press ", 
     em('Confirm')," to save it, or ",em('Change'), 
     " to change it.", 
     hr, 
     table(@rows), 
     hr;
 

If all the required fields are filled in, sign_guestbook( ) generates an HTML table to display the user's entries. The technique for generating the form is similar to that used in the previous subroutine, except that no special cases are needed for different types of fields. We do, however, have to be careful to call escapeHTML( ) (a function imported from CGI.pm) in order to prevent HTML entities and other funny characters that the user might have entered from messing up the page.

    print start_form; 
     foreach (@FIELDS) { 
        print hidden(-name => $_); 
     } 
     print submit(-name => 'action', 
                 -value => 'Change Entry'), 
     submit(-name => 'action', 
           -value => 'Confirm Entry'), 
     end_form; 
 }
 

We end the routine by creating a short fill-out form. This form contains the contents of the user's guestbook entry stashed into a series of hidden fields, and push buttons labeled "Change Entry" and "Confirm Entry." We hide the guestbook entry information in this way in order to carry the information forward to the next set of pages.

sub check_missing { 
     my %p; 
     for (@_) { ++$p{$_} if param($_) } 
     return grep !$p{$_}, keys %REQUIRED; 
 } 
   
 sub print_warning { 
     print font({-color => 'red'}, 
               'Please fill in the following fields: ', 
               em(join ', ', @_), 
               '.'); 
 }
 

The check_missing( ) and print_warning( ) subroutines are short and sweet. The first routine uses the Perl grep( ) function to check the list of provided fields against the list of required fields and returns a list of the truants, if any. The second routine accepts a list of missing fields and turns it into a warning of the form, "Please fill in the following fields: e-mail." For emphasis, the message is rendered in a red font (under browsers that understand the <FONT> extension).

The write_guestbook( ) and view_guestbook( ) subroutines are the most complex of the bunch. The main complication is that, on an active site, there's a pretty good chance that a second instance of the script may be invoked by another user before the first instance has completed opening and updating the guestbook file. If the writes overlap, the file could be corrupted and a guestbook entry lost or scrambled. For this reason, it's important for the script to lock the file before working with it.

POSIX-compliant systems (which include both Unix and Windows systems) offer a simple form of advisory file locking through the flock( ) system call. When a process opens a file and flock( ) s it, no other process can flock( ) it until the first process either closes the file or manually relinquishes the lock. There are actually two types of lock. A "shared" lock can be held by many processes simultaneously. An "exclusive" lock can only be held by one process at a time and prevents any other program from locking the file. Typically, a program that wants to read from a file will obtain a shared lock, while a program that wants to write to the file asks the system for an exclusive lock. A shared lock allows multiple programs to read from a file without worrying that some other process will change the file while they are still reading it. A program that wants to write to a file will call flock( ) to obtain an exclusive lock; the call will then block until all other processes have released their locks. After an exclusive lock is granted, no other program can lock the file until the writing process has finished its work and released the lock.

It's important to realize that the flock( ) locking mechanism is advisory. Nothing prevents a program from ignoring the flock( ) call and reading from or writing to a file without seeking to obtain a lock first. However, as long as only the programs you've written yourself attempt to access the file and you're always careful to call flock( ) before working with it, the system works just fine.

sub lock { 
     my $path = shift; 
     my $for_writing = shift; 
   
     my ($lock_type, $path_name, $description); 
     if ($for_writing) { 
        $lock_type = LOCK_EX;  
        $path_name = ">>$path";  
        $description = 'writing'; 
     }  
     else { 
        $lock_type = LOCK_SH;  
        $path_name = $path;  
        $description = 'reading'; 
     } 
   
     my $fh = IO::File->new($path_name) or 
        warn "Couldn't open $path for $description: $!", return; 
   
 # now try to lock it 
     my $success; 
     my $tries = 0; 
     while ($tries++ < 10) { 
        last if $success = flock($fh, $lock_type|LOCK_NB); 
        print p("Waiting for $description lock on guestbook file..."); 
        sleep(1);               # wait a second 
     } 
     unless ($success) { 
        warn("Couldn't get lock for $description");  
        return; 
     } 
     return $fh; 
 }
 

To make life a little simpler, the guestbook script defines a utility function named lock( ) that takes care of opening and locking the guestbook file (you'll find the definition at the bottom of the source listing). lock( ) takes two arguments: the name of the file to open and a flag indicating whether the file should be opened for writing. If the write flag is true, the function opens the file in append mode and then attempts to obtain an exclusive lock. Otherwise, it opens the file read only and tries to obtain a shared lock. If successful, the opened filehandle is returned to the caller.

The flock( ) function is used to obtain both types of lock. The first argument is the opened filehandle; the second is a constant indicating the type of lock to obtain. The constants for exclusive and shared locks are LOCK_EX and LOCK_SH, respectively. Both constants are imported from the Fcntl module using the :flock tag. We combine these constants with the LOCK_NB (nonblocking) constant, also obtained from Fcntl, in order to tell flock( ) to return if a lock cannot be obtained immediately. Otherwise, flock( ) will block indefinitely until the file is available. In order to avoid a long wait in which the script appears to be hung, we call flock( ) in a polling loop. If a lock cannot immediately be obtained, we print a warning message to the browser screen and sleep for 1 second. After 10 consecutive failed tries, we give up and exit the script. If the lock is successful, we return the filehandle.

sub write_guestbook { 
     my $fh = lock($GUESTBOOKFILE, 1); 
     unless ($fh) { 
        print strong('An error occurred: unable to open guestbook file.'),p(); 
        Delete('action'); 
        print a({-href => self_url}, 'Try again'); 
        return; 
     } 
     seek ($fh,0,2);  # paranoia: seek to end of file 
     my $date = strftime('%D',localtime); 
     print $fh join("\t", $date, map {escape(param($_))} (@FIELDS)),"\n"; 
     print "Thank you, ", param('name'),", for signing the guestbook.\n"; 
     $fh->close;         
     1; 
 }

To write a new entry into the guestbook, the write_guestbook( ) function calls lock( ) with the path to the guestbook file and a flag indicating we want write access. If the call fails, we display an appropriate error message and return. Otherwise, we seek to the end of the file, just in case someone else wrote to the file while we were waiting for the lock. We then join together the current date (obtained from the POSIX strftime( ) function) with the current values of the guestbook fields and write them out to the guestbook filehandle. To avoid the possibility of the user messing up our tab-delimited field scheme by entering tabs or newlines in the fill-out form, we're careful to escape the fields before writing them to the file. To do this, we use the map operator to pass the fields through CGI.pm's escape( ) function. This function is ordinarily used to make text safe for use in URIs, but it works just as well here.

After writing to the file, we're careful to close the filehandle. This releases the lock on the file and gives other processes access to it.

sub view_guestbook { 
     my $show_sign_button = shift; 
     print start_form, 
     submit(-name => 'Sign Guestbook'), 
     end_form 
        if $show_sign_button; 
     my $fh = lock($GUESTBOOKFILE, 0); 
   
     my @rows; 
     unless ($fh) { 
        print strong('An error occurred: unable to open guestbook file.'),br; 
        Delete('action'); 
        print a({-href => self_url},'Try again'); 
        return; 
     }
 

The view_guestbook( ) subroutine looks a lot like the one we just looked at but in reverse. It starts by creating a tiny fill-out form containing a single button labeled "Sign Guestbook." This button is only displayed when someone views the guestbook without signing it first and is controlled by the $show_sign_button flag. Next we obtain a read-only filehandle on the guestbook file by calling lock( ) with a false second argument. If lock( ) returns an undefined result, we print an error message and exit. Otherwise, we read the contents of the guestbook file line by line and split out the fields.

    while (<$fh>) { 
        chomp; 
        my @data = map {escapeHTML($_)} map {unescape($_)} split("\t"); 
        unshift @rows, td(\@data); 
     } 
     unshift @rows, th(['Date',@FIELDS]); 
     print p( 
            table({-border => ''}, 
                  caption(strong('Previous Guests')), 
                  TR(\@rows))); 
     $fh->close; 
     print a({-href => '/'}, 'Home');  
     1; 
 }
 

The fields are then processed through map( ) twice: once to unescape the URL escape characters using the CGI.pm unescape( ) function and once again to make them safe to display on an HTML page using CGI.pm's escapeHTML( ) function. The second round of escaping is to avoid problems with values that contain the <, >, and & symbols. The processed lines are turned into HTML table cells, and unshifted onto a list named @rows. The purpose of the unshift is to reverse the order of the lines, so that more recent guestbook entries appear at the top of the list. We add the headings for the table and turn the whole thing into an HTML table using the appropriate CGI.pm shortcuts. We close the filehandle and exit.

If we were not interested in running this script under standard CGI, we could increase performance slightly and reduce memory consumption substantially by replacing a few functions with their Apache:: equivalents:

IO::File        --> Apache::File 
 CGI::escape     --> Apache::Util::escape_uri 
 CGI::unescape   --> Apache::Util::unescape_uri 
 CGI::escapeHTML --> Apache::Util::escape_html 
 POSIX::strftime --> Apache::Util::ht_time
 

See the reference listings in Chapter 9 for the proper syntax for these replacements. You'll also find a version of the guestbook script that uses these lightweight replacements on this book's companion web site, http://www.modperl.com.

Example 4-15: A Guestbook Script
#!/usr/local/bin/perl 
 # guestbook.cgi 
 use strict; 
 use CGI qw(:standard :html3 escape unescape escapeHTML); 
 use IO::File (); 
 use Fcntl qw(:flock); 
 use POSIX qw(strftime); 
 use vars qw(@FIELDS %REQUIRED %BIG $GUESTBOOKFILE); 
   
 @FIELDS = qw(name e-mail location comments); 
 %REQUIRED = ('name' => 1, 'e-mail' => 1); 
 %BIG = ('comments' => 1); 
   
 if ($ENV{MOD_PERL}) { 
     $GUESTBOOKFILE = Apache->request->dir_config('GuestbookFile'); 
 } 
 $GUESTBOOKFILE ||= "/usr/tmp/guestbookfile.txt"; 
   
 print header, 
     start_html(-title => 'Guestbook', -bgcolor => 'silver'), 
     h1("Guestbook"); 
   
  CASE: { 
      $_ = param('action'); 
      /^sign/i and do    { sign_guestbook(); last CASE; }; 
      /^confirm/i and do { write_guestbook() and view_guestbook(); last CASE; }; 
      /^view/i and do    { view_guestbook(1); last CASE; }; 
      generate_form(); 
  } 
   
 print end_html; 
 exit 0; 
   
 sub generate_form { 
     my @rows; 
     for my $field (@FIELDS) { 
        my $title = "Your $field"; 
        $title .= " (optional)" if !$REQUIRED{$field}; 
        my $element = $BIG{$field} ?  
            textarea(-name => $field, 
                     -rows => 4, 
                     -columns => 50, 
                     -wrap => 1) 
                : textfield(-name => $field, -size => 50); 
        push @rows, th($title) . td($element); 
     } 
     print start_form, 
     table(TR{-align => 'LEFT'}, \@rows), 
     br, 
     submit(-name => 'action', -value => 'View Guestbook'), 
     submit(-name => 'action', -value => 'Sign Guestbook'), 
     end_form; 
 } 
   
 sub sign_guestbook { 
     my @missing = check_missing(@FIELDS); 
     if (@missing) { 
        print_warning(@missing); 
        generate_form(); 
        return; 
     } 
     my @rows; 
     foreach (@FIELDS) { 
        push @rows, TR(th({-align=>'LEFT'},$_),  
                       td(escapeHTML(param($_)))); 
     } 
     print "Here is your guestbook entry.  Press ", 
     em('Confirm')," to save it, or ",em('Change'), 
     " to change it.", 
     hr, 
     table(@rows), 
     hr; 
   
     print start_form; 
     foreach (@FIELDS) { 
        print hidden(-name => $_); 
     } 
     print submit(-name => 'action', 
                 -value => 'Change Entry'), 
     submit(-name => 'action', 
           -value => 'Confirm Entry'), 
     end_form; 
 } 
   
 sub check_missing { 
     my %p; 
     for (@_) { ++$p{$_} if param($_) } 
     return grep !$p{$_}, keys %REQUIRED; 
 } 
   
 sub print_warning { 
     print font({-color => 'red'}, 
               'Please fill in the following fields: ', 
               em(join ', ', @_), 
               '.'); 
 } 
   
 sub write_guestbook { 
     my $fh = lock($GUESTBOOKFILE, 1); 
     unless ($fh) { 
        print strong('An error occurred: unable to open guestbook file.'),p(); 
        Delete('action'); 
        print a({-href => self_url}, 'Try again'); 
        return; 
     } 
     seek ($fh,0,2);  # paranoia: seek to end of file 
     my $date = strftime('%D',localtime); 
     print $fh join("\t", $date, map {escape(param($_))} (@FIELDS)),"\n"; 
     print "Thank you, ", param('name'),", for signing the guestbook.\n"; 
     $fh->close;         
     1; 
 } 
   
 sub view_guestbook { 
     my $show_sign_button = shift; 
     print start_form, 
     submit(-name => 'Sign Guestbook'), 
     end_form 
        if $show_sign_button; 
     my $fh = lock($GUESTBOOKFILE, 0); 
   
     my @rows; 
     unless ($fh) { 
        print strong('An error occurred: unable to open guestbook file.'),br; 
        Delete('action'); 
        print a({-href => self_url},'Try again'); 
        return; 
     } 
     while (<$fh>) { 
        chomp; 
        my @data = map {escapeHTML($_)} map {unescape($_)} split("\t"); 
        unshift @rows, td(\@data); 
     } 
     unshift @rows, th(['Date',@FIELDS]); 
     print p( 
            table({-border => ''}, 
                  caption(strong('Previous Guests')), 
                  TR(\@rows))); 
     $fh->close; 
     print a({-href => '/'}, 'Home');  
     1; 
 } 
   
 sub lock { 
     my $path = shift; 
     my $for_writing = shift; 
   
     my ($lock_type, $path_name, $description); 
     if ($for_writing) { 
        $lock_type = LOCK_EX;  
        $path_name = ">>$path";  
        $description = 'writing'; 
     }  
     else { 
        $lock_type = LOCK_SH;  
        $path_name = $path;  
        $description = 'reading'; 
     } 
   
     my $fh = IO::File->new($path_name) or 
        warn "Couldn't open $path for $description: $!", return; 
   
 # now try to lock it 
     my $success; 
     my $tries = 0; 
     while ($tries++ < 10) { 
        last if $success = flock($fh, $lock_type|LOCK_NB); 
        print p("Waiting for $description lock on guestbook file..."); 
        sleep(1);               # wait a second 
     } 
     unless ($success) { 
        warn("Couldn't get lock for $description");  
        return; 
     } 
     return $fh; 
 }
 

A .htaccess file to go with the guestbook script might be:

PerlSetVar GuestbookFile /home/www/etc/guests.txt

Apache::Registry Traps

There are a number of traps and pitfalls that you can fall into when using Apache::Registry. This section warns you about them.

It helps to know how Apache::Registry works in order to understand why the traps are there. When the server is asked to return a file that is handled by the Apache::Registry content handler (in other words, a script!), Apache::Registry first looks in an internal cache of compiled subroutines that it maintains. If it doesn't find a subroutine that corresponds to the script file, it reads the contents of the file and repackages it into a block of code that looks something like this:

 package $mangled_package_name; 
  use Apache qw(exit); 
  sub handler { 
    #line 1 $original_filename 
    contents of the file 
  }
 

$mangled_package_name is a version of the script's URI which has been modified in such a way as to turn it into a legal Perl package name while keeping it distinct from all other compiled Apache::Registry scripts. For example, the guestbook.cgi script shown in the last section would be turned into a cached subroutine in the package Apache::ROOT::perl::guestbook_2ecgi. The compiled code is then cached for later use.

Before Apache::Registry even comes into play, mod_perl fiddles with the environment to make it appear as if the script were being called under the CGI protocol. For example, the $ENV{QUERY_STRING} environment variable is initialized with the contents of Apache::args( ), and $ENV{SERVER_NAME} is filled in from the value returned by Apache::server_hostname( ). This behavior is controlled by the PerlSetupEnv directive, which is On by default. If your scripts do not need to use CGI %ENV variables, turning this directive Off will reduce memory overhead slightly.

In addition to caching the compiled script, Apache::Registry also stores the script's last modification time. It checks the stored time against the current modification time before executing the cached code. If it detects that the script has been modified more recently than the last time it was compiled, it discards the cached code and recompiles the script.

The first and most common pitfall when using Apache::Registry is to forget that the code will be persistent across many sessions. Perl CGI programmers commonly make profligate use of globals, allocate mammoth memory structures without disposing of them, and open filehandles and never close them. They get away with this because CGI scripts are short-lived. When the CGI transaction is done, the script exits, and everything is cleaned up automatically.

Not so with Apache::Registry scripts (or any other Apache Perl module, for that matter). Globals persist from invocation to invocation, big data structures will remain in memory, and open files will remain open until the Apache child process has exited or the server itself it shut down.

Therefore, it is vital to code cleanly. You should never depend on a global variable being uninitialized in order to determine when a subroutine is being called for the first time. In fact, you should reduce your dependency on globals in general. Close filehandles when you are finished with them, and make sure to kill (or at least wait on) any child processes you may have launched.

Perl provides two useful tools for writing clean code. use strict turns on checks that make it harder to use global variables unintentionally. Variables must either be lexically scoped (with my ) or qualified with their complete package names. The only way around these restrictions is to declare variables you intend to use as globals at the top of the script with use vars. This code snippet shows how:

use strict; 
 use vars qw{$INIT $DEBUG @NAMES %HANDLES};
 

We have used strict in many of the examples in the preceding sections, and we strongly recommend it for any Perl script you write.

The other tool is Perl runtime warnings, which can be turned on in Apache::Registry scripts by including a -w switch on the #! line, or within other modules by setting the magic $^W variable to true. You can even enable warnings globally by setting $^W to true inside the server's Perl startup script, if there is one (see Chapter 2).

-w will catch a variety of errors, dubious programming constructs, typos, and other sins. Among other things, it will warn when a bareword (a string without surrounding quotation marks) conflicts with a subroutine name, when a variable is used only once, and when a lexical variable is inappropriately shared between an outer and an inner scope (a horrible problem which we expose in all its gory details a few paragraphs later).

-w may also generate hundreds of "Use of uninitialized value" messages at runtime, which will fill up your server error log. Many of these warnings can be hard to track down. If there is no line number reported with the warning, or if the reported line number is incorrect,[[2]] try using Perl's #line token described in the perlsyn manual page and in Chapter 9 under "Special Global Variables, Subroutines, and Literals."

It may also be helpful to see a full stack trace of the code which triggered the warning. The cluck( ) function found in the standard Carp module will give you this functionality. Here is an example:

use Carp (); 
 local $SIG{_    _WARN_    _} = \&Carp::cluck;
 

Note that -w checks are done at runtime, which may slow down script execution time. In production mode, you may wish to turn warnings off altogether or localize warnings using the $^W global variable described in the perlvar manpage.

Another subtle mod_perl trap that lies in wait for even experienced programmers involves the sharing of lexical variables between outer and inner named subroutines. To understand this problem, consider the following innocent-looking code:

#!/usr/local/bin/perl -w
   
 for (0..3) { 
     bump_and_print(); 
 }
   
 sub bump_and_print { 
     my $a = 1; 
     sub bump { 
        $a++; 
        print "In the inner scope, \$a is $a\n"; 
     } 
     print "In the outer scope, \$a is $a\n"; 
     bump(); 
 }
 

When you run this script, it generates the following inexplicable output:

Variable "$a" will not stay shared at ./test.pl line 12. 
 In the outer scope, $a is 1 
 In the inner scope, $a is 2 
 In the outer scope, $a is 1 
 In the inner scope, $a is 3 
 In the outer scope, $a is 1 
 In the inner scope, $a is 4 
 In the outer scope, $a is 1 
 In the inner scope, $a is 5
 

For some reason the variable $a has become "unstuck" from its my( ) declaration in bump_and_print( ) and has taken on a life of its own in the inner subroutine bump( ). Because of the -w switch, Perl complains about this problem during the compilation phase, with the terse warning that the variable "will not stay shared." This behavior does not happen if the inner subroutine is made into an anonymous subroutine. It only affects named inner subroutines.

The rationale for the peculiar behavior of lexical variables and ways to avoid it in conventional scripts are explained in the perldiag manual page. When using Apache::Registry this bug can bite you when you least expect it. Because Apache::Registry works by wrapping the contents of a script inside a handler( ) function, inner named subroutines are created whether you want them or not. Hence, this piece of code will not do what you expect:

 #!/usr/local/bin/perl 
  use CGI qw/param header/;
   
  my $name = param('name'); 
  print header('text/plain'); 
  print_body(); 
  exit 0;
   
  sub print_body { 
     print "The contents of \$name is $name.\n"; 
  }
 

The first time you run it, it will run correctly, printing the value of the name CGI parameter. However, on subsequent invocations the script will appear to get "stuck" and remember the values of previous invocations. This is because the lexically scoped $name variable is being referenced from within print_body( ), which, when running under Apache::Registry, is a named inner subroutine. Because multiple Apache processes are running, each process will remember a different value of $name, resulting in bizarre and arbitrary behavior.

Perl may be fixed someday to do the right thing with inner subroutines. In the meantime, there are several ways to avoid this problem. Instead of making the outer variable lexically scoped, you can declare it to be a package global, as this snippet shows:

use strict; 
 use vars '$name'; 
 $name = param('name');
 

Because globals are global, they aren't subject to weird scoping rules.

Alternatively, you can pass the variable to the subroutine as an argument and avoid sharing variables between scopes altogether. This example shows that variant:

my $name = param('name'); 
 print_body($name);
  
 sub print_body { 
    my $name = shift; 
    print "The contents of \$name is $name.\n";      
 }
 

Finally, you can put the guts of your application into a library and use or require it. The Apache::Registry then becomes only a hook that invokes the library:

#!/usr/local/bin/perl 
 require "my_application_guts"; 
 do_everything();
 

The shared lexical variable problem is a good reason to use the -w switch during Apache::Registry script development and debugging. If you see warnings about a variable not remaining shared, you have a problem, even if the ill effects don't immediately manifest themselves.

Another problem that you will certainly run into involves the use of custom libraries by Apache::Registry scripts. When you make an editing change to a script, the Apache::Registry notices the recent modification time and reloads the script. However, the same isn't true of any library file that you load into the script with use or require. If you make a change to a require d file, the script will continue to run the old version of the file until the script itself is recompiled for some reason. This can lead to confusion and much hair-tearing during development!

You can avoid going bald by using Apache::StatINC, a standard part of the mod_perl distribution. It watches over the contents of the internal Perl %INC array and reloads any files that have changed since the last time it was invoked. Installing Apache::StatINC is easy. Simply install it as the PerlInitHandler for any directory that is managed by Apache::Registry. For example, here is an access.conf entry that installs both Apache::Registry and Apache::StatINC :

Alias /perl/ /usr/local/apache/perl/ 
 <Location /perl> 
   SetHandler      perl-script 
   PerlHandler     Apache::Registry 
   PerlInitHandler Apache::StatINC 
   PerlSendHeader  On 
   Options         +ExecCGI 
 </Location>
 

Because Apache::StatINC operates at a level above the level of individual scripts, any nonstandard library locations added by the script with use lib or by directly manipulating the contents of @INC will be ignored. If you want these locations to be monitored by Apache::StatINC, you should make sure that they are added to the library search path before invoking the script. You can do this either by setting the PERL5LIB environment variable before starting up the Apache server (for instance, in the server startup script), or by placing a use lib line in your Perl startup file, as described in Chapter 2.

When you use Apache::StatINC, there is a slight overhead for performing a stat on each included file every time a script is run. This overhead is usually immeasurable, but it will become noticeable on a heavily loaded server. In this case, you may want to forego it and instead manually force the embedded Perl interpreter to reload all its compiled scripts by restarting the server with apachectl. In order for this to work, the PerlFreshRestart directive must be turned on in the Apache configuration file. If you haven't done so already, add this line to perl.conf or one of the other configuration files:

PerlFreshRestart On
 

You can try reloading compiled scripts in this way whenever things seem to have gotten themselves into a weird state. This will reset all scripts to known initial settings and allow you to investigate problems systematically. You might also want to stop the server completely and restart it using the -X switch. This forces the server to run as a single process in the foreground. Interacting with a single process rather than multiple ones makes it easier to debug misbehaving scripts. In a production environment, you'll want to do this on a test server in order to avoid disrupting web services.

Handling Errors

Errors in Apache modules do occur, and tracking them down is significantly trickier than in standalone Perl or C programs. Some errors are due to bugs in your code, while others are due to the unavoidable hazards of running in a networked environment. The remote user might cancel a form submission before it is entirely done, the connection might drop while you're updating a database, or a file that you're trying to access might not exist.

A virtuous Apache module must let at least two people know when a problem has occurred: you, the module's author, and the remote user. You can communicate errors and other exception conditions to yourself by writing out entries to the server log. For alerting the user when a problem has occurred, you can take advantage of the simple but flexible Apache ErrorDocument system, use CGI::Carp, or roll your own error handler.

Error Logging

We talked about tracking down code bugs in Chapter 2 and will talk more about C-language specific debugging in Chapter 10. This section focuses on defensive coding techniques for intercepting and handling other types of runtime errors.

The most important rule is to log everything. Log anything unexpected, whether it is a fatal error or a condition that you can work around. Log expected but unusual conditions too, and generate routine logging messages that can help you trace the execution of your module under normal conditions.

Apache versions 1.3 and higher offer syslog-like log levels ranging in severity from debug, for low-priority messages, through warn, for noncritical errors, to emerg, for fatal errors that make the module unusable. By setting the LogLevel directive in the server configuration file, you can adjust the level of messages that are written to the server error log. For example, by setting LogLevel to warn, messages with a priority level of warn and higher are displayed in the log; lower-priority messages are ignored.

To use this adjustable logging API, you must load the standard Apache::Log module. This adds a log( ) method to the Apache request object, which will return an Apache::Log object. You can then invoke this object's methods in order to write nicely formatted log entries to the server's error log at the priority level you desire. Here's a short example:

use Apache::Log ();
  
 my $log = $r->log; 
 $log->debug("Trying to lock guestbook file now"); 
 unless (lock($GUESTBOOKFILE,1)) { 
    $log->emerg("Can't get lock!"); 
    return SERVER_ERROR; 
 } 
 $log->debug("Got lock");
 

In this example, we first obtain a log object by calling the request object's log( ) method. We call the log object's debug( ) method to send a debug message to the error log and then try to perform a locking operation. If the operation fails, we log an error message at the emerg priority level using the log object's emerg( ) method and exit. Otherwise, we log another debugging message.

You'll find the full list of method calls made available by Apache::Log in Chapter 9, in the subsection "Logging Methods" under "The Apache Request Object." In addition, the Apache Perl API offers three simpler methods for entering messages into the log file. You don't have to import the Apache::Log module to use these methods, and they're appropriate for smaller projects (such as most of the examples in this book).

$r->log_error($message)
log_error( ) writes out a time-stamped message into the server error log using a facility of error. Use it for critical errors that make further normal execution of the module impossible. This method predates the 1.3 LogLevel API but still exists for backward compatibility and as a shortcut to $r->log->error.
$r->warn($message)
warn( ) will log an error message with a severity level of warn. You can use this for noncritical errors or unexpected conditions that you can work around. This method predates the 1.3 LogLevel API but still exists for backward compatibility and as a shortcut to $r->log->warn.
$r->log_reason($message,$file)
This is a special-purpose log message used for errors that occur when a content handler tries to process a file. It results in a message that looks something like this:
access to /usr/local/apache/htdocs/index.html failed for ppp12.yahoo.com, reason: user phyllis not authorized

You might also choose to include a $DEBUG global in your modules, either hard-coding it directly into the source, or by pulling its value out of the configuration file with Apache::dir_config( ). Your module can then check this global every time it does something significant. If set to a true value, your script should send verbose informational messages to the Apache error log (or to an alternative log file of your choice).

The ErrorDocument System

Apache provides a handy ErrorDocument directive that can be used to display a custom page when a handler returns a non-OK status code. The custom page can be any URI, including a remote web page, a local static page, a local server-side include document, or a CGI script or module. In the last three cases, the server generates an internal redirect, making the redirection very efficient.

For example, the configuration file for Lincoln's laboratory site contains this directive:

ErrorDocument 404 /perl/missing.cgi
 

When the server encounters a 404 "Not Found" status code, whether generated by a custom module or by the default content handler, it will generate an internal redirect to a mod_perl script named missing.cgi. Before calling the script, Apache sets some useful environment variables including the following:

REDIRECT_URL
The URL of the document that the user was originally trying to fetch.
REDIRECT_STATUS
The status code that caused the redirection to occur.
REDIRECT_REQUEST_METHOD
The method (GET or POST) that caused the redirection.
REDIRECT_QUERY_STRING
The original query string, if any.
REDIRECT_ERROR_NOTES
The logged error message, if any.

A slightly simplified version of missing.cgi that works with Apache::Registry (as well as a standalone CGI script) is shown in Example 4-16. For a screenshot of what the user gets when requesting a nonexistent URI, see Figure 4-9.

Figure 4-9. The missing.cgi script generates a custom page to display when a URI is not found.

 

Example 4-16: A Simple Apache::Registry ErrorDocument Handler
#!/usr/local/bin/perl 
 # file: missing.cgi 
 use CGI qw(:standard); 
 use strict; 
   
 print header, 
       start_html(-title => 'Missing Document', -bgcolor => 'white'), 
       h1(img({-src => '/icons/unknown.gif'}), 
       'Document Not Found'), 
       p("I'm sorry, but the document you requested,", 
         strong($ENV{REDIRECT_URL}), 
         "is not available.  Please try the", 
         a({-href => "/search.html"}, "search page"), 
         "for help locating the document."), 
       hr, 
       address(a({-href => "mailto:$ENV{SERVER_ADMIN}"}, 'webmaster')), 
       end_html;
 

If you want to implement the ErrorDocument handler as a vanilla Apache Perl API script, the various REDIRECT_ environment variables will not be available to you. However, you can get the same information by calling the request object's prev( ) method. This returns the request object from the original request. You can then query this object to recover the requested URI, the request method, and so forth.

Example 4-17 shows a rewritten version of missing.cgi that uses prev( ) to recover the URI of the missing document. The feature to note in this code is the call to $r->prev on the fifth line of the handler( ) subroutine. If the handler was invoked as the result of an internal redirection, this call will return the original request object, which we then query for the requested document by calling its uri( ) method. If the handler was invoked directly (perhaps by the user requesting its URI), the original request will be undefined and we use an empty string for the document URI.

Example 4-17: An ErrorDocument Handler Using the Vanilla Apache API
package Apache::Missing; 
 # File: Apache/Missing.pm
   
 use strict; 
 use Apache::Constants qw(:common); 
 use CGI qw(:html); 
   
 sub handler { 
     my $r = shift; 
     $r->content_type('text/html'); 
     $r->send_http_header; 
     return OK if $r->header_only; 
   
     my $original_request = $r->prev; 
     my $original_uri = $original_request ? $original_request->uri : ''; 
     my $admin = $r->server->server_admin; 
   
     $r->print( 
              start_html(-title => 'Missing Document', 
                         -bgcolor => 'white'), 
              h1(img({-src => '/icons/unknown.gif'}), 
                 'Document Not Found'), 
              p( 
                "I'm sorry, but the document you requested,", 
                strong($original_uri), 
                ", is not available.  Please try the", 
                a({-href => "/search.html"}, "search page"), 
                "for help locating the document." 
                ), 
              hr, 
              address(a({-href => "mailto:$admin"}, 'webmaster')), 
              end_html 
              ); 
   
     return OK; 
 } 
   
 1; 
 _    _END_    _ 
 

Here's an example using Apache::Missing in the configuration file:

<Location /Missing> 
    SetHandler  perl-script 
    PerlHandler Apache::Missing 
 </Location>
 

If the static nature of the Apache ErrorDocument directive is inadequate for your needs, you can set the error document dynamically from within a handler by calling the request object's custom_response( ) method. This method takes two arguments: the status code of the response you want to handle and the URI of the document or module that you want to pass control to. This error document setting will persist for the lifetime of the current request only. After the handler exits, the setting returns to its default.

For example, the following code snippet sets up a custom error handler for the SERVER_ERROR error code (a generic error that covers a variety of sins). If the things_are_ok( ) subroutine (not implemented here) returns a true value, we do our work and return an OK status. Otherwise, we set the error document to point to a URI named /Carp and return a SERVER_ERROR status.

package Apache::GoFish; 
 # file: Apache/GoFish.pm
   
 use strict; 
 use Apache::Constants qw(:common);
   
 sub handler { 
    my $r = shift; 
    if (things_are_ok($r)) { 
      do_something(); 
      return OK; 
    } 
    $r->custom_response(SERVER_ERROR, "/Carp"); 
    return SERVER_ERROR; 
 }
   
 1; 
 _    _END_    _

HTTP Headers and Error Handling

You already know about using header_out( ) to set HTTP header fields. A properly formatted HTTP header is sent to the browser when your module explicitly calls send_http_header( ), or it is sent for you automatically if you are using Apache::Registry, the PerlSendHeader directive is set to On, and your script prints some text that looks like an HTTP header.

You have to be careful, however, if your module ever returns non-OK status codes. Apache wants to assume control over the header generation process in the case of errors; if your module has already sent the header, then Apache will send a redundant set of headers with unattractive results. This applies both to real HTTP errors, like BAD_REQUEST and NOT_FOUND, as well as to nonfatal conditions like REDIRECT and AUTH_REQUIRED.

Consider the following fishy example:

package Apache::Crash; 
 # File: Apache/Crash.pm
   
 use strict; 
 use Apache::Constants qw(:common); 
 use constant CRASH => 1; 
   
 sub handler { 
     my $r = shift; 
     $r->content_type('text/plain'); 
     $r->send_http_header; 
     return OK if $r->header_only; 
     return SERVER_ERROR if CRASH; 
     $r->print('Half a haddock is better than none.'); 
     return OK; 
 } 
   
 1; 
 _    _END_    _
 

After setting the document MIME type, this module sends off the HTTP header. It then checks a constant named CRASH and if true, which it always is, returns a status code of SERVER_ERROR. Apache would ordinarily send a custom HTTP header in response to this status code, but because the module has already emitted a header, it's too late. Confusion results. If we map this module to the URI /Crash, we can telnet directly to the server to demonstrate the problem:

% telnet www.modperl.com 80 
 Trying 192.168.2.5... 
 Connected to modperl.com. 
 Escape character is '^]'. 
 GET /Crash HTTP/1.0  
 HTTP/1.1 200 OK 
 Date: Thu, 21 May 1998 11:31:40 GMT 
 Server: Apache/1.3b6 
 Connection: close 
 Content-Type: text/plain
   
 HTTP/1.1 200 OK 
 Date: Thu, 21 May 1998 11:31:40 GMT 
 Server: Apache/1.3b6 
 Connection: close 
 Content-Type: text/html
   
 <HTML><HEAD> 
 <TITLE>500 Internal Server Error</TITLE> 
 </HEAD><BODY> 
 <H1>Internal Server Error</h2> 
 The server encountered an internal error or 
 misconfiguration and was unable to complete 
 your request.<P> 
 </BODY></HTML> 
 Connection closed by foreign host.
 

Not only are there two HTTP headers here, but both of them indicate a status code of 200 OK, which is definitely not right. When displayed in the browser, the page will be marred by extraneous header lines at the top of the screen.

The cardinal rule is that you should never call Apache::send_http_header( ) until your module has completed all its error checking and has decided to return an OK status code. Here's a better version of Apache::Crash that avoids the problem:

package Apache::Crash; 
 # File: Apache/Crash.pm
   
 use strict; 
 use Apache::Constants qw(:common); 
 use constant CRASH => 1; 
   
 sub handler { 
     my $r = shift; 
     return SERVER_ERROR if CRASH; 
     $r->content_type('text/plain'); 
     $r->send_http_header; 
     return OK if $r->header_only; 
     $r->print('Half a haddock is better than none.'); 
     return OK; 
 } 
   
 1; 
 _    _END_    _
 

Now when we telnet to the server, the server response looks the way it should:

(~) 103% telnet www.modperl.com 80 Trying 192.168.2.5... 
 Connected to modperl.com. 
 Escape character is '^]'. 
 GET /Crash HTTP/1.0  
  
 HTTP/1.1 500 Internal Server Error 
 Date: Thu, 21 May 1998 11:40:56 GMT 
 Server: Apache/1.3b6 
 Connection: close 
 Content-Type: text/html
  
 <HTML><HEAD> 
 <TITLE>500 Internal Server Error</TITLE> 
 </HEAD><BODY> 
 <H1>Internal Server Error</h2> 
 The server encountered an internal error or 
 misconfiguration and was unable to complete 
 your request.<P> 
 </BODY></HTML>
 

Another important detail about error handling is that Apache ignores the fields that you set with header_out( ) when your module generates an error status or invokes an internal redirect. This is usually not a problem, but there are some cases in which this restriction can be problematic. The most typical case is the one in which you want a module to give the browser a cookie and immediately redirect to a different URI. Or you might want to assign an error document to the UNAUTHORIZED status code so that a custom login screen appears when the user tries to access a restricted page. In both cases you need to manipulate the HTTP header fields prior to the redirect.

For these cases, call the request object's err_header_out( ) method. It has identical syntax to header_out( ), but the fields that you set with it are sent to the browser only when an error has occurred. Unlike ordinary headers, the fields set with err_header_out( ) persist across internal redirections, and so they are passed to Apache ErrorDocument handlers and other local URIs.

This provides you with a simple way to pass information between modules across internal redirects. Combining the example from this section with the example from the previous section gives the modules shown in Example 4-18. Apache::GoFish generates a SERVER_ERROR, which is intercepted and handled by the custom ErrorDocument handler named Apache::Carp (Example 4-19). Before relinquishing control, however, Apache::GoFish creates a custom HTTP field named X-Odor which gives the error handler something substantial to complain about. The end result is shown in Figure 4-10.

Figure 4-10. When Apache::GoFish generates a custom error document, it displays the contents of the custom X-Odor header.

 

The code should be fairly self-explanatory. The main point to notice is Apache::GoFish 's use of err_header_out( ) to set the value of the X-Odor field, and Apache::Carp 's use of the same function to retrieve it. Like header_out( ), when you call err_header_out( ) with a single argument, it returns the current value of the field and does not otherwise alter the header. When you call it with two arguments, it sets the indicated field.

An interesting side effect of this technique is that the X-Odor field is also returned to the browser in the HTTP header. This could be construed as a feature. If you wished to pass information between the content handler and the error handler without leaving tracks in the HTTP header, you could instead use the request object's "notes" table to pass messages from one module to another. Chapter 9 covers how to use this facility (see the description of the notes( ) method under "Server Core Functions").

Example 4-18: Invoking a Custom Error Handler Document
package Apache::GoFish; 
 # File: Apache/GoFish.pm
   
 use Apache::Constants qw(:common :response); 
 use constant CRASH=>1;
   
 sub handler { 
    my $r = shift; 
    $r->err_header_out('X-Odor'=>"something's rotten in Denmark"); 
    $r->custom_response(SERVER_ERROR, "/Carp"); 
    return SERVER_ERROR if CRASH; 
    $r->content_type('text/plain'); 
    $r->send_http_header; 
    return OK if $r->header_only; 
    $r->print('Half a haddock is better than none.'); 
    return OK; 
 } 
 1; 
 _    _END_    _ 
 

Here is a sample configuration entry:

<Location /GoFish> 
    SetHandler perl-script 
    PerlHandler Apache::GoFish 
 </Location> 
 
Example 4-19: An Error Handler to Complement the Previous Example
package Apache::Carp; 
 # File: Apache/Carp.pm 
 use strict; 
 use Apache::Constants qw(:common); 
 use CGI qw(:html); 
   
 sub handler { 
     my $r = shift; 
     my $odor = $r->err_header_out('X-Odor'); 
     $odor ||= 'unspecified odor'; 
     $r->content_type('text/html'); 
     $r->send_http_header; 
     return OK if $r->header_only; 
   
     my $original_request = $r->prev; 
     my $original_uri = $original_request ? $original_request->uri : ''; 
     my $admin = $r->server->server_admin; 
   
     $r->print( 
              start_html(-title => 'Phew!!', -bgcolor => 'white'), 
              h1('Phew!!'), 
              p("Something fishy happened while processing this request."), 
              p("The odor was ", strong($odor), '.'), 
              hr, 
              address(a({-href => "mailto:$admin"}, 'webmaster')), 
              end_html 
              ); 
   
     return OK; 
 } 
   
 1; 
 _    _END_    _ 
 

Here is a sample configuration entry:

<Location /Carp> 
    SetHandler  perl-script    PerlHandler Apache::Carp 
 </Location>
 

Chaining Content Handlers

The C-language Apache API only allows a single content handler to completely process a request. Several handlers may be given a shot at it, but the first one to return an OK status will terminate the content handling phase of the transaction.

There are times when it would be nice to chain handlers into a pipeline. For example, one handler could add canned headers and footers to the page, another could correct spelling errors, while a third could add trademark symbols to all proprietary names. Although the native C API can't do this yet,[[3]] the Perl API can, using a technique called "stacked handlers."

It is actually quite simple to stack handlers. Instead of declaring a single module or subroutine in the PerlHandler directive, you declare several. Each handler will be called in turn in the order in which it was declared. The exception to this rule is if one of the handlers in the series returns an error code (anything other than OK, DECLINED, or DONE). Handlers can adjust the stacking order themselves, or even arrange to process each other's output.

Simple Case of Stacked Handlers

Example 4-20 gives a very simple example of a stack of three content handlers. It's adapted slightly from the mod_perl manual page. For simplicity, all three handlers are defined in the same file, and are subroutines named header( ), body( ), and footer( ). As the names imply, the first handler is responsible for the top of the page (including the HTTP header), the second is responsible for the middle, and the third for the bottom.

A suitable configuration section looks like this:

PerlModule My 
 <Location /My> 
   SetHandler perl-script 
   PerlHandler My::header My::body My::footer 
 </Location>
 

We first load the whole module into memory using the PerlModule directive. We then declare a URI location /My and assign the perl-script handler to it. Perl in turn is configured to run the My::header, My::body, and My::footer subroutines by passing them as arguments to a PerlHandler directive. In this case, the /My location has no corresponding physical directory, but there's no reason that it couldn't.

After bringing in the OK constant from Apache::Constants, we define the subroutines header( ), body( ), and footer( ). header( ) sets the document's content type to plain text, sends the HTTP header, and prints out a line at the top of the document. body( ) and footer( ) both print out a line of text to identify themselves. The resulting page looks like this:

header text 
 body text 
 footer text
 
Example 4-20: A Simple Stacked Handler
package My;
   
 use strict; 
 use Apache::Constants 'OK';
   
 sub header { 
    my $r = shift; 
    $r->content_type('text/plain'); 
    $r->send_http_header; 
    $r->print("header text\n"); 
    OK; 
 } 
 sub body { 
    my $r = shift; 
    $r->print("body text\n"); 
    OK; 
 } 
 sub footer { 
    my $r = shift; 
    $r->print("footer text\n"); 
    OK; 
 } 
 1;
 

Coordinating Stacked Handlers

Stacked handlers often have to coordinate their activities. In the example of the previous section, the header( ) handler must be run before either of the other two in order for the HTTP header to come out correctly. Sometimes it's useful to make the first handler responsible for coordinating the other routines rather than relying on the configuration file. The request object's push_handlers( ) method will help you do this.

push_handlers( ) takes two arguments: a string representing the phase to handle, and a reference to a subroutine to handle that phase. For example, this code fragment will arrange for the footer( ) subroutine to be the next content handler invoked:

$r->push_handlers(PerlHandler => \&footer);
 

With this technique, we can rewrite the previous example along the lines shown in Example 4-21. In the revised module, we declare a subroutine named handler( ) that calls push_handlers( ) three times, once each for the header, body, and footer of the document. It then exits. The other routines are unchanged.

The revised configuration file entry looks like this:

<Location /MyChain> 
   SetHandler perl-script 
   PerlHandler My::Chain 
 </Location>
 

Because we followed the mod_perl convention of naming the first handler subroutine handler( ), there's now no need for a PerlModule statement to load the module into memory.

Example 4-21: Coordinated Stacked Handlers
package My::Chain; 
   
 use strict; 
 use Apache::Constants 'OK'; 
   
 sub handler { 
     my $r = shift; 
     for my $cv (\&header, \&body, \&footer) { 
        $r->push_handlers(PerlHandler => $cv); 
     } 
     OK; 
 } 
   
 sub header { 
     my $r = shift; 
     $r->content_type('text/plain'); 
     $r->send_http_header; 
     $r->print("header text\n"); 
     OK; 
 } 
   
 sub body { 
     my $r = shift; 
     $r->print("body text\n"); 
     OK; 
 } 
   
 sub footer { 
     my $r = shift; 
     $r->print("footer text\n"); 
     OK; 
 } 
   
 1; 
 _    _END_    _
 

Stacked Handler Pipelining

The stacked handlers we looked at in the previous example didn't interact. When one was finished processing, the next took over. A more sophisticated set of handlers might want to pipeline their results in such a way that the output of one handler becomes the input to the next. This would allow the handlers to modify each other's output in classic Unix filter fashion. This sounds difficult, but in fact it's pretty simple. This section will show you how to set up a filter pipeline. As an aside, it will also introduce you to the concept of Apache Perl API method handlers.

The trick to achieving a handler pipeline is to use "tied" filehandles to connect the neighbors together. In the event that you've never worked with a tied filehandle before, it's a way of giving a filehandle seemingly magic behavior. When you print( ) to a tied filehandle, the data is redirected to a method in a user-defined class rather than going through the usual filesystem routines. To create a tied filehandle, you simply declare a class that defines a method named TIEHANDLE( ) and various methods to handle the sorts of things one does with a filehandle, such as PRINT( ) and READ( ).

Here's a concrete example of a tied filehandle class that interfaces to an antique daisywheel printer of some sort:

package DaisyWheel;
  
 sub TIEHANDLE { 
   my($class, $printer_name) = @_; 
   open_daisywheel($printer_name); 
   bless { 'printer' => $printer_name }, $class; 
 }
  
 sub PRINT { 
   my $self = shift; 
   send_to_daisywheel($self->{'printer'}, @_); 
 }
  
 sub DESTROY { 
   my $self = shift; 
   close_daisywheel($self->{'printer'}); 
 }
  
 1; 
 _    _END_    _
 

The TIEHANDLE( ) method gets called first. It is responsible for opening the daisywheel printer driver (routine not shown here!) and returning a blessed object containing its instance variables. The PRINT( ) method is called whenever the main program prints to the tied filehandle. Its arguments are the blessed object and a list containing the arguments to print( ). It recovers the printer name from its instance variables and then passes it, and the items to print, to an internal routine that does the actual work. DESTROY( ) is called when the filehandle is untie( ) d or closed. It calls an internal routine that closes the printer driver.

To use this class, a program just has to call tie( ) with the name of an appropriate printer:

use DaisyWheel (); 
 tie *DAISY, 'DaisyWheel', 'dwj002'; 
 print DAISY "Daisy... Daisy... Daisy the Kangaroo.\n"; 
 print DAISY "She wanted to live in a private home,\n"; 
 print DAISY "So she ran away from the zoo!\n"; 
 close DAISY;
 

A more complete tied filehandle class might include a PRINTF( ) method, a READ( ) method, a READLINE( ) method, and a GETC( ) method, but for output-only filehandles PRINT( ) is usually enough.

Now back to Apache. The strategy will be for each filter in the pipeline, including the very first and last ones, to print to STDOUT, rather than directly invoking the Apache::print( ) method via the request object. We will arrange for STDOUT to be tied( ) in each case to a PRINT( ) method defined in the next filter down the chain. The whole scheme looks something like this:

filter1 -> filter2::PRINT()   [STDOUT tied to filter2] 
 filter2 -> filter3::PRINT()   [STDOUT tied to filter3] 
 filter3 -> filter4::PRINT()   [STDOUT tied to filter4] 
            .
            .
            .
 filterN -> Apache::PRINT()    [STDOUT tied to Apache]
 

Interestingly enough, the last filter in the chain doesn't have to get special treatment. Internally, the Apache request ties STDOUT to Apache::PRINT( ), which in turn calls Apache::print( ). This is why handlers can use $r->print('something') and print('something') interchangeably.

To simplify setting up these pipelines, we'll define a utility class called Apache::Forward.[[4]] Apache::Forward is a null filter that passes its input through to the next filter in the chain unmodified. Modules that inherit from this class override its PRINT( ) method to do something interesting with the data.

Example 4-22 gives the source code for Apache::Forward. We'll discuss the code one section at a time.

package Apache::Forward; 
   
 use strict; 
 use Apache::Constants qw(OK SERVER_ERROR); 
 use vars qw($VERSION); 
 $VERSION = '1.00'; 
   
 sub handler ($$) { 
     my($class, $r) = @_; 
     my $next = tied *STDOUT || return SERVER_ERROR; 
     tie *STDOUT, $class, $r, $next or return SERVER_ERROR; 
     $r->register_cleanup(sub { untie *STDOUT }); 
     OK; 
 }
 

Most of the work is done in the handler( ) subroutine, which is responsible for correctly tying the STDOUT filehandle. Notice that the function prototype for handler( ) is ($$), or two scalar arguments. This is a special signal to Apache to activate its method handler behavior. Instead of calling handler( ) like an ordinary subroutine, Apache calls handler( ) like this:

Apache::Forward->handler($r);
 

The result is that the handler( ) receives the class name as its first argument, and the request object as the second argument. This object-oriented calling style allows Apache::Forward to be subclassed.

The handler( ) subroutine begins by recovering the identity of the next handler in the pipeline. It does this by calling tied( ) on the STDOUT filehandle. tied( ) returns a reference to whatever object a filehandle is tied to. It will always return a valid object, even when the current package is the last filter in the pipeline. This is because Apache ties STDOUT to itself, so the last filter will get a reference to the Apache object. Nevertheless, we do check that tied( ) returns an object and error out if not--just in case.

Next the subroutine reties STDOUT to itself, passing tie( ) the request object and the reference to the next filter in the pipeline. This call shouldn't fail, but if it does, we return a server error at this point.

Before finishing up, the handler( ) method needs to ensure that the filehandle will be untied before the transaction terminates. We do this by registering a handler for the cleanup phase. This is the last handler to be called before a transaction terminates and is traditionally reserved for this kind of garbage collection. We use register_cleanup( ) to push an anonymous subroutine that unties STDOUT. When the time comes, the filehandle will be untied, automatically invoking the class's DESTROY( ) method. This gives the object a chance to clean up, if it needs to. Note that the client connection will be closed before registered cleanups are run, so class DESTROY( ) methods should not attempt to send any data to the client.

sub TIEHANDLE { 
     my($class, $r, $next) = @_; 
     bless { 'r' => $r,        # request object 
             'next' => $next   # next in the chain 
           }, $class; 
 }
 

The next routine to consider is TIEHANDLE( ), whose job is to return a new blessed object. It creates a blessed hash containing the keys r and next. r points to the request object, and next points to the next filter in the pipeline. Both of these arguments were passed to us by handler( ).

sub PRINT { 
     my $self = shift; 
     # Subclasses should do something interesting here 
     $self->forward(@_); 
 }
 

The PRINT( ) method is invoked whenever the caller wants to print something to the tied filehandle. The arguments consist of the blessed object and a list of data items to be processed. Subclasses will want to modify the data items in some way, but we just forward them unmodified to the next filter in line by calling an internal routine named forward( ).

#sub DESTROY { 
 #    my $self = shift; 
 #    # maybe clean up here 
 #}
 

DESTROY( ) is normally responsible for cleaning up. There's nothing to do in the general case, so we comment out the definition to avoid being called, saving a bit of overhead.

sub forward { 
     shift()->{'next'}->PRINT(@_); 
 }
 

forward( ) is called by PRINT( ) to forward the modified data items to the next filter in line. We shift the blessed object off the argument stack, find the next filter in line, and invoke its PRINT( ) method.

Example 4-22: A Chained Content Handler
package Apache::Forward; 
   
 use strict; 
 use Apache::Constants qw(OK SERVER_ERROR); 
 use vars qw($VERSION); 
 $VERSION = '1.00'; 
   
 sub handler ($$) { 
     my($class, $r) = @_; 
     my $next = tied *STDOUT || return SERVER_ERROR; 
     tie *STDOUT, $class, $r, $next or return SERVER_ERROR; 
     $r->register_cleanup(sub { untie *STDOUT }); 
     OK; 
 } 
   
 sub TIEHANDLE { 
     my($class, $r, $next) = @_; 
     bless { 'r' => $r,          # request object 
             'next' => $next     # next in the chain 
           }, $class; 
 } 
   
 sub PRINT { 
     my $self = shift; 
     # Subclasses should do something interesting here 
     $self->forward(@_); 
 } 
   
 #sub DESTROY { 
 #    my $self = shift; 
 #    # maybe clean up here 
 #} 
   
 sub forward { 
     shift()->{'next'}->PRINT(@_); 
 } 
   
 1; 
 _    _END_    _

Having defined the filter base class, we can now define filters that actually do something. We'll show a couple of simple ones to give you the idea first, then create a larger module that does something useful.

Apache::Upcase (Example 4-23) transforms everything it receives into uppercase letters. It inherits from Apache::Forward and then overrides the PRINT( ) method. PRINT( ) loops through the list of data items, calling uc( ) on each. It then forwards the modified data to the next filter in line by calling its forward( ) method (which we do not need to override).

Example 4-23: Apache::Upcase Transforms Its Input into Uppercase
package Apache::Upcase; 
   
 use strict; 
 use Apache::Forward (); 
 use vars qw(@ISA $VERSION); 
 @ISA = qw(Apache::Forward); 
 $VERSION = '1.00'; 
   
 sub PRINT { 
     my $self = shift; 
     $self->forward(map { uc $_ } @_); 
 } 
   
 1; 
 _    _END_    _
 

Along the same lines, Apache::Censor (Example 4-24) filters its input data to replace four-letter words with starred versions. It takes the definition of "four-letter word" a little liberally, transforming "sent" into "s**t." It is identical in every way to Apache::Upcase, except that PRINT( ) performs a global regular expression substitution on the input data. The transformed data is then forwarded to the next filter as before.

Example 4-24: A Handler that Removes Four-Letter Words
package Apache::Censor;
   
 use strict; 
 use Apache::Forward (); 
 use vars qw(@ISA $VERSION); 
 @ISA = qw(Apache::Forward); 
 $VERSION = '1.00'; 
   
 sub PRINT { 
     my($self, @data) = @_; 
     foreach (@data) { s/\b(\w)\w{2}(\w)\b/$1**$2/g; } 
     $self->forward(@data); 
 } 
   
 1; 
 _    _END_    _
 

To watch these filters in action, we need a data source. Here's a very simple content handler that emits a constant string. It is very important that the content be sent with a regular print( ) statement rather than the specialized $r->print( ) method. If you call Apache::print( ) directly, rather than through the tied STDOUT filehandle, you short-circuit the whole chain!

package Apache::TestFilter; 
   
 use strict; 
 use Apache::Constants 'OK'; 
   
 sub handler { 
     my $r = shift; 
     $r->content_type('text/plain'); 
     $r->send_http_header; 
     print(<<END); 
 This is some text that is being sent out with a print() 
 statement to STDOUT.  We do not know whether STDOUT is tied 
 to Apache or to some other source, and in fact it does not 
 really matter.  We are just the content source.  The filters 
 come later. 
 END 
     OK; 
 } 
   
 1; 
 _    _END_    _
 

The last step is to provide a suitable entry in the configuration file. The PerlHandler directive should declare the components of the pipeline in reverse order. As Apache works its way forward from the last handler in the pipeline to the first, each of the handlers unties and reties STDOUT. The last handler in the series is the one that creates the actual content. It emits its data using print( ) and the chained handlers do all the rest. Here's a sample entry:

<Location /Filter> 
   SetHandler  perl-script 
   PerlHandler Apache::Upcase Apache::Censor Apache::TestFilter 
 </Location>
 

Figure 4-11 shows the page that appears when the pipeline runs.

Figure 4-11. The final output from three chained content handlers

 

The last filter we'll show you is actually useful in its own right. When inserted into a filter pipeline, it compresses the data stream using the GZip protocol, and flags the browser that the data has been GZip-encoded by adding a Content-Encoding field to the HTTP header. Browsers that support on-the-fly decompression of GZip data will display the original document without any user intervention.[[5]]

This filter requires the zlib compression library and its Perl interface, Paul Marquess' Compress::Zlib. zlib, along with instructions on installing it, can be found at ftp://ftp.uu.net/pub/archiving/zip/zlib*. As usual, you can find Compress::Zlib at CPAN. Together these libraries provide both stream-based and in-memory compression/decompression services, as well as a high-level interface for creating and reading gzip files.

The filter is a little more complicated than the previous ones because GZip works best when the entire document is compressed in a single large segment. However, the filter will be processing a series of print( ) statements on data that is often as short as a single line. Although we could compress each line as a single segment, compression efficiency suffers dramatically. So instead we buffer the output, using zlib 's stream-oriented compression routines to emit the encoded data whenever zlib thinks enough data has been received to compress efficiently. We also have to take care of the details of creating a valid GZip header and footer. The header consists of the current date, information about the operating system, and some flags. The footer contains a CRC redundancy check and the size of the uncompressed file.

Example 4-25 gives the complete code for Apache::GZip. Although it inherits its core functionality from Apache::Forward, each subroutine has to be tweaked a bit to support the unique requirements of GZip compression.

package Apache::GZip; 
   
 use strict; 
 use Apache::Constants qw(:common); 
 use Compress::Zlib qw(deflateInit crc32 MAX_WBITS Z_DEFLATED); 
 use Apache::Forward (); 
 use vars qw($VERSION @ISA); 
   
 use constant GZIP_MAGIC => 0x1f8b; 
 use constant OS_MAGIC => 0x03; 
   
 $VERSION = '1.00'; 
 @ISA = qw(Apache::Forward);
 

After the usual preamble, we import the compression routines from Compress::Zlib, and bring in the Apache::Forward class. We then define a couple of constants needed for the GZip header (in case you're wondering, we got these constants by looking at the zlib C code).

sub handler ($$) { 
     my($class, $r) = @_; 
     #return DECLINED unless $r->header_in("Accept-Encoding") =~ /gzip/; 
     $r->content_encoding('gzip'); 
     $class->SUPER::handler($r); 
 }
 

In order for the browser to automatically decompress the data, it needs to see a Content-Encoding field with the value gzip in the HTTP header. In order to insert this field, we override the parent class's handler( ) subroutine and set the field using the request object's content_encoding( ) method. We then call our superclass's handler( ) method to do the rest of the work.

The commented line that comes before the call to content_encoding( ) is an attempt to "do the right thing." Browsers are supposed to send a header named Accept-Encoding if they can accept compressed or otherwise encoded data formats. This line tests whether the browser can accept the GZip format and declines the transaction if it can't. Unfortunately, it turns out that many Netscape browsers don't transmit this essential header, so we skip the test.[[6]]

sub TIEHANDLE { 
     my $class = shift; 
     my $r = shift;  
     my $self = $class->SUPER::TIEHANDLE($r, @_); 
     my $d = deflateInit(-WindowBits => -MAX_WBITS()) || return; 
     @{$self}{'crc','d','l','h'} = (crc32(undef),$d,0,0); 
     $r->push_handlers(PerlHandler => sub { $self->flush }); 
     return $self; 
 }
 

All the compression work is done in TIEHANDLE( ), PRINT( ), and flush( ). TIEHANDLE( ) begins by invoking the superclass's handler( ) method to create an object blessed into the current class. The method then creates a new Compress::Zlib deflation object by calling deflateInit( ), using an argument of -WindowBits that is appropriate for GZip files (again, we got this by reading the zlib C source code). Finally we add a few new instance variables to the object and return it to the caller. The instance variables include crc, for the cyclic redundancy check, d for the deflation object, l for the total length of the uncompressed data, and h for a flag that indicates whether the header has been printed.[[7]] Finally, TIEHANDLE( ) will call the push_handlers( ) method, installing our flush( ) method at the end of the output chain.

sub gzheader { 
     pack("nccVcc", GZIP_MAGIC, Z_DEFLATED, 0,time,0, OS_MAGIC) 
 } 
   
 sub PRINT { 
     my $self = shift; 
     $self->forward(gzheader()) unless $self->{'h'}++; 
     foreach (@_) { 
        my $data = $self->{d}->deflate($_); 
        $self->{l} += length($_); 
        $self->{crc} = crc32($_, $self->{crc}); 
        $self->forward($data); 
     } 
 } 
 

The PRINT( ) method is called once each time the previous filter in the pipeline calls print( ). It first checks whether the GZip header has already been sent, and sends it if not. The GZip header is created by the gzheader( ) routine and consists of a number of constants packed into a 10-byte string. It then passes each of its arguments to the deflation object's deflate( ) method to compress the information, then forwards whatever compressed data is returned to the next filter in the chain (or Apache, if this is the last filter). The subroutine also updates the running total of bytes compressed and calculates the CRC, using Compress::Zlib 's crc32( ) subroutine.

sub flush { 
     my $self = shift; 
     my $data = $self->{d}->flush; 
     return unless $self->{'h'}; 
     $self->forward($data); 
     $self->forward(pack("V V", $self->{'crc'}, $self->{'l'})); 
 } 
   
 

The flush( ) routine is called when the last of our chained handlers is run. Because zlib buffers its compressed data, there is usually some data left in its internal buffers that hasn't yet been printed. We call the deflation object's flush( ) method to obtain whatever is left and forward it onward. Lastly we forward the CRC and the total length of the uncompressed file, creating the obligatory GZip footer.

Apache::GZip will usually go last in the filter chain, like this:

<Location /Compressed> 
    SetHandler  perl-script 
    PerlHandler Apache::GZip OneFilter AnotherFilter 
 </Location>
 

You can use Apache::GZip with any content handler that prints directly to STDOUT. Most of the modules given in this chapter send data via $r->print( ). Simply delete the $r-> part to make them compatible with Apache::GZip and other chained content handlers.

Example 4-25: A Handler That Compresses Its Input Before Forwarding It
package Apache::GZip; 
   
 use strict; 
 use Apache::Constants qw(:common); 
 use Compress::Zlib qw(deflateInit crc32 MAX_WBITS Z_DEFLATED); 
 use Apache::Forward (); 
 use vars qw($VERSION @ISA); 
   
 use constant GZIP_MAGIC => 0x1f8b; 
 use constant OS_MAGIC => 0x03; 
   
 $VERSION = '1.00'; 
 @ISA = qw(Apache::Forward); 
   
 sub handler ($$) { 
     my($class, $r) = @_; 
     #return DECLINED unless $r->header_in("Accept-Encoding") =~ /gzip/; 
     $r->content_encoding('gzip'); 
     $class->SUPER::handler($r); 
 } 
   
 sub TIEHANDLE { 
     my $class = shift; 
     my $r = shift;  
     my $self = $class->SUPER::TIEHANDLE($r, @_); 
     my $d = deflateInit(-WindowBits => -MAX_WBITS()) || return; 
     @{$self}{'crc','d','l','h'} = (crc32(undef),$d,0,0); 
     $r->push_handlers(PerlHandler => sub { $self->flush }); 
     return $self; 
 } 
   
 sub gzheader { 
     pack("nccVcc", GZIP_MAGIC, Z_DEFLATED, 0,time,0, OS_MAGIC) 
 } 
   
 sub PRINT { 
     my $self = shift; 
     $self->forward(gzheader()) unless $self->{'h'}++; 
     foreach (@_) { 
        my $data = $self->{d}->deflate($_); 
        $self->{l} += length($_); 
        $self->{crc} = crc32($_, $self->{crc}); 
        $self->forward($data); 
     } 
 } 
   
 sub flush { 
     my $self = shift; 
     my $data = $self->{d}->flush; 
     return unless $self->{'h'}; 
     $self->forward($data); 
     $self->forward(pack("V V", $self->{'crc'}, $self->{'l'})); 
 } 
   
 1; 
 _    _END_    _
 

Readers who are interested in content handler pipelines should be aware of Jan Pazdziora's Apache::OutputChain module. It accomplishes the same thing as Apache::Forward but uses an object model that is less transparent than this one (among other things, the Apache::OutputChain module must always appear first on the PerlHandler list). You should also have a look at Andreas Koenig's Apache::PassFile and Apache::GZipChain modules. The former injects a file into an OutputChain and is an excellent way of providing the input to a set of filters. The latter implements compression just as Apache::GZip does but doesn't buffer the compression stream, losing efficiency when print( ) is called for multiple small data segments.

Just as this book was going to press, Ken Williams announced Apache::Filter, a chained content handler system that uses a more devious scheme than that described here. Among the advantages of this system is that you do not have to list the components of the pipeline in reverse order.

Other Types of Stacked Handlers

Content handlers aren't the only type of Apache Perl API handler that can be stacked. Translation handlers, type handlers, authorization handlers, and in fact all types of handlers can be chained using exactly the same techniques we used for the content phase.

A particularly useful phase for stacking is the cleanup handler. Your code can use this to register any subroutines that should be called at the very end of the transaction. You can deallocate resources, unlock files, decrement reference counts, or clear globals. For example, the CGI.pm module maintains a number of package globals controlling various programmer preferences. In order to continue to work correctly in the persistent environment of mod_perl, CGI.pm has to clear these globals after each transaction. It does this by arranging for an internal routine named _reset_globals( ) to be called at the end of each transaction using this line of code:

$r->push_handlers('PerlCleanupHandler',\&CGI::_reset_globals);
 

Your program can push as many handlers as it likes, but you should remember that despite its name, the handler stack doesn't act like the classic LIFO (last-in/first-out) stack. Instead it acts like a FIFO (first-in/first-out) queue. Also remember that if the same handler is pushed twice, it will be invoked twice.

Method Handlers

It should come as no surprise that between the Apache distribution and third-party modules, there exist dozens of authentication modules, several directory indexing modules, and a couple of extended server-side include modules. All of these modules contain code that was copied and pasted from each other. In some cases all but a minuscule portion of the module consists of duplicated code.

Code duplication is not bad in and of itself, but it is wasteful of memory resources and, more important, of developers' time. It would be much better if code could be reused rather than duplicated, by using a form of object-oriented subclassing. For the C-language API there's not much hope of this. Vanilla C doesn't provide object-oriented features, while C++ would require both the Apache core and every extension module to adopt the same class hierarchy--and it's a little late in the game for this to happen.

Fortunately, the Perl language does support a simple object-oriented model that doesn't require that everyone buy into the same class hierarchy. This section describes how these object-oriented features can be used by Perl API modules to reuse code instead of duplicating it.

We've already looked at piecing together documents in various ways. Here we will explore an implementation using method handlers. There are two classes involved with this example: My::PageBase and My::Page.

Example 4-26 shows the My::PageBase class, which provides the base functionality for the family of documents derived from this class. My::PageBase stitches together a document by calling four methods: the header( ) method sends the HTTP headers, the top( ) method emits the beginning of an HTML document, including the title, the body( ) method emits the main contents of the page, and the bottom( ) method adds a common footer. My::PageBase includes generic definitions for header( ), top( ), body( ), and bottom( ), each of which can be overridden by its subclasses. These are all very simple methods. See Example 4-26 for the definitions.

The My::PageBase handler( ) method looks like this:

sub handler ($$) { 
     my($self, $r) = @_; 
     unless (ref($self)) { 
        $self = $self->new; 
     } 
     for my $meth (qw(header top body bottom)) { 
        $self->$meth($r); 
     } 
     return OK; 
 }
 

The key to using My::PageBase in an object-oriented way is the handler( ) subroutine's use of the ($$) function prototype. This tells mod_perl that the handler wants two arguments: the static class name or object, followed by the Apache request object that is normally passed to handlers. When the handler is called, it retrieves its class name or object reference and stores it in the lexical variable $self. It checks whether $self is an object reference, and if not, it calls its own new( ) method to create a new object. It then invokes the header( ), top( ), body( ), and bottom( ) methods in turn.

The My::PageBase new( ) method turn the arguments passed to it into a blessed hash in the My::PageBase package. Each key in the hash is an attribute that can be used to construct the page. We do not define any default attributes:

sub new { 
     my $class = shift; 
     bless {@_}, $class; 
 }
 

We will see later why this method is useful.

As we saw in the section on the Apache::Forward module, method handlers are configured just like any other:

<Location /my> 
   PerlHandler My::PageBase 
   SetHandler perl-script 
 </Location>
 

However, for clarity's sake, or if you use a handler method named something other than handler( ), you can use Perl's standard -> method-calling notation. You will have to load the module first with the PerlModule directive:

PerlModule My::PageBase 
 <Location /my> 
   PerlHandler My::PageBase->handler 
   SetHandler perl-script 
 </Location>
 

When My::PageBase is installed in this way and you request URI /my, you will see the exciting screen shown in Figure 4-12.

Figure 4-12. The generic document produced by My::PageBase

 

Naturally, we'll want to add a bit more spice to this page. Because the page is modularized, we can do so one step at a time by subclassing Apache::PageBase 's methods. The My::Page class does so by inheriting from the My::PageBase class and simply overriding the body( ) method.

package My::Page; 
 # file: My/Page.pm
   
 use strict; 
 use vars qw(@ISA); 
 use My::PageBase (); 
 @ISA = qw(My::PageBase);
   
 sub body { 
     my($self, $r) = @_; 
     $r->print(<<END); 
 <p><img src="/icons/cover.gif" align=CENTER> 
 This is My homepage</p>  
 <br clear=all> 
 END 
 }
   
 1; 
 _    _END_    _
 

Then change the configuration to invoke the handler( ) method via My::Page rather than My::PageBase :

PerlModule My::Page 
 <Location /my> 
   PerlHandler My::Page->handler 
   SetHandler perl-script 
 </Location>
 

Things look almost the same, but the body text has changed (Figure 4-13).

Figure 4-13. My::Page overrides the body( ) method of My::PageBase, creating a more interesting document.

 

Now we need a better title for our document. We could override the top( ) method as we did for body( ), but that would involve cutting and pasting a significant amount of HTML (see Example 4-26). Instead, we can make use of the object's title attribute, which is used by the top( ) method in this way:

my $title = $self->{title} || "untitled document";
 

So how do we set the title attribute? This is where the My::PageBase new( ) method comes in. When it is called with a set of attribute=value pairs, it blesses them into a hash reference and returns the new object. To set the title attribute, we just have to call the new( ) method like this:

use My::Page (); 
 $My::Homepage = My::Page->new(title => 'My Homepage');
 

This will create a global scalar variable in the My namespace named $My::Homepage. It's most convenient to do this during server startup--for instance, in the Perl startup file.

Now we just change the configuration section to use the object as the handler rather than the class name:

<Location /my> 
   PerlHandler $My::Homepage->handler 
   SetHandler perl-script 
 </Location>
 

The object will be retrieved by mod_perl and used to invoke the handler, which will lead to the creation of the page shown in Figure 4-14.

Figure 4-14. After creating a My::Page object with a title attribute defined, the page displays a custom title and level 1 header.

 

Example 4-26: Using a Method Handler for Object-Oriented Programming
Techniques
package My::PageBase; 
 # file: My/PageBase.pm 
   
 use strict; 
 use Apache::Constants qw(:common); 
   
 sub new { 
     my $class = shift; 
     bless {@_}, $class; 
 } 
   
 sub handler ($$) { 
     my($self, $r) = @_; 
     unless (ref($self)) { 
        $self = $self->new; 
     } 
     for my $meth (qw(header top body bottom)) { 
        $self->$meth($r); 
     } 
     return OK; 
 } 
   
 sub header { 
     my($self, $r) = @_; 
     $r->content_type($self->{type} || "text/html"); 
     $r->send_http_header; 
 } 
   
 sub top { 
     my($self, $r) = @_; 
     my $title = $self->{title} || "untitled document"; 
     $r->print(<<EOF); 
 <html> 
 <head> 
 <title>$title</title> 
 </head> 
 <body> 
 <h1>$title</h2> 
 <hr> 
 EOF 
 } 
   
 sub bottom { 
     my($self, $r) = @_; 
     my $admin = $r->server->server_admin; 
     $r->print(<<EOF); 
 <hr> 
 <i><a href="mailto:$admin">$admin</a></i> 
 </body> 
 </html> 
 EOF 
 } 
   
 sub body { 
     my($self, $r) = @_; 
     $r->print("<p>This is the document body<p>"); 
 } 
   
 1; 
 _    _END_    _
 

This wraps up our discussion of the basic techniques for generating page content, filtering files, and processing user input. The next chapter ventures into the perilous domain of imposing state on the stateless HTTP protocol. You'll learn techniques for setting up user sessions, interacting with databases, and managing long-term relationships with users.


1. At least in theory, you can divine what MIME types a browser prefers by examining the contents of the Accept header with $r->header_in('Accept'). According to the HTTP protocol, this should return a list of MIME types that the browser can handle along with a numeric preference score. The CGI.pm module even has an accept( ) function that leverages this information to choose the best format for a given document type. Unfortunately, this part of the HTTP protocol has atrophied, and neither Netscape's nor Microsoft's browsers give enough information in the Accept header to make it useful for content negotiation.

2. Certain uses of the eval operator and "here" documents are known to throw off Perl's line numbering.

3. At the time this was written, the Apache developers were discussing a layered I/O system which will be part of the Apache 2.0 API.

4. The more obvious name, Apache::Filter, is already taken by a third-party module that does output chaining in a slightly different manner.

5. For historical reasons this facility is limited to Unix versions of Netscape Navigator, to PowerPC versions of Navigator on the Macintosh, and to some other Unix-based browsers such as W3-Emacs. However, now that Navigator's source code has been released to the developer community, we hope to see a more widespread implementation of this useful feature.

6. Andreas Koenig's Apache::GzipChain module, which does much the same thing as this one, contains a hardcoded pattern match for the browser type contained in the User-Agent field. You can add this sort of test yourself if you wish, or wait for the browser developers to implement Accept-Encoding correctly.

7. At the time this chapter was being prepared, the author of Compress::Zlib, Paul Marquess, was enhancing his library to make this manual manipulation of the compressed output stream unnecessary.

Back to: Writing Apache Modules with Perl and C


oreilly.com Home | O'Reilly Bookstores | How to Order | O'Reilly Contacts
International | About O'Reilly | Affiliated Companies | Privacy Policy

© 2001, O'Reilly & Associates, Inc.

Оставьте свой комментарий !

Ваше имя:
Комментарий:
Оба поля являются обязательными

 Автор  Комментарий к данной статье