or:

and:

LINUX

Language

Kernel

Package

Book

Test

Forum

iakovlev.org

Cute Tricks With Perl and Apache

Автор: Lincoln Stein

O'Reilly Perl Conference, August 17-20, 1998, San Jose.

PART I: WEB SITE - АНАЛИЗ и СТАТИСТИКА
PART II: CGI TRICKS
PART III: MOD_PERL

PART I : WEB SITE - АНАЛИЗ и СТАТИСТИКИ

Logs! Logs! Logs!

Логи на веб-сервере могут расти как на дрожжах и пожирать ресурсы. Но не торопитесь удалять их . Логи - наши друзья.

Скрипт I.1.1 показывает пример ротации лог-файлов . Он переименовывает текущий ``access_log'' в ``access_log.0'', ``access_log.0'' в ``access_log.1'', и т.д. самый старый удаляется. Запускайте его из-под крона.

---------------- Script I.1.1: Basic Log File Rotation ----------

 #!/usr/local/bin/perl
  $LOGPATH='/usr/local/apache/logs';
  @LOGNAMES=('access_log','error_log','referer_log','agent_log');
  $PIDFILE = 'httpd.pid';
  $MAXCYCLE = 4;

 chdir $LOGPATH;  # Change to the log directory
  foreach $filename (@LOGNAMES) {
     for (my $s=$MAXCYCLE; $s--; $s >= 0 ) {
         $oldname = $s ? "$filename.$s" : $filename;
         $newname = join(".",$filename,$s+1);
         rename $oldname,$newname if -e $oldname;
     }
  }
  kill 'HUP',`cat $PIDFILE`;

-----------------------------------------------------------------

Ротация и архивация логов.

Некоторые не хотят ничего удалять. Скрипт I.1.2 добавляет старейщий лог в zip-архив.

---------- Script I.1.2: Log File Rotation and Archiving ---------

 #!/usr/local/bin/perl
  $LOGPATH    = '/usr/local/apache/logs';
  $PIDFILE    = 'httpd.pid';
  $MAXCYCLE   = 4;
  $GZIP       = '/bin/gzip';

 @LOGNAMES=('access_log','error_log','referer_log','agent_log');
  %ARCHIVE=('access_log'=>1,'error_log'=>1);

 chdir $LOGPATH;  # Change to the log directory
  foreach $filename (@LOGNAMES) {
    system "$GZIP -c $filename.$MAXCYCLE >> $filename.gz" 
         if -e "$filename.$MAXCYCLE" and $ARCHIVE{$filename};
     for (my $s=$MAXCYCLE; $s--; $s >= 0 ) {
         $oldname = $s ? "$filename.$s" : $filename;
         $newname = join(".",$filename,$s+1);
         rename $oldname,$newname if -e $oldname;
     }
  }
  kill 'HUP',`cat $PIDFILE`;

-----------------------------------------------------------------

Ротация,криптование и архивация логов

Скрипт I.1.3 использует idea (часть пакета SSLEay ) для криптования логов перед компрессией для предотвращения доступа к ним . Для этого нужен архиватор tar.

---------- Script I.1.3: Log File Rotation and Encryption ---------

 #!/usr/local/bin/perl
  use POSIX 'strftime';
  
  $LOGPATH     = '/home/www/logs';
  $PIDFILE     = 'httpd.pid';
  $MAXCYCLE    = 4;
  $IDEA        = '/usr/local/ssl/bin/idea';
  $GZIP        = '/bin/gzip';
  $TAR         = '/bin/tar';
  $PASSWDFILE  = '/home/www/logs/secret.passwd';
  
  @LOGNAMES=('access_log','error_log','referer_log','agent_log');
  %ARCHIVE=('access_log'=>1,'error_log'=>1);
  
  chdir $LOGPATH;  # Change to the log directory
  foreach $filename (@LOGNAMES) {
      my $oldest = "$filename.$MAXCYCLE";
      archive($oldest) if -e $oldest and $ARCHIVE{$filename};
      for (my $s=$MAXCYCLE; $s--; $s >= 0 ) {
          $oldname = $s ? "$filename.$s" : $filename;
          $newname = join(".",$filename,$s+1);
          rename $oldname,$newname if -e $oldname;
      }
  }
  kill 'HUP',`cat $PIDFILE`;
  
  sub archive {
      my $f = shift;
      my $base = $f;
      $base =~ s/\.\d+$//;
      my $fn = strftime("$base.%Y-%m-%d_%H:%M.gz.idea",localtime);
      system "$GZIP -9 -c $f | $IDEA -kfile $PASSWDFILE > $fn";
      system "$TAR rvf $base.tar --remove-files $fn";
  }

-----------------------------------------------------------------

Парсинг логов

Скрипт I.1.4 показывает организацию доступа к логу с помощью регулярных выражений :

portio.cshl.org - - [03/Feb/1998:17:42:15 -0500] ``GET /pictures/small_logo.gif HTTP/1.0'' 200 2172

------------- Script I.1.4: Basic Log Parsing -------------

 #!/usr/local/bin/perl

 $REGEX=/^(\S+) (\S+) (\S+) \[([^]]+)\] "(\w+) (\S+).*" (\d+) (\S+)/;
  while (<>) {
     ($host,$rfc931,$user,$date,$request,$URL,$status,$bytes) = m/$REGEX/o;
      &collect_some_statistics;
  }
  &print_some_statistics;

 sub collect_some_statistics {
    # for you to fill in
  }

 sub print_some_statistics {
    # for you to fill in
  }

-----------------------------------------------------------

Скрипт I.1.5 сканирует лог на предмет статус-кодов и печатает урлы.

 find_status.pl -t10 200 ~www/logs/access_log

 TOP 10 URLS/HOSTS WITH STATUS CODE 200:

    REQUESTS  URL/HOST
     --------  --------
       1845    /www/wilogo.gif
       1597    /cgi-bin/contig/sts_by_name?database=release
       1582    /WWW/faqs/www-security-faq.html
       1263    /icons/caution.xbm
        930    /
        886    /ftp/pub/software/WWW/cgi_docs.html
        773    /cgi-bin/contig/phys_map
        713    /icons/dna.gif
        686    /WWW/pics/small_awlogo.gif
  
 ---------- Script I.1.5: Find frequent status codes ---------
  
  #!/usr/local/bin/perl
  # File: find_status.pl
  
  require "getopts.pl";
  &Getopts('L:t:h') || die <<USAGE;
  Usage: find_status.pl [-Lth] <code1> <code2> <code3> ...
         Scan Web server log files and list a summary
         of URLs whose requests had the one of the
         indicated status codes.
  Options:
         -L <domain>  Ignore local hosts matching this domain
         -t <integer> Print top integer URLS/HOSTS [10]
         -h           Sort by host rather than URL
  USAGE
      ;
  if ($opt_L) {
      $opt_L=~s/\./\\./g;
      $IGNORE = "(^[^.]+|$opt_L)\$";
  }
  $TOP=$opt_t || 10;
  
  while (@ARGV) {
      last unless $ARGV[0]=~/^\d+$/;
      $CODES{shift @ARGV}++;
  }
  
  while (<>) {
      ($host,$rfc931,$user,$date,$request,$URL,$status,$bytes) =
          /^(\S+) (\S+) (\S+) \[([^]]+)\] "(\w+) (\S+).*" (\d+) (\S+)/;
      next unless $CODES{$status};
      next if $IGNORE && $host=~/$IGNORE/io;
      $info = $opt_h ? $host : $URL;
      $found{$status}->{$info}++;
  }
  
  foreach $status (sort {$a<=>$b;} sort keys %CODES) {
      $info = $found{$status};
      $count = $TOP;
      foreach $i (sort {$info->{$b} <=> $info->{$a};} keys %{$info}) {
          write;
          last unless --$count;
      }
      $- = 0;  # force a new top-of-report
  }
  
  format STDOUT_TOP=
  
  TOP @## URLS/HOSTS WITH STATUS CODE @##:
      $TOP,                      $status
  
      REQUESTS  URL/HOST
      --------  --------
  .
  format STDOUT=
      @#####    @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      $info->{$i},$i
  .

-----------------------------------------------------------

DNS

В лог-файл включается ip-адреса , с которых пришел запрос. Скрипт I.1.6 делает перевод ip в DNS-имена . Скрипт поддерживает кэширование DNS. Если ip-шник не резолвится через 2 секунды, больше мы к этому не возвращаемся.

----------- Script I.1.6: Reverse DNS Resolution -----------

 #!/usr/local/bin/perl
  
  use constant TIMEOUT => 2;
  $SIG{ALRM} = sub {die "timeout"};
  
  while (<>) {
      s/^(\S+)/lookup($1)/e;
  } continue { 
      print;
  }
  
  sub lookup {
      my $ip = shift;
      return $ip unless $ip=~/\d+\.\d+\.\d+\.\d+/;
      return $CACHE{$ip} if exists $CACHE{$ip};
      my @h = eval <<'END';
      alarm(TIMEOUT);
      my @i = gethostbyaddr(pack('C4',split('\.',$ip)),2);
      alarm(0);
      @i;
  END
      $CACHE{$ip} = $h[0];
      return $CACHE{$ip} || $ip;
  }

-------------------------------------------------------

Роботы

Скрипт I.1.7 работает следующим образом :

         1. Мы заранее оговариваемся , что любой внешний запрос, который
         в течение 30 минут не меняет свой ip-шник и хидер "user agent" -
         это в общем одно и то же лицо (или робот)

         2. если вы фетчите /robots.txt - вы робот

        
         3. Мы считаем общее число запросов

        
        4. Считаем интервалы между запросами

        
 5.Подсчитываем число запросов в единицу времени.У роботов оно выше,чем у людей

        6. Все выводим в таблицу

Отсюда можно сделать о вывод о наличии "плохих" роботов.

------------------------- Script I.1.7: Robo-Cop ----------------------

 #!/usr/local/bin/perl
  
  use Time::ParseDate;
  use strict 'vars';
  
  # after 30 minutes, we consider this a new session
  use constant MAX_INTERVAL => 60*30;  
  my (%HITS,%INT_NUMERATOR,%INT_DENOMINATOR,%POLITE,%LAST,$HITS);
  
  # This uses a non-standard agent log with lines formatted like this:
  
  my $file = shift;
 open(IN,$file=~/\.gz$/ ? "zcat $file|":$file)||die "Can't open file/pipe: $!"; 
  
  while (<IN>) {
      my($date,$host,$agent,$URL) = /^\[(.+)\] (\S+) "(.*)" (\S+)$/;
      next unless $URL=~/\.(html|htm|txt)$/;
  
      $HITS++;
      $host = "$host:$agent"; # concatenate host and agent
      $HITS{$host}++;
      my $seconds = parsedate($date);
      if ($LAST{$host}) {
         my $interval = $seconds - $LAST{$host};
         if ($interval < MAX_INTERVAL) {
             $INT_NUMERATOR{$host} += $interval;
             $INT_DENOMINATOR{$host}++;
         }
      }
      $LAST{$host} = $seconds;
      $POLITE{$host}++ if $URL eq '/robots.txt';
      print STDERR $HITS,"\n" if ($HITS % 1000) == 0;
  }
  
  # print out, sorted by hits
  print join("\t",qw/Client Robot Hits Interval Hit_Percent Index/),"\n";
  foreach (sort {$HITS{$b}<=>$HITS{$a}} keys %HITS) {
 next unless $HITS{$_} >= 5;             # not enough total hits to mean much
 next unless $INT_DENOMINATOR{$_} >= 5;#not enough consecutive hits to mean much
  
      my $mean_interval = $INT_NUMERATOR{$_}/$INT_DENOMINATOR{$_};
      my $percent_hits = 100*($HITS{$_}/$HITS);
      my $index = $percent_hits/$mean_interval;
  
      print join("\t",
                $_,
                $POLITE{$_} ? 'yes' : 'no',
                $HITS{$_},
                $mean_interval,
                $percent_hits,
                $index
                ),"\n";
  }

-----------------------------------------------------------------

Работа с syslog

Если у вас большой кластерный сайт с разнесенными серверами , то у вас может возникнуть проблема с логами , которые разнесены , и которые надо сводить . Апач позволяет выводить лог не только в файл , но и в процесс (перловый скрипт). С помощью Syslog-модуля , написанного Tom Christiansen's , можно посылать информацию на удаленный syslog daemon.

В конфиге апача пишем следующее :

  <VirtualHost www.company2.com>
      CustomLog "| /usr/local/apache/bin/logger company2" common
      # blah blah
   </VirtualHost>

Если центральный хост называется ``loghost'' , то в syslog.con у других веб-серверов пишем:

  local0.info                   @loghost

У центрального веб-сервера пишем в syslog.conf:

  local0.info                   /var/log/web/access_log

Скрипт I.1.8 показывает код для ``logger'' программы:

------------------- Script I.1.8 ``logger'' ------------------

 #!/usr/local/bin/perl
  # script: logger

 use Sys::Syslog;

 $SERVER_NAME = shift || 'www';
  $FACILITY = 'local0';
  $PRIORITY = 'info';

 Sys::Syslog::setlogsock('unix');
  openlog ($SERVER_NAME,'ndelay',$FACILITY);
  while (<>) {
      chomp;
      syslog($PRIORITY,$_);
  }
  closelog;

-------------------------------------------------------------

Сервер упал и не встает

Веб-сайты могут стать неустойчивыми , если в их поддержке участвует большое число разработчиков.Скрипты в этой секции наблюдают за сервером и посылают e-mail в случае проблемы.

Мониторинг локального сервера.

Следующий скрипт проверяет основной серверный процесс и посылает SOS в случае аварии . Запускается под рутом .

------------------------ I.2.1 ``localSOS'' --------------------

 #!/usr/local/bin/perl
  # script: localSOS

 use constant PIDFILE  => '/usr/local/apache/var/run/httpd.pid';
  $MAIL                 =  '/usr/sbin/sendmail';
  $MAIL_FLAGS           =  '-t -oi';
  $WEBMASTER            =  'webmaster';

 open (PID,PIDFILE) || die PIDFILE,": $!\n";
  $pid = <PID>;  close PID;
  kill 0,$pid || sos();

 sub sos {
    open (MAIL,"| $MAIL $MAIL_FLAGS") || die "mail: $!";
    my $date = localtime();
    print MAIL <<END;
  To: $WEBMASTER
  From: The Watchful Web Server Monitor <nobody>
  Subject: Web server is down

 I tried to call the Web server at $date but there was
  no answer.

 Respectfully yours,

 The Watchful Web Server Monitor   
  END
    close MAIL;
  }

--------------------------------------------------------------

Мониторинг удаленного сервера

Скрипт I.2.2 использует LWP библиотеку для посылки HEAD-запроса . Запускается без привилегий .

------------------------ I.2.2 ``remoteSOS'' --------------------

 #!/usr/local/bin/perl
  # script: remoteSOS

 use LWP::Simple;
  %SERVERS = (
         "Fred's server"   => 'http://www.fred.com',
         "Martha's server" => 'http://www.stewart-living.com',
         "Bill's server"   => 'http://www.whitehouse.gov'
         );
  $MAIL                 =  '/usr/sbin/sendmail';
  $MAIL_FLAGS           =  '-t -oi';
  $WEBMASTER            =  'webmaster';

 foreach (sort keys %SERVERS) {
     sos($_) unless head($SERVERS{$_});
  }

 sub sos {
    my $server = shift;
    open (MAIL,"| $MAIL $MAIL_FLAGS") || die "mail: $!";
    my $date = localtime();
    print MAIL <<END;
  To: $WEBMASTER
  From: The Watchful Web Server Monitor <nobody>
  Subject: $server is down

 I tried to call $server at $date but there was
  no one at home.

 Respectfully yours,

 The Watchful Web Server Monitor   
  END
    close MAIL;
  }

--------------------------------------------------------------

Запуск сервера

Скрипт I.2.3 пытается перезапустить сервер . Запускается под рутом .

 ------------------------ I.2.2 "webLazarus" --------------------
  
  #!/usr/local/bin/perl
  # script: webLazarus
  
  use LWP::Simple;
  use constant URL       => 'http://presto.capricorn.com/';
  use constant APACHECTL => '/usr/local/apache/bin/apachectl';
  $MAIL                  =  '/usr/sbin/sendmail';
  $MAIL_FLAGS            =  '-t -oi';
  $WEBMASTER             =  'lstein@prego.capricorn.com';
  
  head(URL) || resurrect();
  
  sub resurrect {
      open (STDOUT,"| $MAIL $MAIL_FLAGS") || die "mail: $!";
      select STDOUT; $| = 1;
      open (STDERR,">&STDOUT");
  
      my $date = localtime();
      print <<END;
  To: $WEBMASTER
  From: The Watchful Web Server Monitor <nobody>
  Subject: Web server is down
 
  I tried to call the Web server at $date but there was
  no answer.  I am going to try to resurrect it now:
  
  Mumble, mumble, mumble, shazzzzammmm!
  
  END
      ;
  
      system APACHECTL,'restart';
      
      print <<END;
  
  That's the best I could do.  Hope it helped.
  
  Worshipfully yours,
  
  The Web Monitor
  END
      close STDERR;
      close STDOUT;
  }

--------------------------------------------------------------

Сообщение , которое может быть получено от скрипта :

 Date: Sat, 4 Jul 1998 14:55:38 -0400
  To: lstein@prego.capricorn.com
  Subject: Web server is down

 I tried to call the Web server at Sat Jul  4 14:55:37 1998 but there was
  no answer.  I am going to try to resurrect it now:

 Mumble, mumble, mumble, shazzzzammmm!

 /usr/local/apache/bin/apachectl restart: httpd not running, trying to start
  [Sat Jul  4 14:55:38 1998] [debug] mod_so.c(258): loaded module setenvif_module
  [Sat Jul  4 14:55:38 1998] [debug] mod_so.c(258): loaded module unique_id_module
  /usr/local/apache/bin/apachectl restart: httpd started

 That's the best I could do.  Hope it helped.

 Worshipfully yours,

 The Web Monitor

Репликация и зеркала .

Для того чтобы иметь зеркало на страницу другого сайта , можно использовать библиотеку LWP .

Зеркало для страницы

 % ./MirrorOne.pl
  cats.html: Not Modified
  dogs.html: OK
  gillie_fish.html: Not Modified

----------------------Script I.3.1 mirrorOne.pl--------------------

 #!/usr/local/bin/perl
  # mirrorOne.pl

 use LWP::Simple;
  use HTTP::Status;

 use constant DIRECTORY => '/local/web/price_lists';
  %DOCUMENTS = (
         'dogs.html'  => 'http://www.pets.com/dogs/price_list.html',
         'cats.html'  => 'http://www.pets.com/cats/price_list.html',
         'gillie_fish.html' => 'http://aquaria.com/prices.html'
         );
  chdir DIRECTORY;
  foreach (sort keys %DOCUMENTS) {
     my $status = mirror($DOCUMENTS{$_},$_);
     warn "$_: ",status_message($status),"\n";
  }

-------------------------------------------------------------------

Зеркало для дерева документов

С помощью рекурсии это можно сделать и для дерева документов . Скрипт I.3.2 делает зеркало с помощью модуля LWP HTML::LinkExtor для получения HTML links.

----------------------Script I.3.2 mirrorTree.pl--------------------

 #!/usr/local/bin/perl
  
  # File: mirrorTree.pl
  
  use LWP::UserAgent;
  use HTML::LinkExtor;
  use URI::URL;
  use File::Path;
  use File::Basename;
  %DONE    = ();
  
  my $URL = shift;
  
  $UA     = new LWP::UserAgent;
  $PARSER = HTML::LinkExtor->new();
  $TOP    = $UA->request(HTTP::Request->new(HEAD => $URL));
  $BASE   = $TOP->base;
  
  mirror(URI::URL->new($TOP->request->url));
  
  sub mirror {
      my $url = shift;
  
      # get rid of query string "?" and fragments "#"
      my $path = $url->path;
      my $fixed_url = URI::URL->new ($url->scheme . '://' . $url->netloc . $path);
  
      # make the URL relative
      my $rel = $fixed_url->rel($BASE);
      $rel .= 'index.html' if $rel=~m!/$! || length($rel) == 0;
  
      # skip it if we've already done it
      return if $DONE{$rel}++;
  
      # create the directory if it doesn't exist already
      my $dir = dirname($rel);
      mkpath([$dir]) unless -d $dir;
  
      # mirror the document
      my $doc = $UA->mirror($fixed_url,$rel);
      print STDERR "$rel: ",$doc->message,"\n";
      return if $doc->is_error;
  
      # Follow HTML documents
      return unless $rel=~/\.html?$/i;
      my $base = $doc->base;
      
      # pull out the links and call us recursively
      my @links = $PARSER->parse_file("$rel")->links;
      my @hrefs = map { url($_->[2],$base)->abs } @links;
  
      foreach (@hrefs) {
         next unless is_child($BASE,$_);
         mirror($_);
      }
  
  }
  
  sub is_child {
      my ($base,$url) = @_;
      my $rel = $url->rel($base);
      return ($rel ne $url) && ($rel !~ m!^[/.]!);
  }

 --------------------------------------------------------------

Плохие линки

Скрипт I.3.3 проверяет документ на предмет плохих http:, ftp: и gopher: линков .

  % find_bad_links http://prego/apache-1.2/
  checking http://prego/apache-1.2/...
  checking http://prego/apache-1.2/manual/...
  checking http://prego/apache-1.2/manual/misc/footer.html...
  checking http://prego/apache-1.2/manual/misc/header.html...
  checking http://prego/apache-1.2/manual/misc/nopgp.html...
  checking http://www.yahoo.com/Science/Mathematics/Security_and_Encryption/...
  checking http://www.eff.org/pub/EFF/Policy/Crypto/...
  checking http://www.quadralay.com/www/Crypt/Crypt.html...
  checking http://www.law.indiana.edu/law/iclu.html...
  checking http://bong.com/~brian...
  checking http://prego/apache-1.2/manual/cgi_path.html...
  checking http://www.ics.uci.edu/pub/ietf/http/...
    . 
    . 
    .
  BAD LINKS:
  manual/misc/known_bugs.html : http://www.apache.org/dist/patches/apply_to_1.2b6/
  manual/misc/fin_wait_2.html : http://www.freebsd.org/
  manual/misc/fin_wait_2.html : http://www.ncr.com/
  manual/misc/compat_notes.html : http://www.eit.com/
  manual/misc/howto.html : http://www.zyzzyva.com/robots/alert/
  manual/misc/perf.html : http://www.software.hp.com/internet/perf/tuning.html
  manual/misc/perf.html : http://www.qosina.com/~awm/apache/linux-tcp.html
  
  152 documents checked
  11 bad links

----------------------Script I.3.3 mirrorTree.pl--------------------

 #!/usr/local/bin/perl

 # File: find_bad_links.pl
  
  use LWP::UserAgent;
  use HTML::LinkExtor;
  use URI::URL;

 %CAN_HANDLE = ('http'=>1,
                'gopher'=>1,
                # 'ftp'=>1,   # timeout problems?
                );
  %OUTCOME = ();
  $CHECKED = $BAD = 0;
  @BAD = ();

 my $URL = shift;

 $UA     = new LWP::UserAgent;
  $PARSER = HTML::LinkExtor->new();
  $TOP    = $UA->request(HTTP::Request->new(HEAD => $URL));
  $BASE   = $TOP->base;

 check_links(URI::URL->new($TOP->request->url));
  if (@BAD) {
     print "\nBAD LINKS:\n";
     print join("\n",@BAD),"\n\n";
  }
  print "$CHECKED documents checked\n",scalar(@BAD)," bad links\n";

 sub check_links {
     my $url = shift;
     my $fixed_url = $url;
     $fixed_url =~ s/\#.+$//;

    return 1 unless $CAN_HANDLE{$url->scheme};

    # check cached outcomes
     return $OUTCOME{$fixed_url} if exists $OUTCOME{$fixed_url};

    print STDERR "checking $fixed_url...\n";
     $CHECKED++;

    my $rel = $url->rel($BASE) || 'index.html';
     my $child = is_child($BASE,$url);
     $UA->timeout(5);
 my $doc = $d = $UA->request(HTTP::Request->new(($child ? 'GET':'HEAD')=>$url));
     $OUTCOME{$fixed_url} = $doc->is_success;

    return $OUTCOME{$fixed_url} 
        unless $child && $doc->header('Content-type') eq 'text/html';

    # Follow HTML documents
     my $base = $doc->base;
     
     # pull out the links and call us recursively
     my @links = $PARSER->parse($doc->content)->links;
     my @hrefs = map { url($_->[2],$base)->abs } @links;

    foreach (@hrefs) {
         next if check_links($_);
         push (@BAD,"$rel : $_");
     }
     1;
  }

 sub is_child {
     my ($base,$url) = @_;
     my $rel = $url->rel($base);
     return ($rel ne $url) && ($rel !~ m!^[/.]!);
  }

--------------------------------------------------------------------

Загрузка

Что делать , если сервер не справляется с нагрузкой , которая начинает тормозить ?

Один из вариантов решения - несколько веб-серверов с разными хостами и ip-шниками . Следующий скрипт запускается на центральном сервере . Он использует IO::Socket и слушает запросы на 80 порту. Меняет свою привилегию на nobody.nogroup, словно он настоящий Web server. Затем переходит в цикл accept() . При каждом поступающем запросе он форкает дочерний процесс , который читает этот запрос и перенаправляет его на другой веб-сервер.

---------------- Script I.4.1: A Load Balancing ``Web Server'' ---------

 #!/usr/local/bin/perl
  
  # list of hosts to balance between
  @HOSTS = qw/www1.web.org www2.web.org www3.web.org www4.web.org/;
  
  use IO::Socket;
  $SIG{CHLD} = sub { wait() };
  $ENV{'PATH'}='/bin:/usr/bin';
  chomp($hostname = `/bin/hostname`);
  
  # Listen on port 80
  $sock = IO::Socket::INET->new(Listen  => 5,
                               LocalPort => 80,
                               LocalAddr => $hostname,
                               Reuse     => 1,
                               Proto    => 'tcp');
 
  # become "nobody"
  $nobody  = (getpwnam('nobody'))[2]  || die "nobody is nobody";
  $nogroup = (getgrnam('nogroup'))[2] || die "can't grok nogroup";
  ($<,$() = ($>,$)) = ($nobody,$nogroup); # get rid of root privileges!
  ($\,$/) = ("\r\n","\r\n\r\n");          # CR/LF on output/input
  
  # Go into server mode
  close STDIN; close STDOUT; close STDERR;
  
  # prefork -- gee is that all there is to it?
  fork() && fork() && fork() && fork() && exit 0;
  
  # start accepting connections
  while (my $s = $sock->accept()) {
      do { $s->close; next; } if fork();
      my $request = <$s>;
      redirect($1,$s) if $request=~/(?:GET|POST|HEAD|PUT)\s+(\S+)/;
      $s->flush;
      undef $s;
      exit 0;
  }
  
  sub redirect {
      my ($url,$s) = @_;
      my $host = $HOSTS[rand(@HOSTS)];
      print $s "HTTP/1.0 301 Moved Temporarily";
      print $s "Server: Lincoln's Redirector/1.0";
      print $s "Location: http://${host}${url}";
      print $s "";
  }

----------------------------------------------------------------------

Нагрузочное тестирование сервера

Любой сервер , написанный на си , рискует иметь переполнение буфера. Следующий скрипт I.2.3 выполняет нагрузочное тестирование сайта путем посылки большого количества данных .

Например , вот что можно увидеть при крахе сервера :

  % torture.pl -t 1000 -l 5000 http://www.capricorn.com
   torture.pl version 1.0 starting
   Base URL:               http://www.capricorn.com/cgi-bin/search
   Max random data length: 5000
   Repetitions:            1000
   Post:                   0
   Append to path:         0
   Escape URLs:            0

  200 OK
   200 OK
   200 OK
   200 OK
   200 OK
   500 Internal Server Error
   500 Could not connect to www.capricorn.com:80
   500 Could not connect to www.capricorn.com:80
   500 Could not connect to www.capricorn.com:80

---------------------Script I.5.1: torture tester------------------

 #!/usr/local/bin/perl
  
  # file: torture.pl
  # Torture test Web servers and scripts by sending them large arbitrary URLs
  # and record the outcome.
  
  use LWP::UserAgent;
  use URI::Escape 'uri_escape';
  require "getopts.pl";
  
  $USAGE = <<USAGE;
  Usage: $0 -[options] URL
  Torture-test Web servers and CGI scripts
  
  Options:
   -l <integer>  Max length of random URL to send [1024 bytes]
   -t <integer>  Number of times to run the test [1]
   -P            Use POST method rather than GET method
   -p            Attach random data to path rather than query string
   -e            Escape the query string before sending it
  USAGE
  ;
  $VERSION = '1.0';
  
  # process command line
  &Getopts('l:t:Ppe') || die $USAGE;
  
  # get parameters
  $URL    = shift || die $USAGE;
  $MAXLEN = $opt_l ne '' ? $opt_l : 1024;
  $TIMES  = $opt_t || 1;
  $POST   = $opt_P || 0;
  $PATH   = $opt_p || 0;
  $ESCAPE = $opt_e || 0;
  
  # cannot do both a post and a path at the same time
  $POST = 0 if $PATH;
  
  # create an LWP agent
  my $agent = new LWP::UserAgent;
  
  print <<EOF;
  torture.pl version $VERSION starting
  Base URL:               $URL
  Max random data length: $MAXLEN
  Repetitions:            $TIMES
  Post:                   $POST
  Append to path:         $PATH
  Escape URLs:            $ESCAPE
  
  EOF
  ;
  
  # Do the test $TIMES times
  while ($TIMES) {
      # create a string of random stuff
      my $garbage = random_string(rand($MAXLEN));
      $garbage = uri_escape($garbage) if $ESCAPE;
      my $url = $URL;
      my $request;
  
      if (length($garbage) == 0) { # if no garbage to add, just fetch URL
         $request = new HTTP::Request ('GET',$url);
      }
  
      elsif ($POST) {            # handle POST request
         my $header = new HTTP::Headers (
                   Content_Type => 'application/x-www-form-urlencoded',
                                         Content_Length => length($garbage)
                                         );
         # garbage becomes the POST content
         $request = new HTTP::Request ('POST',$url,$header,$garbage);
         
      } else {                   # handle GET request
         
         if ($PATH) {            # append garbage to the base URL
             chop($url) if substr($url,-1,1) eq '/'; 
             $url .= "/$garbage";
         } else {                # append garbage to the query string
             $url .= "?$garbage";
         }
         
         $request = new HTTP::Request('GET',$url);
      }
      
      # do the request and fetch the response
      my $response = $agent->request($request);
      
      # print the numeric response code and the message
      print $response->code,' ',$response->message,"\n";
  
  } continue { $TIMES-- }
  
  # return some random data of the requested length
  sub random_string {
      my $length = shift;
      return undef unless $length >= 1;
      return join('',map chr(rand(255)),0..$length-1);
  }
  
 --------------------------------------------------------------

PART II: CGI TRICKS

Эта секция включает набор скриптов , который иллюстрируют базовые типы CGI скриптования: динамические документы, фильтры, URL редирект.

Динамические документы

В CGI можно создавать документы на лету.

HTML-документы

<I> <hate> <HTML> <because> <it's> <ugly> <and> <has> <too> <many> <$#@*&> <angle> <brackets>.
Скрипт II.1.1 показывает работу с этим списком тегов с помощью CGI.pm.

--------------------Script II.1.1: vegetables1.pl ------------------

 #!/usr/bin/perl
  # Script: vegetables1.pl
  use CGI ':standard';
  print header,
     start_html('Vegetables'),
     h1('Eat Your Vegetables'),
     ol(
        li('peas'),
        li('broccoli'),
        li('cabbage'),
        li('peppers',
           ul(
              li('red'),
              li('yellow'),
              li('green')
              )
           ),
        li('kolrabi'),
        li('radishes')
        ),
     hr,
     end_html;

--------------------------------------------------------------------

Простой HTML

То же самое , но еще проще :

--------------------Script II.1.2: vegetables2.pl ------------------

 #!/usr/bin/perl
  # Script: vegetables2.pl
  use CGI ':standard';
  print header,
     start_html('Vegetables'),
     h1('Eat Your Vegetables'),
     ol(
        li(['peas',
            'broccoli',
            'cabbage',
            'peppers' .
            ul(['red','yellow','green']),
            'kolrabi',
            'radishes'
        ),
     hr,
     end_html;

--------------------------------------------------------------------

Или так :

--------------------Script II.1.3: vegetables3.pl ------------------

 #!/usr/bin/perl

 # Script: vegetables3.pl
  use CGI qw/:standard :html3/;

 print header,
     start_html('Vegetables'),
     h1('Vegetables are for the Strong'),
     table({-border=>undef},
           caption(strong('When Should You Eat Your Vegetables?')),
           Tr({-align=>CENTER,-valign=>TOP}, 
              [
               th(['','Breakfast','Lunch','Dinner']),
               th('Tomatoes').td(['no','yes','yes']),
               th('Broccoli').td(['no','no','yes']),
               th('Onions').td(['yes','yes','yes'])
               ]
              )
           ),
     end_html;

--------------------------------------------------------------------

Формы

CGI.pm имеет набор функций для генерации форм и чтении ее контента при сабмите. Скрипт II.1.4 создает строку радиобатонов разных цветов. После сабмита страница рефрешит себя с выбраным цветом :

------------------Script II.1.4: customizable.pl ------------------

 #!/usr/bin/perl
  # script: customizable.pl
  
  use CGI qw/:standard/;
  
  $color = param('color') || 'white';
  
  print header,
      start_html({-bgcolor=>$color},'Customizable Page'),
      h1('Customizable Page'),
      "Set this page's background color to:",br,
      start_form,
      radio_group(-name=>'color',
                     -value=>['white','red','green','black',
                              'blue','silver','cyan'],
                     -cols=>2),
      submit(-name=>'Set Background'),
      end_form,
      p,
      hr,
      end_html;
  
 --------------------------------------------------------------------
  
 =head3 Making Stateful Forms

Скрипт II.1.5 - пример приложения с 5-ю страницами. С помощью навигации можно перемещаться по ним.Состояние страниц запоминается с помощью скрытых полей :

------------------------Script II.1.5: loan.pl ---------------------

 #!/usr/local/bin/perl
  
  # script: loan.pl
  use CGI qw/:standard :html3/;
  
  # this defines the contents of the fill out forms
  # on each page.
  @PAGES = ('Personal Information','References','Assets','Review','Confirmation');
  %FIELDS = ('Personal Information' => ['Name','Address','Telephone','Fax'],
   'References'           => ['Personal Reference 1','Personal Reference 2'],
            'Assets'               => ['Savings Account','Home','Car']
            );
  # accumulate the field names into %ALL_FIELDS;
  foreach (values %FIELDS) {
      grep($ALL_FIELDS{$_}++,@$_);
  }
  
  
  # figure out what page we're on and where we're heading.
  $current_page = calculate_page(param('page'),param('go'));
  $page_name = $PAGES[$current_page];
  
  print_header();
  print_form($current_page)         if $FIELDS{$page_name};
  print_review($current_page)       if $page_name eq 'Review';
  print_confirmation($current_page) if $page_name eq 'Confirmation';
  print end_html;
  
  # CALCULATE THE CURRENT PAGE
  sub calculate_page {
      my ($prev,$dir) = @_;
      return 0 if $prev eq '';   # start with first page
      return $prev + 1 if $dir eq 'Submit Application';
      return $prev + 1 if $dir eq 'Next Page';
      return $prev - 1 if $dir eq 'Previous Page';
  }
  
  # PRINT HTTP AND HTML HEADERS
  sub print_header {
      print header,
      start_html("Your Friendly Family Loan Center"),
      h1("Your Friendly Family Loan Center"),
      h2($page_name);
  }
  
  # PRINT ONE OF THE QUESTIONNAIRE PAGES
  sub print_form {
      my $current_page = shift;
      print "Please fill out the form completely and accurately.",
         start_form,
         hr;
      draw_form(@{$FIELDS{$page_name}});
      print hr;
      print submit(-name=>'go',-value=>'Previous Page') 
         if $current_page > 0;
      print submit(-name=>'go',-value=>'Next Page'),
         hidden(-name=>'page',-value=>$current_page,-override=>1),
         end_form;
  }
  
  # PRINT THE REVIEW PAGE
  sub print_review {
      my $current_page = shift;
      print "Please review this information carefully before submitting it. ",
         start_form;
      my (@rows);
      foreach $page ('Personal Information','References','Assets') {
         push(@rows,th({-align=>LEFT},em($page)));
         foreach $field (@{$FIELDS{$page}}) {
             push(@rows,
                  TR(th({-align=>LEFT},$field),
                     td(param($field)))
                  );
             print hidden(-name=>$field);
         }
      }
      print table({-border=>1},caption($page),@rows),
         hidden(-name=>'page',-value=>$current_page,-override=>1),
         submit(-name=>'go',-value=>'Previous Page'),
         submit(-name=>'go',-value=>'Submit Application'),
         end_form;
  }
  
  # PRINT THE CONFIRMATION PAGE
  sub print_confirmation {
      print "Thank you. A loan officer will be contacting you shortly.",
         p,
         a({-href=>'../source.html'},'Code examples');
  }
  
  
  # CREATE A GENERIC QUESTIONNAIRE
  sub draw_form {
      my (@fields) = @_;
      my (%fields);
      grep ($fields{$_}++,@fields);
      my (@hidden_fields) = grep(!$fields{$_},keys %ALL_FIELDS);
      my (@rows);
      foreach (@fields) {
         push(@rows,
              TR(th({-align=>LEFT},$_),
                 td(textfield(-name=>$_,-size=>50))
                 )
              );
      }
      print table(@rows);
  
      foreach (@hidden_fields) {
          print hidden(-name=>$_);
      }
  }

--------------------------------------------------------------------

Куки

Скрипт II.1.6 запоминает имя пользователя , выбранный цвет и при перезагрузке восстанавливает их в течение 30 дней .

-------------------Script II.1.6: preferences.pl --------------------

 #!/usr/local/bin/perl
  
  # file: preferences.pl
  
  use CGI qw(:standard :html3);
  
  # Some constants to use in our form.
  @colors=qw/aqua black blue fuschia gray green lime maroon navy olive
      purple red silver teal white yellow/;
  @sizes=("<default>",1..7);
  
  # recover the "preferences" cookie.
  %preferences = cookie('preferences');
  
  # If the user wants to change the background color or her
  # name, they will appear among our CGI parameters.
  foreach ('text','background','name','size') {
      $preferences{$_} = param($_) || $preferences{$_};
  }
  
  # Set some defaults
  $preferences{'background'} = $preferences{'background'} || 'silver';
  $preferences{'text'} = $preferences{'text'} || 'black';
  
  # Refresh the cookie so that it doesn't expire.
  $the_cookie = cookie(-name=>'preferences',
                       -value=>\%preferences,
                       -path=>'/',
                       -expires=>'+30d');
  print header(-cookie=>$the_cookie);
  
  # Adjust the title to incorporate the user's name, if provided.
  $title = $preferences{'name'} ? 
      "Welcome back, $preferences{name}!" : "Customizable Page";
 
  # Create the HTML page.  We use several of the HTML 3.2
  # extended tags to control the background color and the
  # font size.  It's safe to use these features because
  # cookies don't work anywhere else anyway.
  print start_html(-title=>$title,
                   -bgcolor=>$preferences{'background'},
                   -text=>$preferences{'text'}
                  );
  
  print basefont({-size=>$preferences{size}}) if $preferences{'size'} > 0;
  
  print h1($title);
  
  # Create the form
  print hr,
      start_form,
      
      "Your first name: ",
      textfield(-name=>'name',
                -default=>$preferences{'name'},
                -size=>30),br,
      
      table(
           TR(
              td("Preferred"),
              td("Page color:"),
              td(popup_menu(-name=>'background',
                            -values=>\@colors,
                            -default=>$preferences{'background'})
                 ),
              ),
           TR(
              td(''),
              td("Text color:"),
              td(popup_menu(-name=>'text',
                            -values=>\@colors,
                            -default=>$preferences{'text'})
                 )
              ),
           TR(
              td(''),
              td("Font size:"),
              td(popup_menu(-name=>'size',
                            -values=>\@sizes,
                            -default=>$preferences{'size'})
                 )
              )
           ),
  
      submit(-label=>'Set preferences'),
      end_form,
      hr,
      end_html;

--------------------------------------------------------------------

Не-HTML типы

К ним относятся картинки, Postscript , звук и другие.

Скрипт II.1.7 создает набор кликаемых цветных кругов внутри квадрата . При этом создается форма , которая позволяет пользователю выбирать цвет и размер .

-------------------Script II.1.7: circle.pl --------------------

 #!/usr/local/bin/perl
  
  # script: circle.pl
  use GD;
  use CGI qw/:standard Map Area/;
  
  use constant RECTSIZE     => 100;
  use constant CIRCLE_RADIUS  => 40;
  %COLORS = (
            'white' => [255,255,255],
            'red'   => [255,0,0],
            'green' => [0,255,0],
            'blue'  => [0,0,255],
            'black' => [0,0,0],
            'bisque'=> [255,228,196],
            'papaya whip' => [255,239,213],
            'sienna' => [160,82,45]
            );
  
  my $draw          = param('draw');
  my $circle_color  = param('color') || 'bisque';
  my $mag           = param('magnification') || 1;
  
  if ($draw) {
      draw_image();
  } else {
      make_page()
  }
          
  sub draw_image {
      # create a new image
      my $im = new GD::Image(RECTSIZE*$mag,RECTSIZE*$mag);
  
      # allocate some colors
      my $white = $im->colorAllocate(@{$COLORS{'white'}});
      my $black = $im->colorAllocate(@{$COLORS{'black'}});       
      my $circlecolor = $im->colorAllocate(@{$COLORS{$circle_color}});
  
      # make the background transparent and interlaced
      $im->transparent($white);
      $im->interlaced('true');
  
      # Put a black frame around the picture
      $im->rectangle(0,0,RECTSIZE*$mag-1,RECTSIZE*$mag-1,$black);
  
      # Draw the circle
      $im->arc(RECTSIZE*$mag/2,
      RECTSIZE*$mag/2,CIRCLE_RADIUS*$mag*2,CIRCLE_RADIUS*$mag*2,0,360,$black);
  
      # And fill it with circlecolor
      $im->fill(RECTSIZE*$mag/2,RECTSIZE*$mag/2,$circlecolor);
  
      # Convert the image to GIF and print it
      print header('image/gif'),$im->gif;
  }
  
  sub make_page {
      param(-name=>'draw',-value=>1);
      print header(),
         start_html(-title=>'Feeling Circular',-bgcolor=>'white'),
         h1('A Circle is as a Circle Does'),
         img({-src=>self_url(),-alt=>'a circle',
         -align=>'LEFT',-usemap=>'#map',
         -border=>0});
 print em(param('message')) if param('message');
 Delete('draw'); 
 print  start_form,
 "Magnification: ",radio_group(-name=>'magnification',-values=>[1..4]),br,
 "Color: ",popup_menu(-name=>'color',-values=>[sort keys %COLORS]),
 submit(-value=>'Change'),
         end_form;
 print Map({-name=>'map'},
         Area({-shape=>'CIRCLE',
         -href=>param(-name=>'message',-value=>"You clicked in the circle")
                 ? self_url() : '',
         -coords=>join(',',RECTSIZE*$mag/2,RECTSIZE*$mag/2,CIRCLE_RADIUS*$mag),
         -alt=>'Circle'}),
 Area({-shape=>'RECT',
         -href=>param(-name=>'message',-value=>"You clicked in the square")
                 ? self_url() : '',
                 -coords=>join(',',0,0,RECTSIZE*$mag,RECTSIZE*$mag),
                 -alt=>'Square'}));
      print end_html;
  }
  
 ----------------------------------------------------------------------------

Скрипт II.1.8 создает гиф-анимацию.

-------------------Script II.1.8: animate.pl --------------------

 #!/usr/local/bin/perl
  
  # script: animated.pl
  use GD;
  use File::Path;
  
  use constant START      => 80;
  use constant END        => 200;
  use constant STEP       => 10;
  use constant COMBINE    => '/usr/local/bin/convert';
  @COMBINE_OPTIONS = (-delay => 5,
                     -loop  => 10000);
  
  @COLORS = ([240,240,240],
            [220,220,220],
            [200,200,200],
            [180,180,180],
            [160,160,160],
            [140,140,140],
            [150,120,120],
            [160,100,100],
            [170,80,80],
            [180,60,60],
            [190,40,40],
            [200,20,20],
            [210,0,0]);
  @COLORS = (@COLORS,reverse(@COLORS));
  
  my @FILES = ();
  my $dir = create_temporary_directory();
  my $index = 0;
  for (my $r = START; $r <= END; $r+=STEP) {
      draw($r,$index,$dir);
      $index++;
  }
  for (my $r = END; $r > START; $r-=STEP) {
      draw($r,$index,$dir);
      $index++;
  }
  
  # emit the GIF89a
  $| = 1;
  print "Content-type: image/gif\n\n";
  system COMBINE,@COMBINE_OPTIONS,@FILES,"gif:-";
  
  rmtree([$dir],0,1);
  
  sub draw{
      my ($r,$color_index,$dir) = @_;
      my $im = new GD::Image(END,END);
      my $white = $im->colorAllocate(255,255,255);
      my $black = $im->colorAllocate(0,0,0);
      my $color = $im->colorAllocate(@{$COLORS[$color_index % @COLORS]});
      $im->rectangle(0,0,END,END,$white);
      $im->arc(END/2,END/2,$r,$r,0,360,$black);
      $im->fill(END/2,END/2,$color);
      my $file = sprintf("%s/picture.%02d.gif",$dir,$color_index);
      open (OUT,">$file") || die "couldn't create $file: $!";
      print OUT $im->gif;
      close OUT;
      push(@FILES,$file);
  
  }
  
  sub create_temporary_directory {
      my $basename = "/usr/tmp/animate$$";
      my $counter=0;
      while ($counter < 100) {
         my $try = sprintf("$basename.%04d",$counter);
         next if -e $try;
         return $try if mkdir $try,0700;
      } continue { $counter++; }
      die "Couldn't make a temporary directory";
  }
  
 ----------------------------------------------------------------------------

Работа с файлами

Скрипт II.2.1 читает файл и находит в нем 4-буквенные слова .

----------------------Script II.2.1: naughty.pl ------------------------

 #!/usr/local/bin/perl
  # Script: naughty.pl

 use CGI ':standard';
  $file = path_translated() || 
         die "must be called with additional path info";
  open (FILE,$file) || die "Can't open $file: $!\n";
  print header('text/plain');
  while (<FILE>) {
     s/\b(\w)\w{2}(\w)\b/$1**$2/g;
     print;
  }
  close FILE;

----------------------------------------------------------------------------

Еще лучше можно сделать с помощью mod_perl.

--------------------Script II.2.2: naughty2.pl -----------------------

 #!/usr/local/bin/perl
  
  # Script: naughty2.pl
  package HTML::Parser::FixNaughty;
  
  require HTML::Parser;
  @ISA = 'HTML::Parser';
  
  sub start {
      my ($self,$tag,$attr,$attrseq,$origtext) = @_;
      print $origtext;
  }
  sub end {
      my ($self,$tag) = @_;
      print "</$tag>";
  }
  sub text {
      my ($self,$text) = @_;
      $text =~ s/\b(\w)\w{2}(\w)\b/$1**$2/g;    
      print $text;
  }
  
 
  package main;
  use CGI qw/header path_info redirect path_translated/;
  
  $file = path_translated() || 
      die "must be called with additional path info";
  $file .= "index.html" if $file=~m!/$!;
  
  unless ($file=~/\.html?$/) {
      print redirect(path_info());
      exit 0;
  }
  
  $parser = new HTML::Parser::FixNaughty;
  print header();
  $parser->parse_file($file);

----------------------------------------------------------------------------

Если на апаче установлен mod_perl , можно в конфиге прописать хэндлер :

   <Location />
      ... blah blah blah other stuff
      Action text/html /cgi-bin/naughty2.pl
    </Location>

После этого любой HTML-документ будет обрабатываться этим скриптом .

Редирект

Скрипт II.3.1 выбирает произвольную картинку из каталога и выводит ее .

        /cgi-bin/random_pict/banners/egregious_advertising

-------------------Script II.3.1 random_pict.pl -----------------------------

 #!/usr/local/bin/perl
  # script: random_pict.pl

 use CGI qw/:standard/;
  $PICTURE_PATH = path_translated();
  $PICTURE_URL = path_info();
  chdir $PICTURE_PATH
         or die "Couldn't chdir to pictures directory: $!";
  @pictures = <*.{jpg,gif}>;
  $lucky_one = $pictures[rand(@pictures)];
  die "Failed to pick a picture" unless $lucky_one;

 print redirect("$PICTURE_URL/$lucky_one");

---------------------------------------------------------------------

File Uploads

Скрипт II.4.1 читает загружаемый файл и печатает его длину MIME тип.

----------------Script II.4.1 upload.pl -----------------------------

 #!/usr/local/bin/perl
  #script: upload.pl
  
  use CGI qw/:standard/;
  
  print header,
      start_html('file upload'),
      h1('file upload');
  print_form()    unless param;
  print_results() if param;
  print end_html;
  
  sub print_form {
      print start_multipart_form(),
         filefield(-name=>'upload',-size=>60),br,
         submit(-label=>'Upload File'),
         end_form;
  }
 
  sub print_results {
      my $length;
      my $file = param('upload');
      if (!$file) {
         print "No file uploaded.";
         return;
      }
      print h2('File name'),$file;
      print h2('File MIME type'),
      uploadInfo($file)->{'Content-Type'};
      while (<$file>) {
         $length += length($_);
      }
      print h2('File length'),$length;
  }

---------------------------------------------------------------------

PART III: MOD_PERL

С установленным mod_perl нет необходимости ждать , пока перловый интерпретатор прочитает и откомпилирует скрипт.Однажды откомпилированный , скрипт хранится в памяти . mod_perl дает доступ к Apache API и позволяет управлять его поведением.

Создание динамических страниц

Скрипт III.1.1 hello world .

В конфиг-файл нужно добавить следующую секцию :

  <Location /hello/world>
     SetHandler  perl-script
     PerlHandler Apache::Hello
   </Location>

------------------Script III.1.1 Apache::Hello --------------------

 package Apache::Hello;
  # file: Apache::Hello.pm
  use strict vars;
  use Apache::Constants ':common';
    sub handler {
      my $r = shift;
      $r->content_type('text/html');
      $r->send_http_header;
      my $host = $r->get_remote_host;
      $r->print(<<END);
  <HTML>
  <HEADER>
  <TITLE>Hello There</TITLE>
  </HEADER>
  <BODY>
  <H1>Hello $host</H1>
  Hello to all the nice people at the Perl conference.  Lincoln is
  trying really hard.  Be kind.
  </BODY>
  </HTML>
  END
      return OK;
  }
  1;

----------------------------------------------------------------

Фильтры

С помощью install можно инсталлировать content handler .

Добавление футера

Скрипт III.2.1 добавляет футер в HTML файл.

Для установки хэндлера для всех файлов в каталоге в конфиге нужно прописать :

  <Location /footer>
     SetHandler perl-script
     PerlHandler Apache::Footer
   </Location>

Можно декларировать новое расширение ``.footer'' и ссылаться на него в footer module:

  AddType text/html .footer
   <Files ~ "\.footer$">
      SetHandler  perl-script
      PerlHandler Apache::Footer
   </Files>

------------------Script III.2.1 Apache::Footer --------------------

 package Apache::Footer;
  # file Apache::Footer.pm
  use strict vars;
  use Apache::Constants ':common';
  use IO::File;
  sub handler {
      my $r = shift;
      return DECLINED unless $r->content_type() eq 'text/html';
      my $file = $r->filename;
      return DECLINED unless $fh=IO::File->new($file);
      my $modtime = localtime((stat($file))[9]);
      my $footer=<<END;
  <hr>
  &copy; 1998 <a href="http://www.ora.com/">O\'Reilly &amp; Associates</a><br>
  <em>Last Modified: $modtime</em>
  END
  ;
      $r->send_http_header;
 
      while (<$fh>) {
          s!(</BODY>)!$footer$1!oi;
      } continue {
          $r->print($_);
      }
    return OK;
  }
  1;

------------------------------------------------------------------

SSI

В следующем примере мы реализуем собственную SSI-систему , которая будет выглядеть примерно так ,

   <!--#DIRECTIVE PARAM1 PARAM2 PARAM3 PARAM4...-->

Директива будет представлена функцией на перл в отдельном файле :

 use POSIX 'strftime';
  # insert the string "Hello World!"
  sub HELLO {
      my $r = shift;
      "Hello World!";
  }
  # insert today's date possibly modified by a strftime() format
  # string
  sub DATE {
      my ($r,$format) = @_;
      return scalar(localtime) unless $format;
      return strftime($format,localtime);
  }
  # insert the modification time of the document, possibly modified
  # by a strftime() format string.
  sub MODTIME {
      my ($r,$format) = @_;
      my $file = $r->filename;
      return localtime((stat($file))[9]) unless $format;
      return strftime($format,localtime((stat($file))[9]));
  }
  # insert a canned footer
  sub FOOTER {
      my $r = shift;
      my $modtime = MODTIME($r);
      return <<END;
      <hr>
  &copy; 1998 <a href="http://www.ora.com/">O\'Reilly &amp; Associates</a><br>
  <em>Last Modified: $modtime</em>
  END
  ;
  }
  # insert the named field from the incoming HTTP request
  sub HEADER_FIELD {
      my ($r,$h) = @_;
      $r->header_in($h);
  }
  1;

Теперь если в HTML мы вставим код  , будет распечатано Hello World! . Другой пример :  - распечатает версию броузера . Обычно в конфиге используется префикс ``.ehtml'' для обозначения SSI :

 <Files ~ "\.ehtml$">
    SetHandler  perl-script
    PerlHandler Apache::ESSI
    PerlSetVar  ESSIDefs conf/essi.defs
  </Files>
  AddType text/html .ehtml

------------------Script III.2.1 Apache::ESSI --------------------

 package Apache::ESSI;
  use strict vars;
  use Apache::Constants ':common';
  use IO::File;
  my (%MODIFIED,%SUBSTITUTION);
  sub handler {
     my $r = shift;
     $r->content_type() eq 'text/html' || return DECLINED;
     my $fh=IO::File->new($r->filename)|| return DECLINED;
     my $sub = read_definitions($r)    || return SERVER_ERROR;
     $r->send_http_header;
     $r->print($sub->(<$fh>));
     return OK;
  }
  sub read_definitions {
     my $r = shift;
     return undef unless my $def = $r->dir_config('ESSIDefs');
     return undef unless -e ($def = $r->server_root_relative($def));
     return $SUBSTITUTION{$def}
         if $MODIFIED{$def} && $MODIFIED{$def} <= -M _;
     my $package = "Apache::ESSI::$def";
     $package=~tr/a-zA-Z0-9_/_/c;
     $SUBSTITUTION{$def} = eval <<END;
  package $package;
  use Text::ParseWords 'quotewords';
  do '$def';
  sub {
     # Make sure that eval() errors aren't trapped.
     local \$SIG{__WARN__}= \\&CORE::warn;
     local \$SIG{__DIE__} = \\&CORE::die;
     my \@lines = \@_;
     my \$data = join('',\@lines);
     \$data =~ s/<!--\\s*\\#(\\w+) # start of a function name
                 \\s*(.*?)         # optional parameters
                 \\s*-->           # end of comment
                 /eval {&{\$1}(\$r,quotewords('[ ,]',0,\$2))}
                      || "<em>[\$@]<\\/em>"/xseg;
     \$data;
  };
  END
     unless ($SUBSTITUTION{$def}) {
         $r->log_error("Eval of $def did not return true: $@");
         return undef;
     }
     $MODIFIED{$def} = -M $def;  # store modification date
     return $SUBSTITUTION{$def};
  }
  1;
 
 -----------------------------------------------------------------------
 

 Компрессия на лету
 
 
 
 Скрипт III.2.4 - контент-фильтр , который архивирует
 с расширением gzips все файлы в каталоге и добавляет ``gzip'' 
 Content-Encoding хидер.
 В коде используется модуль Compress::Zlib . Этот код можно использовать 
 для криптования .
 В конфигурации настраиваем каталог /compressed :
 
 
 <Location /compressed>
     SetHandler  perl-script
     PerlHandler Apache::GZip
  </Location>
 
 
 ---------------- Script
 III.2.3: Apache::GZip -------------------------
 
  package Apache::GZip;
  #File: Apache::GZip.pm
  use strict vars;
  use Apache::Constants ':common';
  use Compress::Zlib;
  use IO::File;
  use constant GZIP_MAGIC => 0x1f8b;
  use constant OS_MAGIC => 0x03;
  sub handler {
      my $r = shift;
      my ($fh,$gz);
      my $file = $r->filename;
      return DECLINED unless $fh=IO::File->new($file);
      $r->header_out('Content-Encoding'=>'gzip');
      $r->send_http_header;
      return OK if $r->header_only;
      tie *STDOUT,'Apache::GZip',$r;
      print($_) while <$fh>;
      untie *STDOUT;
      return OK;
  }
  sub TIEHANDLE {
      my($class,$r) = @_;
      # initialize a deflation stream
      my $d = deflateInit(-WindowBits=>-MAX_WBITS()) || return undef;
      # gzip header -- don't ask how I found out
      $r->print(pack("nccVcc",GZIP_MAGIC,Z_DEFLATED,0,time(),0,OS_MAGIC));
      return bless { r   => $r,
                     crc =>  crc32(undef),
                     d   => $d,
                     l   =>  0
                     },$class;
  }
  sub PRINT {
      my $self = shift;
      foreach (@_) {
         # deflate the data
         my $data = $self->{d}->deflate($_);
         $self->{r}->print($data);
         # keep track of its length and crc
         $self->{l} += length($_);
         $self->{crc} = crc32($_,$self->{crc});
      }
  }
  sub DESTROY {
     my $self = shift;
    # flush the output buffers
     my $data = $self->{d}->flush;
     $self->{r}->print($data);
 
     # print the CRC and the total length (uncompressed)
     $self->{r}->print(pack("LL",@{$self}{qw/crc l/}));
  }
 
  1;

-----------------------------------------------------------------------

Access Control

Под контролем доступа подразумевается аутентификация / авторизация . О пользователе можно получить ограниченную информацию - ip,хост,тип броузера. Скрипт III.3.1 блокирует доступ для определенного типа броузеров .. Apache::BlockAgent читает информацию из файла ``bad agents''. Пример agents file:

   ^teleport pro\/1\.28
    ^nicerspro
    ^mozilla\/3\.0 \(http engine\)
    ^netattache
    ^crescent internet toolpak http ole control v\.1\.0
    ^go-ahead-got-it
    ^wget
    ^devsoft's http component v1\.0
    ^www\.pl
    ^digout4uagent

Конфигурация :

 <Location />
    PerlAccessHandler Apache::BlockAgent
    PerlSetVar BlockAgentFile /home/www/conf/bad_agents.txt
  </Location>

------------------Script III.3.1: Apache::BlockAgent-------------------

 package Apache::BlockAgent;
  # block browsers that we don't like
  use strict 'vars';
  use Apache::Constants ':common';
  use IO::File;
  my %MATCH_CACHE;
  my $DEBUG = 0;
  sub handler {
      my $r = shift;
 
      return DECLINED unless my $patfile = $r->dir_config('BlockAgentFile');
      return FORBIDDEN unless my $agent = $r->header_in('User-Agent');
      return SERVER_ERROR unless my $sub = get_match_sub($r,$patfile);
      return OK if $sub->($agent);
      $r->log_reason("Access forbidden to agent $agent",$r->filename);
      return FORBIDDEN;
  }
  # This routine creates a pattern matching subroutine from a
  # list of pattern matches stored in a file.
  sub get_match_sub {
      my ($r,$filename) = @_;
      my $mtime = -M $filename;
      # try to return the sub from cache
      return $MATCH_CACHE{$filename}->{'sub'} if
          $MATCH_CACHE{$filename} &&
              $MATCH_CACHE{$filename}->{'mod'} <= $mtime;
      # if we get here, then we need to create the sub
      return undef unless my $fh = new IO::File($filename);
      chomp(my @pats = <$fh>); # get the patterns into an array
      my $code = "sub { \$_ = shift;\n";
      foreach (@pats) {
          next if /^#/
          $code .= "return undef if /$_/i;\n";
      }
      $code .= "1; }\n";
      warn $code if $DEBUG;
      # create the sub, cache and return it
      my $sub = eval $code;
      unless ($sub) {
          $r->log_error($r->uri,": ",$@);
          return undef;
      }
      @{$MATCH_CACHE{$filename}}{'sub','mod'}=($sub,$modtime);
      return $MATCH_CACHE{$filename}->{'sub'};
  }
 
  1;

-----------------------------------------------------------------------

Authentication / Authorization

Модули DBI и DBM включают mod_perl authentication/authorization API . Аутентификация проверяет юзерские логин с паролем . Авторизация решает , имеет ли пользователь права на просмотр того или иного документа .

NIS Authentication

Скрипт III.4.1 показывает , как модуль Apache::AuthSystem проверяет пользовательские имя и пароль , сравнивает с системным паролем и принимает решение . Функция getpwnam() работает с базой данных NIS . Конфигурация :

 <Location /protected>
    AuthName Test
    AuthType Basic
    PerlAuthenHandler Apache::AuthSystem;
    require valid-user
  </Location>

--------------------Script III.4.1: Apache::AuthSystem-----------------

 package Apache::AuthSystem;
  # authenticate users on system password database
  use strict;
  use Apache::Constants ':common';
  sub handler {
     my $r = shift;
     my($res, $sent_pwd) = $r->get_basic_auth_pw;
     return $res if $res != OK;
     my $user = $r->connection->user;
     my $reason = "";
     my($name,$passwd) = getpwnam($user);
     if (!$name) {
         $reason = "user does not have an account on this system";
     } else {
         $reason = "user did not provide correct password"
             unless $passwd eq crypt($sent_pwd,$passwd);
     }
     if($reason) {
         $r->note_basic_auth_failure;
         $r->log_reason($reason,$r->filename);
         return AUTH_REQUIRED;
     }
     return OK;
  }
  1;

-----------------------------------------------------------------------

Anonymous Authentication

Here's a system that authenticates users the way anonymous FTP does. They have to enter a name like ``Anonymous'' (configurable) and a password that looks like a valid e-mail address. The system rejects the username and password unless they are formatted correctly.

In a real application, you'd probably want to log the password somewhere for posterity. Script III.4.2 shows the code for Apache::AuthAnon. To activate it, create a access.conf section like this one:

 <Location /protected>
  AuthName Anonymous
  AuthType Basic
  PerlAuthenHandler Apache::AuthAnon
  require valid-user

 PerlSetVar Anonymous anonymous|anybody
  </Location>

---------------Script III.4.2: Anonymous Authentication-----------------

 package Apache::AuthAnon;

 use strict;
  use Apache::Constants ':common';

 my $email_pat = '\w+\@\w+\.\w+';
  my $anon_id  = "anonymous";

 sub handler {
      my $r = shift;

     my($res, $sent_pwd) = $r->get_basic_auth_pw;
      return $res if $res != OK;

     my $user = lc $r->connection->user;
      my $reason = "";

     my $check_id = $r->dir_config("Anonymous") || $anon_id;

     unless($user =~ /^$check_id$/i) {
          $reason = "user did not enter a valid anonymous username";
      }

     unless($sent_pwd =~ /$email_pat/o) {
          $reason = "user did not enter an email address password";
      }

     if($reason) {
          $r->note_basic_auth_failure;
          $r->log_reason($reason,$r->filename);
          return AUTH_REQUIRED;
      }

     $r->notes(AuthAnonPassword => $sent_pwd);

     return OK;
  }

1;

-----------------------------------------------------------------------

Gender-Based Authorization

After authenticating, you can authorize. The most familiar type of authorization checks a group database to see if the user belongs to one or more privileged groups. But authorization can be anything you dream up.

Script III.4.3 shows how you can authorize users by their gender (or at least their apparent gender, by checking their names with Jon Orwant's Text::GenderFromName module. This must be used in conjunction with an authentication module, such as one of the standard Apache modules or a custom one.

This configuration restricts access to users with feminine names, except for the users ``Webmaster'' and ``Jeff'', who are allowed access.

 <Location /ladies_only>
    AuthName "Ladies Only"
    AuthType Basic
    AuthUserFile /home/www/conf/users.passwd
    PerlAuthzHandler  Apache::AuthzGender
    require gender F            # allow females
    require user Webmaster Jeff # allow Webmaster or Jeff
  </Location>

The script uses a custom error response to explain why the user was denied admittance. This is better than the standard ``Authorization Failed'' message.

------------------Script III.4.3: Apache::AuthzGender---------------

 package Apache::AuthzGender;

 use strict;
  use Text::GenderFromName;
  use Apache::Constants ":common";

 my %G=('M'=>"male",'F'=>"female");

 sub handler {
      my $r = shift;
     
      return DECLINED unless my $requires = $r->requires;
      my $user = lc($r->connection->user);
      substr($user,0,1)=~tr/a-z/A-Z/;
      my $guessed_gender = uc(gender($user)) || 'M';

     my $explanation = <<END;
  <HTML><HEAD><TITLE>Unauthorized</TITLE></HEAD><BODY>
  <H1>You Are Not Authorized to Access This Page</H1>
  Access to this page is limited to:
  <OL>
  END

     foreach (@$requires) {
          my ($requirement,@rest ) = split(/\s+/,$_->{requirement});
          if (lc $requirement eq 'user') {
              foreach (@rest) { return OK if $user eq $_; }
              $explanation .= "<LI>Users @rest.\n";
          } elsif (lc $requirement eq 'gender') {
              foreach (@rest) { return OK if $guessed_gender eq uc $_; }
              $explanation .= "<LI>People of the @G{@rest} persuasion.\n";
          } elsif (lc $requirement eq 'valid-user') {
              return OK;
          }
      }

     $explanation .= "</OL></BODY></HTML>";
     
      $r->custom_response(AUTH_REQUIRED,$explanation);
      $r->note_basic_auth_failure;
      $r->log_reason("user $user: not authorized",$r->filename);
      return AUTH_REQUIRED;
  }

1;

--------------------------------------------------------------------

Proxy Services

mod_perl gives you access to Apache's ability to act as a Web proxy. You can intervene at any step in the proxy transaction to modify the outgoing request (for example, stripping off headers in order to create an anonymizing proxy) or to modify the returned page.

A Banner Ad Blocker

Script III.5.1 shows the code for a banner-ad blocker written by Doug MacEachern. It intercepts all proxy requests, substituting its own content handler for the default. The content handler uses the LWP library to fetch the requested document. If the retrieved document is an image, and its URL matches the pattern (ads?|advertisement|banner), then the content of the image is replaced with a dynamically-generated GIF that reads ``Blocked Ad''. The generated image is exactly the same size as the original, preserving the page layout. Notice how the outgoing headers from the Apache request object are copied to the LWP request, and how the incoming LWP response headers are copied back to Apache. This makes the transaction nearly transparent to Apache and to the remote server.

In addition to LWP you'll need GD.pm and Image::Size to run this module. To activate it, add the following line to the configuration file:

 PerlTransHandler Apache::AdBlocker

Then configure your browser to use the server to proxy all its HTTP requests. Works like a charm! With a little more work, and some help from the ImageMagick module, you could adapt this module to quiet-down animated GIFs by stripping them of all but the very first frame.

---------------Script III.5.1: Apache::AdBlocker---------------------

 package Apache::AdBlocker;
  
  use strict;
  use vars qw(@ISA $VERSION);
  use Apache::Constants qw(:common);
  use GD ();
  use Image::Size qw(imgsize);
  use LWP::UserAgent ();
  
  @ISA = qw(LWP::UserAgent);
  $VERSION = '1.00';
  
  my $UA = __PACKAGE__->new;
  $UA->agent(join "/", __PACKAGE__, $VERSION);
  
  my $Ad = join "|", qw{ads? advertisement banner};
  
  sub handler {
      my($r) = @_;
      return DECLINED unless $r->proxyreq;
      $r->handler("perl-script"); #ok, let's do it
      $r->push_handlers(PerlHandler => \&proxy_handler);
      return OK;
  }
  
  sub proxy_handler {
      my($r) = @_;
  
      my $request =

--------------------------------------------------------------------

Another way of doing this module would be to scan all proxied HTML files for <IMG> tags containing one of the verboten URLs, then replacing the SRC attribute with a transparent GIF of our own. However, unless the <IMG> tag contained WIDTH and HEIGHT attributes, we wouldn't be able to return a GIF of the correct size -- unless we were to go hunting for the GIF with LWP, in which case we might as well do it this way.

Customized Logging

After Apache handles a transaction, it passes all the information about the transaction to the log handler. The default log handler writes out lines to the log file. With mod_perl, you can install your own log handler to do customized logging.

Send E-Mail When a Particular Page Gets Hit

Script III.6.1 installs a log handler which watches over a page or set of pages. When someone fetches a watched page, the log handler sends off an e-mail to notify someone (probably the owner of the page) that the page has been read.

To activate the module, just attach a PerlLogHandler to the <Location> or <File> you wish to watch. For example:

   <Location /~lstein>
       PerlLogHandler Apache::LogMail
       PerlSetVar mailto lstein@cshl.org
    </Location>

The ``mailto'' directive specifies the name of the recipient(s) to notify.

-------------------Script III.6.1: Apache::LogMail------------------

 package Apache::LogMail;
  use Apache::Constants ':common';

 sub handler {
      my $r = shift;
      my $mailto = $r->dir_config('mailto');
      return DECLINED unless $mailto
      my $request = $r->the_request;
      my $uri = $r->uri;
      my $agent = $r->header_in("User-agent");
      my $bytes = $r->bytes_sent;
      my $remote = $r->get_remote_host;
      my $status = $r->status_line;
      my $date = localtime;
      unless (open (MAIL,"|/usr/lib/sendmail -oi -t")) {
         $r->log_error("Couldn't open mail: $!");
         return DECLINED;
      }
      print MAIL <<END;
  To: $mailto
  From: Mod Perl <webmaster>
  Subject: Somebody looked at $uri

 At $date, a user at $remote looked at
  $uri using the $agent browser.

 The request was $request, 
  which resulted returned a code of $status.

 $bytes bytes were transferred.
  END
      close MAIL;
      return OK;
  }
  1;

--------------------------------------------------------------------

Writing Log Information Into a Relational Database

Coming full circle, Script III.6.2 shows a module that writes log information into a DBI database. The idea is similar to Script I.1.9, but there's now no need to open a pipe to an external process. It's also a little more efficient, because the log data fields can be recovered directly from the Apache request object, rather than parsed out of a line of text. Another improvement is that we can set up the Apache configuration files so that only accesses to certain directories are logged in this way.

To activate, add something like this to your configuration file: PerlLogHandler Apache::LogDBI

Or, to restrict special logging to accesses of files in below the URL ``/lincoln_logs'' add this:

 <Location /lincoln_logs>
    PerlLogHandler Apache::LogDBI  
  </Location>

-----------------Script III.6.2: Apache::LogDBI---------------------

 package Apache::LogDBI;
  use Apache::Constants ':common';
  
  use strict 'vars';
  use vars qw($DB $STH);
  use DBI;
  use POSIX 'strftime';
  
  use constant DSN       => 'dbi:mysql:www';
  use constant DB_TABLE  => 'access_log';
  use constant DB_USER   => 'nobody';
  use constant DB_PASSWD => '';
  
  $DB = DBI->connect(DSN,DB_USER,DB_PASSWD) || die DBI->errstr;
  $STH = $DB->prepare("INSERT INTO ${\DB_TABLE} VALUES(?,?,?,?,?,?,?,?,?)") 
       || die $DB->errstr;
  
  sub handler {
      my $r = shift;
      my $date    = strftime('%Y-%m-%d %H:%M:%S',localtime);
      my $host    = $r->get_remote_host;
      my $method  = $r->method;
      my $url     = $r->uri;
      my $user    = $r->connection->user;
      my $referer = $r->header_in('Referer');
      my $browser = $r->header_in("User-agent");
      my $status  = $r->status;
      my $bytes   = $r->bytes_sent;
      $STH->execute($date,$host,$method,$url,$user,
                    $browser,$referer,$status,$bytes);
      return OK;
  }
  
  1;

--------------------------------------------------------------------

Conclusion

That's as many tricks as I thought could squeeze into a three-hour session. Even so, we probably didn't have time to cover them all. You'll find more tricks in my books, articles and Web site. Here's where you can find them:

"How to Set Up and Maintain a Web Site"

General introduction to Web site care and feeding, with an emphasis on Apache. Addison-Wesley 1997.

Companion Web site at http://www.genome.wi.mit.edu/WWW/

"Web Security, a Step-by-Step Reference Guide"

How to keep your Web site free from thieves, vandals, hooligans and other yahoos. Addison-Wesley 1998.

Companion Web site at http://www.w3.org/Security/Faq/

"The Official Guide to Programming with CGI.pm"

Everything I know about CGI.pm (and some things I don't!). John Wiley & Sons, 1998.

Companion Web site at http://www.wiley.com/compbooks/stein/

"The Apache Module Book: Fast Dynamic Pages in Perl and C"

Co-authored with Doug MacEachern. Will be out sometime in the fall of 1998.

Companion Web site at http://www.modperl.com/

WebTechniques Columns

I write a monthly column for WebTechniques magazine. You can find back-issues and reprints at http://www.web-techniques.com/

The Perl Journal Columns

I write a quarterly column for TPJ. Source code listings are available at http://www.tpj.com/

Lincoln D. Stein
Cold Spring Harbor Laboratory
July 6, 1998

Оставьте свой комментарий !

Ваше имя:

Комментарий:

Оба поля являются обязательными

Автор	Комментарий к данной статье