Addendum - version 1.6!


3/3/03
 

Cached and archived pages can cause expiry reminder emails!

I was surprised to receive a reminder email message for a page that I had just updated, and realised that this must be due to an old version of the page stored in a cache somewhere being "hit". A few days later I received another reminder email message that identified a page with an unfamiliar URL as having expired. When I looked up this URL I found that it was for a version of my page that had been archived by Google!

The second of these problems seems easy enough to counter. All that's needed is for the expirychecker script to check the URL to make sure that it lies within the SBU domain. This will not only exclude copies archived by search engines, but will also exclude the possibility that the expirychecker could be activated by some other (possibly rogue) page on the Internet. As for the first problem, I think all I can do is include a warning about this within the email reminder message itself.

Version 1.6

<?php

// Web Page Expiry Checker

// A script to automatically email web page content owners (at most
// once per day) to remind them when their web pages have reached/
// exceeded their expiry dates, and to email webmasters when pages
// have expired over a month ago (the latter limited to one email
// per month max).

// Author = Martin Bush, South Bank University  [bushm@sbu.ac.uk]
// Date = 3 March 2003
// Filename = expirychecker.php3 
// Version = 1.6
// Changes with respect to version 1.5 = 
//           This version checks the url of the expired page to
//           ensure that it is within the SBU domain to prevent
//           activation by copies of the expired page that have
//           been archived by search engines, or by some other
//           (possibly rogue) page on the Internet. Also, the
//           reminder email message now warns of the possibility
//           that the message may by due to a cached old version
//           of the page being viewed after the live version has
//           been updated.


// INSTALLATION INSTRUCTIONS:
// To install, put this script into the cgi-bin directory and edit
// the contents of the two email messages ($mail_message1 and 
// $mail_message2) and the email address of the webmaster as 
// appropriate. (See bottom of this script.) If you have created
// a new cgi-bin directory in which to put this script, remember to
// type the "wwwset" command to enable the script to be executed.
//
// Once installed, this script can be called from any web page by
// including the following (with "???" replaced by the appropriate
// userid) anywhere within the body of the web page:
//
// <!-- Insert values below for "owner", "expirydate" -->
// <!-- and "message" to activate the Expiry Checker. -->
// <img width="1" height="1" border="0" alt=""
// src="http://www.sbu.ac.uk/php-cgiwrap/???/expirychecker.php3?
// owner=email@ddress
// &
// expirydate=dd/mm/yy
// &
// message=a line of free text - no quotation marks please!
// ">
//
// Note that this will insert a one pixel image into the web page.
// This seems to be imperceptible with modern browsers, but a small
// dot is displayed when using both Netscape 4 and IE 4 on a Mac
// (at least). Not only that, but depending on where it is placed it
// can also cause a blank line to appear, in which case it really
// does have quite a noticeable effect. For example it'll cause a
// blank line if it's placed inbetween two tables. Ideally, it
// should be inserted immediately before or after some text so that
// it will just cause a small dot to be displayed in those browsers.
//
// If the value for the expiry date is left as the literal character
// string "dd/mm/yy" this shouldn't cause a problem, and no expiry 
// reminder emails will be sent. The script will accept dd/mm/yy
// dates containing single digits - e.g. either 15/6/03 or 15/06/03
// would be processed correctly.
//
// The Expiry Checker creates/maintains the following two "expiry
// reminders sent" files within the cgi-bin directory:
//   expiryrems_sent_today.txt
//   expiryrems_sent_yesterday.txt
// The second file is for information only.


// parse query_string to get $owner, $expirydate and $message
parse_str($QUERY_STRING);

// discover url of referrer
$url = $GLOBALS["HTTP_REFERER"];

// if (url is within the SBU domain) then
if (eregi("www.sbu.ac.uk", $url)) {

// one level of indentation deliberately skipped here!

//get today's date in dd/mm/yy format
$today = date("d/m/y");
// convert to yymmdd format - e.g. 24/08/02 becomes 020824
// *** end-of-century bug - will fail in the year 2100! ***
$today_day = substr($today,-8,2);
$today_month = substr($today,-5,2);
$today_year = substr($today,-2,2);
$today_yymmdd = $today_year.$today_month.$today_day;

// convert expiry date to yymmdd format
// handles dd/mm/yy dates including single digits for dd, mm, yy
// - e.g. 06/06/03, 6/6/03, 06/6/03, 6/06/03, 06/06/3 etc.
$position_of_first_slash = strpos($expirydate, "/");
$expiry_day = substr($expirydate, 0, $position_of_first_slash);
$expiry_mm_yy = substr($expirydate, $position_of_first_slash + 1, 
                                              strlen($expirydate));
$position_of_second_slash = strpos($expiry_mm_yy, "/");
$expiry_month = substr($expiry_mm_yy, 0, $position_of_second_slash);
$expiry_year = substr($expiry_mm_yy, $position_of_second_slash + 1, 
                                              strlen($expirydate));
// if any of dd, mm or yy are single digits then add a leading zero
if ( strlen($expiry_day) == 1) { $expiry_day = "0".$expiry_day; }
if ( strlen($expiry_month) == 1) { $expiry_month = "0".$expiry_month; }
if ( strlen($expiry_year) == 1) { $expiry_year = "0".$expiry_year; }
$expiry_yymmdd = $expiry_year.$expiry_month.$expiry_day;

// if (web page content has expired) then
if ( $expiry_yymmdd <= $today_yymmdd ) {

  // if (expiryrems_sent_today.txt doesn't exist) then
  $reminders_file = "expiryrems_sent_today.txt";
  if (!(file_exists($reminders_file))) {

    // create new reminders file containing date plus first
    // reminder entry, and make a note to send an email.
    //
    // create and open reminders file - "r+" for read/write
    touch ($reminders_file);
    $reminders_fp = fopen($reminders_file, "r+");
    // lock reminders file - "2" for exclusive writing lock
    $lock = flock($reminders_fp, 2);
    // continue when lock is obtained
    if ($lock) {
      // write today's date
      fwrite($reminders_fp, "$today\n");
      // prepare new reminder entry
      $new_reminder = "_url=".$url."&_exp=".$expirydate.
                             "&_own=".$owner."&_mes=".$message;
      // insert new reminder entry
      fwrite($reminders_fp, ("$new_reminder\n"));
      $email_needed = 1;
    // unlock and close reminders file - "3" is for unlock
    }
    $lock = flock($reminders_fp, 3);
    fclose($reminders_fp);

  } else {  // expiryrems_sent_today.txt already exists
  
    // assume that an email won't be necessary
    $email_needed = 0;
  
    // open and lock reminders file
    $reminders_fp = fopen($reminders_file, "r+");
    $lock = flock($reminders_fp, 2);
    // continue when lock is obtained
    if ($lock) {
    
      // read $date_of_file\n and convert to yymmdd format
      $date_of_file = fgets($reminders_fp, 256);
      $filedate_day = substr($date_of_file,-9,2);
      $filedate_month = substr($date_of_file,-6,2);
      $filedate_year = substr($date_of_file,-3,2);
      $filedate_yymmdd = $filedate_year.$filedate_month.
                                                 $filedate_day;
                                                 
      // if (filedate is older than today's date) then
      if ( $filedate_yymmdd < $today_yymmdd ) {
      
        // rename reminders file to yesterday's reminders file
        rename($reminders_file, "expiryrems_sent_yesterday.txt");
        
        // create, open and lock a fresh reminders file
        // ***Note - another process could potentially...
        // ...intervene here, but as this isn't a...
        // ...safety-critical system I'm ignoring it!***
        touch ($reminders_file);
        $reminders_fp = fopen($reminders_file, "r+");
        $lock = flock($reminders_fp, 2);
        if ($lock) {
        
          // write today's date plus new reminder entry
          fwrite($reminders_fp, "$today\n");
          $new_reminder = "_url=".$url."&_exp=".$expirydate
                            ."&_own=".$owner."&_mes=".$message;
          fwrite($reminders_fp, "$new_reminder\n");

          // make a note to send an email
          $email_needed = 1;
          
          // (unlocking & closing reminders file happens later)
					
        }
				
      } else {  // ($date_of_file = today's date)
      
        // start accumulating file contents
        $file_contents_so_far = $today."\n";
        
        // parse first entry to get $_url, $_exp, $_own, $_mes
        $next_entry = fgets($reminders_fp, 256);
        parse_str($next_entry);
        
        // if (first $_url is >= $url) then don't search
        if ($_url >= $url) {
          $need_next_entry = 0;
        } else { // else search is needed
          $need_next_entry = 1;
          // accumulate file contents
          $file_contents_so_far = $file_contents_so_far.
                                                $next_entry;
        } // endif
        
        // while ($need_next_entry) search for url in file
        while ($need_next_entry == 1) {
        
          // get and parse next entry
          $next_entry = fgets($reminders_fp, 256);
          
          // if (next entry was blank) then stop searchng
          if (strlen($next_entry) < 2) {
            $need_next_entry = 0;
						
          } else {
            parse_str($next_entry);  // parse to get $_url
            if ($_url >= $url) {  // found or searched too far
              $need_next_entry = 0;
            } else {  // accumulate and keep searching
              $file_contents_so_far = $file_contents_so_far
                                               .$next_entry;
            } // endif
						
          } // endif
					
        } // endwhile
        
        // if ($url not found in file) then insert into file
        if ($_url != $url) {
				
          // store contents of file after point of insertion
          $file_contents_after = $next_entry.
                fread($reminders_fp,filesize($reminders_file));
          // rewind the file
          rewind($reminders_fp);
          // write accumulate entries up to point of insertion
          fwrite($reminders_fp, "$file_contents_so_far");
          // prepare new reminder entry
          $new_reminder = "_url=".$url."&_exp=".$expirydate.
                             "&_own=".$owner."&_mes=".$message;
          // insert new reminder entry
          fwrite($reminders_fp, "$new_reminder\n");
          // write remaining contents
          fwrite($reminders_fp, "$file_contents_after"); 
          
          // make a note to send an email
          $email_needed = 1;
          
        } // endif ($url not found in file)
        
      } // endif (filedate is older than today's date)
      
    }  // unlock and close reminders file
    $lock = flock($reminders_fp, 3);
    fclose($reminders_fp);
    
  } // endif (reminders file doesn't exist)
  
  // email url and message to owner if necessary
  if ($email_needed == 1) {
  $mail_message1 = "This is to remind you that this web page...
  \n  $url
  \n...expired on $expirydate. Here is the reminder message (if any):
  \n*** $message ***
  \nYou will receive a reminder each day that this page is hit. 
  Please update the page as necessary, and remember to specify a new 
  expiry date.

  Please note that it is possible for email reminder messages to be 
  sent for several days after the page has been updated - this will 
  happen if cached old versions of the page are viewed."; 
  mail($owner, "Expiry Checker: $url", $mail_message1);
  }

  // email webmaster if page has expired over a month ago, but limit
  // these emails to one a month max. If the expiry date is the nth
  // day of a certain month, then an email will be sent on the nth
  // day of every subsequent month (assuming page hits on those days).
  if ($email_needed == 1) {
    if ((($today_month - $expiry_month) > 0) ||
	                         (($today_year - $expiry_year) > 0)) {
      if ($expiry_day == $today_day) {
        $mail_message2 = "According to the Expiry Checker,
          this web page...
          \n$url
          \n...has exceeded its expiry date by at least a month. The 
          expiry date is: $expirydate. The email address of the owner 
          is: $owner.";
          mail("webmaster@wherever", "Expiry Checker - outdated page 
          alert - $url",
          $mail_message2);
      }
    }
  }
  
} // endif (web page had not expired)
  
} // endif (web page url was not within SBU domain)

?>
 
 
 

<<contents ^top^