<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Caching</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<style type="text/css">
<!--
-->
</style>
</head>
<body>
<p style="color:#FFF;font-weight:bold;background:#FFC000;padding:1em 1ex;text-shadow:0 0 3px #000">This article was last updated Tuesday, 8 February 2005. View a newer version at my site, <a style="color:inherit" href="http://hardanswers.net/dynamic-webpage-caching">hardanswers.net/dynamic-webpage-caching</a></p>
<h1>Caching</h1>
<p>Most browsers and many intermediate servers cache web content to speed up
its delivery and display. Having your content cached will usually make your
site appear faster and more responsive, as well as lower your server’s bandwidth
requirements. Based on this, you would think that everyone would have his or
her site set up to allow caching – but there’s a downside. Dynamic content is
not usually suited to caching – consider a login system, you probably don’t
want a browser to cache the site in its logged-in state, as you would like the
page to be refreshed every time someone attempts to access it, so you can ensure
they’re still logged in. For this reason, <acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym> will, by default, disable caching
– especially if you use sessions. Unfortunately, this means that even if your
site is suitable for caching, if you’ve used <acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym>, the chances are it won’t be
– every time a user goes to a page, that page will be reloaded, with all the
extra time and overhead that requires, making your site seem slower than it
need be. Fortunately, it’s possible to use <acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym> to enable caching of your pages
in an intelligent manner. </p>
<p>Caching is controlled by the <acronym title="HyperText Transfer Protocol">HTTP</acronym> headers
sent with every <acronym title="HyperText Transfer Protocol">HTTP</acronym> request.
The basic logic is quite simple – if permitted to cache, a cache will store
the page for a specified time. This is controlled by the “Cache-Control” and
“Expires” headers. </p>
<p>In addition to this, when checking for new versions of the page, most browsers
will send an “If-Modified-Since” and/or “If-None-Match” header, if a “Modified-Since”
and/or “ETag” header were present in the <acronym title="HyperText Transfer Protocol">HTTP</acronym> headers
received from the web server. If any of these headers indicate that the page
hasn’t been modified, then the server will return a “304 Not Modified” response,
and the browser will continue to use and display the current page, thus saving
the server bandwidth and making the site appearing more responsive. </p>
<p>The first step to getting your pages cached is to send the appropriate headers: </p>
<h2>ETag </h2>
<p>An “ETag” header contains a “strong” identifier – that is, an identifier that
is unique not only for a particular page or resource, but for the current state
of that particular page or resource. In other words, if the identifier has changed,
then the associated page or resource has also changed in some way. I do that
by taking an MD5 hash of the filename and its last modified date. This way,
if either the filename or last modified date is different, the ETag will also
be different. </p>
<p>
<textarea cols="82" rows="8" readonly="readonly" title=".htaccess code">
// $file contains the file name of the page being displayed (the actual
// content, not any templates you may be using). We take the last modified
// date of this file.
$mtime = filemtime($file);
// send a unique 'strong' identifier. This is always the same for this
// particular file while the file itself remains the same.
header('ETag: "'.md5($mtime.$file).'"');</textarea>
</p>
<h2>Last-Modified </h2>
<p>The “Last-Modified” header simply contains the time and date the resource
in question was last modified. </p>
<p>
<textarea cols="82" rows="10" readonly="readonly" title=".htaccess code">
// $file contains the file name of the page being displayed (the actual
// content, not any templates you may be using). We take the last modified
// date of this file.
$mtime = filemtime($file);
// Create a HTTP conformant date, example 'Mon, 22 Dec 2003 14:16:16 GMT'
$gmt_mtime = gmdate('D, d M Y H:i:s', $mtime).' GMT';
// output last modified header using the last modified date of the file.
header('Last-Modified: '.$gmt_mtime);
</textarea>
</p>
<h2>Cache-Control </h2>
<p>The “Cache-Control” header instructs modern caches on how they should behave,
although it is worth noting that older caches may not obey this field. “Cache-Control”
can take a variety of values, such as “private” and “no-cache” – but the one
we are interested in is “public”. A “public” field in a Cache-Control header
indicates that the resource may be cached by any cache, which is what we want
to do. “Private” indicates that the response should only be cached by non-shared
caches (such as your local browser), and “no-cache”, rather obviously, indicates
that the page or resource being returned must not be cached anywhere. </p>
<p>
<textarea cols="82" rows="2" readonly="readonly" title=".htaccess code">
// tell all caches that this resource is publically cacheable.
header('Cache-Control: public');
</textarea>
</p>
<h2>Expires </h2>
<p>The Expires header gives the date and time after which a response is considered
stale, that is, after which a cached copy of a page should no longer be considered
valid. In other words, the Expires header indicates how long caches should store
a cached copy of a page. Here we indicate that pages can be cached for one month
from the current date, by specifying their expiry as a date one month in the
future. </p>
<p>
<textarea cols="82" rows="2" readonly="readonly" title=".htaccess code">
// this resource expires one month from now.
header('Expires: '.gmdate('D, d M Y H:i:s', strtotime('+1 month')).' GMT');
</textarea>
</p>
<p>The next step is to check if the page has been modified when a request is
made to the server, and if not, return a “304 Not Modified” status and stop
any further processing. This is simply done with two <acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym> if statements: </p>
<p>
<textarea name="textarea" cols="82" rows="24" readonly="readonly" title=".htaccess code">
// check if the last modified date sent by the client is the the same as
// the last modified date of the requested file. If so, return 304 header
// and exit.
if(isset($_SERVER['HTTP_IF_MODIFIED_SINCE']))
{
if ($_SERVER['HTTP_IF_MODIFIED_SINCE'] == $gmt_mtime)
{
header('HTTP/1.1 304 Not Modified');
exit();
}
}
// check if the Etag sent by the client is the same as the Etag of the
// requested file. If so, return 304 header and exit.
if (isset($_SERVER['HTTP_IF_NONE_MATCH']))
{
if (str_replace('"', '', stripslashes($_SERVER['HTTP_IF_NONE_MATCH'])) == md5($mtime.$file))
{
header("HTTP/1.1 304 Not Modified");
// abort processing and exit
exit();
}
}</textarea>
</p>
<p>There’s one further caveat – headers must be sent before any other output.
This generally means that headers must be sent before anything else in your
<acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym> code, in other words, that this code must go at the very top of your <acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym>
code before anything else. One way around this is to buffer your page on the
server before outputting it, which is done by using ob_start(), which I use
to provide gzip compression – further increasing the responsiveness of plain
text files by transmitting them compressed, and decreasing the server bandwidth
used even more. </p>
<p>Putting all this together gives us: </p>
<p>
<textarea cols="82" rows="51" readonly="readonly" title=".htaccess code">
// $file contains the file name of the page being displayed (the actual
// content, not any templates you may be using). We take the last modified
// date of this file.
$mtime = filemtime($file);
// Create a HTTP conformant date, example 'Mon, 22 Dec 2003 14:16:16 GMT'
$gmt_mtime = gmdate('D, d M Y H:i:s', $mtime).' GMT';
// send a unique 'strong' identifier. This is always the same for this
// particular file while the file itself remains the same.
header('ETag: "'.md5($mtime.$file).'"');
// check if the last modified date sent by the client is the the same as
// the last modified date of the requested file. If so, return 304 header
// and exit.
if(isset($_SERVER['HTTP_IF_MODIFIED_SINCE']))
{
if ($_SERVER['HTTP_IF_MODIFIED_SINCE'] == $gmt_mtime)
{
header('HTTP/1.1 304 Not Modified');
exit();
}
}
// check if the Etag sent by the client is the same as the Etag of the
// requested file. If so, return 304 header and exit.
if (isset($_SERVER['HTTP_IF_NONE_MATCH']))
{
if (str_replace('"', '', stripslashes($_SERVER['HTTP_IF_NONE_MATCH'])) == md5($mtime.$file))
{
header("HTTP/1.1 304 Not Modified");
// abort processing and exit
exit();
}
}
// output last modified header using the last modified date of the file.
header('Last-Modified: '.$gmt_mtime);
// tell all caches that this resource is publically cacheable.
header('Cache-Control: public');
// this resource expires one month from now.
header('Expires: '.gmdate('D, d M Y H:i:s', strtotime('+1 month')).' GMT');
// set the content-type
header('Content-Type: text/html; charset=utf-8');
// start output.
// Note that no output can precede the headers unless you call ob_start().
// You don't have to use gzip, but it greatly saves on bandwidth (for text)
// at the cost of a little more processing.
ob_start ("ob_gzhandler");</textarea>
</p>
<p>For more information on:</p>
<ul>
<li> <acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym> headers,
see <a href="http://www.php.net/manual/en/function.header.php" title="External Site | www.php.net">http://www.php.net/manual/en/function.header.php</a></li>
<li><acronym title="HyperText Transfer Protocol">HTTP</acronym> headers, see <a href="http://www.freesoft.org/CIE/RFC/2068/155.htm" title="External Site | www.freesort.org">http://www.freesoft.org/CIE/RFC/2068/155.htm</a></li>
<li>ob_start(), see <a href="http://www.php.net/manual/en/function.ob-start.php" title="External Site | www.php.net">http://www.php.net/manual/en/function.ob-start.php</a></li>
<li>To view the <acronym title="HyperText Transfer Protocol">HTTP</acronym> headers sent by a server, <a href="http://web-sniffer.net/" title="External Site | web-sniffer.net">http://web-sniffer.net/</a></li>
</ul>
<p> </p>
</body>
</html>
Last updated Tuesday, 20 November 2012