However, there is one annoying side effect of everything being cached: Since that also includes the landing page, new posts could be invisible to recurring visitors for quite a while. In a bit more detail, here is what is going on at the HTTP level:
By default, my webserver, lighttpd, delivers all static HTML pages with no explicit caching headers, but includes the modification time of the resource (only the relevant headers are included) and an
Date: Fri, 15 Mar 2013 10:03:43 GMT ETag: "4531062" Last-Modified: Thu, 14 Mar 2013 20:12:06 GMT
ETag is good to have (browsers can use it to unambiguously revalidate cached content with the server, as I'll explain later), but the
Last-Modified–Header combined with no explicit statement about cacheability triggers a heuristic defined in HTTP in most browsers. Basically, browsers calculate the difference between the time the resource was retrieved and the time it was last modified on the server, and cache the resource for 10% of that value without revalidating with the server.
This means that for a blog that is daily updated with new posts, users will eventually see the posts after a few hours after their last visit, but for a blog that hasn't been updated for several weeks or months, ten percent of that time can be pretty significant.
A simple solution is to just manually define a cache validity in the HTTP headers for some or all resources. lighttpd has the
expires module that does just that. Here is the relevant line in my
expire.url = ( "/theme/" => "access plus 7 days", "" => "access plus 1 hours" )
The effect is that all resources in the subdirectory
theme will have an
Expires header 7 days in the future, and everything else will be valid for just an hour. This is a tradeoff between server and client resource usage and immediate updates: For me, an hour of delay is not a big deal, and users jumping back and forth between blog posts will be able to do so without any further HTTP requests. Here are the response headers of the main blog page:
Cache-Control: max-age=3600 Date: Fri, 15 Mar 2013 10:23:06 GMT ETag:"4531062" Expires: Fri, 15 Mar 2013 11:23:06 GMT Last-Modified: Thu, 14 Mar 2013 20:12:06 GMT
As you can see, the
max-age directive exlicitly states a validity of 3600 seconds, and the
Expires header also points to a value one hour in the future.
Even when that time is reached, the whole resource doesn't have to be transferred again: Browsers can just perform a conditional HTTP request using the
Last-Modified headers that they cache together with the resource itself. If the content is still the same, the server will be able to deduce that from the headers and reply with a
304 Not Modified HTTP response. As long as your site is not very highly frequented or references many additional resources, cache revalidation is not too expensive.
One thing that has also helped me tremendously in understanding HTTP caching was an answer on Stackoverflow that explains how to force the various browsers to revalidate a resource or to completely bypass the cache – for debugging, it's very useful to know that there is a big difference between pressing
Ctrl + F5 in most browsers.