Today’s lesson: Implement a caching system to improve performance on slow, data-heavy pages — but only if they don’t change too often.
I felt that some pages were slow on one of my major web sites. I had already done as much of the front-end JS and CSS optimization recommended by YSlow and DynaTrace as I cared to. Besides, the slowest component was the primary page request itself, not the images and scripts.
I thought some of the slowness was genuine network latency at my good-but-cheap hosting company, but when I saved one of these slow pages as pure HTML, then served it to myself raw, it was very fast.
It had to be the processing time on my complex pages.
Identifying the slowness
I know the data on some of these pages comes from many tables joined by complex joins. The pages also show many modules, each of which requires its own database calls. They are built from many complex SQL queries, each one being fast, but on aggregate they were slow.
Rewriting the pages to be faster was going to be a big headache because it would mean rewriting everything on the site. Unfortunately, my refactor to the model-view-controller (MVC) pattern hadn’t happened yet (which would make rewriting the “M” component discrete and relatively painless).
Luckily, I realized that the pages themselves rarely change (daily or less). Since they are relatively static, I could write flat HTML once to a cache location and, if it hadn’t been too long, I could serve the static HTML instead of the page. Waaay faster.
The squirrel’s dilemma: what to cache?
I wondered what I should cache, specifically. Many of my sidebar modules, for example, don’t change from page to page. I could cache the HTML for a module on one page and serve it up statically on another. I liked this idea of module caching because I thought it might be less tricky to implement in a single afternoon.
So I began, giving myself half a day to come up with a proof of concept.
My first thought was to write the cache as snippets in flat files. But I soon realized database calls are faster than writing to the file system. So I made a three-column cache table: key, value, expiration. (My data object system made this super-easy.)
The “value” field could be a serialized anything — probably a blob of HTML, but the database wouldn’t care. If I decided to cache a module, a single variable, or a whole page – anything with a single reference – it would be a matter of 4 lines of code: Check the key; if it’s not expired, unserialize and return it; otherwise, do what the code used to do, and then store that in the cache table, serialized. All I needed were these functions:
getCache($key) // unserializes and returns the cached $value if $expiration < "now" setCache($key, $value, $expirationMinutes = 10) // serializes $value and stores it with $key
In the pudding
The proof of concept worked very well and it sped up the page demonstrably, and even made it “feel” faster to me.
In the end, I decided to cache entire pages – at least the popular ones – rather than individual modules because the results are the fastest. It’s faster to serve a single page out of cache rather than a dozen individual modules.
I also realized I needed to give myself some cache-clearing tools. The admin section has a button that will clear the entire cache, and each page can be given a cache-clearing instruction that applies to anything on the page.