Let Them Eat Cache

Using a web server to dynamically serve content from a single geographical location is not fast nor stable to put it mildly. Even if the content is cached on the web server, the user is bound by the physics of the speed of light and cannot retrieve the resource faster than that.  The small Digital Ocean "Droplet" that this blog is hosted on is located in New York City; if it is being read from Hawaii or India the page will take significantly longer to load. In addition to that, if the web server becomes unreachable for any reason, the content becomes inaccessible to everyone until it is reachable again.

These problems can be solved by having a surrogate cache HTTP responses. Ideally that cache is spread over multiple geographic locations. Of course, now we have two problems as the saying goes. How do we purge the cache when content is updated?

When I started upgrading sites from Drupal 7 to 8, I noticed that there was a concept of "cache tags" that came up frequently in the documentation. Instead of having to recall all of your resources that need to be purged from a surrogate cache by URL, the resources can be tagged (typically with a custom header) and later purged by those same tags.

For example, an article like this one might have a response header like:

Cache-Tag: node:49,user:1,media:18

Tagging allows the response to be cached much longer than they would otherwise since the resource can be purged when any of its dependencies change. When user:1 changes their display name, all of the articles they have written can be purged from the cache by the tag. I no longer have to take a guess on the time to live (TTL) of a response. I can effectively cache the resource forever and have the confidence that it will be purged when necessary.

While cache tagging can be applied to any HTTP response, Drupal provides out of the box support. Its extension system expects to be provided with appropriate cache tags for a dependency, which are bubbled up to a response. The Purge module provides a mechanism to expose these tags as custom headers and issue purge requests to surrogate caches.

Unfortunately, most Content Delivery Networks (CDN) charge a premium for purging by tag. Cloudflare for instance, only makes this feature available to their Enterprise service tier.

To get around this, I started running Varnish HTTP Cache on my web server. The Drupal community has published a guide for how to configure Varnish for cache tag purging. Since the "Droplet" that I pay for has very little memory, I configured Varnish to store the cached responses on the the server's SSD. I recently realized that this setup has little benefit over Drupal's build in page caching system. For starters, the cache isn't geographically any closer to the user than origin server. On top of that, since both Drupal and Varnish are configured to use the SSD for cache storage, there isn't any performance benefit to using Varnish over Drupal. Even if there is a slight benefit, it doesn't seem worth the additional overhead.

I came to the conclusion that I needed to move the cache out of Varnish and into a CDN. I took a look at alternatives to Cloudflare like Fastly, but at a $50/month minimum, they are way out of budget for this tiny blog. Cloudflare remained the most cost effective option, but the lack of cache tag purging is rather annoying. Drupal does provide a work-around for providers like this with the URLs queuer module. This module maintains a database of tags and URLs. When a tag is purged by Drupal, Purge looks up all of the tagged URLs in the database. Then a module like Cloudflare can issue a purge request to the Cloudflare Purge API. On the surface, this seems like a great solution, however I quickly ran into some problems with it. The first problem is that URLs queuer isn't aware of what is actually being cached. For instance it may skip the 4xx-response cache tag, while I may actually have Cloudflare configured to cache 404s. This can lead to a mismatch where Purge doesn't purge resources that are in the cache. Another issue is that this solution is unique to Drupal. If I were to use this same strategy with another application, like WordPress, I would have to write a plugin to do so.

Then I had an idea: why not store the mapping of tags to URLs in Cloudflare? I could do this with either their KV or Durable Objects products. Since Durable Objects are still in beta and the pricing after beta ends isn't clear, I opted to use KV.

I thought this would be straightforward design: I would use a tag for the key and an array of URLs for the values. Unfortunately what I didn't realize about KV is that it is not transitionally safe.

Workers KV isn’t ideal for situations where you need support for atomic operations or where values must be read and written in a single transaction.

This is a problem because I needed to read an array of URLs and add a URL to the list. Doing so isn't safe is because it can take up to 60 seconds for changes to propagate across their network.

At this point I thought about ditching KV and hoping for the best with Durable Objects, which offers transactional safety, but that seemed like a huge unknown and during the beta period there are no guarantees of persistence (which could lead to the cache not being purged correctly).

Then I thought perhaps the problem isn't the design of KV, but how I'm using it. If I were to move to a single value per key, then I wouldn't need to read and write in the same transaction. I got this idea from the Set object in JavaScript which is identical to a Map, but the keys and values are identical. What I ended up with was this code for a Cloudflare Worker:

const cache = caches.default;
const cachePut = cache.put(request, response.clone());  
const cacheKey = base64url.encode(request.url);

const tags = response.headers.get('X-Cache-Tag').split(',').reduce((acc, tag) => {
    const trimmedTag = tag.trim();

    // Ignore empty tags.
    if (trimmedTag === '') {
        return acc;
    }

    acc.push(CACHE_TAG.put([trimmedTag, cacheKey].join('|'), request.url));

    return acc;
}, []);

event.waitUntil(Promise.all([
  cachePut,
  ...tags,
]));

This design also gained the benefit of being able to retrieve the URLs that need to be purged by listing the keys by the tagged prefix:

const { keys, list_complete, cursor } = await CACHE_TAG.list({
    prefix: `${tag}|`,
});

const urls = keys.map(({ name }) => {
    const [, encodedUrl] = name.split('|');

    return [name, base64url.decode(encodedUrl)];
});

From there, I could issue a cache.delete(), but I figured it's possible that I could hit the 50 subrequest limit. I instead opted to use Cloudflare's Purge API, which allows purging of up to 30 URLs per request. This gives me the ability to purge up to 1,500 URLs per purge request to my worker. Obviously this doesn't scale to large sites, but for a small blogs like this one, it works well.

Astute observers may have noticed in the code above that I am collecting tags from an X-Cache-Tag header rather than Cloudflare's Cache-Tag header. During development I discovered that the Cache-Tag header is swallowed by Cloudflare before the response reaches a Cloudflare Worker. This is written somewhat clearly in their documentation, but I completely missed it.

Solution

The Cloudflare Worker code I wrote is available on GitHub. I also wrote a small Drupal module to handle adding the X-Cache-Tag header as well as forwarding Drupal tag purge requests to a Cloudflare Worker in a format similar to what Cloudflare natively handles. 

Now that I have cache tag purging in place, I am seeing a lower Largest Concertful Paint (LCP), fewer requests to the origin server, and any changes I make in Drupal are reflected near instantly. 

I hope that Cloudflare makes its cache tag purging available to everyone in the future, but until then, it is good to know that there is a free work around for small sites like this one.

Alternatives

An alternative to doing all of this is to not have a dynamic web server in the first place. I could create a blog with a static site generator and deploy that site using Cloudflare Pages. The entire cache is purged anytime the site is updated. Honestly that is a more straightforward solution, but it does come with trade offs such as non-technical users not being able to create content and not being able to edit content from a mobile device.