Let Them Eat Cache v2

By David Barratt, 27 September, 2024

A lot has changed since I wrote Let Them Eat Cache in 2021. Here are some highlights:

As I was writing this blog post, an even bigger change occurred: Cloudflare announced that in 2025 cache tag purging will be available (for free?) on all account tiers. I'm excited that this is happening, but I'm annoyed that I did all the work discussed below right before the announcement. C'est la vie. Even though this work is quickly becoming irrelevant, I'll document what I learned.

Bottom Line

If you're looking to perform cache tag purging for a production service, I would seriously consider waiting for cache tag purging on Cloudflare or using Fastly. If you cannot do either of those things then I may have a solution for you.

Problems to Solve

I originally wrote Drupal Edge Worker to essentially deal with the lack of Cloudflare Cache Rules. More specifically, when you're logged in, Drupal sets a session cookie (SSESS) and there wasn't a way to instruct Cloudflare to bypass the cache when that cookie was present. That meant that if a user was logged in, they would randomly get the logged-out page if it happened to be in the cache. The only work-around was to implement the bypass logic within a Worker and rely on the Cache API. At the time, I didn't fully understand that the Cache API does not work with Tiered Caching, it does not store cached content outside of the data center that accessed it. Since Cloudflare has 330 data centers, this could result in 330+ cached copies of each resource. The unfortunate downside of this architecture was that there wasn't a reliable way to know if the resource was cached in another data center and whether the indexed tags were accurate. The result was a huge number of write operations to a Durable Object on every cache MISS.

Rewrite

I normally don't advocate for rewriting applications. Since this application is small, has near zero users, and the changes were substantial it seemed like a good exception. In hindsight, this was the right call. Not only was I making a large number of changes, but there are also many things that have changed about Workers, including major changes to Wrangler.

I decided to use D1 to as a storage mechanism for the tag index. I also decided to use Queues for handling the capture and purge requests without blocking the response. Most of this worked fairly well and I'm proud of what I wrote.

D1 and Placements

D1 has been great to work with, and it was a much better system for this purpose than the Durable Object storage. Although a weird "gotcha" that I ran into with D1 is that:

Workers with a D1 binding will always be placed in a data center near the location of the D1 database they are bound to. 

In other words, if you have a Worker and you bind that Worker to a D1 database, then it's going to be in a single datacenter rather than globally distributed like you might expect. To get around this, you can split a Worker into two and either use Service Bindings or Queues to communicate between the Workers. I chose the latter since there didn't seem to be a need to ensure that the client Worker receives a response other than that the request has been persisted to a Queue.

Twilight Zone

Another odd thing I ran into was that Workers operate on an account level, but the Cache is scoped to a specific Zone. Theoretically this wouldn't be a problem; you'd simply index the Zone along with the URL being cached. However, I couldn't find a way to determine the Zone of the request. I assumed it would have been one of the properties on IncomingRequestCfProperties, an incoming request header, or something, but sadly it is not.

There were only two ways to get around this problem. The first method was to hard code the Zone to the Workers. I didn't particularly like this solution because it meant that you'd have to stand up the entire stack for every Zone or you'd have to maintain some sort of mapping of Zones. I thought about fetching the Zones from the Cloudflare API and attempting to infer the Zone by the hostname, but that felt error prone to say the least. 

The second method, and the one I ultimately implemented, was to use an intermediary Worker. When making a request to the intermediary Worker over the internet (i.e. using *.workers.dev, not Service Bindings), Cloudflare adds the CF-Worker header which, of course, contains the Zone. It still feels a bit hacky to do this, but it works better than I expected.

I'm still surprised there isn't a way to determine the Zone from an incoming request and I hope this is something Cloudflare could add in the future.

Drupal & Beyond

This blog runs on Drupal on a Raspberry Pi in my home. I mostly rewrote the Worker so that my blog would continue to be available even during an internet outage, which is sadly more frequent than you might expect. I'm very happy that Drupal has built-in cache tag support, and I updated the Cloudflare Worker Purge module to handle using API Tokens (rather than the old API Keys).

It's surprisingly straightforward to do this outside of Drupal though. If you wanted to, you could add a Cache-Control and Cache-Tag response headers to your backend service(s) and then issue purge requests anytime tagged content needs to be invalidated.

Ironically, I'm very excited to be able to completely deprecate the Workers I just wrote and migrate to using Cloudflare's APIs directly as soon as they are available. Doing so will hopefully result in a greater level of stability and less custom code that I need to maintain. Stay tuned for that update!