Statamic at SPIEGEL Scale (Part 1)

This is PART ONE of a two-part article, 18 months in the making. We've hinted and teased around the edge of it, but have never pulled back the curtain with any amount of detail, until now.

Enter SPIEGEL ONLINE

Back in October 2017 we scheduled what seemed to be a relatively routine video chat with a larger Statamic customer. They had done some custom stuff they were excited to show us and wanted to pick our brain about our upcoming features and roadmap. I'll be honest, as a mono-lingual American, I hadn't heard of SPIEGEL _techlab before and only Googled them as the call was ringing. These types of calls usually come from some beige-souled corporate IT solutions company running .NET and generally look like this:

"Hey there, we were wondering if you had any plans to incorporate ENTERPRISE FEATURES A, B, C, F, and N-Z. Oh and E-commerce, SAP integration, and can guarantee 24/7 phone support at risk of severe contract breach."

We're a 3 person company. The answer is probably nope.

Instead of this Groundhod Day scenario playing out yet again — I realized this was the dev team behind Der SPIEGEL, the huge German media and publishing company with a Top 500 site...right as the video screen popped out of black. André Basse and his energetic developers were excited to chat with us, and formalities and introductions concluded, they dove right into a demo of what can only be described as my grand vision for Statamic — something I've dreamed about since its earliest conception.

There was Statamic, running spiegel.de/plus and bento.de, two sites on a scale we had yet to encounter.

They dove right into a demo of my grand vision of Statamic, something I've dreamed about since the very inception of Statamic.

I knew almost immediately how they were doing it, at least on a high level. There's only one way to run at that scale. And no, it's not MySQL or PostgreSQL. We had been hoping to maybe, eventually, arrive there someday, taking efforts to carve its shape out as a trail of secret breadcrumbs, abstractions, stubbed out hooks, and unfinished endpoints.

Software development is hard

I'm not sure if you knew that. Building the software is often the easy part. Making a business out of it without losing sight of the reason you built it in the first place is very hard. I'm sure many of our readers know it first-hand. It's a constant trade-off between immediate cashflow needs (payroll comes twice a month, did you know that?) and long-term goals. I never wanted to just bolt-on custom services as the plan for profitiability. I wanted to build a dream platform. The next level of CMS. I could see it. I could taste it.

I had been hoping to build out the rest of our dream platform for years, but we could never find enough time between the more immediate needs of our current customers, and the marketing efforts needed to reach new customers in the short-term. And so Statamic stayed a really great flat file CMS, capable of running small to medium sites (or large sites with good caching), but confined by pre-concieved conceptions and technical bottlenecks.

‍A quick tangent regarding flat files...

You see, flat file architecture is blazing fast...to a point. If you want to fetch single entries by URL or the last $n entries from a collection, it'll scale forever and stay quick. But the moment you want to sort, filter, search, or order, you need a lot of data. Maybe all of it.

Once you reach thousands of files in a folder, it takes a painfully noticible amount of time to parse them all on the fly. We knew this of course, and as far back as Statamic 1.5 we built a proprietary caching system (called the Stache) to store pre-parsed data with indexes, keys, and other chunkes of data used to help you perform those sorts, filters, and searches quickly. It works really well, and as long as you can throw some static caching on the front-end, your response times will be nice and fast.

However, since files are the permanence layer for your data, the control panel side of the application can't work with statically cached data or rendered files. At some fairly hard-to-guesstimate tipping point for an amount of data in your site, your experience will begin to lack in the performance department. It's usually somewhere around 2,500-5,000 entries, but it can be less or more depending on the raw amount of text.

Okay, back to SPIEGEL...

2 weeks later Jason and I were getting off a plane in Hamburg, Germany. It was clear from our chat that we would both have a lot to gain from spending some time together.

They had found our breadcrumbs, saw the potential, and built out the missing pieces. Excited by the possibilities, we spent the week in Hamburg cleaning up some abstractions to let them have a cleaner integration with the custom side of their stack. We also built out some additional features that made a lot more sense to be in core rather a custom addon. The bulk of these features became Statamic 2.8 - namely custom publish forms, Bard, and database users.

As a bootstrapped company, it's not often you get to experience a living use case like this. It was surreal to see the Statamic control panel up on screens everywhere I looked. Developers, writers, you name it. There was Statamic. Running on the same playing field as the New York Times and The Guardian.

So why are we only talking about it now?

A fair question. Yes, SPIEGEL had the golden goose, and they even graciously offered us their code if it would help grow Statamic. However, Statamic v2's codebase was stretched too thin in too many places to roll out these kinds of big changes. Yes, it was totally feasible to do as a one-off, but plug-and-play friendly with a streamlined upgrade path and clear docs? Not so much. Rather than comission the hype train without a finished track, I decided it was best to bide our time and we began working on Statamic 3.

18 months later...

We just got back from another trip to Germany two weeks ago where we showed the SPIEGEL team what we've built, and within just a few hours we had Statamic 3 alpha running with millions of entries, blazing fast response times, and no SQL involved. Now that we know for sure we can pull this off, we're breaking the radio silence.

🔥 Tip: at this kind of scale, MySQL can simply fall apart.

I can hear your questions. So how does it work? What's the secret sauce? Is it in Statamic 3? Can we have it now? I know you all want to know more. I promise to get into all the glorious details NEXT WEEK IN PART 2. I don't mean to tease, but it's Friday after 5pm, and you should be ordering pizza. Or at least I should. Until next week...👋