Some people spend their free time watching movies, some follow sports, others drink at bars. I, apparently, sit at home and write YAML parsers.

Statamic Loves YAML

We use it all over the place, and we just couldn't be happier with it. Statamic currently comes with two different YAML parsers — SPYC and Symfony. Both of these are parsing for different specs, but we like SPYC because it's parsing for a friendlier, looser version of YAML, but you can choose which one you want to use with the _yaml_mode setting.

The issue, however, is that neither are very quick at what they do. This isn't their fault, YAML is complex. Have you ever tried to read the entire spec? You can do a whole bunch of interesting things with it. But it occurred to me that I almost never do.

The Idea

Then I had this weird idea — what if I created a YAML parser that only parsed the bits of YAML that I use regularly? What do you use in your YAML? Strings, scalars, numbers, lists, named-lists… that's pretty much it. Could I make a YAML parser that only parsed for those things without wasting time checking for advanced features that I've never needed?

The Result

The result: Dipper. We're calling it a demi-YAML parser in that it knowingly only parses a subset of the YAML spec, the parts that we use in our sites.

It turns out that it's pretty quick. I've spent some time micro-optimizing the calls it's making. Did you know that doing a foreach on an array is faster than array_map? I didn't either. It's a bunch of revelations like that, trying to use string manipulation instead of regular expressions wherever possible, changing the orders of if/else statements.

In some of my local testing, I've copies a couple of the bigger YAML files that Statamic parses (the language file, the settings file, a couple of other test items). All said, it's about 500 lines of YAML (with comments and stuff, which the parser needs to remove). Running each parser 15 times and taking the average time to parse the file, I get these times:

SPYC:     ~22ms   - the default Statamic parser
Symfony:  ~24ms
Dipper:    ~9ms

That makes Dipper about 2.5x faster than the default Statamic YAML parser (SPYC) in my test. Neat!

The Inclusion

I'm working on adding Dipper into an upcoming release of Statamic. At first it will probably be somewhat experimental, something you can turn on in your settings, click around your site and see if everything looks normal, and then can report back "hey this worked" or "hey this broke." However, in my testing, it's pretty darn stable for what it's trying to do.

The License

We're releasing Dipper as open source. You can find it over on github. It's currently at version 0.1, which is "the very first stable public release." I've been doing some testing for output, making sure that everything that comes out is what I need it to be, and it's to the point where I'm happy enough to show everyone.

I know this is a weird project in that it's purposefully not fully following a spec, but we think there are some other projects out there that want to do similar YAML stuff to what we do. If so, maybe Dipper will be helpful.