Snapshot Testing For the Masses
May 14, 2024
Snapshot testing is a technique for the “assert” part of an “arrange, act, assert” test, that dispenses with hand-written assertions, and instead uses comparison to a known good value, a snapshot. Crucially, in the case of a mismatch, a snapshot can update itself. A gif is worth a thousand words:
That is, we start with a test with an empty expected result:
The testing infrastructure automatically populates the value using the current code
After behavior change (e.g., a bug fix), the expected values are updated automatically
This post assumes that you are already familiar with the concept of snapshot testing (also known as testing with expectations), sold on the methodology, and don’t need further convincing that this is a good idea. You can read any of the following posts to learn about motivation behind snapshot testing:
But, just in case, here’s a short bullet list of most important motivations for me:
Snapshots require that domain objects have rich and readable textual representations. Ability to dump a domain object to string is invaluable during debugging.
Traditional tests often end up “freezing” the code, when significant behavioral changes require a lot of busy work to manually update all the tests. Snapshot tests are easily adaptable to changes in requirements.
Snapshot testing encourages a more fruitful mindstate for testing focusing on comparing input and output data, rather than focusing on details of particular code which implements data transformation.
Textual comparison with a diff subsumes various fluent assertion libraries.
Rather then elaborating on the above properties, we will look at how one might implement a snapshot testing library oneself. The motivation is two-fold:
To dispel the magic surrounding mature snapshot testing libraries — they usually make use of fancy macros, require external tools or rely on deep integration with code editors. It might not be obvious, but none of this is essential! A simple version can be implemented in a pretty spartan way.
To show one particular implementation technique, which I think leads to a more flexible API than is typically available in popular libraries.
We’ll be using Zig — it’s a spartan language, so it’ll work perfectly to demonstrate how little is required!
What Is a Snapshot?
Usually, snapshot testing libraries start with an assertion macro for comparing expected value with a snapshot:
I think this is suboptimal API — as soon as something is a macro, the only way to program it is with another macro. In other words, macros are an abstraction that composes poorly.
For our library, we start with a slightly different building block — a self-updating string literal, which we call a snapshot:
The secret sauce here is SourceLocation
. It specifies the source file and line of the string literal that gives value to the .text
field.
How to get a SourceLocation
very much depends on a particular language. Often, there’s some sort of a “get current line” builtin or macro. For Zig, that would be @src()
function.
to be called like
In a more expressive language, you might want to abstract snap(@src()
spell with a macro (you need a macro to make sure that the equivalent of @src()
captures the call-site). But in Zig, the caller has to write this exact pattern, which actually isn’t all that bad!
Comparison Functions
Now, once you have a location-tagged string literal, you might add various comparison functions to it. The fundamental one is of course direct string comparison:
A note about line endings: on Windows, git sometimes “helpfully” changes \n
in the source code to \r\n
. This of course breaks direct string comparison. You can fix this by either making the comparison line-ending agnostic, or just by telling git to not mess up with line endings using the following .gitattributes
file:
On top of this primitive string comparison, you can build higher-level utils to, e.g., compare debug string representation or JSON serialization:
Using Snapshots
When designing an API, you should always think not about what code you, as the implementer, are writing, but rather about the user of the API. So let’s test-drive what we have so far. One example from TigerBeetle where we use snapshot testing is our CLI argument parsing library. The way these tests work, is that we have a example CLI that exercises various features of the library:
In the tests, we want to run this cli against a specific array of arguments, and verify that it parses the arguments correctly, providing appropriate error messages. We can write a helper check
function such that we can express these requirements directly. That is, the check
function takes only two arguments: the input data, and the result (the error message or parsed arguments):
Here’s how check
can be implemented:
The high order bit: because Snapshot
is a first-class value, we can pass it around!
Updating Source Code
So far, our library is just string comparison with more steps. The distinguishing features of snapshot testing is the ability to update “gold” values in place. We have already solved the hard part of this problem: our snapshots know the source file and line that need an update. Let’s do the rest!
First, when should the test update the gold value, instead of just failing on mismatch? One surprisingly simple approach is to just update unconditionally (still failing the test). This works great if you store the code in source control, and can always revert to an earlier version.
But a more conservative approach is to require an opt-in via an environmental variable, or even a specific builder method on the snapshot itself:
The fundamental .diff
method then becomes:
The above snippet oversimplifies things a bit — here, one has to write some amount of messy code:
To extract the range of the original snapshot string literal, one might want to reach out for a real lexer for the programming language in question. However, given that you already know the start line, and that you can require that the snapshots are written in a particular consistent style, just some ad-hoc string processing works.
When splicing in the new value, it can’t be pasted as is! You need to convert it to a string literal, adding quotes around, and you also want to compute the correct indentation.
Finally, if several snapshots are updated in a row, the original source lines become incorrect! So you’ll also need to keep a bit of state around to keep track how many lines were removed and added so far, and adjust
source_location.line
appropriately. Alternatively, you can require that the snapshots are updated one-at-a-time.
When updating a source file, there’s an important invariant to uphold: you should touch the file system only when the test would fail otherwise. It is often useful to run the test suite outside of the original repository, without access to the source code. This use case should continue to work, unless the tests are going to fail anyway.
Extras
In case of a mismatch, it is helpful to print a proper diff. There’s a cheat code here: leveraging git. That is, just update the snapshots in-place and use git diff
to show a nicely colored diff to the user. But, if you want to use a real diff, it’s useful to know that a good quality one isn’t that much code.
Often, the data to be snapshoted includes some volatile parts, like timestamps, which you would like to exclude from comparison. One possible approach here is to explicitly mark ignorable parts of the snapshots. This is how it could look like:
The magic is <snap:ignore>
— this is a special string recognized by the library. Specifically, the
function splits the snapshot on the <snap:ignore>
substring, and than checks that each of the remaining fragments can be found in the original string, in that order.
Conclusion
That’s all for today! Now you can write your very own snapshot testing library, or make a more informed choice among the existing offerings. Key points:
Snapshot testing is a useful technique! Keep it mind for cases where you feel like tests make the code less changeable.
The basic primitive of snapshot testing is a self-aware string literal, which remembers its position in the original source code and can update itself.
Text is a surprisingly powerful primitive, if you can compare text, you can compare complex, structured data by serializing it to text.
A minimalist, but useful snapshot testing library can be implemented in less than 500 lines of code! See, for example, our TigerBeetle microlibrary.