It Takes Two to Contract
Today's topic: design by contract. To be honest, I was skeptical of DbC: it feels like it boils down to:
- Use types.
- Use assertions.
Which I still believe is mostly true! To clarify, types and assertions are a very capable mechanism to catch bugs earlier and to limit the consequences of bugs that do slip through. It's just that you don't really need any syntactic mechanisms to use these tools effectively, you don't need first class support for design by contract in your language. Just write more assertions!
For preconditions, assert
function works perfectly, and
for postconditions, there's defer
:
!grid.free_set.opened);
assert(defer assert(grid.free_set.opened);
One thing that might occasionally be necessary is the ability to name the result of the function, so that it's possible to write:
defer assert(result > 0);
But:
- It is almost always possible to just move the postcondition to be right before the return statement.
- A canonical name for the result variable is a generally useful language feature unrelated to assertions, it enables placement: https://github.com/ziglang/zig/issues/2765.
One area where you do need first-class support for DbC is inheritance in object-oriented languages: derived classes can weaken the preconditions and strengthen the postconditions, and coding that manually does add boilerplate. Which is perhaps another reason to avoid inheritance-based polymorphism?
As an exception, contracts absolutely do pull their weight in the Nickel configuration language, which demonstrates rather remarkable synergy between gradual typing, contracts, and lazy evaluation.
But, for a run-of-the-mill general purpose programming language, assertions and types are all you need, so there's no excuse to not start doing more contract programming today.
If DbC is just asserts, they why I am writing this post today? Well, it turns out that there is something new about contract programming I've learned while hacking on TigerBeetle, and I want to share it:
Contracts always involve two parties.
That is, something is a real contract only when you have at least
two assertions checking the relevant condition: you do one
check at the call site, and one check at the definition site. Let's pull
an example from today's code! Here's a call site of the
fill_immutable_values
function, which fills the provided
buffer with some values:
const filled = compaction.fill_immutable_values(target);
<= target.len);
assert(filled if (filled == 0) assert(Table.usage == .secondary_index);
At the call site, we don't blindly trust the return value, but rather assert some invariants: that it couldn't have copied more values than the length of the provided buffer and that, if the the number of copied values was zero, we are indeed in the special case where that could happen.
And then, we assert the same two conditions at the definition site:
fn fill_immutable_values(compaction: *Compaction, target: []Value) usize {
...
<= source_count);
assert(target_count if (target_count == 0) {
.usage == .secondary_index);
assert(Tablereturn 0;
}return target_count;
}
Now this is curious. Isn't this just code duplication? The lesson I've learned is that yes, it absolutely is often a duplication in practice, but a useful one.
The philosophy here is that these are just accidentally identical assertions. The function postcondition might actually be stronger than what is required at the call site, so assertion after-the-call could be a different, weaker one.
But let's talk pragmatics. First, having asserts in both places helps readability tremendously. Just compare
const filled = compaction.fill_immutable_values(target);
with
const filled = compaction.fill_immutable_values(target);
<= target.len); assert(filled
In the second snippet, the semantics of filled
is
immediately clear. And, with a matching assertion at the definition
site, you can now (at a glance!) check that the assumption at the call
site holds.
An assertion pair forms an airlock --- to reason about the correctness of a particular function call, you don't need to keep both the definition and the call site in mind. Rather, you could verify just that the assertions between the two are compatible, and then reason separately that each party indeed provides the guarantees it asserts.
We love the readability benefit of asserts so much that we even have
a special maybe(condition);
assertion, which is a no-op: it
signals that the condition might be true at runtime, shining a spotlight
on an otherwise non-obvious aspect of code:
pub fn write_sectors(
self: *Storage,
: *const fn (write: *Storage.Write) void,
callback: *Storage.Write,
write: []const u8,
buffer: vsr.Zone,
zone: u64,
offset_in_zonevoid {
) .verify_iop(buffer, offset_in_zone);
zone== .grid_padding); // Padding is zeroed during format.
maybe(zone ...
}
Besides readability, another pragmatic reason to pair assertions is
robustness and defense in depth: as code is evolving and preconditions
are strengthened or relaxed over time, having a separate set of
assertions at each call site helps to ensure that refactoring around one
call site doesn't break the other. And then there's an argument that
repeating an assertion using a slightly different vocabulary of local
variables maximizes the chance that at least one instance is not buggy.
For example, did you notice in the fill_immutable_values
example that the assert(filled <= target.len);
condition
isn't actually checked for at the definition site? I didn't
myself, until I started drafting this post!
But, although checking return values of functions comes straight from NASA's Power of 10 Rules for Safety-Critical Code, which TigerBeetle follows, this actually is the more trivial benefit of paired assertions. The real superpower is unlocked once you get to the same logical assertion through two different code paths.
Consider a consensus protocol, which relies on many subtle invariants. Whenever a replica sends a message, it of course asserts that the resulting message satisfies the invariants, at least to the extent the replica is aware about global state of the cluster. Upon receiving the message, the recipient replica checks the same invariants, but it has different local knowledge about the state of the cluster, and the resulting checks become meaningfully different.
The best example of this in TigerBeetle is in pervasive use of hash-chaining --- everything in TigerBeetle carries a checksum, and includes checksums of related data. So, although two replicas might not have all pieces of the shared cluster state at the same time, they can and do check that the parts of the state they do have connect correctly according to hashes, and sound alarm if divergence is detected. We think of this as binding intent cryptographically, as cryptographic scaffolding for what would otherwise be simple (easy to conflate!) operation numbers in our consensus.
This actually is a trick from property-based testing --- there, one possible property is to carry out the same computation in two different ways and check that the result is indeed the same. Turns out, in real systems the "two paths" can happen naturally, and the technique can be lifted straight out of tests into the production code.
As another example of two paths, there generally can be two ways a replica might end up in a particular state:
- Either it started from an empty data file and processed a bunch of events normally.
- Or, it crashed at some point (or joined the cluster later), and started from a specific snapshot, rather than from zero.
The state should be the same, although the paths are different. Recording the checksum of the state into a snapshot and asserting that it is the same across lucky and unlucky replicas is a way to ensure that the two paths are indeed equivalent.
To sum up:
- Assertions often come in pairs.
- Whenever you assert something, think about which distant part of the code base relies on the assertion you just wrote, and add an equivalent assertion there.
- This is worth doing even in the trivial case, where the two parties are a function and its caller.
- But be on lookout for more interesting cases, where the two halves of an assertion pair are separated by different implementations, or a process and time boundary.