February in TigerLand
Dear friends,
Hope your February was phenomenal! We unlocked “the power of zero”, added metrics for observability, and started testing misdirected IO. We also spoke at HYTRADOI, shone a spotlight on the foundations of DBMS durability, and announced Systems Distributed!
Let’s go!
Bruce Eckel
TigerBeetle is one of the only production databases (we know of) that applies the end-to-end principle to define (and survive) an explicit storage fault model, to expect almost nothing from the underlying hardware/software storage stack, treating everything as commodity. Disks (as well as faulty vendor firmware and rare filesystem bugs along the way) may read/write to or from the wrong place (misdirecting I/O), or arbitrarily corrupt data, or experience arbitrary gray failure (to slow down suddenly, temporarily or permanently). The VOPR tests that TigerBeetle correctly detects and recovers from these storage faults, and last month we:
Added specific fault injection for misdirected writes (beyond only corrupted writes). A misdirected write is a disk fault where the data is not written to the intended location (i.e. a “lost write”), but is instead written to a different, mistaken location. As a result of this new fault, we ended up finding and fixing two correctness bugs along with a few overtight assertions!
Changed the VOPR to inject storage corruptions at the byte level, as opposed to merely the sector level. This strategy revealed another two bugs:
We asserted that a grid or prepare block’s sector padding is zeroed out, but this doesn’t hold true if only the block padding is corrupted! TigerBeetle can now detect corruption in the prepare and grid sector padding and repair the entire block if the padding is read as non zero.
The VOPR further checks that every prepare’s trailing sector padding is zeroed.
Imagine you’re working on a shiny new feature for your software. You run fuzz tests on your code changes, encounter a failure, and know for certain it’s due to your code changes, not existing bugs! We set out on a mission to get to “the power of zero” – a complete absence of bugs in our VSR implementation. The road has been long, winding and immensely rewarding! Some final fixes to achieve “VOPR green”:
Solve the last known liveness bug in TigerBeetle’s consensus protocol, triggered by a subtle interaction between a single corrupted block and our asynchronous checkpointing. The solution includes a new on-disk CheckpointState format.
Fix known false positives in the VOPR – cases where our consensus protocol was behaving correctly, but our understanding led to false positives in the simulator.
Solve for a crash in VSR where a replica sends a prepare_ok to itself.
One of our three design goals is experience – making TigerBeetle delightful for both developers and operators. For example, for developers, we do little things like keep the number of files and directories in the root of our repository minimal to reduce cognitive overhead (Miller’s Law). And for operators, we provide a single, small, statically linked binary with extremely predictable performance. This month, in the realm of experience, we landed:
Support for exporting TigerBeetle metrics! Operators now have visibility into the state of their cluster at runtime, for instance the minimum, maximum and average timings for commit, compaction, and queries. These metrics are periodically emitted via statsd if --statsd=: is specified in the tigerbeetle start command.
Fixes for a few language client bugs:
A crash in the Node.js client when a client instance is closed while there are outstanding function invocations on that client.
A bug in the Python client wherein long address strings could overflow and overwrite the memory allocated for the cluster ID.
Some major refactors to tb_client (the API atop which all our language clients are built) to make it completely thread safe! Previously, only submit was explicitly thread safe.
Enhancements to make our documentation more snappy, adding keyboard navigation for search results, prioritizing results wherein the page title contains the searched key, and loading a fresh table of contents on every browser tab (as opposed to retaining the expanded table of contents from another tab). After our docs rewrite, to own our docs infrastructure, it feels good to own the kitchen you cook in!
TigerBeetle organizes its replicas in a ring topology for replication, wherein each replica replicates to the next replica in the ring. As opposed to the star topology, wherein the primary broadcasts prepares, the ring topology ensures that each replica contributes its bandwidth to replication (and reduces multi-AZ replication costs by crossing network boundaries less). Last month, we decreased the minimum exponential backoff delay to be proportional to the time that it takes to replicate a prepare across the ring in a fast network. This enables the primary to detect breaks in the replication ring sooner and re-replicate prepares eagerly, leading to more predictable performance in the face of failures.
The service that schedules and runs all of TigerBeetle’s fuzzers is called the Continuous Fuzzing Orchestrator (CFO). Our CFO runs across multiple servers with a total of 1,000 CPU cores, continuously testing various components of TigerBeetle and pushing failed seeds to our DevHub (you should take a look!). The efficiency with which these fuzzers are scheduled directly impacts the rate at which they uncover bugs:
We implemented a fair scheduler in the CFO to avoid starvation of short running fuzzers. Earlier, long running LSM fuzzers ended up spending more than their fair share of time on the CPU, with only 1-10% of CFO time being spent on short running VOPR fuzzers.
Additionally, we improved CPU utilization of the CFO from 25% → 80% by spawning fuzzers more frequently.
Raghunandan Bhat enhanced the REPL with shortcuts to move forward a word (Alt+F), move back a word (Alt+B), and forward delete (Ctrl+D). Huge thanks for the contribution, Raghunandan!
Nikita spotted that our Node.js client samples use equal, which has been deprecated since node v9), and corrected all occurrences to use strictEqual. Bravo, Nikita!
Kirill Bezuglyi changed the REPL to use StaticAllocator, a wrapper around Zig’s ArenaAllocator that allows disabling dynamic allocations after a particular point in the execution (REPL init, in this case). Static memory allocation for the win, kudos Kirill!
Ike corrected some typos in the new documentation that we are currently in the process of rolling out. Thank you, Ike!
On IronBeetle, matklad worked on building a brand new code review system, while Brian’s TigerBeetle Stream revolved around addressing code review comments on his open PR for TigerBeetle’s Rust client! Join us every week on twitch.tv/tigerbeetle or catch up on the TigerTube.
We were at HYTRADBOI in the form of Talk #18! matklad presented (remotely) Rocket Science of Simulation Testing!, covering recent fuzzing lessons learned at TigerBeetle.
The hard part about DistSys is not the algorithms or coding, but the years (!) spent testing. You can speed this up (literally) with Deterministic Simulation Testing. But how? Joran and Dominik met on the airwaves to share lessons learned building their own simulators at Resonate and TigerBeetle.
The third edition of Systems Distributed will take place in Amsterdam, 19-20 June! We can’t wait to be together again in person with worldwide friends of distributed systems. Our venue is booked, our speakers are prepping, and Amsterdam’s watery wonders await. With Sigmod taking place 22-27 June in Berlin, the time is right!
‘Till next time… play it live!
The TigerBeetle Team












