December in TigerLand

Dec 23, 2024

Dear friends,

Wishing you a very merry December. This month, we published our brand new Python client, introduced more fault injection to VOPR and Vöʀᴛᴇx, and improved our logging, CLI, and REPL for a better developer experience. We also discovered and fixed a subtle memory swapping issue that would have potentially allowed the Linux kernel to undermine TigerBeetle’s storage fault tolerance.

Let’s go!

“ The silver lining of correctness bugs: they are fun to debug . ”

TigerBeetle adopts an explicit storage fault model—it detects and recovers from latent sector errors, disk corruption, and misdirected I/O where firmware or filesystem bugs might read or write the wrong sector. This behaviour is continuously tested by the VOPR, which uses Deterministic Simulation Testing to precisely control (and accelerate) time, concurrency, and fault injection, even knowing how to push faults close to the theoretical limit of TigerBeetle.
We recently discovered that it was possible to circumvent our storage fault model on systems where swap is enabled—a section of a disk that the OS uses to store inactive data from memory. With swap enabled, corrupt data on disk could be swapped into memory by the kernel!
This behaviour could not have been detected by the VOPR as it tests whether TigerBeetle correctly adheres to its storage, process and network fault models, but does not inject memory faults! TigerBeetle’s memory fault model explicitly recommends using memory protected with error-correcting codes (ECC), and we explicitly make no mitigations against memory corruptions (this would require byzantine fault-tolerant consensus).
To prevent swap from bypassing TigerBeetle’s storage fault-tolerance, we now invoke mlockall Linux syscall on startup, which locks all of TigerBeetle’s virtual address space into RAM, preventing the memory from being paged to disk in the swap area, where it might be vulnerable to storage faults. Equivalently, we invoke SetProcessWorkingSetSize on Windows. However, TigerBeetle running on macOS is still vulnerable to swap, as an equivalent syscall isn’t available on macOS! (At present, we support only Linux for production, with Windows and macOS for development)
For predictable latencies of queries over two (or more!) indexes, TigerBeetle employs the Zig Zag Merge Join algorithm – a technique that intersects correlated values across indexes, zigging and zagging between them, without having to buffer or sort anything in memory. As a result, no pathological query can “explode” in time (or space) or take forever to execute!
We recently discovered and fixed a correctness bug in our Zig Zag merge algorithm implementation wherein some queries erroneously end a scan too early, truncating the query results. This bug could manifest through the use of get_account_balances, get_account_transfer, query_accounts, and query_transfers when over multiple secondary indexes. To learn more, do read through the phenomenal PR description that explains not only the bug in greater detail, but also how it made it past all four (!) of the fuzzers that test TigerBeetle’s scans. 🤯
At TigerBeetle, we continue to remind ourselves of the need to practice the principle of defense-in-depth. This is why critical components of the code are tested using test suites of varying granularity. For example, Scans (the component that powers queries in TigerBeetle) are tested via unit tests, a dedicated scan fuzzer, auxiliary LSM tree and forest fuzzers, and the VOPR (via this exhaustive workload). This month, we continued to invest in testing, from improving unit tests and fuzzers, to adding new failure modes to VOPR and Vöʀᴛᴇx.
We added a new network failure mode to Vöʀᴛᴇx – packet corruption. Vöʀᴛᴇx now randomly shuffles & zeroes out bytes in network packets, all while asserting correctness – by performing a set of reconciliation checks that assert creation of all accounts & transfers is successful, and querying accounts & transfers fetches expected results.
Currently, VOPR tests for safety and liveness of our VSR implementation in the presence of processes crashing (and restarting). We added a new process fault mode to the VOPR – pauses. The motivation behind adding this new failure mode is to simulate the real-world scenario of VM migration and to unveil new, interesting interleaving of events!
Motivated by a correctness bug that we fixed in our Zig Zag Merge Join algorithm, we rewrote our Scan fuzzer. The fuzzer now tests the Zig Zag Merge Join implementation more effectively by choosing random objects, as opposed to carefully choosing the prefix of generated objects based on what query they should match.
We now assert that the CacheMap, a hybrid between a SetAssociativeCache and a HashMap (stash) is exclusive; keys in the cache must not be present in the stash.
Enabling assertions in production means that TigerBeetle either runs correctly as far as our expectations encoded in the code or else crashes to remain safe, as opposed to remaining available but compromising correctness. However, it is important to exercise caution while choosing which asserts to enable in production. Since the data plane is performance-critical, we gate this new assertion behind the constants.verify flag. To learn more about how we decide whether an assert should be enabled in production, do read through this amazing comment! 🚀
We finally landed the heavily requested Python client for TigerBeetle, which supports both synchronous and asynchronous usage for all APIs (create_accounts, create_transfers, create_accounts, get_account_transfers, get_account_balances, query_accounts, query_transfers, lookup_accounts, and lookup_transfers). It has no runtime dependencies, and bundles in the C libraries - much like the Go client. Be sure to read through the documentation, and start building! 👨‍💻 🐍
Vöʀᴛᴇx is our full-system integration test that also covers the language clients. It runs the TigerBeetle binary on actual infrastructure (real OS, network, and storage) and stresses it using real client libraries while injecting crashes and faults.
Currently, Vöʀᴛᴇx only runs on Linux, but we are taking steps towards making it platform independent. To that effect, this month we added custom network fault injection to Vöʀᴛᴇx. This change involves placing a TCP proxy in front of each replica which intercepts all communication and injects basic network faults (packet loss, corruption, and delays). This rids us from our dependency on Linux tc/netem!
We also fixed a resource leak in Vöʀᴛᴇx wherein successfully initialized proxies weren’t deinitialized in the case where some proxies couldn’t be successfully initialized.
TigerBeetle cluster upgrades are designed to happen without external operator coordination. Multiple versions of the code are compiled into a single TigerBeetle binary, so that a replica itself can decide which version should be running, and deploy its own upgrades. Specifically, once operators place a new TigerBeetle binary on enough replicas, they coordinate the release amongst themselves. This approach optimizes for safety (eliminating a large category of storage non-determinism) and operational simplicity.
We changed the count of replicas considered enough __ by the cluster to start coordinating an upgrade. Specifically, we now require all replicas in the cluster to agree to upgrade (as opposed to the old value – all - 1). The motivation behind this change is that in most cases, not upgrading all replicas together would be a mistake, leading to replicas lagging and needing to state sync. If an upgrade is needed while the cluster is compromised, then it should be a hotfix upgrade, which is a build tagged with the same release.
This month, we landed enhancements to operator and developer experience.
--cluster is now an optional parameter for the tigerbeetle format. This allows operators to opt into a randomly generated cluster ID, which is logged by the format command.
Operators can now pass --log-debug –experimental to the tigerbeetle start command and enable debug logging at runtime, without recompiling the binary.
Cluster logs are now prepended with a UTC timestamp, allowing for easier correlation of logs between TigerBeetle replicas. 🕒
We improved the error logged when an operator accidentally runs a TigerBeetle client and cluster with incompatible releases (client_release_too_low/client_release_too_high).
We fixed a bug in the REPL wherein attempting to use multiple objects on operations that do not support them (for example get_account_transfers account_id=1, account_id=2) would cause it to panic.

Join us live every Thursday to walk through TigerBeetle.

Thursday / 9am PST / 12pm EST / 5pm UTC twitch.tv/tigerbeetle

Be sure to check out episodes 51, 52, and 53 from this month, where matklad covers the theory and practice of State Sync—the mechanism which lagging replicas use to catch up to the rest of the cluster!

IronBeetle YouTube Playlist Episode 001: Intro and (Absence of) Message Parsing

Looking back over the year, there’s much to be grateful for.

We raised a $24 million Series A led by Natalie Vais of Spark Capital, with participation from Lenny Pruss of Amplify Partners and Stefan Thomas of Coil.
Cut our first production release for Linux (version 0.15.3) in March!
Launched and distributed the Golden Disk – TigerBeetle on a floppy disk.
Saw OKTO, Thundr, Fynbos, Wallet Guru, Super Payments, GTXN, Pesawise, and many others go into production with TigerBeetle. TigerBeetle clusters around the world now process hundreds of millions of transactions a month.
Featured on the Nasdaq tower in Times Square as part of Redpoint’s 2024 InfraRed list of the top 100 transformative companies in cloud infrastructure.
Welcomed Chaitanya Bhandari, Joy Machs, Fabian Ruehle, Fabio Arnold, and Martijn Gonlag to the team.
Bid farewell to King Butcher.
Presented the 2nd Systems Distributed conference in New York City.
Hosted the 2nd TigerBeetle Hackathon, co-located with the Interledger Hackathon in Cape Town.
Upgraded our simulation infrastructure to run across 1000 CPU cores and simulate 2 millennia of TigerBeetle runtime every 24 hours.
Deployed TigerBeetle DevHub to track bugs found by our fuzzers, as well as key release metrics.
Spoke at the Interledger Summit in Cape Town.
Spoke at Software You Can Love in Milan, and demo’d an Italian version of the simulator in a cinema (hats off to Sergio Leone and Ennio Morricone!).
Spoke about TigerBeetle’s Just In Time LSM Compaction at ScyllaDB’s P99 conference.
Spoke about Durability and the Art of Consensus at Systems Distributed in New York City.
Spoke about The Next 30 Years of Transaction Processing at Money 20/20 Amsterdam, on the Mastercard Stage.
Spoke about Modern Systems Programming: Rust and Zig at the SEi conference in Brava.
Spoke about TigerBeetle: Database Design From First Principles at FOSSASIA Summit 2024 in Hanoi.
Spoke about the Performance And Reliability of TigerBeetle (audio only) at a meeting hosted by the CNCF Storage Technical Advisory Group.
Appeared live on The Primeagen to talk about TigerBeetle in front of a live audience of 2,000 people, with the video receiving a quarter of a million views thereafter.
Were interviewed on several podcasts by Anish Badri, Software Engineering Daily, FutureMoney, The Geek Narrator, Resonate Vibrations, A Thousand Features, Underwater and Open Source Startup Podcast.
Were invited to attend the Jim Gray-founded OLTP conference ♥️ the 20th International Workshop on High Performance Transaction Systems in Carmel, California, catching up with Tobias Ziegler, Viktor Leis, Colin Breck, Marc Brooker, Reuben Bond, Michael Cahill, Pat Hellend, Shel Finkelstein, James Hamilton, and friends and heroes.
Sponsored the first TigerBeetle athlete, Joran’s sister Ghita, who competed in the IRONMAN Women’s World Championship in Nice, France.
Featured in Matt Blewitt’s 7 Databases in 7 Weeks for 2025 as “The Obsessively Correct Database”.
Featured in Kris Hansen’s whitepaper Architecting High Performance Financial Ledgers: Patterns and Practices for Modern Applications as the “High Performance Transaction Layer”.
Got matching hand drawn TigerTattoos.
And got together as a team in Milan, New York City, and Cape Town to offsite, including hiking up to spend the night teaching databases on top of Table Mountain (just look at those city lights!). 🌎