July in TigerLand

Aug 2, 2024

Dear friends,

We hope your July was a joy! TigerBeetle turned 4 years old since the first few lines of code were written in July 2020. This July, we shipped multiversion binary upgrades and powerful new query methods, announced our Series A of $24M led by Natalie Vais of Spark Capital, and had more than 2,000 people join us live on Twitch for IronBeetle.

Let’s go!

Here’s a story from one of our “Walk and Talk” calls, which took place a few weeks before TigerBeetle’s first production release was cut. The problem under discussion was how to upgrade a TigerBeetle cluster whenever a new version is released.

It was a tricky problem that had been keeping us up for months.

If different replicas run different binaries, no matter how good our testing might be, subtle business logic changes across versions might still lead to split brain.

This is a risk faced by every distributed database, and the typical solution is to replay a stream of historical traffic against the new binary, compare with the old, and hope that will catch any difference. However, for TigerBeetle, we didn’t want to violate a powerful guarantee that TigerBeetle gives the operator: that data files across the cluster can be verified to be logically byte-for-byte identical.

We also considered scenarios where, even if an upgrade was impeccable , a lagging replica might rejoin after the upgrade and replay old requests already executed by the rest of the cluster on the previous version, but now on the new version!

To top it off, we wanted the user experience to be as simple as dropping in a new binary on each machine… and that’s it. We didn’t want to delegate upgrades to tools, scripts, or checklists, or have the user coordinate anything.

So we asked ourselves: “If we’re going to be non-deterministic, how can we be non-deterministic deterministically (and so still be deterministic!)?”

We decided to co-design the upgrade process into TigerBeetle’s VSR, to make the consensus protocol itself aware of upgrades , and upgrade only once a majority of replicas have the new version available, from exactly the same client request onwards. This ensures that differences between versions cannot do any harm, since a given request will always be executed by the exact same binary version across the cluster, not only during an upgrade, but even when the log is replayed.

With this plan in mind, there was a deliberate pause between our first release 0.15.3 and the recent 0.15.4 release that resumed our weekly release cadence. We used the time to design the new protocol for version upgrades and a novel multiversion binary format that conveniently packs all interim versions necessary for upgrading a running cluster (from one version to another) into a single downloadable executable (or “multiversion binary”).

To upgrade a cluster on Linux, simply replace the existing TigerBeetle executable with the new version. There’s no need to restart the process. It will continue running and automatically detect and advertise the new binary. Once the cluster confirms that enough replicas have the new version, it will coordinate the restart. Downtime is limited to the second or two it takes to start the new version and allocate memory, appearing as a minor latency spike to clients (around a second if your grid cache is 1 GiB).

Check out the documentation for details:

To further enhance the developer experience running on Windows and macOS, the multiversion binary format allows the same upgrade path for development clusters. However, the fully automated process (drop in the new binary) is only supported on Linux. Development environments on other platforms will still require the user to restart the replicas manually to trigger the upgrade 🤓.
We’ve mentioned before how TigerBeetle’s storage engine was designed from the ground up to use the disk as a raw block device rather than relying on commodity filesystems (which typically don’t handle storage faults safely). Now, we’ve taken the idea of being a filesystem further by introducing our own inspection tool (à la fsck and chkdsk) to diagnose problems in the wild, and allow curious users to explore TigerBeetle’s on-disk data format!

You can try it with “tigerbeetle inspect –help”. ⌨️

Speaking of storage, TigerBeetle’s LSM-Forest design excels in processing write-intensive OLTP workloads thanks to how its compaction algorithm works:

“The secret to this performance is a process called compaction, which moves data from the top of the tree further down, with nice sequential writes.”

However, the fastest compaction algorithm may not be the most efficient for disk space usage. We might choose to optimize for speed (low write amplification) or space efficiency (low space amplification), or find a balance between the two. But, as with many things in life, you can’t have it all!

Here’s an example of recent experiments we’ve been doing with 975 different compaction strategies over ~50 different workloads. The exact algorithms in this chart aren’t important. The x-axis represents writes and the y-axis represents space, in both cases higher is worse. As we can see (while the graph does make a beautiful “T”!) it’s still more complex than simply picking the best algorithm—there isn’t a single algorithm that excels in both respects! Instead, we plan to apply different strategies depending on the information at hand. For instance, we might prioritize low write amplification for hot/small data and low space amplification for cold/big data.

We landed a stopgap fix to address a pathological case of high amounts of space amplification when processing tiny batches. It’s already proving quite effective!

In a major step for TigerBeetle’s query engine, you can now query Accounts and Transfers by intersecting any combination of secondary indexes (user_data_128, user_data_64, user_data_32, code, and ledger) along with a timestamp range. These fields are typically associated with real-world business events, unlocking some of the most common (and interesting) use cases for querying TigerBeetle directly.

These new query methods were built upon the foundation we’ve been developing for several months. We were able to validate the design, and guarantee predictable query execution time that scales linearly with the number of results , regardless of the number of indexes involved. In other words, no query (no matter how complex, or how big your data) can “explode” or take forever to execute!

This is a valuable guarantee.

TigerBeetle’s continuous fuzzing orchestrator (or CFO) extensively tested all the building blocks that enabled querying the database, including the awesome Zig-Zag merge join algorithm which is what enables TigerBeetle to stream the above index set operations so predictably.

Fuzzing databases, especially queries, can take plenty of memory and complexity to recreate the database logic and compare the results from memory against the matches returned by the database.

So we introduced a novel fuzzing approach that starts with a randomized, arbitrarily complex query and then back-populates the database with a deterministic mass of tests. Since everything can be deterministically validated from a seed, the only state we need to keep in memory is the expectation of what should match the predicate (positive space) and what shouldn’t (negative space).

We increased fuzzer coverage in general by mocking TigerBeetle’s entire IO layer to fuzz the real storage implementation, instead of only running tests over mocked storage.
We also landed many fixes and improvements, with some highlights:
Our friend Igor Kolomiets noticed inconsistent balances when waiting for pending transfers to expire and contributed a detailed bug report outlining his findings step-by-step. We discovered that the trigger for executing periodic tasks, such as expiring timed-out balances, was functioning correctly during normal operations. However, it relied on incoming requests when the cluster was idle, meaning the function wouldn’t trigger unless a request arrived.

Thanks to this report, we not only fixed the bug but also learned how to improve our deterministic simulator to cover scenarios where the cluster is completely idle.

We fixed a liveness bug where a replica could get stuck in state sync or repair. Normally, messages exchanged between replicas are tagged with the replica index, but some messages, like repair blocks received from another replica’s storage, aren’t tied to a specific replica index, so the replica tag is set to zero. This caused the receiving replica to drop the messages with an “unexpected peer” error if the message was not actually from replica 0.
As data is persisted in TigerBeetle’s LSM-Forest, entries are removed from the WAL (the write-ahead log, essentially a ring buffer on disk) to make room for new incoming operations in the WAL. We recently discovered a scenario where an operation could potentially be stored across different replicas in a mixture of WALs and checkpoints (LSM-Forest), potentially impairing physical durability under storage fault conditions. This could happen if the data was spread across a commit-quorum of replicas but in different forms, making the repair process more heterogeneous than it optimally should be, taking the actual number of storage faults into consideration.

To address this, we ensured that each replica will retain the WAL entry even after it has been persisted in the LSM-Forest, at least until a majority of replicas in the cluster has confirmed that they have also persisted the operation.

We spotted and improved a suboptimal approach during replica repair. Previously, a replica would wait until all repairable headers were fixed before attempting to commit new operations. However, if all the headers after the checkpoint were present, the system could still proceed with committing, even if some headers from before the checkpoint were missing. This change allows the system to move forward without waiting for all preceding headers to be repaired, maximizing availability.
All TigerBeetle language clients (Java, Go, C#, and Node.js) received a much-needed refactor on how the FFI client handles concurrency from the language runtime. Previously, we statically allocated an array for all concurrent requests the client could possibly submit, lending these to the application during requests (limited by the concurrency_max parameter during client initialization). However, accessing a static array from multiple runtime threads caused significant friction, requiring synchronization when a client acquired and released a request. Given that dynamic allocations were inevitably occurring on the language side (such as Channels in Go or Tasks in C#), it made more sense to redesign the interface rather than maintain this statically allocated buffer.

Now, the interface between the FFI client and the language runtime is simpler, since each platform can allocate its own request and pass it intrusively through the FFI client, removing the need for a hard limit on the maximum number of concurrent calls (and eliminating the annoying ConcurrencyMaxExceeded exception altogether).

We also improved how inflight requests are canceled when closing/disposing the client, introducing a new error code ClientShutdown that might be returned when awaiting for a request.

Please refer to for more details about the client libraries!

Speaking of TigerBeetle clients, starting from version 0.15.4, we began enforcing hardware acceleration (AES-NI / AVX2) for fast AEGIS checksums. Previously, this requirement was only enforced for the TigerBeetle cluster, but the performance penalty for using a software AES implementation for AEGIS checksums was significant enough that we decided to extend the requirement to the clients as well. While most modern CPUs already support these instructions, very old CPUs might no longer be able to run the TigerBeetle client.
Every day is a TigerStyle day 🎉! We enforce a maximum line length of 100 columns to:
Make code (and especially diffs) more readable.
Stay within the human field of view, reducing the chance of misreading a line due to code being hidden behind a horizontal scroll bar.
Ensure a well-defined limit that allows us to configure our code editors for the best experience (font size, column wrapping, etc.).

We completely cleaned up the codebase, eliminating all special cases (aka the “naughty” list) where we previously allowed more than 100 columns. Now, we are aiming to do the same by limiting the maximum number of lines per function!

Thanks to Jora Troosh for adding a custom formatter for displaying units in error messages and for correcting typos and misspellings!

We’ve known Natalie Vais as a database engineer (former Oracle, Google Firestore) and technical investor since before TigerBeetle was a company, with Joran having met Natalie through Jamie Brandon’s HYTRADBOI DBMS conference, connecting on the design of TB’s LSM-Forest. Natalie was instrumental in our Series Seed, and has been a big believer in TigerBeetle since.

It’s therefore a tremendous privilege to welcome Natalie to the board and have Spark Capital lead TigerBeetle’s Series A, with participation from Lenny Pruss of Amplify Partners and Stefan Thomas of Coil, as well as angels including 🚀 Alexander Gallego, Founder and CEO of Redpanda; Uriel Cohen, Co-Founder and Executive Chairman of Clear Street; Sachin Kumar, Co-Founder and CTO of Clear Street; and Alex Rattray, former Stripe and Founder of Stainless.

The journey of TigerBeetle has been an incredible story for us as a team. As we’ve come to appreciate, it’s the story of rediscovering transaction processing from history and first principles.

News of the round was covered by Techcrunch, Finextra, Fintech Futures, and made the front page of Lobsters and Hacker News.

IronBeetle is our weekly live stream into TigerBeetle internals… and it’s taking off!

The last two episodes both had more than 2,000 people joining matklad live on Twitch to walk through TigerBeetle code.

Mark your calendars🗓️ and join us live every Thursday.

Thursdays / 10am PT / 1pm ET / 5pm UTC

twitch.tv/tigerbeetle

IronBeetle YouTube Playlist

Stay tuned for the live premieres of SD’24 talks coming to TigerBeetle’s YouTube channel. The post-production is a wrap, and we’re excited to share the talks with you any day now!

Till next time… “much obliged”!

The TigerBeetle Team

An early sketch by Joy for our Series A illustration.

June in TigerLand