October in TigerLand

Nov 7, 2024

Dear friends,

Hope your October went swimmingly! This month, we deployed our VOPR simulators on 1000 dedicated CPU cores to simulate up to 2 millennia of database runtime every day. We also doubled down on consensus safety and availability, optimized compaction sort performance, released two TigerTalks (Architecture of Trust and Just In Time LSM Compaction), and took TigerBeetle on tour to the Interledger Hackathon and Summit.

Let’s go!

“ The Only Constant in Life Is Change. ”

Heraclitus

It’s been a year since we introduced CHANGELOG.md as part of our release process! Each week, a different team member steps in as release manager, capturing the motivation and backstory behind each PR in the release. It’s a great way to share not only knowledge within the team, but also our musical taste through TigerTracks! 🎧

We’re excited about the boost we got by upgrading the VOPR – TigerBeetle’s own deterministic simulator – to run on a thousand dedicated CPU cores (previously, it ran on 100 cores)!
Hetzner sent us a special email asking if we were sure we wanted so many cores. That’s right, with VOPR-1000 , we now explore 1,000 different deterministic scenarios simultaneously, 24x7x365, to catch rare conditions as far as possible before production.
With time abstracted deterministically, and accelerated in the simulator by a factor of roughly 700x, this adds up to nearly 2 millennia of simulated database runtime per day.
We’re already feeling the increased velocity in discovering interesting simulation seeds.
TigerBeetle was designed to exchange replicated durability for optimal availability. Building on this, we optimized TigerBeetle’s VSR consensus protocol to make view changes quicker, reducing downtime visible to users when an old primary becomes unavailable.

We realized that view changes could start earlier in new-primary recovery, with the replica only having to repair the essential portion of the WAL needed to become primary rather than the entire WAL (write-ahead log)! The VOPR then extensively tested this new strategy.

We fixed several liveness issues found by the VOPR: cases where a corrupt replica (or the entire cluster) might become permanently unavailable despite having enough cluster durability to recover.

If you’re into consensus protocols, you’ll want to check out the awesome write-ups in the PR descriptions! 👀

TigerBeetle DevHub got a fresh coat of paint! 🎨
We did extensive refactoring to clarify the storage engine’s compaction scheduler logic! Compaction is the process that moves data from the top of the LSM-Tree further down, to free up space for new insertions as the tree grows. In traditional LSM-Tree designs, if compaction lags behind insertions, this can cause undesirable write stalls, forcing clients to wait for compaction to catch up. However, by knowing how many insertions the next request to the database might possibly have (thanks to having explicit limits on everything), TigerBeetle breaks compaction down into incremental units of work and schedules them to match insertions with predictable latencies, even under intense OLTP workloads.

The new compaction scheduler has decoupled CPU, I/O, and memory, allowing it to schedule more aggressively and making it easier to run multiple compactions concurrently. This refactor even boosted TPS performance by ~13%, even though that wasn’t the primary goal of this PR! 💪

If you’re curious about how TigerBeetle compacts LSM-Trees during insertions without a background worker, check out Matklad’s IronBeetle episodes Compaction Strikes Again, Part I, and Part II. 📺
We significantly reduced P100 latency by changing how we sort values in memory before LSM compaction. Just like music, we call each unit of work a beat and a group of beats a bar. Instead of sorting all values at once at the end of a bar, we now incrementally sort small chunks within each beat, leveraging the fact that sort algorithms are optimized for handling sequences of already sorted sub-arrays. Just like a musician can play faster when all notes fall within a familiar scale. 🎶
To ensure the correctness of this approach, we added new tests to Zig’s standard library sort algorithm. These tests cover the presorted subarray case and check arrays that might be larger than the sort algorithm’s on-stack cache.
We periodically run scale tests, inserting massive amounts of data (e.g., 100 billion transfers over 10 days) to evaluate how the database handles demanding OLTP workloads and continuously growing files.

This time, we improved the error message for cases where a data file becomes too large, providing clearer instructions for the operator to increase the memory allocated for the manifest log. We also simplified the code path in this scenario by making Grid.reserve() abort rather than return null when a reservation would exceed the data file size limit. Previously, we would panic by unwrapping a null result, but now the exit has clearer semantics and a helpful error message.

The client libraries received improvements and bug fixes, notably an issue in the MessageBus wherein connections weren’t being terminated during client teardown, eventually starving the process’s TCP connection limit.

Huge thanks to Phil Davies for providing a great script to reproduce the problem!

TigerBeetle clients internally batch operations for improved performance. We discovered and fixed a bug where an unclosed link chain could be batched before another linked chain, causing them to be treated as a single long chain. Additionally, we now ensure that non-batchable requests don’t share packets at all, improving batching consistency.
The REPL also received enhancements, including support for the AMOUNT_MAX sentinel and other maximum integer values represented as -0. We also added support for hexadecimal input, making it more convenient to work with GUID/UUID literals (e.g., 0xa1a2a3a4_b1b2_c1c2_d1d2_e1e2e3e4e5e6).

You can try the TigerBeetle REPL using the command line: tigerbeetle repl –cluster=0 –addresses=3000.

Each new TigerBeetle version maintains backwards compatibility with certain previous client releases, ensuring a smooth upgrade path for production environments. First, drop the new binary version to the replicas, and the TigerBeetle cluster will reach consensus and upgrade itself with minimal downtime. After that, you can upgrade the application to use the latest TigerBeetle client at your own pace.

This month, we released extra binaries outside our weekly schedule to address availability issues caused by the interaction between older TigerBeetle clients and newly introduced features after the upgrade.

One hotfix corrected an overtight assertion for transfers failed due to transient errors being retried by clients from before this feature was introduced.
Another hotfix ensured the correct size is expected for AccountFilter in get_account_transfers and get_account_balances calls from clients prior to the addition of filters by user_data_* and code.
To streamline the process and make hotfixes easier to release, we now allow the git tag to be bumped while keeping the release number unchanged.

Please refer to our release notes for more details.

Client documentation was revamped, and code examples were standardized across all languages to ensure that each snippet is self-contained (no undeclared or mutable variables used in place of constants).

Big thanks to Michiel de Jong for pointing this out!

We also fixed the documentation to add missing links for the query_accounts and query_transfers operations, and included the declarations for QueryFilter and QueryFilterFlags in the C tb_client.h header file.
After some meticulous research, our benchmark now supports Zipfian distributed random numbers, allowing us to better simulate varying conditions and laying the groundwork for approximating YCSB.

Curious? Don’t miss the excellent write-up in the PR! 🤓

Our friend Phil Davies discovered a discrepancy in the definition of AMOUNT_MAX (the sentinel value used to represent “transfer as much as possible”) in the Java client. Thank you for submitting a fix for this, Phil!
Thanks to Liam Swayne for contributing a formatting improvement for keywords to TigerStyle.

Joran was invited onto the Future Money podcast to share his journey from a childhood passion for business and coding, to creating TigerBeetle and leading an international team. The conversation delves not only into the technical design and development of TigerBeetle as a financial transactions database, but the one word that brings everything (and everyone) together: trust.

At this year’s P99 conference, matklad explained how TigerBeetle’s Just-in-Time LSM-Tree compaction algorithm:

using only static memory allocation,
with perfect pacing to solve write stalls for predictable P100s,
and guaranteeing deterministically identical data files across all replicas for faster recovery.

Watch the full talk

The 2nd TigerBeetle Hackathon took place during the Interledger Hackathon in Cape Town on October 19-20! It was awesome to interact with so many brilliant people tackling real-world problems, coding, and brainstorming through the night, unstoppable!

🥈Second Place: The Usizo team leveraged TigerBeetle to power a microlending/crowdfunding platform within a community.