November in TigerLand

    Dear friends,

    Hope you had a splendid November. This month, we landed Vöʀᴛᴇx, a new full-system integration test that covers language clients, fixed a few subtle bugs in our networking & IO implementations, kickstarted Anish Bhadri’s new podcast, and saw TigerBeetle featured in Kris Hansen’s new white paper on modern high performance financial ledger architecture.

    Let’s go!

    Whenever you find yourself on the side of the majority, it is time to pause and reflect.

    Mark Twain

    • Deterministic Simulation Testing (DST) has been invaluable for TigerBeetle in uncovering countless edge cases and rare failures by simulating the entire system with precise control over timing, concurrency, and fault injection. We’ve dedicated 1,000 CPU cores to running the VOPR, to detect bugs as quickly as possible. 🐞

    Nevertheless, TigerBeetle’s philosophy of craft, is that it’s not only machine, but human and machine, and so we added this to TigerStyle to ensure defense-in-depth of both autonomous testing and human reasoning:

    “Recently the VOPR has gotten too good, and it’s very tempting to switch to an empirical mode of coding: write some code and let the VOPR figure out whether it is correct or not. This is suboptimal - silence of the VOPR doesn’t guarantee total absence of bugs, and safety comes in layers and cross-checks”

    Reinforcing the need for defense-in-depth, we’re building an exciting new testing tool: Vöʀᴛᴇx , a full-system integration test that also covers the language clients. If we think of deterministic simulation testing as a flight simulator that recreates real-world conditions in a virtual environment, then Vortex would be more like a wind tunnel, pushing the real plane to its physical limits. 🌪️

    • Vortex runs the TigerBeetle binary on actual infrastructure (real OS, network, and storage) and stresses it using real client libraries while injecting crashes and faults. Unlike the VOPR, we can’t accelerate time in the Vortex, but we gain the ability to debug our expectations interacting with the underlying OS components.

    • We fixed a bug that only manifests with asymmetrical TCP disconnections—where the client detects the disconnection and reconnects, but where the replica hasn’t processed the event yet. This left the replica holding a connection object for the existing client ID, tripping an assertion and crashing the replica.

    • TigerBeetle uses io_uring on Linux for async I/O, enabling zero-copy memory through user-space buffers while utilizing the kernel as a thread pool to execute operations.

    Another find was a subtle use-after-free bug when closing a client or replica (and freeing the memory) while buffers are still referenced by a running I/O operation!

    We addressed this by canceling all submitted operations and waiting for their completion (which might quickly “complete” as ECANCELED). In future, if we bump our minimum supported Linux kernel version to 5.19 or later, we can simplify this logic by using the IORING_ASYNC_CANCEL_ALL and IORING_ASYNC_CANCEL_ANY flags to cancel all I/O operations with a single call. 🚀

    • TigerBeetle now supports running on XFS! While TigerBeetle is designed to run on any filesystem (or without a filesystem on raw block devices), we caught XFS presenting an unexpected behavior, returning EAGAIN during reads, even when reading from a blocking file without flags like RWF_NOWAIT—have you ever!

    The issue was discovered while running a scale test of 100 billion transfers. Since ext4 doesn’t support single files larger than 16 TiB, we had to move to XFS, and are setting our sights on 1 trillion transfers next. 😎

    • The integration tests also brought improvements to the client libraries:

    • We added a log handler to the tb_client library (the underlying C library from which all language libraries are built).

    Instead of logging to stderr, the new log handler will allow applications to capture TigerBeetle events within their preferred logging framework (such as Log4j for Java).

    • A feature we deliberately left unimplemented was handling client evictions on the client side.

    TigerBeetle enforces strict limits, including a maximum number of clients. A client eviction typically points to a latent deployment issue (e.g., creating too many clients). Early on, we chose to let such failures be loud and clear (to fail fast and crash the client process) rather than masking potentially serious issues behind automatic reconnection attempts.

    Now, with our improved logging infrastructure and testing, the time has come to introduce a proper mechanism for applications to handle client evictions. Applications can now catch evictions and decide whether to reconnect or take alternative actions, providing more flexibility while preserving visibility into underlying issues.

    • We fixed an issue in the tb_client library where uninitialized fields could be accessed when closing clients concurrently. This bug was uncovered and reproduced thanks to integration tests! 🛠️

    • With applications now able to reconnect after evictions, a resource leak in the Java client was exposed when creating/closing many clients in a loop. This issue was caused by how we (incorrectly 🤦‍♂️) managed the lifecycle of the internal tb_client thread in the Java Virtual Machine (JVM).

    The JVM requires threads to be “attached” before it can manipulate Java objects but also mandates they be properly “detached” when finished. Otherwise, orphaned thread handlers on the Java side will accumulate, causing memory consumption growth over time.

    Since detaching the native thread must be performed by the thread itself — it cannot be done by the user thread calling client.close(), the Java client now taps on the operating system’s finalizer for thread-local storage to perform the cleanup when the native thread exits: pthread_key_create on Linux/macOS and FlsAlloc on Windows.

    • Our friend Phil Davies discovered a discrepancy in the definition of AMOUNT_MAX (the sentinel value used to represent “transfer as much as possible”) in the Java client. Thank you for submitting a fix for this, Phil!

    • Thanks to Liam Swayne for contributing a formatting improvement for keywords to TigerStyle.

    Matthew Tolman wrote a thoughtful essay comparing TigerBeetle’s approach to reliability (through NASA’s The Power of Ten Rules) with that of Erlang.

    The main reason I started with Zig is because of TigerBeetle and their methodology for making reliable systems, which is based on what NASA does (or at least did). Their approach is the polar opposite of Erlang’s approach. Instead of automated error recovery through subsystem restarts, they use strict programming patterns to build analyzable systems which operate in knowable bounds and memory spaces. With these limitations, they’re able to prevent many classes of errors.

    Matt Blewitt put together a list of 7 databases he found interesting, to challenge his readers to spend a week playing with each of them, including TigerBeetle. Thank you, Matt!

    Kris Hansen, CTO of Sagard, has decades of experience in core banking, as CTO of Synctera, director of solutions for the Royal Bank of Canada and as Chief Architect at ATB Financial.

    Based on his experience, Kris wrote a white paper on patterns and practices for architecting high performance financial ledgers, to describe a modern reference architecture powered by TigerBeetle.

    We love to paddle for the backline and catch the swell before it breaks. In this spirit, we helped kickstart Anish Bhadri’s new Stack Analysis podcast!

    Joran joined as featured guest in the podcast’s debut episode, to share the origin story of TigerBeetle’s journey, with a behind the scenes look at our philosophy of people, business and engineering, and the lessons learned to make it possible.

    The Anatomy of a TigerBeetle - Ep01

    Missed any of our live streams? Catch up on all the episodes in our IronBeetle YouTube playlist!

    Episode 001: Intro and (Absence of) Message Parsing

    Highlights this month:

    • In Episode 048, Matklad and Tobias reviewed TigerBeetle’s main performance bottleneck.
    • In Episodes 049 and 050, Matklad explores everything you ever wanted to know (or not know!) about time.

    Join us live every Thursday to walk through TigerBeetle.

    Thursday / 9am PST / 12pm EST / 5pm UTC twitch.tv/tigerbeetle

    Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet Tweet

    Till next time…“hello, and welcome to our jazz club”!

    The TigerBeetle Team

    Stanley Kubrick is a favorite director. His 2001: A Space Odyssey (more than half a century ago) remains timeless, and a reference in this sketch by Joy Machs announcing the accelerated time travel of VOPR1000—TigerBeetle’s deterministic simulator now running on 1000 dedicated CPU cores, autonomously testing 2 millennia of simulated DBMS runtime, each and every day.

    RSS iconRSS
    An idling tiger beetle Speech bubble says hi