February in TigerLand
Dear friends,
We hope you had a first-rate February! Last month, we reduced view change latency by an order of magnitude with a new failure detector, doubled down on Vörtex, and boosted end-to-end performance by ~4%. Meanwhile, TigerStyle got a new visual home, and we made progress towards solving one of the hardest problems in Computer Science…
Let’s go!
Simplicity is a prerequisite for reliability
Edsger W. Dijkstra
At TigerBeetle, we believe in doing things right the first time, the best we know how. If we encounter a problem again, further down the road, then we redo it the best we know how.
- Failure detection is a fundamental problem in consensus-based systems. We had relied on a fixed heartbeat timeout to detect primary failure, after which backups would initiate a view change. However, this put a suboptimal lower bound of 5 seconds on detection latency. So, we redesigned our failure detector to rely on the flow of prepares from the primary, arriving at an elegant solution devoid of any fixed timeouts wherein the backups maintain a sliding window of the rate of incoming prepares, and initiate a view change if the flow stops. As a result, view change latency was reduced from 5s to 500ms!
- We revisited the LSM iteration and key comparison logic to raise its abstraction level, which in turn simplified our ZigZag merge and scan logic!
In the spirit of measuring one level deeper, we dug into hardware counters to profile transaction execution, spotting an opportunity to improve end-to-end performance ~4% and add new metrics to TigerBeetle.
- After noticing high volumes of Translation Lookaside Buffer (TLB) misses during TigerBeetle transaction execution, we enabled Transparent Huge Pages on Linux which reduced TLB overhead and significantly boosted performance.
- In our IO subsystem, we now capture both logical bytes produced and physical bytes written to disk during LSM compaction, making it possible to estimate the write-amplification per tree in our ~30 tree LSM forest! We added tracing to our event loop to measure the disparity between a tick (our logical unit of time) and the actual (physical) execution time of that tick.
- A metric for MessageBus connections is now emitted, letting users track clients as they connect/disconnect from their TigerBeetle cluster. We also realized that we were recording metrics more frequently than we were emitting them, which was fixed in two separate instances.
- The computation for the number of index blocks was fixed, reducing excess reservation during compaction.
We doubled down on our non-deterministic testing harness, Vörtex, which tests TigerBeetle from the outside-in. A cluster is spun up using real binaries and exercised via our language clients, and replicas communicate over the network and interact with real storage while asserting safety and availability in the presence of various faults.
- The final liveness bug, that we know of, found by Vörtex, has been squashed: an asymmetric partition between the primary and backups (even though we weren’t explicitly injecting asymmetric partitions!).
- We added a VSR unit
test that asserts that a cluster becomes correctly
unavailable during an asymmetric partition where the primary cannot
receive any
requestmessages.
Finally, we improved our language clients:
- With only lightweight callbacks and modest computational needs, the stack size of the background IO thread was reduced to 512 KiB.
- The semantics of the callback invoked on request
completion or cancellation have been clarified:
packet->userdataidentifies a specific submission, andresultmust be non-null when the packet’s status isTB_PACKET_OK. - An eviction test case for the Rust client was added, where a new session evicts an existing one when the cluster’s concurrent client limit (64 by default) is reached.
Last month on IronBeetle, we revisited a fundamental problem in distributed systems – failure detection. We discussed our learnings from The ϕ Accrual Failure Detector (modeling failure as a probability as opposed to a binary) and how we designed a new failure detection algorithm from first principles based on the quality of service of the primary. We Zig-Zag Merge’d, attempted several live refactors, and learned an important meta lesson about honing in on the complicated parts of the code that make us uncomfortable.
Over 100 episodes of IronBeetle have now taken to the skies. Join us live every Thursday at 5pm UTC on Twitch, YouTube and X!
TigerStyle.dev, Feb 4
TigerStyle, the software engineering methodology developed by
TigerBeetle to produce safer, faster software in less time, has a new visual home. If you want to be
remembered, be remarkable!
The Founder-led Sales Blueprint, Feb 11
In the same way that TigerBeetle’s engineering was shaped by TigerStyle,
we’re rethinking TigerSales from first principles. And here, for
early-stage founders, it can be tempting to overcompensate in the sales
pitch. TigerBeetle’s Chief Customer Officer, Peter Ahn shares how to
build trust, avoid common traps, and lean into your biggest
advantage: being human.
Index, Count, Offset, Size, Feb 16
Wherein we make progress towards solving one of the most vexing problems
of Computer Science — naming
things.
Building a Tigerbeetle Client in Unison, Feb
26
Kaushik Chakraborty went yak-shaving to create a shim-library,
tb-unison-shim, that wraps the original TB C-library to expose a
callback-free C ABI for the Unison FFI to work.
Monster Scale Summit, March 11-12 (online)
Monster Scale is here! There’s still time to register
and join us online tomorrow; Joran will open for Antirez (whose writings
fanned the flames of our understanding of “code as art form”) to speak
to the seven stages of DBMS survivability and… demo processing
1,000,000,000,000 transactions with TigerBeetle thanks to our new
“diagonal scaling” to object storage (yes, that’s a trillion
transactions). If you’re looking for petabyte-scale for your business,
and want to connect your TigerBeetle to object storage in the same way,
speak to Peter!
QCon, March 16–18
Chaitanya Bhandari will host the Modern
Performance Optimization track at QCon London next week. Speakers from
industry and academia, will share performance-engineering insights from
the application all the way down to the kernel: Thea Klaeboe Aarrestad
from CERN’s Large Hadron Collider, Prof. Peter Boncz on vector search
for columnar storage, Orson Peters from Polars, Prof. Holger Pirk on
systems performance tricks, and Prof. Laurence Tratt, the Shopify /
Royal Academy of Engineering Research Chair in Language Engineering at
King’s College London.
Future of Payments NL, March 26
Joris Portegies Zwart of Ximedes will lead a breakout session at this
year’s Future of Payments
conference in Amsterdam: Process your Transactions 1000× Faster: Why
Legacy Databases are Hitting Hard Limits, in which he will dive
into TigerBeetle internals and where an OLTP database, purpose-built for
payments and accounting, fits within the existing DBMS landscape.
‘Till next time… there’s no stopping (you)!
The TigerBeetle Team





