Writing High-Performance Clients for Tigerbeetle
In this post, I’ll explore TigerBeetle's strategy for delivering high-performance clients by seamlessly integrating its unique threading and memory model into each target ecosystem, such as Java, Dotnet, Go, and Node.js.
The TigerBeetle protocol format is simple: a header followed by a payload consisting of one or many fixed-length structs. It's so straightforward that it's tempting for most developers interested in building a TigerBeetle client to start writing the wire protocol directly in their favorite programming language. In fact, I fell into this temptation myself, writing a TigerBeetle client in pure C# for learning purposes (and for fun, of course).
Database clients written entirely in high-level languages can offer several benefits by not having foreign and platform-specific dependencies. Tightly integrated with each ecosystem, these clients are simpler to build and install and are sometimes even preferred for security reasons. Examples of pure client implementations include JDBC Driver for MS SQL Server, .NET data provider for PostgreSQL, and Go PostgreSQL driver.
In TigerBeetle, clients and replicas communicate with each other using the rock-solid Viewstamped Replication (VSR) protocol and a combination of features such as io_uring and static memory allocation. It would require too much effort to rewrite and maintain all these components for each targeted programming language, risking introducing bugs and divergent behavior each time a new implementation is added. Instead, it makes sense for TigerBeetle to rely on a single well-tested client written in Zig (a.k.a. tb_client) as the foundation for all other client implementations through an FFI (Foreign Function Interface) API.
TigerBeetle's tb_client
does
zero-deserialization, which allows the application to provide its
own memory in the most efficient way. However, this requires the target
programming language to have a certain level of control over memory
layout. For languages such as Go and C#, which have this capability,
using the tb_client API requires
no special treatment, since the memory representation of the
application data is already in the expected binary format.
On the other hand, languages that mask memory layout from the
programmer, such as Java and JavaScript, may need additional steps to
convert data between the application and the tb_client
API,
sometimes imposing costs
for serialization and deserialization.
Here is a simplified representation of the memory layout of a TigerBeetle request for creating a batch of accounts:
Request:
+-------------+-------------+-------------+-------------+
| ACCOUNT 1 | ACCOUNT 2 | ACCOUNT 3 | ACCOUNT N |
+-------------+-------------+-------------+-------------+
In Java, for example, an Object
is just a pointer and
the underlying data stored in elements of an array (e.g.
Account[]
) will not be placed together in a contiguous
memory area, requiring each element to be copied between the application
and the client.
The same request, represented by the Java memory model as an array:
Account[] batch = new Account[N];
+----------------+----------------+----------------+----------------+
| Object 1 | Object 2 | Object 3 | Object N |
+----\-----------+-----\----------+------\---------+--------\-------+
\ \ \ \
\ ref \ ref \ ref \ ref
+--\--------+ +--\--------+ +--\--------+ +---\-------+
| Account 1 | ... | Account 2 | ... | Account 3 | ... | Account N |
+---------/-+ +--/--------+ +--/--------+ +---/-------+
/ / / /
Request: / copy / copy / copy / copy
+-----------/----+-----/----------+------/---------+--------/-------+
| ACCOUNT 1 | ACCOUNT 2 | ACCOUNT 3 | ACCOUNT N |
+----------------+----------------+----------------+----------------+
Instead of using arrays and multiple object instances, the
TigerBeetle Java client utilizes a single Batch
object
backed by Java's ByteBuffer
class to represent the application data. By using this approach, the Java
Native Interface (JNI) module can directly access raw memory in the
layout expected by the tb_client
API, avoiding the cost of
serialization and minimizing the workload on the JVM's garbage
collector.
The same request, now using the AccountBatch
class:
AccountBatch batch = new AccountBatch(N);
+----------------+
| Object + JNI |
+---------|------+
| ref each element by calling get/set + index
Request: |
+---------|---+--------------+------------+-------------+
| ACCOUNT 1 | ACCOUNT 2 | ACCOUNT 3 | ACCOUNT N |
+-------------+-------------+-------------+-------------+
Another of TigerBeetle's most distinctive characteristics is the single-threaded design that allows it to handle concurrent requests efficiently by avoiding the costs of multi-threaded coordinated access to data.
Many database clients expose their API through some sort of
Connection
abstraction familiar to most software
developers. The key aspect of this abstraction is that a single database
connection is designed to be used by only one application thread at a
time and is typically accompanied by a ConnectionPool
(or a
multiplexer) that allows multiple threads to share existing
connections.
The tb_client
starts a dedicated thread to process
application requests, allowing a single client instance to be used by
multiple application threads concurrently through a function pointer
callback that notifies the caller when the reply arrives. Although this
approach is very efficient, it may appear less ergonomic to
developers depending on the programming language being used. As a
result, each specific client implementation uses the threading
primitives available in their ecosystem to hide this complexity from the
API.
In C and other programming languages that use the FFI API directly, it is the user's responsibility to properly handle callback events without blocking TigerBeetle's internal thread. This can be achieved by dispatching the execution to another thread (asynchronous completion) or waking up the caller thread that was waiting for the reply (synchronous completion).
void on_completion(uintptr_t context, tb_client_t client, tb_packet_t* packet, const uint8_t* data, uint32_t size) {
// This callback function runs on TigerBeetle's internal thread.
// The user should not block the execution here,
// e.g. processing the reply, writing to files or network, etc.
}
int main(int argc, char **argv) {
// Submits the request and returns immediately:
(client, &packets);
tb_client_submit}
In Go, each request is processed by a goroutine that is paused and resumed by the callback when the reply arrives.
// Pauses the goroutine until the reply arrives.
, err := client.CreateAccounts(accounts) res
In C#, the implementation takes advantage of the language's async/await
mechanisms to abstract those callbacks into tasks that can be naturally
invoked by async
methods. Also, a blocking version of the
same API is available for those who don't want to introduce asynchronous
functions in their applications.
// Blocking usage:
// Blocks the current thread until the reply arrives.
var errors = client.CreateAccounts(accounts);
// Async usage:
// The async state machine will yield and resume when the reply arrives.
var errors = await client.CreateTransfersAsync(accounts);
In Java, even though there is no async/await support built into the
language, there are two versions of the same API: the traditional
blocking one and an asynchronous implementation on top of the
CompletableFuture<>
class.
// Blocking usage:
// Blocks the current thread until the reply arrives.
= client.createAccounts(accounts);
CreateAccountResultBatch errors
// Async usage:
// Submits the batch and returns immediately.
<CreateTransferResultBatch> request = client.createTransfersAsync(transfers);
CompletableFuture
// Waits for completion until the reply arrives.
= request.get(); CreateTransferResultBatch errors
This benchmark compares the throughput and latency of the TigerBeetle
client implemented in Zig with other programming language
implementations using the tb_client
API to show how the
natural runtime overhead of FFI calls is minimized.
The code consists of submitting one million transfers to the TigerBeetle cluster. Since the focus is benchmarking only the client side, all transfers are sent with an invalid ID to ensure that they will be immediately rejected. It's enough work to stress the client without much server-side measurement noise.
Client | Transfer / s | Max latency per batch |
Zig | 1,563,167 | 7ms |
Go | 1,471,084 | 7ms |
Java | 1,273,476 | 9ms |
C# | 1,521,359 | 9ms |
All TigerBeetle clients are high-level wrappers for the
tb_client
implemented in Zig, which ensures that they offer
the same performance, consistency, maintainability, and quality without
sacrificing the developer experience.
Using platform-specific libraries and FFI calls comes at the cost of
requiring specific steps to integrate and build the software for each
target platform, such as Go's CGO,
.Net
P/Invoke, and custom modules for Java
JNI and Node's
N-API. Nevertheless, Zig's excellent cross-compilation capabilities
can significantly mitigate this trade-off, making it easy to build
tb_client
for all major operating systems and processor
architectures.
For me personally, writing open source TigerBeetle clients has opened so many doors, and I would encourage you to take a look at the code and consider which language you'll port next! Will it be Python, Ruby, Elixir, or... Rust?!