Writing High-Performance Clients for Tigerbeetle

Feb 21, 2023

Rafael Batiati

In this post, I’ll explore TigerBeetle’s strategy for delivering high-performance clients by seamlessly integrating its unique threading and memory model into each target ecosystem, such as Java, Dotnet, Go, and Node.js.

The TigerBeetle protocol format is simple: a header followed by a payload consisting of one or many fixed-length structs. It’s so straightforward that it’s tempting for most developers interested in building a TigerBeetle client to start writing the wire protocol directly in their favorite programming language. In fact, I fell into this temptation myself, writing a TigerBeetle client in pure C# for learning purposes (and for fun, of course).

Database clients written entirely in high-level languages can offer several benefits by not having foreign and platform-specific dependencies. Tightly integrated with each ecosystem, these clients are simpler to build and install and are sometimes even preferred for security reasons. Examples of pure client implementations include JDBC Driver for MS SQL Server, .NET data provider for PostgreSQL, and Go PostgreSQL driver.

In TigerBeetle, clients and replicas communicate with each other using the rock-solid Viewstamped Replication (VSR) protocol and a combination of features such as io_uring and static memory allocation. It would require too much effort to rewrite and maintain all these components for each targeted programming language, risking introducing bugs and divergent behavior each time a new implementation is added. Instead, it makes sense for TigerBeetle to rely on a single well-tested client written in Zig (a.k.a. tb_client) as the foundation for all other client implementations through an FFI (Foreign Function Interface) API.

TigerBeetle’s tb_client does zero-deserialization, which allows the application to provide its own memory in the most efficient way. However, this requires the target programming language to have a certain level of control over memory layout. For languages such as Go and C#, which have this capability, using the tb_client API requires no special treatment, since the memory representation of the application data is already in the expected binary format.

On the other hand, languages that mask memory layout from the programmer, such as Java and JavaScript, may need additional steps to convert data between the application and the tb_client API, sometimes imposing costs for serialization and deserialization.

Here is a simplified representation of the memory layout of a TigerBeetle request for creating a batch of accounts:

Request:
+-------------+-------------+-------------+-------------+
|  ACCOUNT 1  |  ACCOUNT 2  |  ACCOUNT 3  |  ACCOUNT N  |
+-------------+-------------+-------------+-------------+

In Java, for example, an Object is just a pointer and the underlying data stored in elements of an array (e.g. Account[]) will not be placed together in a contiguous memory area, requiring each element to be copied between the application and the client.

The same request, represented by the Java memory model as an array:

Account[] batch = new Account[N];
+----------------+----------------+----------------+----------------+
|    Object 1    |    Object 2    |    Object 3    |    Object N    |
+----\-----------+-----\----------+------\---------+--------\-------+
      \                 \                 \                  \
       \ ref             \ ref             \ ref              \ ref
     +--\--------+     +--\--------+     +--\--------+     +---\-------+
     | Account 1 | ... | Account 2 | ... | Account 3 | ... | Account N |
     +---------/-+     +--/--------+     +--/--------+     +---/-------+
              /          /                 /                  /
Request:     / copy     / copy            / copy             / copy
+-----------/----+-----/----------+------/---------+--------/-------+
|   ACCOUNT 1    |   ACCOUNT 2    |   ACCOUNT 3    |   ACCOUNT N    |
+----------------+----------------+----------------+----------------+

Instead of using arrays and multiple object instances, the TigerBeetle Java client utilizes a single Batch object backed by Java’s ByteBuffer class to represent the application data. By using this approach, the Java Native Interface (JNI) module can directly access raw memory in the layout expected by the tb_client API, avoiding the cost of serialization and minimizing the workload on the JVM’s garbage collector.

The same request, now using the AccountBatch class:

AccountBatch batch = new AccountBatch(N);
+----------------+
| Object + JNI   |
+---------|------+
          | ref each element by calling get/set + index
Request:  |
+---------|---+--------------+------------+-------------+
|  ACCOUNT 1  |  ACCOUNT 2  |  ACCOUNT 3  |  ACCOUNT N  |
+-------------+-------------+-------------+-------------+

Another of TigerBeetle’s most distinctive characteristics is the single-threaded design that allows it to handle concurrent requests efficiently by avoiding the costs of multi-threaded coordinated access to data.

Many database clients expose their API through some sort of Connection abstraction familiar to most software developers. The key aspect of this abstraction is that a single database connection is designed to be used by only one application thread at a time and is typically accompanied by a ConnectionPool (or a multiplexer) that allows multiple threads to share existing connections.

The tb_client starts a dedicated thread to process application requests, allowing a single client instance to be used by multiple application threads concurrently through a function pointer callback that notifies the caller when the reply arrives. Although this approach is very efficient, it may appear less ergonomic to developers depending on the programming language being used. As a result, each specific client implementation uses the threading primitives available in their ecosystem to hide this complexity from the API.

In C and other programming languages that use the FFI API directly, it is the user’s responsibility to properly handle callback events without blocking TigerBeetle’s internal thread. This can be achieved by dispatching the execution to another thread (asynchronous completion) or waking up the caller thread that was waiting for the reply (synchronous completion).

void on_completion(uintptr_t context, tb_client_t client, tb_packet_t* packet, const uint8_t* data, uint32_t size) {
    // This callback function runs on TigerBeetle's internal thread.
    // The user should not block the execution here,
    // e.g. processing the reply, writing to files or network, etc.
}
int main(int argc, char **argv) {
    // Submits the request and returns immediately:
    tb_client_submit(client, &packets);
}

In Go, each request is processed by a goroutine that is paused and resumed by the callback when the reply arrives.

// Pauses the goroutine until the reply arrives.
res, err := client.CreateAccounts(accounts)

In C#, the implementation takes advantage of the language’s async/await mechanisms to abstract those callbacks into tasks that can be naturally invoked by async methods. Also, a blocking version of the same API is available for those who don’t want to introduce asynchronous functions in their applications.

// Blocking usage:
// Blocks the current thread until the reply arrives.
var errors = client.CreateAccounts(accounts);

// Async usage:
// The async state machine will yield and resume when the reply arrives.
var errors = await client.CreateTransfersAsync(accounts);

In Java, even though there is no async/await support built into the language, there are two versions of the same API: the traditional blocking one and an asynchronous implementation on top of the CompletableFuture<> class.

// Blocking usage:
// Blocks the current thread until the reply arrives.
CreateAccountResultBatch errors = client.createAccounts(accounts);

// Async usage:
// Submits the batch and returns immediately.
CompletableFuture<CreateTransferResultBatch> request = client.createTransfersAsync(transfers);

// Waits for completion until the reply arrives.
CreateTransferResultBatch errors = request.get();

This benchmark compares the throughput and latency of the TigerBeetle client implemented in Zig with other programming language implementations using the tb_client API to show how the natural runtime overhead of FFI calls is minimized.

The code consists of submitting one million transfers to the TigerBeetle cluster. Since the focus is benchmarking only the client side, all transfers are sent with an invalid ID to ensure that they will be immediately rejected. It’s enough work to stress the client without much server-side measurement noise.

Client	Transfer / s	Max latency per batch
Zig	1,563,167	7ms
Go	1,471,084	7ms
Java	1,273,476	9ms
C#	1,521,359	9ms

All TigerBeetle clients are high-level wrappers for the tb_client implemented in Zig, which ensures that they offer the same performance, consistency, maintainability, and quality without sacrificing the developer experience.

Using platform-specific libraries and FFI calls comes at the cost of requiring specific steps to integrate and build the software for each target platform, such as Go’s CGO, .Net P/Invoke, and custom modules for Java JNI and Node’s N-API. Nevertheless, Zig’s excellent cross-compilation capabilities can significantly mitigate this trade-off, making it easy to build tb_client for all major operating systems and processor architectures.

For me personally, writing open source TigerBeetle clients has opened so many doors, and I would encourage you to take a look at the code and consider which language you’ll port next! Will it be Python, Ruby, Elixir, or… Rust?!

One code base to rule them all

The memory model

The threading model

Benchmarks

Conclusion