In this post, I’ll explore TigerBeetle’s strategy for delivering high-performance clients by seamlessly integrating its unique threading and memory model into each target ecosystem, such as Java, Dotnet, Go, and Node.js.
One code base to rule them all
The TigerBeetle protocol format is simple: a header followed by a payload consisting of one or many fixed-length structs. It’s so straightforward that it’s tempting for most developers interested in building a TigerBeetle client to start writing the wire protocol directly in their favorite programming language. In fact, I fell into this temptation myself, writing a TigerBeetle client in pure C# for learning purposes (and for fun, of course).
Database clients written entirely in high-level languages can offer several benefits by not having foreign and platform-specific dependencies. Tightly integrated with each ecosystem, these clients are simpler to build and install and are sometimes even preferred for security reasons. Examples of pure client implementations include JDBC Driver for MS SQL Server, .NET data provider for PostgreSQL, and Go PostgreSQL driver.
In TigerBeetle, clients and replicas communicate with each other using the rock-solid Viewstamped Replication (VSR) protocol and a combination of features such as io_uring and static memory allocation. It would require too much effort to rewrite and maintain all these components for each targeted programming language, risking introducing bugs and divergent behavior each time a new implementation is added. Instead, it makes sense for TigerBeetle to rely on a single well-tested client written in Zig (a.k.a. tb_client) as the foundation for all other client implementations through an FFI (Foreign Function Interface) API.
The memory model
TigerBeetle’s tb_client
does
zero-deserialization,
which allows the application to provide its own memory in the most
efficient way. However, this requires the target programming language
to have a certain level of control over memory layout. For languages
such as Go and C#, which have this capability, using the tb_client API
requires no special
treatment,
since the memory representation of the application data is already in
the expected binary format.
On the other hand, languages that mask memory layout from the
programmer, such as Java and JavaScript, may need additional steps
to convert data between the application and the tb_client
API,
sometimes imposing
costs
for serialization and deserialization.
Here is a simplified representation of the memory layout of a TigerBeetle request for creating a batch of accounts:
Request:
+-------------+-------------+-------------+-------------+
| ACCOUNT 1 | ACCOUNT 2 | ACCOUNT 3 | ACCOUNT N |
+-------------+-------------+-------------+-------------+
In Java, for example, an Object
is just a pointer and the underlying
data stored in elements of an array (e.g. Account[]
) will not be
placed together in a contiguous memory area, requiring each element to
be copied between the application and the client.
The same request, represented by the Java memory model as an array:
Account[] batch = new Account[N];
+----------------+----------------+----------------+----------------+
| Object 1 | Object 2 | Object 3 | Object N |
+----\-----------+-----\----------+------\---------+--------\-------+
\ \ \ \
\ ref \ ref \ ref \ ref
+--\--------+ +--\--------+ +--\--------+ +---\-------+
| Account 1 | ... | Account 2 | ... | Account 3 | ... | Account N |
+---------/-+ +--/--------+ +--/--------+ +---/-------+
/ / / /
Request: / copy / copy / copy / copy
+-----------/----+-----/----------+------/---------+--------/-------+
| ACCOUNT 1 | ACCOUNT 2 | ACCOUNT 3 | ACCOUNT N |
+----------------+----------------+----------------+----------------+
Instead of using arrays and multiple object instances, the TigerBeetle
Java client utilizes a single Batch
object backed by Java’s
ByteBuffer
class to represent the application data. By using this approach, the
Java Native Interface (JNI)
module
can directly access raw memory in the layout expected by the
tb_client
API, avoiding the cost of serialization and minimizing the
workload on the JVM’s garbage collector.
The same request, now using the AccountBatch
class:
AccountBatch batch = new AccountBatch(N);
+----------------+
| Object + JNI |
+---------|------+
| ref each element by calling get/set + index
Request: |
+---------|---+--------------+------------+-------------+
| ACCOUNT 1 | ACCOUNT 2 | ACCOUNT 3 | ACCOUNT N |
+-------------+-------------+-------------+-------------+
The threading model
Another of TigerBeetle’s most distinctive characteristics is the single-threaded design that allows it to handle concurrent requests efficiently by avoiding the costs of multi-threaded coordinated access to data.
Many database clients expose their API through some sort of
Connection
abstraction familiar to most software developers. The key
aspect of this abstraction is that a single database connection is
designed to be used by only one application thread at a time and is
typically accompanied by a ConnectionPool
(or a multiplexer) that
allows multiple threads to share existing connections.
The tb_client
starts a dedicated thread to process application
requests, allowing a single client instance to be used by multiple
application threads concurrently through a function pointer callback
that notifies the caller when the reply arrives. Although this
approach is very efficient, it may appear less ergonomic to
developers depending on the programming language being used. As a
result, each specific client implementation uses the threading
primitives available in their ecosystem to hide this complexity from
the API.
In C and other programming languages that use the FFI API directly, it is the user’s responsibility to properly handle callback events without blocking TigerBeetle’s internal thread. This can be achieved by dispatching the execution to another thread (asynchronous completion) or waking up the caller thread that was waiting for the reply (synchronous completion).
void on_completion(uintptr_t context, tb_client_t client, tb_packet_t* packet, const uint8_t* data, uint32_t size) {
// This callback function runs on TigerBeetle's internal thread.
// The user should not block the execution here,
// e.g. processing the reply, writting to files or network, etc.
}
int main(int argc, char **argv) {
// Submits the request and returns immediately:
tb_client_submit(client, &packets);
}
In Go, each request is processed by a goroutine that is paused and resumed by the callback when the reply arrives.
// Pauses the goroutine until the reply arrives.
res, err := client.CreateAccounts(accounts)
In C#, the implementation takes advantage of the language’s
async/await
mechanisms to abstract those callbacks into tasks that can be
naturally invoked by async
methods. Also, a blocking version of the
same API is available for those who don’t want to introduce
asynchronous functions in their applications.
// Blocking usage:
// Blocks the current thread until the reply arrives.
var errors = client.CreateAccounts(accounts);
// Async usage:
// The async state machine will yield and resume when the reply arrives.
var errors = await client.CreateTransfersAsync(accounts);
In Java, even though there is no async/await support built into the
language, there are two versions of the same API: the traditional
blocking one and an asynchronous implementation on top of the
CompletableFuture<>
class.
// Blocking usage:
// Blocks the current thread until the reply arrives.
CreateAccountResultBatch errors = client.createAccounts(accounts);
// Async usage:
// Submits the batch and returns immediately.
CompletableFuture<CreateTransferResultBatch> request = client.createTransfersAsync(transfers);
// Waits for completion until the reply arrives.
CreateTransferResultBatch errors = request.get();
Benchmarks
This benchmark compares the throughput and latency of the TigerBeetle
client implemented in Zig with other programming language
implementations using the tb_client
API to show how the natural
runtime overhead of FFI calls is minimized.
The code consists of submitting one million transfers to the TigerBeetle cluster. Since the focus is benchmarking only the client side, all transfers are sent with an invalid ID to ensure that they will be immediately rejected. It’s enough work to stress the client without much server-side measurement noise.
Client | Transfer / s | Max latency per batch |
Zig | 1,563,167 | 7ms |
Go | 1,471,084 | 7ms |
Java | 1,273,476 | 9ms |
C# | 1,521,359 | 9ms |
Conclusion
All TigerBeetle clients are high-level wrappers for the tb_client
implemented in Zig, which ensures that they offer the same
performance, consistency, maintainability, and quality without
sacrificing the developer experience.
Using platform-specific libraries and FFI calls comes at the cost of
requiring specific steps to integrate and build the software for each
target platform, such as Go’s CGO, .Net
P/Invoke,
and custom modules for Java
JNI
and Node’s N-API. Nevertheless,
Zig’s excellent cross-compilation capabilities can significantly
mitigate this trade-off, making it easy to build tb_client
for all
major operating systems and processor architectures.
For me personally, writing open source TigerBeetle clients has opened so many doors, and I would encourage you to take a look at the code and consider which language you’ll port next! Will it be Python, Ruby, Elixir, or… Rust?!
In our latest blog post, @rbatiati shares his experience building language clients for TigerBeetle and attempts to nerd snipe you into writing an Elixir or Rust client. 😉https://t.co/GpOr8Idohn pic.twitter.com/wbZ4jS8aU6
— TigerBeetle (@TigerBeetleDB) February 22, 2023