Fixisoft - IdeaFIX - High-Frequency Trading made simple

Benchmarks

Find out how fast IdeaFIX can run

Methodology

There are so many ways to benchmark a network application but the best is to focus on a realistic scenario and make sure that numbers are comparable across similar products. The focus of a trading application is on response times or end-to-end latency.

The method we used involves writing a client that sends one order (NewSingleOrder type 35=D) and receives 2 execution reports (type 35=8), one acknowledgement and one fill report. After which, another new single order is sent until the timer expires. Server and client are running on the same host.

It’s a simple but realistic ping-pong setup that is suitable for capturing all sources of latencies. It allows a platform-neutral comparison between FIX engines.

An analysis script then process the message history to estimate the mean round trip time (RTT), its distribution, standard deviation and percentiles using a bootstrap method.

The SDK contains a benchmark folder with 2 scripts one for QuickFIX/J and one for IdeaFIX. With them, users can reproduce performance numbers on their environment before moving forward.

The study is divided into 2 steps

  1. IdeaFIX vs QuickFIX/J on a medium-powered spec
  2. In-depth benchmarking of IdeaFIX on a high-end spec

IdeaFIX vs QuickFIX/J

IdeaFIX

Here are the numbers for IdeaFIX on a medium bare metal instance at OVH. UDS was used with OS tuning :

population mean = 8878.71 ns
sampled mean = 8862.52 ns
std dev. = 185.71 ns
sample size = 626
Percentiles:
        1.00th percentile = 8613.19 ns
        5.00th percentile = 8665.63 ns
        50.00th percentile = 8843.82 ns
        90.00th percentile = 9026.54 ns
        95.00th percentile = 9117.75 ns
        99.00th percentile = 9319.91 ns
        99.99th percentile = 12896.14 ns
IdeaFIX RTT histogram

IdeaFIX RTT histogram

Quickfix/J

Here are the numbers for QuickFIX/J on the same instance (TCP) :

population mean = 60155.44 ns
sampled mean = 60341.26 ns
std dev. = 4656.07 ns
sample size = 104
Percentiles:
	1.00th percentile = 56689.39 ns
	5.00th percentile = 57100.56 ns
	50.00th percentile = 58578.18 ns
	90.00th percentile = 67866.25 ns
	95.00th percentile = 69053.48 ns
	99.00th percentile = 74685.56 ns
	99.99th percentile = 116496.26 ns
QuickFIX/J RTT histogram

QuickFIX/J RTT histogram

Conclusions

Two main observations can be made :

A lot of effort has been put in the threading model, the lock-free architecture and low allocation rate, precisely to insure predictable response times. On the other hand, QuickFIX/J uses coarse-grained locks, sleeps, high allocation rate with higher GC usage, etc. These are the main causes of RTT jitter.

On a last note, this also demonstrates that, with current JVMs, GC activity is a moderate source of latencies (in the ~µs range) compared to I/O (mostly network), thread sleeps and context switching. It’s a widespread misunderstanding in financial technology.

The IdeaFIX run above is, in fact, running the GC at a low frequency but the OS scheduler with the extra core count manages to mitigate its effect. In effect, the RTT variance stays very low.

Saying that, IdeaFIX does include a GC-free mode. That means, however, that the client code also has to be garbage-free wich brings significant design challenges.

IdeaFIX on high-end specs

Configuration

For this in-depth benchmarking, I’ve chosen to go for a dedicated bare metal AX102 server @ Hetzner running debian 12

AMD Ryzen 9 7950X3D 16-Core Processor
128GiB System Memory
1920GB NVMe disk SAMSUNG MZQL21T9HCJR-00A07

I’ll follow the instructions in the install script in the SDK debian_setup_script

Out of the box, the debian linux kernel is not fully optimised for this recent hardware. With a back-ported kernel, it’s easy to get significant boost but I chose to go for a fully re-compiled kernel and I used the linux tkg repo

git clone https://github.com/Frogging-Family/linux-tkg.git
cd linux-tdg
./install.sh install

Default parameters are very good. The CPU generation is zen4 and I went for a fully tickless kernel. Now time to add a few boot-time parameters by editing /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="consoleblank=0 quiet threadirqs transparent_hugepage=never nosoftlockup audit=0 mitigations=off processor.max_cstate=1 mce=ignore_ce preempt=none"

It’s relatively safe to switch off CPU mitigations on recent hardware since it will include vulnerability fix directly on the chip, preempt=none switch the preemption model to server. I’ve added more runtime linux parameters to the sysctl.conf

vm.nr_hugepages = 16384
vm.dirty_background_ratio = 3
vm.dirty_ratio = 6
vm.swappiness = 10
vm.vfs_cache_pressure = 50
vm.stat_interval = 120
vm.max_map_count = 262144

Huge pages offer a ~5% boost to performances, so it’s worth a try. The rest make sure the kernel postpones hard-drive flushes as much as possible to avoid freezes during high I/O

This analysis is divided into 4 independent runs using the parameters of the benchmark script

  1. Low-GC IdeaFIX using Unix Domain Sockets (default)
  2. No-GC IdeaFIX using UDS
  3. Low-GC IdeaFIX using TCP
  4. Low-GC IdeaFIX using UDS and SSL (encryption)

Low-GC IdeaFIX using UDS

This is the default and offer theorically the best average latencies

population mean = 5595.74 ns
sampled mean = 5594.67 ns
std dev. = 98.90 ns
sample size = 993
Percentiles:
	1.00th percentile = 5485.81 ns
	5.00th percentile = 5507.53 ns
	50.00th percentile = 5593.06 ns
	90.00th percentile = 5645.66 ns
	95.00th percentile = 5657.65 ns
	99.00th percentile = 5714.02 ns
	99.99th percentile = 9132.94 ns
IdeaFIX Ryzen UDS RTT histogram

IdeaFIX Ryzen UDS RTT histogram

This option offers the best performances with an outstanding 5.5µs round trip time ! the 99.99th percentile is also excellent at 9µs

No-GC IdeaFIX using UDS

Can we improve on the 99.99th percentile ? For a start, let’s run the no-GC variant of the benchmark

population mean = 5520.84 ns
sampled mean = 5512.61 ns
std dev. = 60.46 ns
sample size = 1006
Percentiles:
	1.00th percentile = 5429.23 ns
	5.00th percentile = 5449.58 ns
	50.00th percentile = 5506.13 ns
	90.00th percentile = 5555.12 ns
	95.00th percentile = 5571.92 ns
	99.00th percentile = 5673.63 ns
	99.99th percentile = 7661.13 ns
IdeaFIX Ryzen no-GC UDS RTT histogram

IdeaFIX Ryzen no-GC UDS RTT histogram

The 99.99th percentile is down to 7,6µs, the average latencies stay the same. The benefits are modest and should be measured against the increased complexity of a no-gc code.

Low-GC IdeaFIX using TCP

Now let’s estimate the overhead of TCP in a real-word scenario

population mean = 6506.12 ns
sampled mean = 6508.47 ns
std dev. = 128.65 ns
sample size = 854
Percentiles:
	1.00th percentile = 6408.59 ns
	5.00th percentile = 6423.55 ns
	50.00th percentile = 6516.38 ns
	90.00th percentile = 6552.45 ns
	95.00th percentile = 6562.94 ns
	99.00th percentile = 6591.95 ns
	99.99th percentile = 13743.49 ns
IdeaFIX Ryzen TCP RTT histogram

IdeaFIX Ryzen TCP RTT histogram

Compared to the first run, TCP incurs a cost of less than 1µs ! On high-end specs, IdeaFIX matches the performance of the best FIX engines.

Low-GC IdeaFIX using UDS and SSL

And finally, let’s have a look to SSL (end-to-end encryption). Does SSL really slow down IdeaFIX ?

population mean = 7927.58 ns
sampled mean = 7920.51 ns
std dev. = 140.72 ns
sample size = 701
Percentiles:
	1.00th percentile = 7726.79 ns
	5.00th percentile = 7803.39 ns
	50.00th percentile = 7917.46 ns
	90.00th percentile = 8000.27 ns
	95.00th percentile = 8026.28 ns
	99.00th percentile = 8076.17 ns
	99.99th percentile = 12203.67 ns
IdeaFIX Ryzen UDS SSL RTT histogram

IdeaFIX Ryzen UDS SSL RTT histogram

Results are excellent in this benchmark ! Compared to the first run, SSL overhead is estimated to 2.5µs.

Conclusions

IdeaFIX results are excellent on high-end specs with a round trip time that lands it in the top spots compared to publicly-available benchmarks. TCP overhead is mininum at less than 1µs and SSL only adds 2.5µs to the RTT.

What would be interesting to do next ?

  1. Investigate the effect of kernel-bypass technologies such as OpenOnLoad but this requires dedicated network cards
  2. Develop support of binary message format such as FAST/FIX, SBE, OUTCH, etc.
Download View on GitHub

Zip-archive includes the development kit, documentation, benchmark scripts and examples

fix trading