Benchmarks
Find out how fast IdeaFIX can run
Methodology
There are so many ways to benchmark a network application but the best is to focus on a realistic scenario and make sure that numbers are comparable across similar products. The focus of a trading application is on response times or end-to-end latency.
The method we used involves writing a client that sends one order (NewSingleOrder type 35=D) and receives 2 execution reports (type 35=8), one acknowledgement and one fill report. After which, another new single order is sent until the timer expires. Server and client are running on the same host.
It’s a simple but realistic ping-pong setup that is suitable for capturing all sources of latencies. It allows a platform-neutral comparison between FIX engines.
An analysis script then process the message history to estimate the mean round trip time (RTT), its distribution, standard deviation and percentiles using a bootstrap method.
The SDK contains a benchmark folder with 2 scripts one for QuickFIX/J and one for IdeaFIX. With them, users can reproduce performance numbers on their environment before moving forward.
The study is divided into 2 steps
IdeaFIX vs QuickFIX/J
IdeaFIX
Here are the numbers for IdeaFIX on a medium bare metal instance at OVH. UDS was used with OS tuning :
population mean = 8878.71 ns
sampled mean = 8862.52 ns
std dev. = 185.71 ns
sample size = 626
Percentiles:
1.00th percentile = 8613.19 ns
5.00th percentile = 8665.63 ns
50.00th percentile = 8843.82 ns
90.00th percentile = 9026.54 ns
95.00th percentile = 9117.75 ns
99.00th percentile = 9319.91 ns
99.99th percentile = 12896.14 ns

IdeaFIX RTT histogram
Quickfix/J
Here are the numbers for QuickFIX/J on the same instance (TCP) :
population mean = 60155.44 ns
sampled mean = 60341.26 ns
std dev. = 4656.07 ns
sample size = 104
Percentiles:
1.00th percentile = 56689.39 ns
5.00th percentile = 57100.56 ns
50.00th percentile = 58578.18 ns
90.00th percentile = 67866.25 ns
95.00th percentile = 69053.48 ns
99.00th percentile = 74685.56 ns
99.99th percentile = 116496.26 ns

QuickFIX/J RTT histogram
Conclusions
Two main observations can be made :
- IdeaFIX’s mean RTT is ~ 7x smaller than QuickFIX/J’s !
- IdeaFIX’s response times are predictable and centered around the mean.
- IdeaFIX’s worst-case latencies (99.99th percentile) are very low at 12µs
A lot of effort has been put in the threading model, the lock-free architecture and low allocation rate, precisely to insure predictable response times. On the other hand, QuickFIX/J uses coarse-grained locks, sleeps, high allocation rate with higher GC usage, etc. These are the main causes of RTT jitter.
On a last note, this also demonstrates that, with current JVMs, GC activity is a moderate source of latencies (in the ~µs range) compared to I/O (mostly network), thread sleeps and context switching. It’s a widespread misunderstanding in financial technology.
The IdeaFIX run above is, in fact, running the GC at a low frequency but the OS scheduler with the extra core count manages to mitigate its effect. In effect, the RTT variance stays very low.
Saying that, IdeaFIX does include a GC-free mode. That means, however, that the client code also has to be garbage-free wich brings significant design challenges.
IdeaFIX on high-end specs
Configuration
For this in-depth benchmarking, I’ve chosen to go for a dedicated bare metal AX102 server @ Hetzner running debian 12
AMD Ryzen 9 7950X3D 16-Core Processor
128GiB System Memory
1920GB NVMe disk SAMSUNG MZQL21T9HCJR-00A07
I’ll follow the instructions in the install script in the SDK debian_setup_script
Out of the box, the debian linux kernel is not fully optimised for this recent hardware. With a back-ported kernel, it’s easy to get significant boost but I chose to go for a fully re-compiled kernel and I used the linux tkg repo
git clone https://github.com/Frogging-Family/linux-tkg.git
cd linux-tdg
./install.sh install
Default parameters are very good. The CPU generation is zen4 and I went for a fully tickless kernel.
Now time to add a few boot-time parameters by editing /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="consoleblank=0 quiet threadirqs transparent_hugepage=never nosoftlockup audit=0 mitigations=off processor.max_cstate=1 mce=ignore_ce preempt=none"
It’s relatively safe to switch off CPU mitigations on recent hardware since it will include vulnerability fix directly on the chip, preempt=none
switch the preemption model to server.
I’ve added more runtime linux parameters to the sysctl.conf
vm.nr_hugepages = 16384
vm.dirty_background_ratio = 3
vm.dirty_ratio = 6
vm.swappiness = 10
vm.vfs_cache_pressure = 50
vm.stat_interval = 120
vm.max_map_count = 262144
Huge pages offer a ~5% boost to performances, so it’s worth a try. The rest make sure the kernel postpones hard-drive flushes as much as possible to avoid freezes during high I/O
This analysis is divided into 4 independent runs using the parameters of the benchmark script
- Low-GC IdeaFIX using Unix Domain Sockets (default)
- No-GC IdeaFIX using UDS
- Low-GC IdeaFIX using TCP
- Low-GC IdeaFIX using UDS and SSL (encryption)
Low-GC IdeaFIX using UDS
This is the default and offer theorically the best average latencies
population mean = 5595.74 ns
sampled mean = 5594.67 ns
std dev. = 98.90 ns
sample size = 993
Percentiles:
1.00th percentile = 5485.81 ns
5.00th percentile = 5507.53 ns
50.00th percentile = 5593.06 ns
90.00th percentile = 5645.66 ns
95.00th percentile = 5657.65 ns
99.00th percentile = 5714.02 ns
99.99th percentile = 9132.94 ns

IdeaFIX Ryzen UDS RTT histogram
This option offers the best performances with an outstanding 5.5µs round trip time ! the 99.99th percentile is also excellent at 9µs
No-GC IdeaFIX using UDS
Can we improve on the 99.99th percentile ? For a start, let’s run the no-GC variant of the benchmark
population mean = 5520.84 ns
sampled mean = 5512.61 ns
std dev. = 60.46 ns
sample size = 1006
Percentiles:
1.00th percentile = 5429.23 ns
5.00th percentile = 5449.58 ns
50.00th percentile = 5506.13 ns
90.00th percentile = 5555.12 ns
95.00th percentile = 5571.92 ns
99.00th percentile = 5673.63 ns
99.99th percentile = 7661.13 ns

IdeaFIX Ryzen no-GC UDS RTT histogram
The 99.99th percentile is down to 7,6µs, the average latencies stay the same. The benefits are modest and should be measured against the increased complexity of a no-gc code.
Low-GC IdeaFIX using TCP
Now let’s estimate the overhead of TCP in a real-word scenario
population mean = 6506.12 ns
sampled mean = 6508.47 ns
std dev. = 128.65 ns
sample size = 854
Percentiles:
1.00th percentile = 6408.59 ns
5.00th percentile = 6423.55 ns
50.00th percentile = 6516.38 ns
90.00th percentile = 6552.45 ns
95.00th percentile = 6562.94 ns
99.00th percentile = 6591.95 ns
99.99th percentile = 13743.49 ns

IdeaFIX Ryzen TCP RTT histogram
Compared to the first run, TCP incurs a cost of less than 1µs ! On high-end specs, IdeaFIX matches the performance of the best FIX engines.
Low-GC IdeaFIX using UDS and SSL
And finally, let’s have a look to SSL (end-to-end encryption). Does SSL really slow down IdeaFIX ?
population mean = 7927.58 ns
sampled mean = 7920.51 ns
std dev. = 140.72 ns
sample size = 701
Percentiles:
1.00th percentile = 7726.79 ns
5.00th percentile = 7803.39 ns
50.00th percentile = 7917.46 ns
90.00th percentile = 8000.27 ns
95.00th percentile = 8026.28 ns
99.00th percentile = 8076.17 ns
99.99th percentile = 12203.67 ns

IdeaFIX Ryzen UDS SSL RTT histogram
Results are excellent in this benchmark ! Compared to the first run, SSL overhead is estimated to 2.5µs.
Conclusions
IdeaFIX results are excellent on high-end specs with a round trip time that lands it in the top spots compared to publicly-available benchmarks. TCP overhead is mininum at less than 1µs and SSL only adds 2.5µs to the RTT.
What would be interesting to do next ?
- Investigate the effect of kernel-bypass technologies such as OpenOnLoad but this requires dedicated network cards
- Develop support of binary message format such as FAST/FIX, SBE, OUTCH, etc.