Performance

Measured, not promised.

Gateway overhead measured on identical hardware against one keep-alive upstream, every run verified 100% HTTP 200. The gate costs one millisecond at the median; against a live model, nothing measurable.

4,046 RPSPeak throughput
+1 msMedian gate overhead
13.6×vs. LiteLLM throughput

The whole data plane is written in Rust (axum, tokio, no garbage collector), so at 256 concurrent connections the compliance gate adds one millisecond at the median, and against a live model it disappears into provider noise.

One upstream, identical hardware, the same load.

Every gateway fronts the same in-network keep-alive mock upstream (a canned OpenAI completion behind one fixed 60 ms delay), hit by the same tool, oha, at the same concurrency for the same window. That isolates gateway overhead from provider variance. A second, rate-capped lane runs against a live provider to check the story holds outside the lab.

  • One shared mock upstream

    All four gateways route to the same in-network Node mock upstream, with keep-alive on, the same fixed 60 ms delay and the same canned body, so no one gets a faster backend. The latency differences are the gateways' own overhead.

  • Isolated, identical hardware

    Each gateway runs in a CPU/memory-pinned container, hit by oha at 256 connections for the same window, after an equal discarded warmup. Same load, same limits, every time.

  • Gate-off baseline

    Allow-all policy, DLP in observe-only mode, audit content off: Sluis as a plain OpenAI-compatible proxy.

  • Gate-on: the real product

    EU residency policy, DLP masking, content-retention and a hash-chained audit ledger: every gate running on the hot path.

Built so the result can't be quietly rigged.

Every config is pinned and identical across gateways, so the comparison can't be quietly tilted.

  • All gateways pinned to identical CPU and memory cgroups.
  • The mock upstream is byte-identical and shared; one fixed delay for everyone.
  • LiteLLM and Bifrost run identical configs, with no hidden flags.
  • Equal discarded warmup and equal measurement window per gateway.
  • Every run is gated on its status codes: anything but 100% HTTP 200 is discarded, never reported.
Results

The numbers, exactly as measured.

Two Sluis columns: gate-off, an apples-to-apples proxy comparison; gate-on, the compliance overhead against our own baseline.

Measured 2026-07-04 on identical hardware (one Apple Silicon Mac, Docker) at 256 connections; a relative comparison. Every run verified 100% HTTP 200, repeat variance under 1%. Peak RSS wasn't captured this run, so that row stays TBD.
MetricSluis gate-offSluis gate-onLiteLLMBifrost
Throughput (RPS)4,046
3,678
298
3,148
p50 latency (ms)62
63
743
80
p99 latency (ms)77
130
4,584
118
Peak RSS (MiB)TBD
TBD
TBD
TBD
Success rate100%
100%
100%
100%

Latency is end-to-end through the gateway over the shared fixed-latency mock: the differences are the gateways' own overhead.

The gate-on column is compliance overhead versus gate-off, not a head-to-head.

Against a live model (rate-capped at 10 RPS so the provider never throttles), the gate adds nothing at the median: 289 ms gate-on versus 291 ms gate-off. The p99 tails in that lane belong to the provider, not the gateways.

An honest claim, or none at all.

A benchmark you can't trust is worse than no benchmark. So we bind ourselves to a few rules, written down before any number ships.

  • The head-to-head “fastest” claim is gate-off only: Sluis as a plain proxy versus LiteLLM and Bifrost, which have no compliance gate.
  • Gate-on is never dressed up as a win over competitors. It's overhead against our own baseline, stated plainly: one millisecond at the median, nine percent of throughput given up at full synthetic saturation, and a p99 of 130 ms versus 77 ms.
  • The live lane against a real EU provider is rate-capped and labelled honestly: the gate adds nothing at the median, and the p99 tails are provider variance, not gateway overhead. A sanity check, not the headline.
  • If gate-off Sluis isn't clearly faster, the headline becomes “compliance at negligible cost”. The honest framing, not a louder one.

Built for the regulator in the room, not just the load test.

Read how the gateway works, then see how the compliance gate performs.