Documentation

Budgets & caching

Per-key rate limits and budgets, organisation caps, and the two response caches.

Budgets & limits

Each key carries rate limits (requests and tokens per minute) and a spend budget with a period: total, daily or monthly. An organisation-wide aggregate cap sits above every key.

Enforcement happens at the gateway, in real money: every request is priced from the live price book and debited before dispatch. Past the rate limit the call returns 429; past the budget, 402. The request never reaches a provider. If the counters are ever unreachable, the check fails closed and reconciles from the immutable audit ledger instead of guessing.

Caching

Two opt-in response caches: exact (a normalized-request match) and semantic (vector similarity with a conservative threshold). Both are strictly isolated per organisation, encrypted at rest, and require content retention. Enable them in the Console's Cache view.

Every audit row records how the call was served: none | exact | semantic.