API reference
Authentication, the OpenAI-compatible endpoint surface, streaming, reasoning, and the error taxonomy.
Authentication
Every request to /v1/* authenticates with a virtual key in the Authorization: Bearer header. Mint keys in the Console; the secret is shown once and only its hash is stored. A key carries its own rate limits, budget, tags, and an optional model allow-list (empty = any model).
A request whose model is not on the key's non-empty allow-list is refused with a sealed 403 permission_error before anything is dispatched; the refusal itself lands in the audit chain.
Minting a production key requires a verified email address.
curl https://api.sluis.ai/v1/models \ -H "Authorization: Bearer $SLUIS_KEY" # a model outside the key's allow-list never dispatches: # → 403 permission_error · the refusal is sealed in the audit chain
Endpoints
Sluis exposes the OpenAI-compatible surface below. Endpoints without a first-class handler are proxied verbatim to the routed provider, streaming included, so OpenAI-compatible upstreams keep full fidelity.
| Endpoint | Purpose |
|---|---|
| POST /v1/chat/completions | Chat completions, the primary surface: routing, data protection, caching, streaming. |
| POST /v1/completions | Legacy text completions. |
| POST /v1/embeddings | Embeddings; also feeds the semantic cache. |
| POST /v1/moderations | Moderation classification. |
| POST /v1/responses | The OpenAI Responses API. |
| GET /v1/models | The models your policy and credentials can actually reach, nothing hypothetical. |
| GET /v1/models/{id} | One model's metadata. |
| POST /v1/audio/* | Transcription, translation, speech · proxied to the routed provider. |
| POST /v1/images/* | Image generation and edits · proxied. |
| POST /v1/video/generations | Video generation · proxied. |
| /v1/files | File operations · proxied. |
Streaming
Set stream: true and the response arrives as server-sent events: each frame is a chat.completion.chunk delta and the stream ends with data: [DONE]. Streams are never buffered in the gateway: they tee through it, and the audit seal and metering happen even if the client hangs up early.
Ask for stream_options.include_usage and the final frame carries exact token usage, the same numbers the gateway meters and bills.
stream = client.chat.completions.create(
model="mistral/mistral-large-latest",
messages=[{"role": "user", "content": "Write a haiku"}],
stream=True,
stream_options={"include_usage": True},
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")const stream = await client.chat.completions.create({ model: "mistral/mistral-large-latest", messages: [{ role: "user", content: "Write a haiku" }], stream: true, stream_options: { include_usage: true }, }); for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content ?? ""); }
Reasoning
Pass the OpenAI reasoning_effort parameter (minimal | low | medium | high) on any thinking-capable model. Sluis translates it per provider (Gemini's thinking level, Claude's adaptive thinking effort) and omits it where a model would reject it, so one parameter works across the whole catalog.
resp = client.chat.completions.create(
model="vertex/claude-opus-4-8",
messages=[{"role": "user", "content": "Prove it step by step…"}],
reasoning_effort="high", # minimal | low | medium | high
)Error codes
Errors use the OpenAI error envelope; the error.type value mirrors the HTTP status, so your SDK's error handling keeps working unchanged.
| Code | When |
|---|---|
| 400 invalid_request_error | Malformed request body or parameters. |
| 400 invalid_request_error | Model id missing its provider prefix. Every callable id is provider/model, e.g. mistral/mistral-large-latest; the body reads: model must be provider-prefixed. |
| 401 authentication_error | Missing or unknown API key. |
| 402 insufficient_quota | Free allowance spent or budget reached. Activate a plan or raise the budget; the request never reaches a provider. |
| 403 permission_error | The key lacks permission, for example the model is not on its allow-list. |
| 422 invalid_request_error | Refused before dispatch, for example data protection in block mode matched the request. |
| 429 rate_limit_error | Rate limit reached. Enforced at the gateway; the request never hits a provider. |
| 451 permission_error | Blocked by residency policy: no allowed jurisdiction serves the request. The body includes the reason. |
| 5xx api_error | Upstream provider failure after retries; the circuit breaker steers traffic around unhealthy providers. |