Documentation

API reference

Authentication, the OpenAI-compatible endpoint surface, streaming, reasoning, and the error taxonomy.

Authentication

Every request to /v1/* authenticates with a virtual key in the Authorization: Bearer header. Mint keys in the Console; the secret is shown once and only its hash is stored. A key carries its own rate limits, budget, tags, and an optional model allow-list (empty = any model).

A request whose model is not on the key's non-empty allow-list is refused with a sealed 403 permission_error before anything is dispatched; the refusal itself lands in the audit chain.

Minting a production key requires a verified email address.

curl https://api.sluis.ai/v1/models \
  -H "Authorization: Bearer $SLUIS_KEY"

# a model outside the key's allow-list never dispatches:
# → 403 permission_error · the refusal is sealed in the audit chain

Endpoints

Sluis exposes the OpenAI-compatible surface below. Endpoints without a first-class handler are proxied verbatim to the routed provider, streaming included, so OpenAI-compatible upstreams keep full fidelity.

Endpoint	Purpose
POST /v1/chat/completions	Chat completions, the primary surface: routing, data protection, caching, streaming.
POST /v1/completions	Legacy text completions.
POST /v1/embeddings	Embeddings; also feeds the semantic cache.
POST /v1/moderations	Moderation classification.
POST /v1/responses	The OpenAI Responses API.
GET /v1/models	The models your policy and credentials can actually reach, nothing hypothetical.
GET /v1/models/{id}	One model's metadata.
POST /v1/audio/*	Transcription, translation, speech · proxied to the routed provider.
POST /v1/images/*	Image generation and edits · proxied.
POST /v1/video/generations	Video generation · proxied.
/v1/files	File operations · proxied.

Streaming

Set stream: true and the response arrives as server-sent events: each frame is a chat.completion.chunk delta and the stream ends with data: [DONE]. Streams are never buffered in the gateway: they tee through it, and the audit seal and metering happen even if the client hangs up early.

Ask for stream_options.include_usage and the final frame carries exact token usage, the same numbers the gateway meters and bills.

stream = client.chat.completions.create(
    model="mistral/mistral-large-latest",
    messages=[{"role": "user", "content": "Write a haiku"}],
    stream=True,
    stream_options={"include_usage": True},
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

const stream = await client.chat.completions.create({
  model: "mistral/mistral-large-latest",
  messages: [{ role: "user", content: "Write a haiku" }],
  stream: true,
  stream_options: { include_usage: true },
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Reasoning

Pass the OpenAI reasoning_effort parameter (minimal | low | medium | high) on any thinking-capable model. Sluis translates it per provider (Gemini's thinking level, Claude's adaptive thinking effort) and omits it where a model would reject it, so one parameter works across the whole catalog.

resp = client.chat.completions.create(
    model="vertex/claude-opus-4-8",
    messages=[{"role": "user", "content": "Prove it step by step…"}],
    reasoning_effort="high",  # minimal | low | medium | high
)

Error codes

Errors use the OpenAI error envelope; the error.type value mirrors the HTTP status, so your SDK's error handling keeps working unchanged.

Code	When
400 invalid_request_error	Malformed request body or parameters.
400 invalid_request_error	Model id missing its provider prefix. Every callable id is provider/model, e.g. mistral/mistral-large-latest; the body reads: model must be provider-prefixed.
401 authentication_error	Missing or unknown API key.
402 insufficient_quota	Free allowance spent or budget reached. Activate a plan or raise the budget; the request never reaches a provider.
403 permission_error	The key lacks permission, for example the model is not on its allow-list.
422 invalid_request_error	Refused before dispatch, for example data protection in block mode matched the request.
429 rate_limit_error	Rate limit reached. Enforced at the gateway; the request never hits a provider.
451 permission_error	Blocked by residency policy: no allowed jurisdiction serves the request. The body includes the reason.
5xx api_error	Upstream provider failure after retries; the circuit breaker steers traffic around unhealthy providers.

→ Quickstart

Residency & models →