Examples

Short, copy-pasteable end-to-end snippets for each feature. The curl examples assume ACS_API_KEY + ACS_API_BASE are exported (see Quick start); the Python examples use the same openai SDK client.

Each page below shows one shared example response after the curl and Python snippets — the two calls return the same JSON; the SDK just wraps it in typed objects. Heads-up on reproducibility: the example responses were captured with seed=0 appended to the request, but the displayed requests don't pin a seed, so under the default temperature=1.0 your output will differ from token to token. Add a seed to your own request to reproduce a specific run, or temperature=0 for greedy decoding. Always-null fields (service_tier, system_fingerprint, kv_transfer_params, etc.) are omitted from the shown responses for brevity.

Token-level inspection

logprobs — top-k logprobs per generated token.
prompt_logprobs — top-k logprobs at each prompt position (gotchas around rank-vs-actual-token).
echo — include the prompt in the response.

Streaming + concurrency

stream — SSE token-by-token.
Batch rollouts — 8 concurrent per key, with the canonical asyncio.gather pattern.

Recovery patterns

Cold-boot recovery — handle 503 modal_cold_boot + Retry-After.
Budget-cap recovery — handle 429 budget_exceeded (the non-retryable kind).