Examples
Short, copy-pasteable end-to-end snippets for each feature. The curl examples assume ACS_API_KEY + ACS_API_BASE are exported (see Quick start); the Python examples use the same openai SDK client.
Each page below shows one shared example response after the curl and Python snippets — the two calls return the same JSON; the SDK just wraps it in typed objects. Heads-up on reproducibility: the example responses were captured with seed=0 appended to the request, but the displayed requests don't pin a seed, so under the default temperature=1.0 your output will differ from token to token. Add a seed to your own request to reproduce a specific run, or temperature=0 for greedy decoding. Always-null fields (service_tier, system_fingerprint, kv_transfer_params, etc.) are omitted from the shown responses for brevity.
Token-level inspection
logprobs— top-k logprobs per generated token.prompt_logprobs— top-k logprobs at each prompt position (gotchas around rank-vs-actual-token).echo— include the prompt in the response.
Streaming + concurrency
stream— SSE token-by-token.- Batch rollouts — 8 concurrent per key, with the canonical
asyncio.gatherpattern.
Recovery patterns
- Cold-boot recovery — handle
503 modal_cold_boot+Retry-After. - Budget-cap recovery — handle
429 budget_exceeded(the non-retryable kind).