Cold-boot recovery
If a model has been idle, the first request returns 503 modal_cold_boot with a Retry-After header and a retry_after_seconds body field. Wait and retry — a few minutes is normal, 5–13 min can happen for 405B.
curl
# curl with manual retry — read Retry-After from -D headers.txt:
until curl -fs -D headers.txt "$ACS_API_BASE/completions" \
-H "Authorization: Bearer $ACS_API_KEY" -H "Content-Type: application/json" \
-d '{"model": "llama-405b", "prompt": "Hello", "max_tokens": 4}'; do
sleep "$(awk '/^[Rr]etry-[Aa]fter:/ {print $2+0}' headers.txt)"
done
Python
import time
from openai import OpenAI, APIStatusError
while True:
try:
resp = client.completions.create(model="llama-405b", prompt="Hello", max_tokens=4)
break
except APIStatusError as e:
if e.status_code == 503 and (e.response.headers.get("retry-after") or "").isdigit():
time.sleep(int(e.response.headers["retry-after"]))
continue
raise
Gotcha
Retry indefinitely on 503 modal_cold_boot, but cap retries on 503 circuit_open — that one means an underlying outage and the breaker may stay open for minutes.