Budget-cap recovery

When your monthly token budget runs out, requests return 429 budget_exceeded. There is no automatic refill mid-month — the cap resets at the 1st (UTC).

curl

curl -i "$ACS_API_BASE/completions" \
  -H "Authorization: Bearer $ACS_API_KEY" -H "Content-Type: application/json" \
  -d '{"model": "llama-8b", "prompt": "Hi", "max_tokens": 8}'
# HTTP/1.1 429 Too Many Requests
# {"error": {"code": "budget_exceeded", "message": "Monthly token budget exhausted.", ...}}

Python

from openai import APIStatusError

try:
    resp = client.completions.create(model="llama-8b", prompt="Hi", max_tokens=8)
except APIStatusError as e:
    # The wrapper nests the error code under body["error"]["code"], not at
    # the top level — so e.code (which the openai SDK reads from
    # body["code"]) is None here. Dig into e.body instead.
    err = (e.body or {}).get("error", {}) if isinstance(e.body, dict) else {}
    if e.status_code == 429 and err.get("code") == "budget_exceeded":
        # Don't retry — the cap is sticky until the 1st of next month UTC.
        # Email base-models@acsresearch.org to raise the budget mid-month.
        raise SystemExit("Budget exhausted — emailing the team for a raise.")
    raise

Gotcha

429 budget_exceeded is not a transient rate-limit — exponential backoff just burns time. Inspect error.code and short-circuit non-retryable cases (concurrency limits, by contrast, simply queue).