logprobs

Ask for the top-k alternative tokens at each generated position (k ≤ 20). Useful for calibration, classification by next-token probabilities, or just inspecting what the model "almost said".

curl

curl -s "$ACS_API_BASE/completions" \
  -H "Authorization: Bearer $ACS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-8b", "prompt": "The capital of France is", "max_tokens": 1, "logprobs": 5}'

Python

resp = client.completions.create(
    model="llama-8b",
    prompt="The capital of France is",
    max_tokens=1,
    logprobs=5,
)
choice = resp.choices[0]
print(choice.text, choice.logprobs.top_logprobs[0])  # {' a': -1.74, ' Paris': -1.99, ...}

Example response

{
  "id": "cmpl-91ea94c6ed0e5b75",
  "object": "text_completion",
  "model": "meta-llama/Llama-3.1-8B",
  "choices": [
    {
      "index": 0,
      "text": " a",
      "logprobs": {
        "text_offset": [0],
        "tokens": [" a"],
        "token_logprobs": [-1.7437232732772827],
        "top_logprobs": [
          {
            " a": -1.7437232732772827,
            " Paris": -1.9937232732772827,
            " one": -2.5562233924865723,
            " the": -2.5562233924865723,
            " also": -3.1187233924865723
          }
        ]
      },
      "finish_reason": "length"
    }
  ],
  "usage": {"prompt_tokens": 6, "completion_tokens": 1, "total_tokens": 7}
}

Note that base llama-8b picked " a" over " Paris" — the model is a continuation predictor, not a Q&A assistant, so it samples a plausible continuation rather than answering. " Paris" is the second-most likely token by 0.25 logprob, so a chat-tuned model (or greedy decoding with temperature=0) would land on it more often.

Gotcha

logprobs is an integer (top-k), not a boolean — passing true returns 400 invalid_request. The same rule applies to every numeric sampler param: max_tokens, n, best_of, seed, top_k, temperature, top_p, min_p, presence_penalty, frequency_penalty, repetition_penalty, and prompt_logprobs all reject true/false so a typo like max_tokens: true fails loudly instead of silently being interpreted as 1. (echo and stream are the only sampler fields that legitimately take booleans.)