prompt_logprobs

Return the model's top-k logprob predictions at each prompt position — useful for inspecting what the model expected at each step, and (with extra care, see the Gotcha) a building block for scoring how likely a given piece of text is under the model.

curl

curl -s "$ACS_API_BASE/completions" \
  -H "Authorization: Bearer $ACS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-8b",
       "prompt": "The capital of France is",
       "max_tokens": 1,
       "prompt_logprobs": 5}' | jq '.choices[0].prompt_logprobs[:2]'

Python

resp = client.completions.create(
    model="llama-8b",
    prompt="The capital of France is",
    max_tokens=1,
    extra_body={"prompt_logprobs": 5},
)

# resp.choices[0].prompt_logprobs[i] is a {token_id: {logprob, rank, decoded_token}}
# dict for prompt position i, or None for the very first position (no
# conditioning context). Here we just print the top-5 at the first few positions.
for i, choices in enumerate(resp.choices[0].prompt_logprobs[:3]):
    if choices is None:
        continue
    print(f"position {i}:")
    for token_id, info in choices.items():
        print(f"  rank={info['rank']:>2}  logprob={info['logprob']:+.3f}  {info['decoded_token']!r}")

Example response

Just choices[0].prompt_logprobs[:2] (what the curl jq filter shows):

[
  null,
  {
    "14924": {"logprob": -1.179518699645996, "rank": 1, "decoded_token": "Question"},
    "755":   {"logprob": -2.179518699645996, "rank": 2, "decoded_token": "def"},
    "2":     {"logprob": -2.742018699645996, "rank": 3, "decoded_token": "#"},
    "791":   {"logprob": -3.679518699645996, "rank": 4, "decoded_token": "The"},
    "16309": {"logprob": -4.179518699645996, "rank": 5, "decoded_token": "Tags"}
  }
]

Position 0 is null (no conditioning context for the first token). At position 1 the model's top-1 prediction is "Question", not the actual prompt token "The" — which here happens to show up at rank 4. If your prompt token isn't in the top-k, vLLM still appends it as an extra entry with rank > k so you can recover its logprob; see the Gotcha for how to use that to score a sequence.

Gotcha

prompt_logprobs isn't on the OpenAI SDK's typed signature — pass it via extra_body={...}. It appears as a vLLM-native top-level field on each choice (alongside logprobs); each non-None entry is a {token_id: {logprob, rank, decoded_token}} dict, and the first position is None (nothing to condition on). rank=1 is the model's top-1 prediction at that position, not necessarily the actual prompt token — if your prompt token wasn't in the top-k, vLLM appends it as an extra entry with rank > k. To compute the actual sequence log-likelihood of your prompt, set echo=true to recover the prompt tokens, then for each position pick the entry whose decoded_token matches the prompt token (or use a generous prompt_logprobs=20 and fall back to the appended "extra" entry when the actual token wasn't in the top-k). The wrapper schema requires max_tokens >= 1, so set it to 1 and ignore the one extra generated token.