prompt_logprobs
Return the model's top-k logprob predictions at each prompt position — useful for inspecting what the model expected at each step, and (with extra care, see the Gotcha) a building block for scoring how likely a given piece of text is under the model.
curl
curl -s "$ACS_API_BASE/completions" \
-H "Authorization: Bearer $ACS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "llama-8b",
"prompt": "The capital of France is",
"max_tokens": 1,
"prompt_logprobs": 5}' | jq '.choices[0].prompt_logprobs[:2]'
Python
resp = client.completions.create(
model="llama-8b",
prompt="The capital of France is",
max_tokens=1,
extra_body={"prompt_logprobs": 5},
)
# resp.choices[0].prompt_logprobs[i] is a {token_id: {logprob, rank, decoded_token}}
# dict for prompt position i, or None for the very first position (no
# conditioning context). Here we just print the top-5 at the first few positions.
for i, choices in enumerate(resp.choices[0].prompt_logprobs[:3]):
if choices is None:
continue
print(f"position {i}:")
for token_id, info in choices.items():
print(f" rank={info['rank']:>2} logprob={info['logprob']:+.3f} {info['decoded_token']!r}")
Example response
Just choices[0].prompt_logprobs[:2] (what the curl jq filter shows):
[
null,
{
"14924": {"logprob": -1.179518699645996, "rank": 1, "decoded_token": "Question"},
"755": {"logprob": -2.179518699645996, "rank": 2, "decoded_token": "def"},
"2": {"logprob": -2.742018699645996, "rank": 3, "decoded_token": "#"},
"791": {"logprob": -3.679518699645996, "rank": 4, "decoded_token": "The"},
"16309": {"logprob": -4.179518699645996, "rank": 5, "decoded_token": "Tags"}
}
]
Position 0 is null (no conditioning context for the first token). At position 1 the model's top-1 prediction is "Question", not the actual prompt token "The" — which here happens to show up at rank 4. If your prompt token isn't in the top-k, vLLM still appends it as an extra entry with rank > k so you can recover its logprob; see the Gotcha for how to use that to score a sequence.
Gotcha
prompt_logprobs isn't on the OpenAI SDK's typed signature — pass it via extra_body={...}. It appears as a vLLM-native top-level field on each choice (alongside logprobs); each non-None entry is a {token_id: {logprob, rank, decoded_token}} dict, and the first position is None (nothing to condition on). rank=1 is the model's top-1 prediction at that position, not necessarily the actual prompt token — if your prompt token wasn't in the top-k, vLLM appends it as an extra entry with rank > k. To compute the actual sequence log-likelihood of your prompt, set echo=true to recover the prompt tokens, then for each position pick the entry whose decoded_token matches the prompt token (or use a generous prompt_logprobs=20 and fall back to the appended "extra" entry when the actual token wasn't in the top-k). The wrapper schema requires max_tokens >= 1, so set it to 1 and ignore the one extra generated token.