Private beta — access is gated. Email base-models@acsresearch.org with a brief note on how you'd like to use it; we review requests individually. If it's a fit, you'll get an invite link to create an API key from the dashboard.

echo

Prepends the prompt to the generated text in choices[0].text. Combined with logprobs + max_tokens=1 (the wrapper's minimum) and prompt_logprobs, this gives you a request that's almost entirely about scoring existing text rather than generating new text — only one token is actually generated.

curl

curl -s "$ACS_API_BASE/completions" \
  -H "Authorization: Bearer $ACS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-8b", "prompt": "Once upon a time", "max_tokens": 8, "echo": true}'

Python

resp = client.completions.create(
    model="llama-8b",
    prompt="Once upon a time",
    max_tokens=8,
    echo=True,
)
print(resp.choices[0].text)  # "Once upon a time there was a man who was very poor"

Example response

{
  "id": "cmpl-b986142b3fd7d215",
  "object": "text_completion",
  "model": "meta-llama/Llama-3.1-8B",
  "choices": [
    {
      "index": 0,
      "text": "Once upon a time there was a man who was very poor",
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {"prompt_tokens": 5, "completion_tokens": 8, "total_tokens": 13}
}

Note that text contains both the prompt and the 8 generated tokens, concatenated with no separator — usage.completion_tokens only counts the generated portion, so you can slice by token count or just by the byte length of your original prompt. (prompt_tokens=5, not 4, because the tokenizer auto-prepends a <|begin_of_text|> BOS token to Llama prompts.)

Gotcha

echo only mirrors the prompt back; it doesn't add a separator, so split by length if you need just the continuation.