Private beta — access is gated. Email base-models@acsresearch.org with a brief note on how you'd like to use it; we review requests individually. If it's a fit, you'll get an invite link to create an API key from the dashboard.

Models

The wrapper currently fronts three base-model checkpoints. Pick llama-8b if you're just trying it out — it has the shortest cold-boot.

Available models

modelCheckpointPrecision
llama-8bmeta-llama/Llama-3.1-8Bbf16
llama-405bmeta-llama/Llama-3.1-405Bbf16
trinity-basearcee-ai/Trinity-Large-TrueBasebf16

Availability & cold starts

Models that get steady use are usually kept warm, so requests start immediately. Less-used or larger models sleep when idle to save GPU time and cold-start on the first request after a quiet spell — usually a few minutes, worst case up to ~10 minutes (occasionally longer for the largest models when GPUs are scarce). Once warm, a model answers in seconds and stays warm while you keep using it.

If a first request returns modal_cold_boot, wait for the Retry-After hint and retry — see Cold-boot recovery for a copy-pasteable retry loop. For interactive work, send one short throwaway request to wake the model, then go; for batch jobs, let the first request absorb the wait.

Always-on (warm windows)

If you'd rather have a model kept always-on for a stretch — a work session, a deadline — tell us via the Feedback button or email, and we'll schedule a warm window.