Stop paying for context your model doesn't need
Your prompts carry far more tokens than the model actually reads. Compresr drops the rest—up to ~90% fewer tokens—so you cut cost and latency. At light compression it matches or beats full-context accuracy on public benchmarks.
Drop-in proxy · OpenAI & Anthropic compatible
How it works
A proxy that thinks about every token
Compresr sits between your app and your model provider. No retraining, no prompt rewrites, no new SDK to learn.
Point your SDK at Compresr
Swap your base URL—nothing else changes. We're wire-compatible with the OpenAI and Anthropic APIs, streaming included.
We compress in flight
Our model scores every span of your prompt and removes the tokens that don't change the answer—retrieval chunks, boilerplate, dead context.
Your model reads less, answers the same
The compressed prompt hits the model you already use. You keep your outputs, your evals, and your provider—just at a fraction of the tokens.
Benchmarks
Fewer tokens. Same answers.
At light compression, Compresr matches or beats full-context accuracy across public long-context benchmarks—while removing roughly half the tokens or more.
Scores are illustrative · run on your own evals during the beta.
Capabilities
Built for production inference
Tunable compression
Dial from light to aggressive per route or per request. Trade tokens for fidelity exactly where it matters.
Drop-in proxy
Change one base URL. Works with the OpenAI and Anthropic SDKs, function calling, and streaming.
Token analytics
See savings, latency, and quality deltas per model and endpoint in a live console.
Private by default
Prompts are processed in-memory and never used for training. SOC 2 Type II in progress.
Model agnostic
Compress once, route anywhere—GPT, Claude, Gemini, or your own open-weight deployment.
Deterministic & cacheable
Stable outputs for the same input so your caches, evals, and replays keep working.
Pricing
Pay for the tokens that matter
Plans are invite-only during the private beta. Join the waitlist to lock in launch pricing—checkout opens once your account is approved.
Hobby
For side projects and evaluation.
- 1M tokens / month compressed
- Light & balanced presets
- Community support
- Basic analytics
Pro
PopularFor teams shipping to production.
- 250M tokens / month included
- All compression presets
- Per-route tuning & caching
- Full token analytics console
- Email & Slack support
Enterprise
For high-volume, regulated workloads.
- Volume token pricing
- VPC & on-prem deployment
- SSO, audit logs, SOC 2
- Dedicated solutions engineer
Purchase unlocks after waitlist approval.
FAQ
Questions, answered
Cut your token bill, not your accuracy
Join the Compresr waitlist for early access to the API and console. We onboard new teams every week.