v0.9 private beta · 2026

Stop paying for context your model doesn't need

Your prompts carry far more tokens than the model actually reads. Compresr drops the rest—up to ~90% fewer tokens—so you cut cost and latency. At light compression it matches or beats full-context accuracy on public benchmarks.

See how it works

Drop-in proxy · OpenAI & Anthropic compatible

compresr · live preview
Original prompt18,420
Compressed5,526
-70%
12,894Tokens saved
$0.0166Cost / callwas $0.0553
~42% fasterLatency
~90%fewer tokens at max compression
0 lossaccuracy at light compression
<8msadded p50 latency per call
5 minto a drop-in integration

How it works

A proxy that thinks about every token

Compresr sits between your app and your model provider. No retraining, no prompt rewrites, no new SDK to learn.

01

Point your SDK at Compresr

Swap your base URL—nothing else changes. We're wire-compatible with the OpenAI and Anthropic APIs, streaming included.

02

We compress in flight

Our model scores every span of your prompt and removes the tokens that don't change the answer—retrieval chunks, boilerplate, dead context.

03

Your model reads less, answers the same

The compressed prompt hits the model you already use. You keep your outputs, your evals, and your provider—just at a fraction of the tokens.

Benchmarks

Fewer tokens. Same answers.

At light compression, Compresr matches or beats full-context accuracy across public long-context benchmarks—while removing roughly half the tokens or more.

Scores are illustrative · run on your own evals during the beta.

BenchmarkFullCompresrTokens
LongBench QA71.271.8-48%
HotpotQA65.465.1-62%
GovReport58.959.6-55%
TriviaQA88.188.0-71%

Capabilities

Built for production inference

Tunable compression

Dial from light to aggressive per route or per request. Trade tokens for fidelity exactly where it matters.

Drop-in proxy

Change one base URL. Works with the OpenAI and Anthropic SDKs, function calling, and streaming.

Token analytics

See savings, latency, and quality deltas per model and endpoint in a live console.

Private by default

Prompts are processed in-memory and never used for training. SOC 2 Type II in progress.

Model agnostic

Compress once, route anywhere—GPT, Claude, Gemini, or your own open-weight deployment.

Deterministic & cacheable

Stable outputs for the same input so your caches, evals, and replays keep working.

Pricing

Pay for the tokens that matter

Plans are invite-only during the private beta. Join the waitlist to lock in launch pricing—checkout opens once your account is approved.

Hobby

$0/mo

For side projects and evaluation.

  • 1M tokens / month compressed
  • Light & balanced presets
  • Community support
  • Basic analytics

Pro

Popular
$49/mo

For teams shipping to production.

  • 250M tokens / month included
  • All compression presets
  • Per-route tuning & caching
  • Full token analytics console
  • Email & Slack support

Enterprise

Custom

For high-volume, regulated workloads.

  • Volume token pricing
  • VPC & on-prem deployment
  • SSO, audit logs, SOC 2
  • Dedicated solutions engineer

Purchase unlocks after waitlist approval.

FAQ

Questions, answered

Cut your token bill, not your accuracy

Join the Compresr waitlist for early access to the API and console. We onboard new teams every week.