Schema-Safe Prompt Compression

Research2026-03-154 min read

Query-aware pruning works well on free text and degrades predictably on structured prompts. Notes on why entity and schema preservation often matter more than raw reduction ratios, and what the research suggests about measuring both.

The compression ratio trap

Prompt compression research is almost entirely focused on one metric: tokens saved. A technique that reduces a 4,000-token prompt to 2,400 tokens is reported as a 40% compression. Better than another technique that gets to 2,600 tokens. Simple comparison.

The problem is that 40% compression on a free-text summarization prompt and 40% compression on a structured data operation prompt are not equivalent reductions. One preserves the information the model needs. The other may not.

What gets lost in compression

Consider a prompt for a structured data operation:

original prompt (87 tokens)

You are a database assistant.
Update the record where user_id = 12847 in the accounts table.
Set the field notifications.email_digest to false.
Set the field notifications.push.marketing to false.
Keep all other fields unchanged.
Return the updated record as JSON.

aggressively compressed (51 tokens, 41% reduction)

Database assistant.
Update user 12847: disable email digest and marketing push notifications.
Return JSON.

The compression looks clean. But the original prompt contains three critical structured elements: the exact table name accounts, the exact field paths notifications.email_digest andnotifications.push.marketing, and the constraint that other fields must not change. The compressed version loses all three.

A model that hallucates a different field name on a write operation corrupts data. A model that assumes it should update additional fields corrupts more data. The compressed prompt did not just lose tokens. It lost safety constraints.

What schema-safe compression protects

Schema-safe compression preserves a set of token categories verbatim:

Exact identifiers: column names, field paths, table names, variable names.
Numeric values: IDs, counts, thresholds, coordinates.
Code syntax: brackets, operators, keywords, function signatures.
Named constraints: "must", "only", "exactly", "never".
URLs, file paths, and format specifiers.

Everything else can be compressed: verbose instructions can be condensed, repeated context can be deduplicated, soft qualifiers can be dropped.

compression with and without schema preservation

Prompt type

Std. compression (40% target)

Schema-safe compression (40% target)

Downstream accuracy

Free-text summarization

40% reduction

38% reduction

97% vs 98% (parity)

Structured data update

41% reduction

22% reduction

94% vs 61% (significant loss)

SQL generation

39% reduction

24% reduction

91% vs 68% (significant loss)

Code generation

38% reduction

19% reduction

89% vs 71% (significant loss)

Factual Q&A

40% reduction

37% reduction

96% vs 95% (near parity)

For free-text tasks, schema-safe and standard compression achieve similar token reduction with similar quality. For structured tasks, standard compression hits the target ratio by discarding schema elements, which causes large quality drops. Schema-safe compression gets less compression but much higher quality.

How to implement it

A schema-safe compressor operates in two passes:

two-pass compression (conceptual)

Pass 1: Extract and protect
  - Run a schema extractor over the prompt
  - Tag all identifiers, paths, values, constraints
  - Mark them as protected (do not compress)

Pass 2: Compress the remainder
  - Apply standard compression to unprotected text
  - Merge compressed segments back with protected elements
  - Verify: no protected element was altered or dropped

The schema extractor can be rule-based (regex for common patterns) or model-based (a small classifier trained on structured content). Rule-based works well for SQL, JSON, and code. Model-based handles more complex cases like natural language with embedded technical terms.

The compression ceiling

Schema-safe compression has a lower ceiling than unconstrained compression on structured prompts, because a large fraction of the tokens in structured prompts are identifiers and values that cannot be compressed without semantic loss.

On our production traffic, structured-task prompts compress to 15-25% with schema preservation versus 35-45% without. The right response to this is not to loosen the constraints. It is to accept that structured prompts are expensive and price them accordingly, rather than lying to yourself about the compression ratio.

The metric you actually need

Token reduction rate is not sufficient as a compression quality metric for structured workloads. You need to measure schema preservation rate separately: what fraction of identifiers, values, and constraints in the original prompt are present and unchanged in the compressed prompt. If that number is below 99%, your compressor is introducing correctness risk.

When it matters most

Schema preservation matters most when the model output is used programmatically: database writes, API calls, code execution, form submissions. For these use cases, a small compression gain is not worth a correctness regression.

For conversational and summarization tasks, standard compression is fine. The model's task is to produce text that is semantically correct, not to reproduce exact identifiers. Use schema-safe compression selectively, not universally.

← previousSmart Routing as a Classification Problem

next →Context Compilation: Framing the Problem