Query-aware pruning works well on free text and degrades predictably on structured prompts. Notes on why entity and schema preservation often matter more than raw reduction ratios, and what the research suggests about measuring both.
The compression ratio trap
Prompt compression research is almost entirely focused on one metric: tokens saved. A technique that reduces a 4,000-token prompt to 2,400 tokens is reported as a 40% compression. Better than another technique that gets to 2,600 tokens. Simple comparison.
The problem is that 40% compression on a free-text summarization prompt and 40% compression on a structured data operation prompt are not equivalent reductions. One preserves the information the model needs. The other may not.
What gets lost in compression
Consider a prompt for a structured data operation:
You are a database assistant. Update the record where user_id = 12847 in the accounts table. Set the field notifications.email_digest to false. Set the field notifications.push.marketing to false. Keep all other fields unchanged. Return the updated record as JSON.
Database assistant. Update user 12847: disable email digest and marketing push notifications. Return JSON.
The compression looks clean. But the original prompt contains three critical structured elements: the exact table name accounts, the exact field paths notifications.email_digest andnotifications.push.marketing, and the constraint that other fields must not change. The compressed version loses all three.
A model that hallucates a different field name on a write operation corrupts data. A model that assumes it should update additional fields corrupts more data. The compressed prompt did not just lose tokens. It lost safety constraints.
What schema-safe compression protects
Schema-safe compression preserves a set of token categories verbatim:
- Exact identifiers: column names, field paths, table names, variable names.
- Numeric values: IDs, counts, thresholds, coordinates.
- Code syntax: brackets, operators, keywords, function signatures.
- Named constraints: "must", "only", "exactly", "never".
- URLs, file paths, and format specifiers.
Everything else can be compressed: verbose instructions can be condensed, repeated context can be deduplicated, soft qualifiers can be dropped.
compression with and without schema preservation
For free-text tasks, schema-safe and standard compression achieve similar token reduction with similar quality. For structured tasks, standard compression hits the target ratio by discarding schema elements, which causes large quality drops. Schema-safe compression gets less compression but much higher quality.
How to implement it
A schema-safe compressor operates in two passes:
Pass 1: Extract and protect - Run a schema extractor over the prompt - Tag all identifiers, paths, values, constraints - Mark them as protected (do not compress) Pass 2: Compress the remainder - Apply standard compression to unprotected text - Merge compressed segments back with protected elements - Verify: no protected element was altered or dropped
The schema extractor can be rule-based (regex for common patterns) or model-based (a small classifier trained on structured content). Rule-based works well for SQL, JSON, and code. Model-based handles more complex cases like natural language with embedded technical terms.
The compression ceiling
Schema-safe compression has a lower ceiling than unconstrained compression on structured prompts, because a large fraction of the tokens in structured prompts are identifiers and values that cannot be compressed without semantic loss.
On our production traffic, structured-task prompts compress to 15-25% with schema preservation versus 35-45% without. The right response to this is not to loosen the constraints. It is to accept that structured prompts are expensive and price them accordingly, rather than lying to yourself about the compression ratio.
When it matters most
Schema preservation matters most when the model output is used programmatically: database writes, API calls, code execution, form submissions. For these use cases, a small compression gain is not worth a correctness regression.
For conversational and summarization tasks, standard compression is fine. The model's task is to produce text that is semantically correct, not to reproduce exact identifiers. Use schema-safe compression selectively, not universally.