← Back to home
documentation

Build, route, verify.

Three install commands, one API key, and a routed call in under three minutes. BCU meter definitions are public. SDK and API references are drafting; the meter table and pricing examples below are the source of truth.

Quickstart

#quickstart

Install the CLI or SDK, set your provider keys, send a call. The drop-in client mirrors the OpenAI Chat Completions shape so existing code paths usually work without changes.

First call

python
from bytevion import Bytevion

client = Bytevion(api_key="bv-...")

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Explain recursion"}],
    optimize=True,
)
print(response.choices[0].message.content)

BCU meter reference

#bcu-meters

A Byte Compute Unit is a normalized unit of platform work. Every invoice itemizes BCUs across the meters below so customers can audit precisely what they are paying for. List price is $0.010 per BCU; enterprise committed pricing ranges from $0.004 to $0.007 per BCU.

Meter
Quantity
Purpose
Gateway request
0.020 per request
Authentication, policy enforcement, telemetry, billing trace.
Simple route decision
0.030 per request
Single route from a customer-defined policy.
Quality-constrained route
0.120 per request
Confidence scoring, workload class, or eval history.
Multi-candidate route
0.300 per request
Simulates or compares multiple route candidates.
Optimization input
0.040 per 1K optimized input tokens
Token accounting, transforms, policy validation.
Optimization output
0.060 per 1K output tokens
Output accounting, observability, quality metadata.
Cache read
0.010 per 1K cache read tokens
Cache lookup, retrieval, and validation.
Cache write
0.025 per 1K cache write tokens
Storage, replication, TTL management, invalidation.
Compression
0.080 per 1K raw input tokens entering compressor
Compressor execution and quality guardrail.
Shadow / eval
0.200 per 1K evaluated tokens plus provider passthrough
Baseline sampling and savings proof generation.

Pricing examples

#pricing-examples

Three worked examples at the list rate ($0.010 per BCU). The Cheap passthrough row is intentional. Bytevion does not claim savings on already-optimized workloads, and negative savings figures are preserved in invoice-grade ledgers as useful routing evidence.

Workload
Direct
Byte bill
BCU/req
Savings
Support chat
2K input, 500 output
$0.0135
$0.0068
0.308
-49.8%
Agentic code
20K input, 2K output, large reusable context
$0.0900
$0.0268
0.930
-70.2%
Document Q&A
8K retrieved context, 800 output
$0.0210
$0.0136
0.412
-54.8%
Cheap passthrough
already on a low-cost model, no optimization claim
$0.0006
$0.0008
0.020
+32.0%

Provider matrix

#providers

Bytevion supports 11 provider adapters out of the box, with BYOK and managed billing modes. Text, vision, document and audio inputs are routed through the same gateway.

OpenAIAnthropicGeminiGroqxAIOpenRouterOllamaMistralCohereBedrockLlama.cpp

API reference

#api

The full SDK and API reference is drafting. The OpenAI-shaped Chat Completions endpoint is the most stable surface and is documented inline in the SDK. If you are integrating today and want preview docs ahead of the public release, send a request to namaste@bytevion.com and we will share them.

> drafting
SDK reference, full endpoint list, error codes, rate limit semantics, webhook events.
target publish: 2026-Q3 with the public beta.

Deployment modes

#deploy
  • in-process· drop the SDK into your app. Routing and cache run locally; telemetry streams to the API portal.
  • gateway· run the FastAPI Docker gateway as a sidecar or shared service. Single endpoint, no app changes.
  • k8s / vpc· private deployment in your own VPC with data residency, private caches and custom runtime constraints. Enterprise only.
still reading?
The private beta is open. We respond within the day.
> request a key