documentation

Build, route, verify.

Three install commands, one API key, and a routed call in under three minutes. BCU meter definitions are public. SDK and API references are drafting; the meter table and pricing examples below are the source of truth.

Quickstart

#quickstart

Install the CLI or SDK, set your provider keys, send a call. The drop-in client mirrors the OpenAI Chat Completions shape so existing code paths usually work without changes.

First call

python

from bytevion import Bytevion

client = Bytevion(api_key="bv-...")

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Explain recursion"}],
    optimize=True,
)
print(response.choices[0].message.content)

BCU meter reference

#bcu-meters

A Byte Compute Unit is a normalized unit of platform work. Every invoice itemizes BCUs across the meters below so customers can audit precisely what they are paying for. List price is $0.010 per BCU; enterprise committed pricing ranges from $0.004 to $0.007 per BCU.

Meter

Quantity

Purpose

Gateway request

0.020 per request

Authentication, policy enforcement, telemetry, billing trace.

Simple route decision

0.030 per request

Single route from a customer-defined policy.

Quality-constrained route

0.120 per request

Confidence scoring, workload class, or eval history.

Multi-candidate route

0.300 per request

Simulates or compares multiple route candidates.

Optimization input

0.040 per 1K optimized input tokens

Token accounting, transforms, policy validation.

Optimization output

0.060 per 1K output tokens

Output accounting, observability, quality metadata.

Cache read

0.010 per 1K cache read tokens

Cache lookup, retrieval, and validation.

Cache write

0.025 per 1K cache write tokens

Storage, replication, TTL management, invalidation.

Compression

0.080 per 1K raw input tokens entering compressor

Compressor execution and quality guardrail.

Shadow / eval

0.200 per 1K evaluated tokens plus provider passthrough

Baseline sampling and savings proof generation.

Pricing examples

#pricing-examples

Three worked examples at the list rate ($0.010 per BCU). The Cheap passthrough row is intentional. Bytevion does not claim savings on already-optimized workloads, and negative savings figures are preserved in invoice-grade ledgers as useful routing evidence.

Workload

Direct

Byte bill

BCU/req

Savings

Support chat

2K input, 500 output

$0.0135

$0.0068

0.308

-49.8%

Agentic code

20K input, 2K output, large reusable context

$0.0900

$0.0268

0.930

-70.2%

Document Q&A

8K retrieved context, 800 output

$0.0210

$0.0136

0.412

-54.8%

Cheap passthrough

already on a low-cost model, no optimization claim

$0.0006

$0.0008

0.020

+32.0%

Provider matrix

#providers

Bytevion supports 11 provider adapters out of the box, with BYOK and managed billing modes. Text, vision, document and audio inputs are routed through the same gateway.

OpenAIAnthropicGeminiGroqxAIOpenRouterOllamaMistralCohereBedrockLlama.cpp

API reference

#api

The full SDK and API reference is drafting. The OpenAI-shaped Chat Completions endpoint is the most stable surface and is documented inline in the SDK. If you are integrating today and want preview docs ahead of the public release, send a request to namaste@bytevion.com and we will share them.

> drafting

SDK reference, full endpoint list, error codes, rate limit semantics, webhook events.

target publish: 2026-Q3 with the public beta.

Deployment modes

#deploy

in-process· drop the SDK into your app. Routing and cache run locally; telemetry streams to the API portal.
gateway· run the FastAPI Docker gateway as a sidecar or shared service. Single endpoint, no app changes.
k8s / vpc· private deployment in your own VPC with data residency, private caches and custom runtime constraints. Enterprise only.

still reading?

The private beta is open. We respond within the day.

> request a key