Skip to main content

Tokenization

Token counting is the foundation of every operation in LLM Context Forge. Every chunking boundary, context window limit, and cost calculation depends on exact token counts — not heuristics.

Why Heuristics Fail

The common approximation token_count ≈ len(text) / 4 is dangerously inaccurate:

TextActual Tokens (cl100k)len/4 EstimateError
"Hello, world!"43-25%
"const x = 42;"53-40%
"日本語テスト"41-75%
"https://api.example.com/v1/chat"98-11%

For CJK text and code, the error regularly exceeds 50%. This means your "safe" 4,000-token chunk might actually be 6,000 tokens — causing silent truncation or API errors.

How Context Forge Counts Tokens

┌────────────────────────────────────────────────────┐
│ TokenCounter │
│ │
│ Input Text ──→ Resolve Model ──→ Select Encoder │
│ │ │
│ ┌─────────────────────┼───────────┐ │
│ │ tiktoken (OpenAI) │ │ │
│ │ transformers (HF) │ Fallback │ │
│ │ mistral-common │ Chain │ │
│ │ anthropic │ │ │
│ └─────────────────────┼───────────┘ │
│ ▼ │
│ Exact Count │
└────────────────────────────────────────────────────┘

Encoding Selection

Each model in the registry maps to a specific BPE encoding:

Model FamilyEncodingBackend
GPT-4o, GPT-4o-minio200k_basetiktoken
GPT-4, GPT-3.5-turbocl100k_basetiktoken
Claude 3.x / 3.5cl100k_basetiktoken (compatible)
Gemini 1.5 / 2.0cl100k_basetiktoken (heuristic)
Llama 3.xHuggingFace tokenizertransformers
Mistral / MixtralMistral tokenizermistral-common

Fallback Strategy

If the primary tokenizer is unavailable (e.g., transformers not installed), the counter degrades gracefully:

  1. Primary — Model-specific encoder (exact)
  2. Secondary — Compatible BPE encoding (±1-2% variance)
  3. Baselinecl100k_base with a 1.05× safety multiplier
warning

The baseline fallback intentionally overestimates by 5%. This is a safety measure — it's always better to slightly underuse your context window than to overflow it.

Usage

from llm_context_forge import TokenCounter

counter = TokenCounter("gpt-4o")

# Basic counting
tokens = counter.count("Your prompt text here")
print(f"Exact token count: {tokens}")

# Check if content fits in the context window
fits = counter.fits_in_window(
text="Large document...",
reserve_output=1000 # Reserve tokens for the model's response
)
if fits:
print("Safe to send!")

Supported Models

LLM Context Forge ships with a registry of 15+ production models with verified context windows and pricing. See Model Registry for the complete list and how to add custom models.