Tokenization
Token counting is the foundation of every operation in LLM Context Forge. Every chunking boundary, context window limit, and cost calculation depends on exact token counts — not heuristics.
Why Heuristics Fail
The common approximation token_count ≈ len(text) / 4 is dangerously inaccurate:
| Text | Actual Tokens (cl100k) | len/4 Estimate | Error |
|---|---|---|---|
"Hello, world!" | 4 | 3 | -25% |
"const x = 42;" | 5 | 3 | -40% |
"日本語テスト" | 4 | 1 | -75% |
"https://api.example.com/v1/chat" | 9 | 8 | -11% |
For CJK text and code, the error regularly exceeds 50%. This means your "safe" 4,000-token chunk might actually be 6,000 tokens — causing silent truncation or API errors.
How Context Forge Counts Tokens
┌────────────────────────────────────────────────────┐
│ TokenCounter │
│ │
│ Input Text ──→ Resolve Model ──→ Select Encoder │
│ │ │
│ ┌─────────────────────┼───────────┐ │
│ │ tiktoken (OpenAI) │ │ │
│ │ transformers (HF) │ Fallback │ │
│ │ mistral-common │ Chain │ │
│ │ anthropic │ │ │
│ └─────────────────────┼───────────┘ │
│ ▼ │
│ Exact Count │
└────────────────────────────────────────────────────┘
Encoding Selection
Each model in the registry maps to a specific BPE encoding:
| Model Family | Encoding | Backend |
|---|---|---|
| GPT-4o, GPT-4o-mini | o200k_base | tiktoken |
| GPT-4, GPT-3.5-turbo | cl100k_base | tiktoken |
| Claude 3.x / 3.5 | cl100k_base | tiktoken (compatible) |
| Gemini 1.5 / 2.0 | cl100k_base | tiktoken (heuristic) |
| Llama 3.x | HuggingFace tokenizer | transformers |
| Mistral / Mixtral | Mistral tokenizer | mistral-common |
Fallback Strategy
If the primary tokenizer is unavailable (e.g., transformers not installed), the counter degrades gracefully:
- Primary — Model-specific encoder (exact)
- Secondary — Compatible BPE encoding (±1-2% variance)
- Baseline —
cl100k_basewith a 1.05× safety multiplier
The baseline fallback intentionally overestimates by 5%. This is a safety measure — it's always better to slightly underuse your context window than to overflow it.
Usage
- Python
- TypeScript
from llm_context_forge import TokenCounter
counter = TokenCounter("gpt-4o")
# Basic counting
tokens = counter.count("Your prompt text here")
print(f"Exact token count: {tokens}")
# Check if content fits in the context window
fits = counter.fits_in_window(
text="Large document...",
reserve_output=1000 # Reserve tokens for the model's response
)
if fits:
print("Safe to send!")
import { TokenCounter } from "llm-context-forge";
const counter = new TokenCounter("gpt-4o");
// Basic counting
const tokens = counter.count("Your prompt text here");
console.log(`Exact token count: ${tokens}`);
// Check if content fits in the context window
if (counter.fitsInWindow("Large document...", 1000)) {
console.log("Safe to send!");
}
Supported Models
LLM Context Forge ships with a registry of 15+ production models with verified context windows and pricing. See Model Registry for the complete list and how to add custom models.