Tokenization

Token counting is the foundation of every operation in LLM Context Forge. Every chunking boundary, context window limit, and cost calculation depends on exact token counts — not heuristics.

Why Heuristics Fail

The common approximation token_count ≈ len(text) / 4 is dangerously inaccurate:

Text	Actual Tokens (cl100k)	`len/4` Estimate	Error
`"Hello, world!"`	4	3	-25%
`"const x = 42;"`	5	3	-40%
`"日本語テスト"`	4	1	-75%
`"https://api.example.com/v1/chat"`	9	8	-11%

For CJK text and code, the error regularly exceeds 50%. This means your "safe" 4,000-token chunk might actually be 6,000 tokens — causing silent truncation or API errors.

How Context Forge Counts Tokens

┌────────────────────────────────────────────────────┐
│                  TokenCounter                      │
│                                                    │
│  Input Text ──→ Resolve Model ──→ Select Encoder   │
│                                    │               │
│              ┌─────────────────────┼───────────┐   │
│              │ tiktoken (OpenAI)   │           │   │
│              │ transformers (HF)   │ Fallback  │   │
│              │ mistral-common      │   Chain   │   │
│              │ anthropic           │           │   │
│              └─────────────────────┼───────────┘   │
│                                    ▼               │
│                            Exact Count             │
└────────────────────────────────────────────────────┘

Encoding Selection

Each model in the registry maps to a specific BPE encoding:

Model Family	Encoding	Backend
GPT-4o, GPT-4o-mini	`o200k_base`	tiktoken
GPT-4, GPT-3.5-turbo	`cl100k_base`	tiktoken
Claude 3.x / 3.5	`cl100k_base`	tiktoken (compatible)
Gemini 1.5 / 2.0	`cl100k_base`	tiktoken (heuristic)
Llama 3.x	HuggingFace tokenizer	transformers
Mistral / Mixtral	Mistral tokenizer	mistral-common

Fallback Strategy

If the primary tokenizer is unavailable (e.g., transformers not installed), the counter degrades gracefully:

Primary — Model-specific encoder (exact)
Secondary — Compatible BPE encoding (±1-2% variance)
Baseline — cl100k_base with a 1.05× safety multiplier

warning

The baseline fallback intentionally overestimates by 5%. This is a safety measure — it's always better to slightly underuse your context window than to overflow it.

Usage

Python
TypeScript

from llm_context_forge import TokenCounter

counter = TokenCounter("gpt-4o")

# Basic counting
tokens = counter.count("Your prompt text here")
print(f"Exact token count: {tokens}")

# Check if content fits in the context window
fits = counter.fits_in_window(
    text="Large document...",
    reserve_output=1000  # Reserve tokens for the model's response
)
if fits:
    print("Safe to send!")

import { TokenCounter } from "llm-context-forge";

const counter = new TokenCounter("gpt-4o");

// Basic counting
const tokens = counter.count("Your prompt text here");
console.log(`Exact token count: ${tokens}`);

// Check if content fits in the context window
if (counter.fitsInWindow("Large document...", 1000)) {
    console.log("Safe to send!");
}

Supported Models

LLM Context Forge ships with a registry of 15+ production models with verified context windows and pricing. See Model Registry for the complete list and how to add custom models.

Why Heuristics Fail​

How Context Forge Counts Tokens​

Encoding Selection​

Fallback Strategy​

Usage​

Supported Models​