Getting Started
LLM Context Forge is production-grade infrastructure for managing LLM context windows. It provides deterministic token counting, intelligent document chunking, priority-based context packing, and pre-flight cost estimation — available as identical, cross-platform libraries for Python and TypeScript.
The Problem
Every team building on LLMs hits the same walls:
| Problem | Consequence |
|---|---|
| Context overflow | Silent truncation or 400 errors from the API |
| Heuristic token counting | len(text) / 4 is wrong 15-30% of the time |
| Naive chunking | Splitting mid-sentence destroys retrieval quality |
| Pricing surprises | $200 bills from untested prompt pipelines |
| Platform inconsistency | Python prototype ≠ TypeScript production behavior |
The Solution
LLM Context Forge eliminates all five problems with a single, dependency-light package:
┌─────────────────────────────────────────────────────┐
│ LLM Context Forge │
├──────────┬──────────┬──────────┬────────────────────┤
│ Token │ Smart │ Context │ Cost │
│ Counter │ Chunker │ Packer │ Estimator │
├──────────┴──────────┴──────────┴────────────────────┤
│ Model Registry (15+ models) │
│ OpenAI · Anthropic · Google · Meta · Mistral │
└─────────────────────────────────────────────────────┘
Quick Install
- Python
- TypeScript
pip install llm-context-forge
npm install llm-context-forge
30-Second Demo
- Python
- TypeScript
from llm_context_forge import TokenCounter, ContextWindow, Priority
# Count tokens exactly
counter = TokenCounter("gpt-4o")
print(counter.count("Hello, world!")) # deterministic result
# Build a context window with priorities
window = ContextWindow("gpt-4o")
window.add_block("You are an expert assistant.", Priority.CRITICAL, "system")
window.add_block("User question here...", Priority.HIGH, "query")
window.add_block("Retrieved document...", Priority.MEDIUM, "rag_0")
prompt = window.assemble(max_tokens=4000)
stats = window.usage()
print(f"Used {stats.tokens_used} tokens, dropped {stats.excluded} blocks")
import { TokenCounter, ContextWindow } from "llm-context-forge";
// Count tokens exactly
const counter = new TokenCounter("gpt-4o");
console.log(counter.count("Hello, world!")); // deterministic result
// Build a context window with priorities
const window = new ContextWindow("gpt-4o");
window.addBlock("You are an expert assistant.", 0, "system"); // CRITICAL
window.addBlock("User question here...", 1, "query"); // HIGH
window.addBlock("Retrieved document...", 2, "rag_0"); // MEDIUM
const prompt = window.assemble({ maxTokens: 4000 });
const stats = window.usage();
console.log(`Used ${stats.tokensUsed} tokens, dropped ${stats.excluded} blocks`);
What's Next?
- Core Concepts — Understand how tokenization, chunking, and packing work under the hood
- Python SDK — Full Python setup with CLI, REST API, and advanced tokenizers
- TypeScript SDK — Node.js/browser setup with full type safety
:::tip Cross-Platform Parity The Python and TypeScript editions produce identical results for the same inputs. You can prototype in Python and ship in TypeScript (or vice versa) with zero behavioral drift. :::