Getting Started

LLM Context Forge is production-grade infrastructure for managing LLM context windows. It provides deterministic token counting, intelligent document chunking, priority-based context packing, and pre-flight cost estimation — available as identical, cross-platform libraries for Python and TypeScript.

The Problem

Every team building on LLMs hits the same walls:

Problem	Consequence
Context overflow	Silent truncation or 400 errors from the API
Heuristic token counting	`len(text) / 4` is wrong 15-30% of the time
Naive chunking	Splitting mid-sentence destroys retrieval quality
Pricing surprises	$200 bills from untested prompt pipelines
Platform inconsistency	Python prototype ≠ TypeScript production behavior

The Solution

LLM Context Forge eliminates all five problems with a single, dependency-light package:

┌─────────────────────────────────────────────────────┐
│                 LLM Context Forge                   │
├──────────┬──────────┬──────────┬────────────────────┤
│  Token   │  Smart   │ Context  │   Cost             │
│  Counter │  Chunker │ Packer   │   Estimator        │
├──────────┴──────────┴──────────┴────────────────────┤
│            Model Registry (15+ models)              │
│     OpenAI · Anthropic · Google · Meta · Mistral    │
└─────────────────────────────────────────────────────┘

Quick Install

Python
TypeScript

pip install llm-context-forge

npm install llm-context-forge

30-Second Demo

Python
TypeScript

from llm_context_forge import TokenCounter, ContextWindow, Priority

# Count tokens exactly
counter = TokenCounter("gpt-4o")
print(counter.count("Hello, world!"))  # deterministic result

# Build a context window with priorities
window = ContextWindow("gpt-4o")
window.add_block("You are an expert assistant.", Priority.CRITICAL, "system")
window.add_block("User question here...", Priority.HIGH, "query")
window.add_block("Retrieved document...", Priority.MEDIUM, "rag_0")

prompt = window.assemble(max_tokens=4000)
stats = window.usage()
print(f"Used {stats.tokens_used} tokens, dropped {stats.excluded} blocks")

import { TokenCounter, ContextWindow } from "llm-context-forge";

// Count tokens exactly
const counter = new TokenCounter("gpt-4o");
console.log(counter.count("Hello, world!")); // deterministic result

// Build a context window with priorities
const window = new ContextWindow("gpt-4o");
window.addBlock("You are an expert assistant.", 0, "system");   // CRITICAL
window.addBlock("User question here...", 1, "query");           // HIGH
window.addBlock("Retrieved document...", 2, "rag_0");           // MEDIUM

const prompt = window.assemble({ maxTokens: 4000 });
const stats = window.usage();
console.log(`Used ${stats.tokensUsed} tokens, dropped ${stats.excluded} blocks`);

What's Next?

Core Concepts — Understand how tokenization, chunking, and packing work under the hood
Python SDK — Full Python setup with CLI, REST API, and advanced tokenizers
TypeScript SDK — Node.js/browser setup with full type safety

:::tip Cross-Platform Parity The Python and TypeScript editions produce identical results for the same inputs. You can prototype in Python and ship in TypeScript (or vice versa) with zero behavioral drift. :::

The Problem​

The Solution​

Quick Install​

30-Second Demo​

What's Next?​

The Problem

The Solution

Quick Install

30-Second Demo

What's Next?