Priority-Based Context Packing

The ContextWindow is the most critical component for RAG implementations. It accepts content blocks with assigned priorities and packs them into a token-limited prompt, gracefully dropping lower-priority items when space runs out.

Priority Levels

Priority	Numeric	Guarantee	Typical Usage
CRITICAL	0	Always included (fails if it doesn't fit)	System prompt
HIGH	1	Included before MEDIUM/LOW	User query, essential context
MEDIUM	2	Included if space allows	RAG documents, history
LOW	3	First to be dropped	Nice-to-have context, examples

The Algorithm

1. Sort all blocks by priority (CRITICAL first)
2. Initialize token_budget = max_tokens
3. For each block in priority order:
   a. Count tokens in block
   b. If tokens ≤ remaining budget:
      - Include block
      - Subtract tokens from budget
   c. If tokens > remaining budget AND priority == CRITICAL:
      - FAIL (critical content must fit)
   d. If tokens > remaining budget AND priority != CRITICAL:
      - Skip block (mark as excluded)
4. Concatenate included blocks → final prompt

:::danger Critical Priority If a CRITICAL block doesn't fit in the context window, the assembler will raise an error rather than silently drop it. This is intentional — your system prompt being truncated is never acceptable. :::

Usage

Python
TypeScript

from llm_context_forge import ContextWindow, Priority

window = ContextWindow("gpt-4o")

# System prompt — CRITICAL (never dropped)
window.add_block(
    "You are an expert Python developer. Answer concisely.",
    Priority.CRITICAL,
    "system"
)

# User's question — HIGH priority
window.add_block(
    "How do I implement a binary search tree in Python?",
    Priority.HIGH,
    "query"
)

# RAG documents — MEDIUM (dropped if space is tight)
for i, doc in enumerate(retrieved_documents):
    window.add_block(doc.text, Priority.MEDIUM, f"rag_{i}")

# Nice-to-have examples — LOW (first to go)
window.add_block(
    "Here's an example implementation...",
    Priority.LOW,
    "example"
)

# Build the final prompt
final_prompt = window.assemble(max_tokens=8000)
stats = window.usage()

print(f"Included: {stats.included} blocks")
print(f"Excluded: {stats.excluded} blocks")
print(f"Tokens used: {stats.tokens_used}")

import { ContextWindow } from "llm-context-forge";

const window = new ContextWindow("gpt-4o");

// System prompt — CRITICAL (priority 0)
window.addBlock(
    "You are an expert Python developer. Answer concisely.",
    0, // CRITICAL
    "system"
);

// User's question — HIGH (priority 1)
window.addBlock(
    "How do I implement a binary search tree in Python?",
    1, // HIGH
    "query"
);

// RAG documents — MEDIUM (priority 2)
for (let i = 0; i < retrievedDocs.length; i++) {
    window.addBlock(retrievedDocs[i].text, 2, `rag_${i}`);
}

// Nice-to-have — LOW (priority 3)
window.addBlock("Here's an example implementation...", 3, "example");

// Build
const finalPrompt = window.assemble({ maxTokens: 8000 });
const stats = window.usage();

console.log(`Included: ${stats.included} blocks`);
console.log(`Excluded: ${stats.excluded} blocks`);
console.log(`Tokens: ${stats.tokensUsed}`);

Best Practices

Use CRITICAL sparingly — Only system prompts and absolute essentials. If you mark everything as CRITICAL, nothing can be dropped and you'll get overflow errors.
Label your blocks — The id parameter helps you debug which blocks were included or excluded:
```
Included: system, query, rag_0, rag_1
Excluded: rag_2, rag_3, example
```
Reserve output tokens — If your model needs 2,000 tokens for its response, set max_tokens to context_window - 2000, not the full context window.
Combine with chunking — If a single document is too large, chunk it first with DocumentChunker, then add each chunk as a separate MEDIUM-priority block. The packer will include as many chunks as fit.

Priority Levels​

The Algorithm​

Usage​

Best Practices​

Priority Levels

The Algorithm

Usage

Best Practices