REST API Server

LLM Context Forge includes a fast, standalone HTTP server built on FastAPI. This allows you to deploy context management as an independent microservice that supports any language — Go, Rust, Ruby, or internal tools — without maintaining multiple tokenizer implementations.

Starting the Server

The server is included in the base Python package.

# Start on default host (127.0.0.1) and port (8000)
python -m llm_context_forge.server

# Configure host and port
python -m llm_context_forge.server --host 0.0.0.0 --port 8080

Endpoints

`POST /v1/tokens/count`

Counts the exact tokens for a given prompt and model.

Request:

{
  "model": "gpt-4o",
  "text": "Hello, world!"
}

Response (200 OK):

{
  "tokens": 4,
  "model": "gpt-4o",
  "encoder": "o200k_base"
}

`POST /v1/documents/chunk`

Splits text into chunks.

Request:

{
  "model": "claude-3-5-sonnet",
  "text": "Very long document...",
  "strategy": "paragraph",
  "max_tokens": 1000,
  "overlap": 100
}

Response (200 OK):

{
  "total_chunks": 5,
  "chunks": [
    "Chunk 1 text...",
    "Chunk 2 text..."
  ]
}

`POST /v1/context/pack`

Assembles a prompt using priority packing.

Request:

{
  "model": "gpt-4o",
  "max_tokens": 4000,
  "blocks": [
    {"content": "System prompt", "priority": 0, "id": "system"},
    {"content": "User query", "priority": 1, "id": "query"},
    {"content": "RAG chunk", "priority": 2, "id": "rag_0"}
  ]
}

Response (200 OK):

{
  "prompt": "System prompt\n\nUser query\n\nRAG chunk",
  "usage": {
    "tokens_used": 150,
    "included_ids": ["system", "query", "rag_0"],
    "excluded_ids": []
  }
}

`GET /v1/models`

Returns the complete model registry.

Response (200 OK):

{
  "models": {
    "gpt-4o": {
      "provider": "openai",
      "context_window": 128000,
      "tokenizer": "o200k_base"
    }
  }
}

Starting the Server​

Endpoints​

POST /v1/tokens/count​

POST /v1/documents/chunk​

POST /v1/context/pack​

GET /v1/models​

Starting the Server

Endpoints

`POST /v1/tokens/count`

`POST /v1/documents/chunk`

`POST /v1/context/pack`

`GET /v1/models`