Skip to main content

Cost Estimation

The CostCalculator lets you evaluate API costs before sending any requests. This is critical for batch processing, RAG pipelines, and any system where token volumes are unpredictable.

How Pricing Works

LLM APIs charge per token, with separate rates for input (prompt) and output (completion) tokens:

ModelInput (per 1M tokens)Output (per 1M tokens)
GPT-4o$2.50$10.00
GPT-4o-mini$0.15$0.60
Claude 3.5 Sonnet$3.00$15.00
Claude 3 Haiku$0.25$1.25
Gemini 1.5 Pro$1.25$5.00
Gemini 1.5 Flash$0.075$0.30
Llama 3.1 405B$0.00$0.00

Prices from the model registry. Self-hosted models default to $0.00.

Usage

from llm_context_forge import CostCalculator

calc = CostCalculator("gpt-4o")

# Estimate cost of a single prompt
cost = calc.estimate_prompt("Your large prompt text goes here...")
print(f"Estimated input cost: ${cost.usd:.6f}")

# Compare costs across models
report = calc.compare_models(
texts=["Document chunk A", "Document chunk B", "Document chunk C"],
models=["gpt-4o", "gpt-4o-mini", "claude-3-haiku"]
)

for entry in report:
print(f"{entry.model}: ${entry.total_usd:.4f} for {entry.total_tokens} tokens")

Batch Cost Projection

For pipelines processing thousands of documents, estimate total costs before execution:

from llm_context_forge import CostCalculator, TokenCounter

calc = CostCalculator("gpt-4o")
counter = TokenCounter("gpt-4o")

documents = load_documents() # Your document corpus
total_tokens = sum(counter.count(doc) for doc in documents)

# Project cost for the full batch
cost_per_token = 2.50 / 1_000_000 # GPT-4o input rate
projected_cost = total_tokens * cost_per_token

print(f"Total tokens: {total_tokens:,}")
print(f"Projected cost: ${projected_cost:.2f}")

:::tip Cost Optimization Use compare_models() to find the cheapest model that meets your quality requirements. For many RAG use cases, gpt-4o-mini at $0.15/1M tokens delivers 90% of GPT-4o's quality at 6% of the cost. :::