Cost Estimation
The CostCalculator lets you evaluate API costs before sending any requests. This is critical for batch processing, RAG pipelines, and any system where token volumes are unpredictable.
How Pricing Works
LLM APIs charge per token, with separate rates for input (prompt) and output (completion) tokens:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| GPT-4o-mini | $0.15 | $0.60 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
| Claude 3 Haiku | $0.25 | $1.25 |
| Gemini 1.5 Pro | $1.25 | $5.00 |
| Gemini 1.5 Flash | $0.075 | $0.30 |
| Llama 3.1 405B | $0.00 | $0.00 |
Prices from the model registry. Self-hosted models default to $0.00.
Usage
- Python
- TypeScript
from llm_context_forge import CostCalculator
calc = CostCalculator("gpt-4o")
# Estimate cost of a single prompt
cost = calc.estimate_prompt("Your large prompt text goes here...")
print(f"Estimated input cost: ${cost.usd:.6f}")
# Compare costs across models
report = calc.compare_models(
texts=["Document chunk A", "Document chunk B", "Document chunk C"],
models=["gpt-4o", "gpt-4o-mini", "claude-3-haiku"]
)
for entry in report:
print(f"{entry.model}: ${entry.total_usd:.4f} for {entry.total_tokens} tokens")
import { CostCalculator } from "llm-context-forge";
const calc = new CostCalculator("gpt-4o");
// Estimate cost of a single prompt
const cost = calc.estimatePrompt("Your large prompt text goes here...");
console.log(`Estimated input cost: $${cost.usd.toFixed(6)}`);
Batch Cost Projection
For pipelines processing thousands of documents, estimate total costs before execution:
from llm_context_forge import CostCalculator, TokenCounter
calc = CostCalculator("gpt-4o")
counter = TokenCounter("gpt-4o")
documents = load_documents() # Your document corpus
total_tokens = sum(counter.count(doc) for doc in documents)
# Project cost for the full batch
cost_per_token = 2.50 / 1_000_000 # GPT-4o input rate
projected_cost = total_tokens * cost_per_token
print(f"Total tokens: {total_tokens:,}")
print(f"Projected cost: ${projected_cost:.2f}")
:::tip Cost Optimization
Use compare_models() to find the cheapest model that meets your quality requirements. For many RAG use cases, gpt-4o-mini at $0.15/1M tokens delivers 90% of GPT-4o's quality at 6% of the cost.
:::