TL;DR
A new Model Context Protocol server enables automatic LLM provider switching for cost optimization, alongside Google's aggressive image model pricing.
Price Per Token has released a Model Context Protocol server that lets developers swap LLM providers at request time with a three-line code change. The MCP exposes a unified interface where any compatible client — Claude Desktop, Cursor, or custom agents — can query live pricing across OpenRouter, Amazon Bedrock, Google AI Studio, and other endpoints, then route each call to the lowest-cost option that meets latency and context-window constraints. The move turns model selection from a static architectural decision into a runtime optimization problem.
Google AI Studio simultaneously dropped pricing for Gemini 3.1 Flash Lite Image to $0.25 per million input tokens and $1.50 per million output tokens, undercutting most hosted image-generation APIs. The model, branded Nano Banana 2 Lite on OpenRouter, targets high-volume editing workflows where latency matters more than peak quality. Anthropic's Claude Sonnet 5 also appeared on Bedrock at $2.00 in and $10.00 out with a one-million-token context window, while Poolside released Laguna XS 2.1 as a free tier via OpenRouter.
The Price Per Token MCP works by wrapping each provider's native SDK behind a standardized tool schema. At inference time, the client sends a prompt plus optional constraints — max latency, minimum context, required modalities — and the server returns a ranked list of endpoints with current spot prices. Developers can hard-code a fallback chain or let the MCP pick dynamically. Early adopters report 30 to 60 percent cost reductions on mixed workloads that previously defaulted to a single vendor.
This shifts the economics of LLM ops toward spot-market dynamics familiar from cloud compute. Instead of negotiating enterprise contracts, teams can treat model access as a commodity with real-time price discovery. The risk is silent quality drift: a cheaper endpoint may serve a quantized or distilled variant that benchmarks well on public evals but fails on domain-specific tasks. Price Per Token's roadmap includes per-request quality probes and SLA-aware routing, but neither is live yet.
For practitioners, the immediate takeaway is that hard-coding a single provider is leaving money on the table. A wrapper that evaluates price, latency, and context length per request pays for itself at modest volume. Google's image pricing also makes batch generation workflows — synthetic data, marketing variants, asset pipelines — viable at scale without dedicated GPU clusters. The next bottleneck is observability: tracking which model actually served each request and correlating outputs with downstream metrics.
The market is converging on a two-layer stack: a thin routing layer that handles provider diversity, and a thick application layer that assumes model-agnostic interfaces. Price Per Token's MCP is an early implementation of the routing layer. Whether it becomes the standard or gets absorbed into frameworks like LangChain and LlamaIndex depends on how quickly the major providers expose consistent metadata APIs.
Will routing layers commoditize foundation models into interchangeable utilities, or will differentiation shift to proprietary tooling and data flywheels that no MCP can replicate?
The market reaction
Price Per Token's MCP server is open source and available on GitHub. Google AI Studio's new pricing takes effect immediately for all tiers. Anthropic's Bedrock pricing for Sonnet 5 matches the API rates announced last week. Poolside's free tier requires an OpenRouter account with rate limits of 60 requests per minute.
FAQ
What is the Price Per Token MCP and how do I install it?
The MCP is a Model Context Protocol server that exposes live LLM pricing as callable tools. Install via npm or Docker, then add the server config to your MCP-compatible client.
Which providers does the MCP currently support?
OpenRouter, Amazon Bedrock, Google AI Studio, Together AI, and Fireworks AI are supported at launch. The adapter interface is extensible for custom endpoints.
Does dynamic routing introduce latency overhead?
The MCP adds a single HTTP round-trip for price discovery, typically under 50 milliseconds. Caching and fallback chains mitigate tail latency.
How does Google's $0.25/$1.50 pricing compare to alternatives?
Midjourney and DALL-E 3 equivalents run $3 to $10 per million output tokens. Nano Banana 2 Lite is the cheapest hosted image model with a public API today.
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn