DeepSeek V4 ships with 1M-token context and two model tiers

TL;DR

DeepSeek V4-Pro and V4-Flash bring 1M-token context windows and aggressive pricing to open-source AI, with benchmark claims that warrant independent verification.

DeepSeek shipped V4 on April 23, splitting the release into two variants: V4-Pro and V4-Flash. Both carry a one-million-token context window, a specification that now marks the floor for serious open-source deployment. The Chinese lab's prior release, R1, triggered a market sell-off that wiped more than $500 billion from Nvidia's valuation in a single session. V4 arrives with that precedent already reflected in expectations.

According to Yahoo Finance, DeepSeek's statement positions V4-Pro as achieving leadership in open-source across agent capabilities, world knowledge, and reasoning performance. That claim comes with one honest qualifier the company included in its own announcement: on world knowledge benchmarks, V4-Pro trails Google's Gemini-Pro-3.1, making it first among open-source alternatives but second overall. Practitioners conducting an artificial intelligence review of frontier options should weight that distinction when comparing deployment costs against capability gaps.

The pricing spread between tiers is substantial. Price Per Token lists V4-Pro at $2.40 per million input tokens and $4.80 per million output tokens via Alibaba Cloud. V4-Flash runs at $0.20 input and $0.40 output on the same infrastructure, a twelve-to-one cost ratio that makes the Flash variant compelling for high-throughput retrieval pipelines where maximum reasoning depth is not the bottleneck.

The architecture and chip angle

DeepSeek V4 is explicitly described as better optimized for Chinese domestic chips, a detail that extends beyond benchmark tables into supply-chain strategy. Export controls on high-end Nvidia hardware have forced Chinese labs to retool inference stacks for alternative silicon, and V4 appears to be the first major DeepSeek release designed with that constraint as a first-class requirement rather than an afterthought. The AI Release Tracker logs four V4 variants, including V4-Flash-Max and V4-Pro-Max, suggesting a modular release strategy that lets DeepSeek tune cost-performance tradeoffs without forking the base architecture.

V4-Pro introduces a maximum reasoning effort mode, which the company frames as a setting that pushes knowledge capabilities beyond the default configuration. The mechanism is underdocumented in the preview release. Independent evaluation against standard suites such as GPQA Diamond or SWE-Bench Verified has not yet appeared, and benchmark claims for this mode should be treated as preliminary until third-party numbers surface.

Placing V4 in context

The one-million-token context window is no longer a differentiator on its own. GPT-5.5 and Gemini-Pro-3.1 operate at comparable or longer windows, and LLM Stats shows the open-source field converging rapidly on the same specification. What distinguishes V4 is the combination of that window with open weights and pricing that undercuts most proprietary alternatives at scale, precisely where enterprise retrieval and multi-step agent workflows accumulate the most token volume.

DeepSeek's trajectory since R1 has changed how the broader artificial intelligence research community calibrates expectations for Chinese labs. R1 was treated as a surprise; V4 is treated as a continuation. The volatility that followed R1 established inference cost and model openness as investor-grade concerns, not just practitioner preferences. V4 extends that argument into longer-context territory and arrives with a Flash tier priced to capture volume customers before proprietary providers can respond.

The preview label leaves room for DeepSeek to revise benchmark numbers before a full release, and the chip-optimization story remains opaque without published hardware details. Whether V4-Pro holds its claimed position against Gemini-Pro-3.1 across a wider benchmark suite will determine how quickly enterprises begin migrating long-context workloads away from closed providers.

FAQ

What is the DeepSeek V4 context window?
Both V4-Pro and V4-Flash support a one-million-token context window, available in the current preview release.

How does DeepSeek V4-Pro pricing compare to proprietary models?
V4-Pro costs $2.40 per million input tokens and $4.80 per million output tokens via Alibaba Cloud, which is substantially below most proprietary frontier models at comparable capability tiers.

What is the difference between V4-Pro and V4-Flash?
V4-Pro is the full-capability variant with a maximum reasoning effort mode; V4-Flash is a cost-optimized option suited for high-throughput workloads where top-tier reasoning is not required.

Does DeepSeek V4 outperform GPT-5.5 or Gemini-Pro-3.1?
DeepSeek's own data shows V4-Pro trailing Gemini-Pro-3.1 on world knowledge benchmarks and leading other open-source models. No published direct comparison with GPT-5.5 is available from current sources.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn