When Silicon Valley Cheers a Chinese Model — How Open Weights Rewrite Your Cost Sheet

DeepSeek V4 runs at $3.48 per million output tokens. OpenAI charges $30 for the same job, Anthropic $25. Now that Silicon Valley is praising a Chinese model whose weights you can download and self-host, the founder’s question isn’t whether to use it — it’s whether to keep your cost sheet chained to a closed API.

What happened

Hangzhou-based DeepSeek shipped V4 in April. Two flavors: V4-Pro, which it claims rivals top closed models, and a smaller, cheaper V4-Flash. The price is the story. V4-Pro costs $3.48 per million output tokens; V4-Flash, $0.28. The same volume runs $30 at OpenAI and $25 at Anthropic. That’s not a few percent — it’s close to an order of magnitude. And DeepSeek keeps to its open-weight playbook: download the weights, modify them, run them on your own servers. How good is it? MIT Technology Review pegs V4 as marginally behind GPT-5.4 and Gemini 3.1 Pro — roughly three to six months off the frontier. As with R1 a year ago, the point is that the gap between open and closed has narrowed from an uncrossable cliff to a months-long lag. Layer on Huawei announcing full Ascend-chip support for DeepSeek’s models, and you can see the outline of a China-origin stack designed to cut Nvidia dependence. Silicon Valley’s enthusiasm isn’t ideological — it’s that the model delivers comparable output at a tenth of the price.

What it means for founders

On the surface this is another “AI got cheaper” headline. Underneath sits a fork in your cost structure. A founder built on a closed API is a price-taker: you copy the vendor’s rate card, you don’t set token prices, the model changes and your outputs change, the terms change and your business wobbles. Open weights invert that relationship. Hold the weights and the model stops being a service living in someone else’s cloud and becomes an asset running on your own infrastructure. For a product that burns inference at scale, that difference is margin. The gap between $25 and $3.48 per million tokens is the line between profit and loss for a company baking hundreds of millions of tokens a month. But there’s no free lunch. Running open weights yourself means renting GPUs and operating a serving stack, and at low traffic that overhead can cost more than a closed API call. So this isn’t “switch to DeepSeek” — it’s “who controls your cost sheet.” Be honest about geopolitical risk, too. The weights may sit on your hardware, but the model’s provenance, regulatory fit, and data-handling policy are separate items to vet — especially if you serve US or EU customers or touch sensitive data. The reasons open weights are attractive and the reasons they’re risky come from the same source.

What you can do now

First, hide inference calls behind an abstraction layer. Don’t wire your code directly to one model; build so a closed API and an open-weight model swap behind the same interface. When prices or terms shift, swapping the model without rewriting code is your real leverage. Second, split your workloads. Route the quality-critical path to a top closed model and the high-volume, structured work to cheap open weights — running everything through one model is the most expensive design there is. Third, before adopting open weights, compute the actual all-in cost of both paths — self-serving versus a hosted API — at each traffic tier. Token price alone is a trap. Fourth, document model provenance and data policy as a compliance line item. Adopting something because it’s cheap can cost you more later in regulatory exposure and customer trust.