OpenAI Is Making Its Own Chips — Watch the Bottom of the AI Stack

OpenAI and Broadcom unveiled Jalapeño, OpenAI’s first custom inference chip. Better performance-per-watt than today’s GPUs, roughly half the cost. For founders building on the model API, this is not chip trivia — it’s a question of who controls inference pricing and the compute supply chain.

What Happened

On Wednesday, June 24, 2026, OpenAI revealed Jalapeño, the first inference chip it designed itself, built with Broadcom under a partnership the two announced last October. It went from concept to silicon in about nine months. This is an inference chip, not a training one — it runs models that already exist, answering user requests in real time, while pre-training stays on Nvidia GPUs. A nice detail: OpenAI’s own models helped design it. The early numbers the company shared are twofold — performance-per-watt clearly ahead of state-of-the-art GPUs, and operating cost roughly half that of a typical AI GPU. They leaned hard on the case for running real-time coding models like Codex cheaply. With this, OpenAI now owns the full stack: chip, kernels, memory, networking, scheduling, deployment. And this is not OpenAI acting alone. Google has run TPUs since 2015, Amazon has Trainium and Inferentia, Microsoft has Maia, Meta has MTIA. With inference now roughly two-thirds of all AI compute, every hyperscaler is moving down the stack to silicon outside Nvidia. Even the players who don’t design their own chips lean on the same two co-design houses — Broadcom and Marvell control an estimated 95% of that market — and TSMC fabricates nearly all of it.

What This Means for Founders

On the surface this reads as a corporate-strategy story: OpenAI reducing its Nvidia dependence. But if you’ve built a product on top of a model API, the floor of your cost structure just shifted. Inference already eats north of 20% of revenue at many AI-native companies. If the chip running that inference drops to half the cost, that’s good news in the short run. The real signal is that control is concentrating further into one company’s hands. They own the model. Now they own the chip that runs it. They can cut prices, prioritize their own workloads on scarce silicon, or bind specific models to their own hardware. Building on someone else’s full stack means renting the entire ladder, and a rented ladder can be pulled. I’d weigh that against the upside everyone is celebrating. The same verticalization wave is happening across the Valley — Google, Amazon, Microsoft, Meta, and now OpenAI all running serious custom-silicon programs — which means a thin wrapper over a single provider’s API is exactly the kind of business that gets squeezed when that provider integrates downward. A cheaper inference dollar doesn’t widen your moat if the provider sets the price and can change it tomorrow. Model access was never a moat; now the infrastructure the model runs on isn’t yours either. Write that line into your cost assumptions before you raise on margins you don’t control.

What You Can Do Now

First, don’t route your entire inference supply chain through one provider. With OpenAI now integrating all the way down to silicon, a product locked to a single model vendor is exposed to that vendor’s pricing and policy swings. Put an abstraction layer in front so the same task can run on a different model. Second, don’t let “half the cost” lull you — model your own per-token unit economics directly. You don’t set the price; the provider does, and when its assumptions change, so does your P&L. Third, build the moat on assets you control — your data, integrations wired deep into the workflow, domain knowledge — the things that survive anyone halving the price of a chip. Fourth, read the verticalization wave as opportunity too. The more the giants integrate their own stacks, the more the narrow, deep slots in specific industries, languages, and regulatory regimes are left open. When everyone consolidates the bottom of the stack, the founder’s seat isn’t the thin app layer on top — it’s the corner the giants won’t bother to enter.