Now You Have to Build the Chip Too — Vertical Integration Redraws the Stakes

OpenAI taped out Jalapeño, its first inference chip with Broadcom, nine months from design start. After Google’s TPU and Amazon’s Trainium, AI players owning the stack down to silicon is now the pattern. The point for founders isn’t the chip — it’s how few companies can play this game.

What Happened

On June 24, 2026, OpenAI unveiled Jalapeño, its first in-house chip, co-designed with Broadcom. It’s inference-only — the stage where a finished model answers users — while pre-training stays on Nvidia GPUs. The eye-catching part is the speed: nine months from design start to tape-out, which OpenAI calls one of the fastest cycles in the industry for a high-performance ASIC of this class, and it says its own models helped with the design. The part itself is a large, reticle-sized chip. On performance, the company only said early perf-per-watt is “substantially better than current state-of-the-art” and disclosed no hard numbers. Deployment is slated for late 2026 at gigawatt scale with Microsoft and other partners. And this isn’t OpenAI standing alone. Google has run TPUs since 2015; Amazon puts even training on its own Trainium silicon. Big tech that owns the models has been moving one by one into owning the chips, and OpenAI has now joined — late, but emphatically.

What This Means for Founders

The headline reads “reduces Nvidia dependence,” but the real signal is that the cover charge to sit at the table just dropped another layer deeper. A few years ago an AI company’s moat was a better model. Then it was data and distribution. Now there’s a new line underneath: your own silicon. Your model, the chip that runs it, the data center you plug it into — the number of companies that can stack that full thing is, worldwide, countable on your fingers. With Broadcom and Marvell effectively owning custom-ASIC design and TSMC fabricating all of it, only those who can show up with the capital and the volume get a seat. If perf-per-watt really beats incumbent chips, OpenAI gains room to push inference prices lower, to route its own workloads to its silicon first, or to bind specific models to its hardware. Building on someone else’s full stack means borrowing the whole ladder, and whoever lent it decides when it shakes. This is not abstract for a YC-stage team either. The default playbook — wrap a frontier API, ship fast — assumes the layer beneath you stays neutral. It is getting less neutral. FAANG-scale players integrating down to the wafer is exactly the kind of structural shift that quietly resets unit economics for everyone renting compute on top. Perf-per-watt is undisclosed, so how much inference economics actually move is still unknown. The direction isn’t. The real moat in AI is migrating from the application layer down to the silicon layer, and the guest list for that layer keeps getting shorter.

What You Can Do Now

First, be honest about which layer you’re fighting on. Chips and full-stack integration are a capital game, and a founder won’t win it. Don’t try to enter — claim the spot above it that the platforms won’t bother to occupy. Second, don’t funnel all your inference through one provider. With OpenAI integrating down to the chip, a product tied to a single model is exposed to that provider’s pricing and policy swings; one abstraction layer that lets you route a task to another model is cheap insurance. Third, build your moat on assets you control, not on the model or the chip. Your data, integrations wired into a workflow, depth in a specific industry, language, or regulatory regime — none of that moves when someone halves the price of silicon. Fourth, don’t read the integration wave as threat only. The more everyone descends into the substrate, the more the narrow, deep application niches sit empty. The gigawatt infrastructure they’re pouring is, to whoever knows what to sell on top of it, just a cheaper raw material.