An Inference Chip Just Gave Back Its Margin — Cerebras' 47%-to-38% Drop Is a Warning

Cerebras stock fell nearly 20% after earnings. It cut full-year gross margin guidance to 38-41%, down from the 47% it posted in Q1. The CEO called it a misunderstanding, but a data-center space shortage is forcing it to rent its own systems back from a customer, shaving 10-15 points off margin. Startups leaning on cheap inference should take note.

What Happened

Cerebras beat expectations on Tuesday with its Q1 numbers. Then the stock dropped almost 20% on Wednesday. One figure did the damage: the company guided full-year gross margin in its core business to 38-41%, well below the 47% it had just reported for Q1. CEO Andrew Feldman went on CNBC’s “Squawk on the Street” and said investors had “misunderstood” the guidance. “We laid out a plan at the start of ‘26. We shared that plan as we went public a few months ago, and we’re beating that plan.” The margin drop, the company insists, comes not from pricing pressure or cost overruns but from a temporary operational decision. To put AI compute capacity in the market faster, Cerebras chose to temporarily rent its own systems back from one of its largest customers while it builds and deploys its own data-center footprint. CFO Bob Komin explained that a severe shortage of data-center space is forcing the company to lease equipment back from customers and stand up its own capacity — a move that will drag margins down by 10 to 15 percentage points this year.

What This Means for Founders

On the surface this is one chip company’s accounting wrinkle. Underneath it is a crack in inference economics. The real reason Cerebras gave back margin — from 47% to 38% — is that there is nowhere to put the compute. There isn’t enough physical space to run AI workloads, so the company is renting its own gear back from a customer despite having the chips. That bottleneck isn’t unique to Cerebras. When compute gets scarce, the cost eventually flows down into per-token inference pricing and lands on the cost sheet of any startup running its product on someone else’s API. For founders who spent two years watching token prices fall, this is an uncomfortable signal. The model price curve has trended down, but the physical layer beneath it — chips, power, data-center floor space — trends up, and the two don’t move in the same direction forever. Just as a startup that scaled on AWS learned that S3 and egress fees, not the EC2 sticker price, decided its margin, AI infrastructure has invisible physical costs that set yours. If you run an AI-native product where inference eats a large chunk of revenue, one supplier’s margin guidance is your cost risk. Feldman may be right that this isn’t structural pricing pressure. But if a chipmaker itself is cutting margin for lack of space, anyone building on top of it should stop treating “cheap inference” as a permanent assumption.

What You Can Do Now

First, don’t funnel all your inference through a single supplier. Tie yourself to one chip or one cloud and that company’s data-center problems become yours. Second, stop baking permanent price cuts into your model. Build a revenue plan on the assumption that token prices keep falling and your margin collapses the moment an infrastructure bottleneck pushes prices back up. Third, make cost-efficient design — the same output for less compute — your moat. Caching, routing tasks to the right-sized model, and cutting needless calls are what survive when inference pricing wobbles. Fourth, read supplier earnings and guidance as cost signals. A chipmaker’s margin print today shows up on your invoice six to twelve months later.