AI chipmaker Cerebras’ epic IPO launch this week is another example of the shift in the AI infrastructure market toward inference, and of how AI labs and vendors are diversifying beyond just Nvidia GPUs. But the current interest in inference is unlikely to be enough to sustain Cerebras’s high IPO valuation.
The 2015 startup went public on May 14 at a price as high as $386 per share, pushing the AI hardware vendor to a valuation of about 100 billion. It also raised about $5.55 billion, making its IPO the largest for a tech company so far in 2026.
While this is not the first time Cerebras has gone public (the vendor had an IPO in 2024 and pulled back in 2025), the success it has seen this time around is telling of the current market, according to Gartner. The AI inference market is seeing significant interest. Spending on inference is expected to surpass training for the first time in 2026. Moreover, many enterprises are focusing more on their inference costs and trying to manage them. AI hardware giants such as Nvidia are also paying attention to the shift. For instance, Nvidia agreed to license chipmaker startup Groq’s inference technology in December 2025, for $20 billion in cash.
“Nvidia’s acquisition of Groq and the rollout of the Vera Rubin architecture signal that the incumbent is moving aggressively to occupy the fast inference category,” said Brendan Burke, an analyst at Futurum Group.
Cerebras’ Benefits
Cerebras is also betting on AI inference “becoming the dominant AI infrastructure for large reasoning models,” said Kashyap Kompella, CEO of RPA2AI Research. “That is a narrow but plausible and defensive bet.”
For small parameter models with tens to low hundreds of billions of parameters, Cerebras has a small advantage over giants like Nvidia because its technology is proven to be able to handle those models with little latency. But it has yet to be shown if its technology is as efficient with larger models, especially if the model provider is deeply integrated within the Nvidia stack.
However, Cerebras’ recent partnerships with OpenAI and AWS have also given it footing to compete with market leader Nvidia and other big chipmakers, even though Cerebras is considerably smaller in size. While the vendor’s deal with OpenAI in January for 750 megawatts of ultra-low-latency compute capacity, the deal had expanded in May to $20 billion through 2028, with OpenAI using Cerebras’ CS-3 systems for real time inference. Meanwhile, a deal with AWS made AWS a home for Cerebras’ architecture.
Despite this advantage, the vendor has not proven that frontier model providers like OpenAI can move off Nvidia’s ecosystem completely without a large economic cost or even having some problems with engineering the whole AI stack, Kompella said.
Moreover, the vendor’s unique wafer-scale engine (WSE) system — the world’s largest computer chip, designed to keep the entire AI model on a single chip, unlike GPUs — could also prove to be a differentiator in the market compared to Nvidia in the age of agentic AI and reasoning models.
“The WSE-3 shines in real-time reasoning because it eliminates the inter-chip communication delays that cripple traditional GPU clusters during the token generation phase,” Burke said. “For the next generation of autonomous agents, the ability to think in milliseconds makes wafer-scale silicon a critical necessity for a viable user experience.”
Cost and Other Challenges
However, while the WSE architecture requires technical support on a case-by-case basis and a high price tag, which would require those interested in it to determine whether it is worth using, he continued. Moreover, the vendor’s operating costs are high, indicating that it is an “expensive, low-volume specialty product rather than a mass-market commodity,” Burke said.
“Cerebras faces increasing pressure to prove its speed leads to a premium capital expenditure,” he said. “Cerebras must dominate the high-complexity market to overcome the pricing disadvantage inherent in its massive silicon footprint.”
The vendor must also show that its system can work well with other systems, said Gaurav Gupta, an analyst at Gartner.
“It isn’t about being a better play as a standalone hardware vendor, but how it integrates into the most efficient end-to-end system,” Gupta said. He added that Cerebras faces other obstacles, such as its niche architecture, which limits scalability.
A Need for Choice
For enterprises, though, the significance of Cerebras’ entry into IPO is that it allows them to monitor the cost of inference and the need for choice outside of Nvidia, Kompella said.
“Model choice and hardware choice are increasingly linked,” he said. “Knowing which hardware runs which model best, at what cost and latency point, is becoming a core infrastructure competency.”
For many enterprises, the answer will probably be a hybrid approach in which they use Nvidia for training and broad workloads and specialist accelerators such as those from Cerebras for high-volume commodity serving, Kompella added.
“Enterprises running agentic workflows or real-time reasoning applications at scale should run a structured evaluation of Cerebras as an inference layer alongside their existing stack,” he said.

