CONNECT WITH US
Sign out

Cerebras outpaces Nvidia in video showdown at SuperAI Singapore, making its case against GPU dominance

, DIGITIMES Asia, Taipei
0

Credit: Joseph Chen

Andy Hock, chief strategy officer at Cerebras Systems, walked onto the Main Stage at SuperAI Singapore 2026 on Wednesday carrying the company's Wafer Scale Engine — the physical chip itself — and held it up for an audience of 10,000 before placing it next to a slide showing it to scale against Nvidia's latest B200 platform. The size difference was stark.

The visual was deliberate. Cerebras has built its commercial case on the argument that GPU architectures, however capable, were not designed for AI workloads and cannot keep pace with where those workloads are heading. Showing the wafer in person, next to its most recognizable competitor, was the company's most direct statement of that argument yet.'

Credit: Digitimes

Credit: Digitimes

The chip is large by any measure in semiconductor history. The Wafer Scale Engine spans an entire silicon wafer, making it more than 50 times larger than a conventional chip. It carries 900,000 cores, all designed for the sparse linear algebra operations common to AI training and inference, all directly connected over silicon, and all with access to on-chip SRAM memory. The effect, Hock argued, is to consolidate what would otherwise require an entire GPU cluster onto a single device. The supporting system, roughly the size of a hotel room refrigerator, fits into a standard data center rack.

Cerebras did not build large because large looked impressive, Hock said. "We built big chips because AI wants big chips. AI needs massive compute, and it needs that compute to be close together with high communication bandwidth."

From satellite imagery to custom silicon

The company's origin is directly relevant to the problem it is trying to solve. Cerebras was founded by a team doing satellite imagery analysis using AI who found that available chips could not deliver the inference speed their workloads required. Rather than wait for the chip industry to catch up, they built their own. That founding decision — custom silicon driven by a specific AI bottleneck — mirrors a pattern now playing out across the industry, from Google's TPUs to Amazon's Trainium to Apple's Neural Engine.

Hock joined the company nine years ago and has watched it move from delivering one or two systems at a time to building clusters every single month, each equivalent in AI inference and training capacity to hundreds or thousands of GPUs.'

Credit: Digitimes

Credit: Digitimes

The 1,000-fold compute escalation

The core of Hock's argument was a compute demand curve that makes the GPU bottleneck increasingly difficult to ignore. A standard single-shot query to a model like GPT-4o requires one unit of inference compute. A reasoning model that breaks a problem into steps, checks its own answers, and iterates requires roughly 100 times that amount per query. Agentic workflows — where a single prompt spawns multiple autonomous daughter agents that plan, write code, verify outputs, and collaborate over hours or days — push the requirement to 1,000 times or more.

At that scale, GPU latency is not a performance inconvenience. It is a structural barrier to the kind of interactive, real-time agentic applications that enterprises are beginning to demand. "Speed isn't just nice to have. It's required," Hock said. "Speed gives you differentiation. Not only is speed required, but speed is really intelligence."

Credit: Joseph Chen

Credit: Joseph Chen

Video demos against the fastest GPUs on the market

Hock showed two side-by-side video demonstrations on stage, both using Meta's Llama 4 Maverick model — the same model, the same query, run on Cerebras hardware and a GPU-powered system described as among the fastest GPU implementations currently available. The Nvidia name appeared earlier in the presentation in the context of the physical size comparison on slide, but the GPU systems in the video demonstrations were not identified by a specific brand.

In the first video, both systems were prompted to implement the Tetris video game in Python. In the video, the Cerebras system completed the task while the GPU-powered system was still processing. The second video showed both systems prompted to plan a detailed two-week road trip from New York to San Francisco. The Cerebras system finished the entire itinerary before the GPU system had worked past Tuesday of the first week.

The reaction from the audience was audible. Hock acknowledged it directly: "It's not magic. It's not a different model. What it is is the power of the right architecture under the hood."

Credit: Digitimes

Credit: Digitimes

OpenAI, AWS, and the disaggregated inference architecture

The company's two most significant commercial partnerships reflect different approaches to deploying its hardware at hyperscale.

OpenAI has selected Cerebras for fast inference workloads, with 750 megawatts of data center capacity being built over several years to support OpenAI's customers. The scale of that commitment places it among the larger dedicated infrastructure partnerships in the current AI buildout.

The AWS partnership is architecturally more significant for semiconductor readers. AWS will become the first hyperscale cloud provider to deploy Cerebras natively, using a disaggregated inference model that splits the workload between two different chips. Amazon's own Trainium silicon handles the prefill phase — the initial processing of the input prompt — before handing off to Cerebras hardware for the decode phase, where the model generates its output tokens. The split reflects a practical acknowledgment that Trainium alone is not fast enough for the latency-sensitive decode phase, where speed most directly determines how interactive an application feels to the end user.

"Inference is the component where the model actually does something useful for the customer," AWS vice president of compute Dave Brown said in remarks Hock quoted on stage. "Speed and performance are a significant bottleneck."

The disaggregated architecture has broader implications for how hyperscalers will combine custom and third-party silicon going forward. Rather than a winner-take-all dynamic between competing chip architectures, the AWS model points toward a division of labor based on the specific performance characteristics of each phase of the inference workload.

Beyond OpenAI and AWS, Hock cited partnerships with Meta, Cognition, GSK, Notion, and MRO as part of a growing base of customers using Cerebras inference through cloud or direct channels.

Credit: Digitimes

Credit: Digitimes

The IPO and what comes next

Cerebras completed its initial public offering earlier this year, raising $5.5 billion in what organizers described as the largest IPO of the year to date. Shares rose 68% on the first day of trading, giving the company a valuation of close to $100 billion.

Hock closed by framing the company's next phase as a partnership problem as much as a technology one. "We're a great computer builder, but we're really bad at just delivering compute and then walking away," he said. "We're actually best where we partner."

The TSMC partnership behind the wafer

The chip Hock brought onto the stage on Wednesday would not exist without a meeting in Taipei in August 2017. Cerebras posted a note on X on June 8, two days before the conference opened, describing how the company's founders approached TSMC's senior leadership with nothing more than a PowerPoint presentation and a proposition: that they could build the largest chip in the history of the computer industry.

Credit: Digitimes

Credit: Digitimes

TSMC greenlit the project in that meeting.

"TSMC is perhaps the greatest manufacturing company in the world," the company wrote. "We could not have done it without them. They have been an extraordinary partner in every way."

The post was accompanied by a photograph taken at TSMC's headquarters in Taiwan, showing TSMC chief executive C.C. Wei alongside Cerebras chief operating officer Dhiraj Mallick and TSMC senior vice president Kevin Zhang. The timing of the post — published during Jensen Huang's high-profile Taiwan visit and days before Hock's SuperAI appearance — appeared designed to establish the depth of the TSMC relationship ahead of the conference.

The Wafer Scale Engine that resulted from that 2017 meeting now carries four trillion transistors, which Cerebras says makes it 58 times larger than its largest competitor and the fastest AI processor currently available. Every one of those wafers comes from TSMC.

Article edited by Joseph Chen