How Nvidia's next-gen GPUs are fueling an inference supercycle

Nvidia on May 28, 2025, reported a strong performance in its Data Center segment during its first quarter fiscal 2026 earnings call, with revenue reaching US$39 billion, a 73% increase year-over-year. This significant growth was attributed to the accelerating adoption of AI workloads, a robust transition towards inference, and the ongoing buildout of AI factories by customers worldwide.

Blackwell architecture drives record ramp

The rapid deployment of Nvidia's new Blackwell architecture is a primary driver of this growth. Described as the fastest ramp in the company's history, Blackwell contributed nearly 70% of data center compute revenue in the first quarter of 2026, indicating that the transition from the previous Hopper architecture is nearly complete.

The GB200 NVL system represents a fundamental architectural change designed to enable data-center-scale workloads and achieve the lowest cost per inference token. Manufacturing yields for these complex systems are improving, and rack shipments to end customers are accelerating. GB200 NVL racks are now generally available for model builders, enterprises, and sovereign customers.

Major hyperscalers are deploying nearly 1,000 NVL72 racks (72,000 Blackwell GPUs) per week on average and plan to further ramp output in the second quarter of 2026. Microsoft, for example, has already deployed tens of thousands of Blackwell GPUs and plans to ramp to hundreds of thousands.

Nvidia is also progressing with its product roadmap, with Blackwell Ultra and GB300 systems sampling at major CSPs and expected to begin production shipments later this quarter.

GB300 systems are designed for seamless transition, leveraging the same architecture and footprint as GB200. The B300 GPUs, featuring 50% more HBM, are expected to deliver a 50% increase in dense FP4 inference compute performance compared to the B200. The company remains committed to an annual product cadence extending through 2028.

Surge in inference demand drives reasoning and agentic AI

A critical factor driving demand is a sharp jump in inference demand. Customers like OpenAI, Microsoft, and Google are observing a step-function leap in token generation.

Microsoft, for instance, processed over 100 trillion tokens in the first quarter of 2026, a five-fold increase year-over-year in Azure OpenAI. This surge is linked to the rise of reasoning AI and agentic AI, which are significantly more compute-intensive than previous models, potentially requiring hundreds to thousands of times more tokens per task. Reasoning AI enables step-by-step problem-solving, planning, and tool use, transforming models into intelligent agents.

Nvidia's Blackwell architecture, particularly GB200 NVL72, is highlighted as the ideal computer thinking machine for reasoning AI. Compared to Hopper, Grace Blackwell offers a significant step-up in inference performance, with up to 40x higher speed and throughput reported for GB200 compared to Hopper.

Software optimizations like Nvidia Dynamo are significantly turbocharging inference throughput for new reasoning models, with reported improvements of up to 30x on Blackwell NVL72 for models like Llama 3.1. Nvidia expects to continue improving Blackwell performance through software optimizations, similar to the 4x inference performance increase achieved with Hopper over two years. Inference-serving startups are also leveraging B200 to significantly increase token generation and revenue for high-value reasoning models.

AI factories accelerate global buildout

The pace and scale of AI factory deployments are accelerating, with nearly 100 Nvidia-powered AI factories currently in flight this quarter, a twofold increase year-over-year. The average number of GPUs powering each factory has also doubled.

These factories are being built across various industries and geographies, supporting strategic sovereign clouds and enterprise AI initiatives. Industry leaders such as AT&T, BYD, Capital One, Foxconn, MediaTek, and Telenor are building these factories. Strategic sovereign clouds are being deployed in countries like Saudi Arabia, Taiwan, and the UAE. Nvidia has a line of sight to projects potentially requiring tens of gigawatts of AI infrastructure.

Jensen Huang emphasized that AI is becoming essential infrastructure for every economy, similar to electricity and the internet. Countries are racing to build national AI platforms to elevate their digital capabilities. The buildout of this infrastructure is described as being in its very early stages.

Enterprise and industrial AI emerge as growth pillars

Beyond cloud and sovereign deployments, AI is expected to move into the enterprise, particularly on-premise, due to the importance of data access control. Nvidia is introducing products like the RTX PRO Enterprise AI Server and DGX Spark, and DGX Station systems designed for enterprise and developer on-premise needs. Enterprise AI is seen as ready to take off, supported by computing systems that integrate enterprise IT stacks with AI.

Industrial AI is also emerging as a key pillar of growth. The trend of onshoring manufacturing and building new plants worldwide is creating demand for AI-powered factories and robotics. Technologies like Omniverse and Isaac GR00T are powering next-generation factories and humanoid robotic systems. Every factory is expected to have an associated AI factory.

Networking solutions prove crucial for massive deployments

To support these massive AI factory deployments, Nvidia's networking solutions are crucial. Sequential growth in networking resumed in the first quarter of 2026, with revenue up 64% quarter-over-quarter to US$5 billion. Key networking platforms include:

NVLink serves as a scale-up platform to create larger computer systems, now in its fifth generation with high bandwidth, exceeding US$1 billion in first quarter 2026 shipments. NVLink Fusion allows partners to connect directly to the Nvidia platform.

Spectrum-X provides enhanced Ethernet optimized for AI workloads, delivering high throughput and low latency. Spectrum-X is annualizing over US$8 billion in revenue and saw widespread adoption, adding Google Cloud and Meta as customers in the first quarter of 2026. It improves Ethernet utilization in AI clusters from as low as 50% to 85-90%.

BlueField functions as the control plane used for storage, security, and creating high-performance, multi-tenant clusters.

These four networking platforms, including InfiniBand, are all reportedly growing well.

Article edited by Jerry Chen