China's 2nm AI GPU hits prototype, production unclear

Shanghai Dishan Technology has advanced its 2nm-class AI GPU into the prototyping stage, marking a rare push by a China-based designer into cutting-edge AI silicon — though manufacturing access and software readiness remain major hurdles.

Founded in 2021, the Shanghai-based firm focuses on high-performance computing and sensor chips and is led by Zhang Zhenjun, whose background spans FD-SOI processes and image sensing. Its team includes engineers from companies such as Texas Instruments and Infineon, though details on its leadership and foundry partners remain limited.

Prototype stage shifts focus to validation

According to the Shanghai Morning Post, the chip has completed key design stages and entered prototype verification. Dishan targets mass production by 2030, though industry sources point to a more immediate test: whether the design can reach tape-out within the next one to two years, with commercialization potentially between 2028 and 2029.

The focus has shifted to late-stage validation — including system-level verification, timing closure, yield optimization, and software adaptation — the core steps that determine whether a design can move from simulation to silicon.

The company has presented its architecture and simulation data at industry events, offering a clearer view of its design capabilities while still withholding a confirmed tape-out timeline.

Architecture targets AI training and inference

The design combines a hybrid FinFET–GAA process with chiplet-based integration, built around Dishan's in-house DS-Core. The chip packs roughly 170 billion transistors into an 800 mm² die and uses 2.5D CoWoS-L packaging to boost interconnect density and thermal performance, aligning with the industry's shift toward packaging-driven scaling.

Performance simulations show scaling across multiple precision formats:

● FP32: about 50 TFLOPS
● FP16: about 100 TFLOPS
● FP4: up to 400 TFLOPS

This positions the chip for both AI training and inference, with energy efficiency improving by about 40% while keeping power consumption below 350W.

The chip integrates HBM4 with up to 48GB per stack and bandwidth of 3.2TB/s, improving data throughput for large-model workloads. Interconnect latency falls below 0.25ns/mm, while a microfluidic cooling design helps stabilize operating temperatures and reduce thermal risk.

It also supports NVLink 6-compatible interconnects and is designed to run within the CUDA ecosystem, reducing migration barriers for developers.

Manufacturing remains the key uncertainty

Despite strong simulation results, the chip has yet to reach tape-out — the point where design claims meet manufacturing reality.

Earlier plans to complete pre-tape-out validation by end-2025 and begin mass production in early 2026 have slipped, reflecting constraints in advanced-node foundry access, EDA tooling, and yield ramp complexity.

Industry observers cited by Sohu note that only a handful of foundries — TSMC, Samsung Electronics, and Intel — are capable of producing 2nm-class chips, leaving manufacturing pathways uncertain for China-based designers.

While Dishan's simulated performance approaches that of Nvidia's H100 and H200, the gap remains wide in manufacturing scale, software ecosystem depth, and long-term reliability — the factors that ultimately determine commercial success.

Article edited by Jerry Chen