India's Ziroh Labs pitches CPU-first AI compute as power-hungry GPUs face scrutiny

India's Ziroh Labs is positioning its Kompact AI runtime as a homegrown alternative to GPU-based AI compute, arguing that enterprise AI adoption in emerging markets will hinge as much on energy availability and hardware sovereignty as on model quality.

The Bengaluru-based company says its software delivers Nvidia A100-class inference performance on Intel Xeon CPUs, without resorting to quantization or distillation, and at a fraction of the power consumed by GPU racks.

Founded in 2019 with roots in fully homomorphic encryption (FHE), Ziroh Labs emerged from mathematical research led by CEO Hrishikesh Dewan, who is affiliated with the Indian Institute of Science, and guided by Whitfield Diffie, a Turing Laureate and co-inventor of public-key cryptography. The early work in encryption exposed the team to challenges related to transformer architecture before the generative AI boom.

"We wrote the entire stack from the ground up," senior VP Vineet Mittal told DIGITIMES Asia. "Most optimizations, about 70% of them, came from the science of AI. About 30% came from rewriting system software close to the CPU, carefully managing L1/L2 caches, parallelism, and memory."

A CPU-based path to AI scale

Ziroh Labs' Kompact AI runtime runs open-source LLMs at 164 tokens per second on a batch size of 1, comparable to Nvidia's A100. With Intel's newer Gen 6 servers, throughput approaches Nvidia's H100-class performance, the company claims.

Batching significantly increases throughput, reaching around 2500 tokens per second for enterprise use cases. A second innovation, an internal caching layer called Elephant, detects semantic similarity across queries, intermediate outputs, and final outputs. The company said repetitive workloads can hit up to 10,000 tokens per second, an order of magnitude jump (compared to A100 and H100), designed to reduce redundant inference cycles.

"Data centers are worried about GPU power draw," Mittal said. "A single GPU card can consume 750W to 1kW. That's untenable for India if we scale AI the same way the West is doing."

Ziroh Labs argues that India and much of Asia need CPU-first AI infrastructure, leveraging existing x86 and ARM-based servers already deployed at enterprises and government data centers. The theme echoes concerns heard across India's semiconductor policy circles and constrained domestic power availability.

Avoiding quality loss from quantization

All modern inference engines enable CPU inferencing by quantizing LLMs. Kompact AI by Ziroh Labs delivers full-precision inference without any quantization, running complete models in BF16.

The company runs models in BF16, the datatype used in training. According to the executive, quantization may accelerate responses but risks subtle semantic errors, particularly in long-context tasks such as legal or financial summarization.

However, if an enterprise prefers quantized or distilled models for specific workloads, Kompact AI fully supports those as well.

Mittal said, "Customers are already confused by fluctuating quality. It's too early to sacrifice correctness for speed."

The stance places Ziroh Labs in a different bucket from "llama.cpp-style" CPU accelerators, which prioritize small-footprint local inference. "Llama.cpp showed it's possible to run models on CPUs," Mittal said. "But throughput there is 4–10x lower than what we deliver."

India-first infrastructure, with an eye on sovereignty

The company believes CPU-first AI aligns with India's semiconductor and digital sovereignty ambitions. With the US tightening export controls on advanced accelerators headed to China and potentially other markets, relying on commodity x86 hardware mitigates geopolitical risk.

"Countries don't know today which chips will be banned tomorrow," Mittal said. "It's important to build infrastructure that runs on open standards and hardware you already control."

Ziroh Labs says its long-term vision still includes FHE, encrypted prompts, encrypted computation, and encrypted outputs, once transformer efficiency improves. "FHE is the ultimate solution for national-security-grade AI," Mittal said. "But the math is too expensive today. We expect the window to open in one to two years."

A software-first path before custom silicon

Asked whether the company plans to build its own ASIC in the future, Mittal said: "Not until the software stabilizes."

The company believes the rate of innovation in LLM architectures makes custom silicon a risky bet for new entrants. Instead, it aims to become the "virtual machine layer" for AI workloads, portable across Intel, AMD, and ARM servers, and eventually, even hybrid CPU-GPU environments.

"We want to adapt our runtime to whatever hardware emerges. Hardware should follow software, not the other way around," Mittal said.

Ziroh Labs collaborated with IIT Madras for the launch of Kompact AI and is actively advancing partnerships with leading academic institutions across India.

Why this matters for the semiconductor ecosystem

For the semiconductor supply chain, Ziroh Labs' proposition intersects with three trends shaping India's AI strategy:

Power-constrained AI deployment: India's data center capacity is growing, but utility power is not, making CPU-based inference attractive for scale.

Shift toward open-source AI models: As enterprises reject dependence on closed-source models, CPU-native open-model inference becomes a strategic layer.

Diversification away from Nvidia: Even Indian hyperscalers face long lead times and rising costs for A100/H100-class GPUs.

By arguing that software-driven CPU optimization can delay or reduce GPU dependency, Ziroh Labs positions itself as a technical and policy-aligned player in India's semiconductor narrative, one where AI sovereignty, supply chain resilience, and energy efficiency are now core themes.

Article edited by Jack Wu