Indian startup targets AI inference opportunity with full-stack compute platform

While global AI infrastructure investment remains concentrated around massive GPU clusters for training frontier models, Indian startup Turiyam.ai is betting on a different commercial reality: the dominance of inference.

The company is developing a full-stack AI compute platform specifically for sub-100-billion-parameter models, arguing that India's AI demand will be driven overwhelmingly by deployment rather than foundational training.

"If you are not building foundational models, then you are not doing training at all," Turiyam co-founder and CEO Sanchayan Sinha told DIGITIMES Asia. "If I have to make a prediction, 95% [of the Indian market] would be on inference only."

This distinction is becoming a strategic pivot point for India's semiconductor ecosystem. While training requires the kind of capital outlays currently reserved for US and Chinese hyperscalers, inference represents the first large-scale commercial opportunity for local infrastructure players to serve enterprises, governments, and consumer platforms.

Designing for enterprise scale

To capture this market, Turiyam is positioning itself as a "semiconductor slash full-stack company." The platform is being designed around models below 100 billion parameters, rather than the trillion-parameter models pursued by major global AI companies.

"Most enterprise use cases don't need such large models," Sinha said. "We are trying to optimize on less than 100 billion parameter models and build silicon, which is way more optimized on the TCO side."

Sinha argued that many enterprise workloads, including image, speech, and video, can be served by these smaller models if the underlying compute stack is optimized for the forward-pass nature of inference.

Proprietary architecture and software stack

Turiyam develops its own accelerator hardware, software stack, and orchestration layer, while utilizing standard servers rather than designing them in-house. Co-founder Praveen Jain noted that Turiyam follows the vein of Nvidia by moving up the stack from silicon into a full-stack solution.

"We are definitely a semiconductor company first," Jain said. "If you look at Nvidia, they started as a semiconductor company, then they built the software stack. As they evolved, they kept going higher and higher in the layers to essentially create a full-stack solution."

Sinha described the proprietary chip architecture as a combination of CPUs for the control plane and tensor processing units arranged in tiles with local memory. Crucially, the architecture does not use cache coherence. "We can actually go ahead and divide the layers and optimize," Sinha explained, noting that inference workloads are largely forward-pass operations.

C-DAC validation and NTT partnership

The company recently demonstrated technical validation through C-DAC, inserting its PCIe-based accelerator card into indigenous Rudra 1 and Rudra 2 servers.

"What we did was put in our GPU accelerator over there, and run an Indic model on top of it, just to go ahead and show that you have the server validation in terms of the software," Sinha said.

Beyond lab validation, Turiyam has partnered with NTT Global Data Centers for deployment on standard servers from vendors such as ASUS.

"We rack them up, and then NTT provides all the power and space, a very standard data centre way of dealing with it," Sinha noted.

While the company is currently pre-revenue, Sinha expects the "first set of dollars" to arrive in roughly one quarter.

The TCO and supply-chain case

Turiyam's central commercial claim is that an inference-focused stack can offer significantly better total cost of ownership than Nvidia-based infrastructure, depending on the model.

"We are at least 5 to 10x," Sinha said regarding the indexed comparison.

The company did not disclose detailed benchmark data during the interview, including latency, throughput, power consumption, utilization, or the Nvidia system used for comparison.

Sinha said Turiyam is targeting compute-heavy, media-related workloads such as micro-videos and image generation, where the economics of expensive GPU infrastructure can limit commercial viability.

Sinha also cited supply-chain resilience as a key differentiator. He said Turiyam's non-Nvidia approach and lack of dependence on HBM could give it a supply-chain advantage, while also referring to TSMC's 6nm in the context of current supply. The company is also evaluating 4nm for future chips.

"Given that it's a non-Nvidia solution... we have a much better supply chain," he added.

Article edited by Jack Wu