Canada-based CentML harnesses software optimization to maximize LLM performance

CentML CEO Gennady Pekhimenko (second from right) and other co-founders of the company.

As AI adoption rises, the demand for computing power and enhanced performance also grows.

The situation creates opportunities for companies such as CentML, which helps customers optimize their Machine Learning (ML) models with available hardware. The startup has attracted investment from tech giants such as Google and Nvidia.

Founded in 2022, CentML represents the collective effort of current and former PhD students and professors at the University of Toronto and industry veterans. CEO Gennady Pekhimenko, also a Professor at the university, said his research group, the predecessor of CentML, predicted a long time ago that hardware would be one of the bottlenecks for the growth of machine learning and focused on improving the utilization of existing hardware.

Pekhimenko said they observed frequent mismatches between existing hardware and the ML models that run on it. CentML aims to tell customers the best performance per dollar they can achieve on various hardware, offering solutions to severely optimize their ML models while reducing the cost of running those models. The company targets both enterprises and large or medium-sized cloud providers.

Solutions that serve enterprise and cloud providers' needs

According to Pekhimenko, CentML has developed four software-based products, including CServe and CentML Platform. Both are commercial offerings. CServe is a Large Language Model (LLM) inference engine and deployment interface that allows users to run their models on different hardware.

Pekhimenko said the end-to-end solution removes the major complexity from customers in deploying LLMs, such as which hardware to choose and how to reduce costs for their deployment. CServe can also serve enterprises and cloud providers' unique needs.

Instead of simply asking for the lowest latency or the best throughput, like many startups, those big companies have more complicated requirements. Pekhimenko said they might, for example, simultaneously require a specific level of latency for 99% of the request, while achieving the maximum possible throughput, and the lowest possible price on their hardware.

"They want to get the best performance per dollar," Pekhimenko added.

CentML Platform is also called CCluster. According to Pekhimenko, the solution can optimize any machine learning model and is tightly integrated with the hardware. It is built on cloud providers' offerings or raw hardware, such as Nvidia's DGX boxes.

Pekhimenko said the common practice is training and deploying LLMs on independent clusters or servers. With CCluster, customers can tackle both tasks on a single server without incurring additional expense. The solution creates almost no negative effect on the performance of training and inference, he added.

CentML also offers two open-source solutions: DeepView and Hidet. Pekhimenko said DeepView, a machine learning profiler and predictor, is integrated with the popular machine learning framework PyTorch. It can predict how well a model will run on any of the popular hardware used for ML.

Hidet is a machine-learning compiler. According to Pekhimenko, Hidet generates compute kernels directly from Python code. This would require less engineering effort to generate efficient code for many unique models and layers.

Smart approaches help attract tech giants' attention

As a young company, CentML has already caught the eye of big tech players. In September 2023, the company finished a US$27 million seed funding round led by Gradient Ventures, Google's AI-focused venture fund. Radical Ventures, Nvidia, Deloitte Ventures, and Thomson Reuters Ventures also participated in the round.

"It's hard and easy at the same time," Pekhimenko replied when asked about how CentML won over tech giants.

He said that as a startup, CentML can invest and deliver its mission fast. The company also needs to work smarter. For example, it cannot use a manual approach to write kernels because it does not have that many engineers, he added.

Pekhimenko said CentML's solutions can improve Nvidia GPU's performance and utilization with only a few engineers. He believed this was the main reason the AI computing leader invested in CentML.

Following the investment from Gradient, CentML has maintained an ongoing conversation and partnership with Google, which let the tech giant know how CentML's technology can benefit its cloud customers.

Pekhimenko said CentML's profiler DeepView can help Google's engineers efficiently determine whether they should run specific workloads on Nvidia GPUs or Tensor Processing Units (TPU), and when the practice would benefit customers. He said engineers usually must do manual performance exploration or tuning to find the answer, which could be very time-consuming.

Expanding globally

Headquartered in Toronto, CentML is poised to expand internationally. Pekhimenko said the company opened an office in Palo Alto, the US, in March 2024. It also has the ambition to grow in the Asian market, with Taiwan being one of the top candidates. He said the island hosts a top-notch semiconductor industry while offering many opportunities in the software sector.

Asia has become one of the primary adopters of AI technology, Pekhimenko said. He added that the potential growth of the Asian AI market could be even higher than that of North America. CentML looks forward to bringing its technologies to customers in the region.