Since 2023, Generative AI (GenAI) has been the main momentum driving the electronics industry forward. Its developmental vision has gradually extended from the cloud to the edge, with concepts like AI smartphones and AI PCs becoming popular keywords in the market. However, the high hardware requirements and computing costs of GenAI models make introducing them into edge applications challenging.
Specifically, GenAI's massive demand for memory is an issue that's difficult to bypass regardless of how powerful the computing chips are. Nevertheless, the rapid evolution of AI technology has seen the emergence of more and more new technologies. There is the potential that the hardware thresholds can be addressed within AI algorithms.
As hardware manufacturers confirm the prospects of AI in the electronics industry, they have attempted to invest in relevant research in different fields, but the bottlenecks quickly appeared. For instance, operating a 7-billion parameter INT8 LLM smoothly on an edge device will require 7-8GB of memory capacity.
This means allocating a device's entire memory capacity to a single function for mainstream smartphone specifications. To maintain the operation of other functions, it would likely require increasing the smartphone's RAM to 24GB to be sufficient. This is a significant cost burden for the widespread adoption of GenAI on the edge side.
However, with sufficient resources and manpower involved in development, AI evolution seems to progress a generation nearly every quarter. At the recent MWC, we've already seen MediaTek and Qualcomm demonstrate a Low-Rank Adaptation (LoRA) technology that wasn't even created in the fourth quarter of 2023.
Microsoft's LoRA technology is primarily used for fine-tuning parameters in GenAI images. The characteristic of this technology is that it enables more accurate and rapid results with smaller memory requirements. The functional upgrades this technology brings to GenAI on smartphones are evident in the MWC demonstrations.
Recently, on the training side, Anima Anandkumar, former senior director of AI research at Nvidia, proposed the Gradient Low-Rank Projection (GaLore) pre-training technology. Under the same computational efficiency, GaLore, compared to LLaMA 1B and 7B, can reduce memory usage during training by approximately 65%.
Although the related research paper has not yet received broader verification in practical use, this technology appears to significantly reduce the training costs of GenAI models, increasing the likelihood of broader adoption at the edge.
Sources familiar with AI chips noted that the current AI technology community has a diverse range of research on reducing the operating memory requirements of LLM models. The concerns about hardware obstacles and thresholds voiced by the industry at the end of 2023 may be resolved in the short term.
For products like AI smartphones and AI PCs to achieve widespread adoption, the real focus should be on whether the actual applications can be accepted by consumers, rather than worry about hardware limitations. If the resulting feature fails to attract consumers, even with lowered thresholds on memories and chips, it won't truly drive shipment growth.