Growing AI computing demands result in major improvements in servers and networking

The LLM behind generative AI requires processing massive amounts of data for training and faster transmission, which has presented lucrative business opportunities for manufacturers. President of ASUS Cloud and Taiwan Web Service Corporation (TWSC) Peter Wu highlighted the distinctions between traditional AI and generative AI by referring to the two as AI 1.0 and 2.0 respectively.

Wu pointed out that AI 1.0 involves building specific models through supervised learning and requires project creation and labeling. This process needs to be repeated for different models and can only solve specific problems. On the other hand, the generative AI of AI 2.0 is different. Once the model is built, it can learn autonomously. From an application perspective, generative AI is also smarter and more versatile.

The volume of training data that needs to be processed for AI 2.0 is also significantly larger than previous AI models. Past models have parameters in the tens of thousands, while generative AI now demands tens of billions of parameters. Wu pointed out that many people will feel that the machine is smart when using ChatGPT. This is because once a model surpasses the 40–60 billion parameter threshold, it reaches a kind of "enlightenment."

OpenAI's GPT-3 has 175 billion parameters, and GPT-3.5 has around 200 billion parameters. GPT-4's LLM was announced on March 14, 2023, but it didn't disclose its parameter count. Some estimations suggest that GPT-4's parameters could be several times larger than GPT-3, while some suggest that it has reached one trillion parameters.

To make AI smarter and increase the number of parameters, corresponding improvements in hardware equipment are necessary. Past architectures that required just one GPU or even just a CPU now require parallel processing with hundreds of GPUs. This shift necessitates changes in servers, network switches, and even the entire data center architecture.

When looking purely at server design, in response to the cooling requirements of Nvidia's HGX module, it uses the 3D Vapor Chamber (3D VC) architecture. This architecture has a heatsink height of 3U (13.35cm), meaning that the chassis needs to be 4U or taller. As the chassis height increases, the internal mechanical component design needs to be adjusted to account for factors like airflow and pressure. The chassis's weight capacity and material also need to be reconsidered.

The descriptions above have to do with air-cooling designs. Considering power usage effectiveness (PUE), liquid cooling can be a potential solution. However, liquid cooling would introduce even more significant changes, involving considerations for cold plates, water blocks, pipes, and cooling distribution units (CDUs). Additionally, how would one address the notorious leakage issues for water cooling? The other option is immersion cooling, and that presents its own set of challenges as well.

Apart from mechanical components, power supply also needs to be considered. As power increases, the size of the power supply will also get bigger, but space is limited. Furthermore, with the increase in wattage, the power supply's conversion efficiency will reach titanium levels of 97% or higher. On top of that, not only is the server itself going to use more power but the entire data center's power design will also need to be elevated.

Besides servers, the architecture of network switches also needs to change. Traditional data center architectures use PCIe switches to connect CPU and GPU NICs. However, for data transmission used in AI or machine learning, this architecture will face three limitations: inability to properly expand, lower performance, and higher transmission latency.

Network switch manufacturer Accton pointed out that the new generation of data centers will need a different architecture to optimize the AI and machine learning process. This involves utilizing server fabric accelerators along with an integration of CXL memory technology and NICs. This allows for flexible expansion in AI or machine learning and reduces transmission latency. While this design still needs testing, it is expected to be the direction of AI network architecture.