AMD EPYC CPU delivers powerful impetus with high performance in "Kawas", the soul of computing built by ASIAA

News highlights 0

Astronomy is one of the oldest fields of science in the world. Related studies accumulated by scientists over centuries have spurred the development of knowledge in human society. Today, astronomy has become a necessary area of research among knowledge-developed countries. The Academia Sinica Institute of Astronomy and Astrophysics (ASIAA) has achieved remarkable academic feats in only a few years since its establishment in 2010. With the intention of boosting research capabilities, the AMD EPYC processors has been incorporated as the core of ASIAA's next-generation computing system, which went online not too long ago in 2022. ASIAA Director Dr. Ue-Li Pen explained that not only can the AMD EPYC processors meet ASIAA's huge computing needs, but its server can also take up less rack space due to its high computing density. On top of that, it only requires one-third of the electricity consumed by previous equipment to achieve the same level of computing power. "The new system powered by AMD EPYC processors will be of tremendous benefit to academic research at ASIAA," Dr. Pen added.

Leveraging High Computing Power to Optimize Astronomical Research Capabilities

Dr. Pen then gave an introduction on research areas ASIAA is involved in. He highlighted that as astronomy encompasses a wide range of fields, ASIAA has conducted research on most fields in both observational and theoretical astrophysics, including planets, stars, black holes, and galaxies. Meanwhile, ASIAA has also invested heavily in the development of telescopes and the related astronomical instrumentation to bolster its research capabilities. In recent years, ASIAA has actively engaged in international research in collaboration with a host of developed countries. Dr. Pen cited the high-resolution black hole image released in May this year as an example of such a research initiative. The image was taken by the Event Horizon Telescope (EHT), an international collaboration project aimed at capturing images of black holes. ASIAA not only participated in this project through the Greenland Telescope, but also played an integral role in it. Aside from the Greenland Telescope, ASIAA is also involved in the Atacama Large Millimeter/Submillimeter Array (ALMA), the largest ground-based observatory project in human history. Furthermore, Dr. Pen also revealed that ASIAA has initiated efforts to conduct a new project aimed at studying fast radio bursts (FRB) and is currently building the world's first telescope dedicated to studying this mysterious signal in Taiwan. "We hope to become a leading astronomical research institution and devote ourselves to nurturing the next generation of Taiwanese scientists," Dr. Pen said.

However, it is extremely difficult to achieve the desired quality of research results as scheduled, especially for astronomical research. As pointed out by Dr. Pen, a massive amount of data is required for various simulations in astronomical research. Prior to the commercialization of computers, research institutions had to deploy a large group of professionals for the purpose of calculating all kinds of data. This situation was mitigated after the advent of computing technology, but their effectiveness was still limited. Dr. Pen recalled the days as a young researcher at a university when the mainframe he used for research was only capable of calculating one-dimensional data. Later, computing power started increasing rapidly, and construction of the ASIAA mainframe had already begun in 2001 when ASIAA was still under Academia Sinica's Preparatory Office. The facility has since been upgraded numerous times according to changes in research contents.

According to ASIAA Associate Research Fellow Dr. Min-Kai Lin, astronomical research is now in the era of three-dimensional data computation, which requires high system performance. For instance, Dr. Lin's research team studies how planets form using fluid dynamics simulations, which require large amounts of data to be analyzed using large-scale parallel computing techniques. ASIAA's mainframe system is replaced approximately once every five years, with the last generation going online in 2015. The system's 1,664 cores combined to form103 nodes, with each node having only 16 or 24 cores. As for storage, the system incorporated an open-source clustered storage system with a parallel file system. The nodes were connected based on the InfiniBand Fourteen Data Rate (FDR) transmission standard. That system had a total of 1,664 cores and consists of 103 nodes, each with only 16 or 24 cores. The storage part uses an open-source cluster storage architecture to build a parallel file system, and uses the InfiniBand FDR transmission standard to connect to each node.

The system was still capable of meeting research needs in the early days after its inception. However, with ASIAA's continuous expansion, accompanied by the increased pace of internationalization, the number of cores and computing input/output of existing equipment at ASIAA gradually could not keep up with the research needs of the institution. "In the past few years, our planned calculations required hundreds or even thousands of cores, but the existing system was limited by its performance, so it could only execute a few cores so it could only run a few jobs at a time, leading to extended system times," Dr. Lin said. In order to solve this problem, ASIAA took on an initiative to design a new mainframe that incorporates both its current needs and future research plans. This led to the birth of a new-generation mainframe known as "Kawas.".

"Kawas means 'soul' in the Amis language, and represents the core of ASIAA's high-performance computing system," Dr. Lin explained. He pointed out that compared with previous computing systems, the performance of Kawas, which went online in 2022, has been upgraded substantially as it is now equipped with 2,048 CPU cores, 8 TB of memory, and 1.2 PB of parallel file system capacity. "The most unique feature of this system is that there are 128 cores in a single node and the nodes are connected by InfiniBand HDR 200 Gbps network switches, greatly improving the parallel computing performance of the system," Dr. Lin added. With the above hardware capabilities, the total computing power of Kawas reaches 61 TFLOPS.

ASIAA Builds the Most Powerful Computing System Using Four Major Features of AMD EPYC processors

Kawas's powerful performance originates from the AMD EPYC processors. Dr. Lin acknowledged that as a scientific institution, ASIAA is not beholden to brand names when evaluating products in the market. Instead, it prioritizes performance as a whole. "We carry out tests on different systems simultaneously based on software programs used by researchers and scientific topics in general. After all, the end goal is to enhance research output," Dr. Lin revealed. Upon careful assessment, the AMD EPYC processors came out on top due chiefly to four reasons: powerful performance, high density (due to computing performance that significantly reduces the total space taken up by its server), low power consumption, and compatibility.

Dr. Lin further explained that the powerful performance of the AMD EPYC 64 core server enables various research teams at ASIAA to efficiently perform parallel computing in large quantities. "The 2,048-core system is the largest computing architecture we have ever built in ASIAA," he emphasized. The most significant benefit of such computing capabilities is reflected in the substantially improved performance of parallel computing. Previous generation computing systems, in which each node only had 16 or 24 cores, required a complex message passing interface (MPI) to achieve computations with hundreds of cores across different computing nodes. On the other hand, each node in Kawas has 128 cores from AMD EPYC processors, which enable a high level of parallel performance in a single node using Open Multi-Processing (OpenMP). Since OpenMP is relatively easy to develop and encompasses a wide range of applications, Kawas can markedly reduce the development time, and thus allows ASIAA to rapidly expand into new research directions. This advantage effectively bolsters ASIAA's essential efforts to develop its own original programs and libraries. In addition, its high performance also maximizes flexibility when it comes to how server rack space is used. Dr. Lin disclosed that Kawas only has 16 nodes. If products with more nodes are used, a large number of servers have to be configured to achieve Kawas's computing power of 61 TFLOPS. Whereas the AMD EPYC processors can substantially reduce the number of servers used and the space taken up due to their powerful performance. At the same time, for the same computing power, AMD EPYC processors demonstrate superior excellent performance in power consumption, which in turn enables higher computing density for cost and power-optimized platforms.

As far as compatibility is concerned, Dr. Lin noted that the importance of system compatibility is due to extremely close cooperation between astronomical research institutions around the world. AMD compilers with an open-source architecture are readily compatible with systems in other countries, which in turn helps facilitate ASIAA's presence abroad. "Therefore, Kawas is the best system ever built by ASIAA in terms of computing performance, space utilization, power consumption, and compatibility," he stressed.

This system, in which the cores of AMD EPYC processors play a key role, has been online for some time now. Aside from the products, the services provided by AMD and its partners are also the primary reason why the benefits of introducing the system have gained prominence. Dr. Lin revealed that during the early stage of introduction, ASIAA deployed the operating system using the QCT HPC Starter Kit, a tool available on QCT POD provided by AMD's partner, Quanta Cloud Technology. "This simple and fast one-time system deployment mode can significantly reduce the installation time of the overall HPC system and eliminate the major problem issue of overly-complex system configurations in the past., which used to be a major problem," he said while explaining the elaborating on the relatively comprehensive features of the QCT HPC Starter Kit. As for system management, the QCT HPC Starter Kit offers a variety of tools based on Kawas's needs. It not only assists in setting up HPC-related environments, but also enables administrators to keep track of system status and obtain information on resource utilization readily.

"With concerted efforts by all our colleagues, we have delivered remarkable results over the past few years. In addition to conducting research on an ongoing basis, we will also devote ourselves to nurturing the next generation of Taiwanese scientists," Dr. Pen said. Finally, he concluded that the powerful performance of AMD EPYC processors and the corresponding technical services will help ASIAA achieve this vision, and become a leading astronomical research institution in the world.

ASIAA Associate Research Fellow Dr. Min-Kai Lin (left) and Director Dr. Ue-Li Pen (right)

ASIAA Associate Research Fellow Dr. Min-Kai Lin (left) and Director Dr. Ue-Li Pen (right)
Photo: AMD