South Korean researchers unveiled a graphics processing unit (GPU) technology that can increase the speed of large-scale artificial intelligence. The technology is expected to reduce the cost of building large-scale AI models by eliminating the need to connect multiple GPUs to secure memory capacity.
Professor Jung Myung-soo’s research team at KAIST announced on July 8 that they developed a technology that optimizes the memory read-write performance of high-capacity GPUs enabled with Compute Express Link (CXL). CXL is a next-generation interface technology that enhances memory capacity and data processing efficiency.
Large-scale AI services require tens of terabytes (TB) of memory due to increasingly larger models and data sizes. But the memory capacity of GPUs is only a few tens of gigabytes (GB), which is why multiple GPUs are used to power AI services. This approach, while effective, significantly increases costs - the price of Nvidia’s latest GPU, the H100, is between $40 million and $50 million per unit.
The research team developed a ‘CXL-GPU’ structure that directly connects large memory to the GPU device by utilizing CXL. This technology integrates memory expansion devices into the GPU memory space via CXL, increasing memory capacity without connecting multiple GPUs.
While CXL-GPU technology has been actively researched in the industry, it has been challenging to apply in practice due to poor memory read and write performance. To address this challenge, the research team designed the memory expansion device to receive signals from the connected GPU to proactively perform memory reads and determine the timing of writes on its own.
South Korean chip startup Panmnesia developed the prototype of the CXL-GPU. The startup has garnered attention for its ultra-fast CXL controller technology. The research team said Panmnesia’s prototype ran AI services 2.36 times faster than existing GPU memory expansion technologies, meaning it does not require connecting many GPUs to increase memory capacity.
The research team expects the developed CXL-GPU technology will prevent Nvidia’s monopoly in the AI accelerator market. “CXL-GPU can contribute to dramatically lowering the cost of memory expansion for big tech companies operating large-scale AI services,” said Professor Jung. The team’s research results will be presented this month at the USENIX Conference in Santa Clara, California, the U.S.