Supercomputers Move Towards The Zettascale Era
Several organizations have deployed new supercomputers, including a system for the open science community as well as a massive zettascale cluster for cloud customers.
By Mark LaPedus
Several organizations have deployed the next wave of new and powerful supercomputers, including a system for use in the open science community as well as a massive zetascale cluster and a wafer-scale chip computer.
In one of the latest announcements, the Texas Advanced Computing Center (TACC) at The University of Texas has rolled out Vista, a new Arm-based AI supercomputer designed for the open science community. Then, Oracle recently announced the first zettascale cloud computing cluster, which is faster than today’s supercomputers, at least on paper. In theory, a zettascale computer can process one sextillion, or 10(21), calculations per second.
Nonetheless, companies, government agencies, and researchers for years have used supercomputers, which are large, powerful and expensive systems that process complex data in a short time frame. Supercomputers are used for a range of applications, such as defense/aerospace, math, physics, medical research, weather forecasting and others.
In the market for decades, traditional supercomputers are different than quantum computing, which is receiving a lot of attention and hype. In supercomputers, the systems incorporate traditional processors, memory and other components. The same is true with PCs and smartphones. In simple terms, these systems manipulate and store data using a binary language (1 or 0).
In contrast, quantum computers are different and utilize qubit devices. In theory, quantum computers are faster than supercomputers. But quantum computers are still in the early phases of development.
So for the foreseeable future, supercomputers will remain the mainstream computing systems for high-end applications. Typically, a supercomputer is built by a company like Hewlett Packard Enterprise (HPE) or others. HPE would integrate the system with the latest and greatest chips. Then, an agency, company or a research organization buys the system. The computer would be exclusively used by that organization.
In many cases, though, the research community is able to gain access to a supercomputer at a university or elsewhere.
For example, the TACC at The University of Texas has recently deployed Vista, a new AI-based supercomputer. Vista is available to the broad open science community. To access the Vista system, users must submit an allocation request to the TACC.
Funded by the National Science Foundation (NSF), the Vista supercomputer is powered using Nvidia’s GH200 chip architecture. This architecture combines two separate devices on the same multi-chip module--the Grace CPU and Hopper GPU. On the module, the CPU and GPU are connected using a high-speed link. In operation, two devices operate like a single superchip.
The Hopper GPU (H100) is Nvidia’s ninth-generation data center GPU. The Grace CPU is a processor based on Arm’s architecture. The Grace CPU combines 72 Neoverse V2 Armv9 cores with up to 480GB of LPDDR5X DRAM.
Vista, a new AI-centric system at TACC, is in full production for the open science community. Source: Texas Advanced Computing Center
“Vista expands TACC’s capacity for AI and will ensure that the broad science, engineering, and education research communities have access to the most advanced computing and AI technologies,” said Dan Stanzione, TACC’s executive director and associate vice president of research at UT Austin.
The TACC is part of the Office of the Vice President for Research at The University of Texas at Austin. Founded in 2001, TACC also houses several other supercomputers for the open science research community.
Vista isn’t the world’s fastest supercomputer. That title is held by Frontier, an exascale-class supercomputer built around HPE’s Cray EX architecture. Located at Oak Ridge National Laboratory in Tennessee, Frontier became operational in 2022.
The system can process a quintillion, or 10(18), calculations each second. Touting an HPL score of 1.206 EFlop/s, Frontier has a total of 8,699,904 combined CPU and GPU cores. It combines AMD’s EPYC CPUs and the Instinct MI250X accelerators.
By solving calculations five times faster than today’s top supercomputers, Frontier enables scientists to develop new technologies for energy, medicine, and materials.
In the latest announcement with Frontier, Oak Ridge National Laboratory recently used the supercomputer to calculate the magnetic properties of calcium-48’s atomic nucleus. Calcium-48 is an important isotope used for scientific research. Its nucleus is composed of 20 protons and 28 neutrons — a combination that scientists call “doubly magic.” Magic numbers — such as 20 and 28 — are specific numbers of protons or neutrons that provide stability by forming a complete shell within the nucleus.
Frontier supercomputer (Source: ORNL)
More supercomputers, AI hardware
Others are also announcing new and powerful supercomputers. Oracle recently rolled out what the company says is the largest AI supercomputer in the cloud.
The system is the first zettascale-class cloud computing clusters, which are available with up to 131,072 of Nvidia’s new Blackwell GPUs. The maximum scale of system offers more than three times as many GPUs as Frontier. Oracle’s cloud customers can now take orders for access to the system.
Meanwhile, G42, the Abu Dhabi-based technology holding group, and Cerebras Systems recently rolled out the Condor Galaxy 3 (CG-3), the third cluster of their constellation of AI supercomputers, the Condor Galaxy. Featuring 64 of Cerebras’ CS-3 systems – all powered by the Wafer-Scale Engine 3 (WSE-3) – Condor Galaxy 3 will deliver 8 exaFLOPs of AI with 58 million AI-optimized cores.
Earlier this year, Cerebras introduced the WSE-3, a wafer-scale chip product. Built for training the industry’s largest AI models, the 5nm-based wafer-scale device incorporates 4 trillion transistors, delivering 125 petaflops of peak AI performance through 900,000 compute cores.
In addition, Cerebras recently announced Cerebras Inference, the fastest AI inference solution in the world. Delivering 1,800 tokens per second for Llama3.1 8B and 450 tokens per second for Llama3.1 70B, Cerebras Inference is 20 times faster than Nvidia’s GPU-based solutions in hyperscale clouds. Starting at just 10c per million tokens, Cerebras Inference is priced at a fraction of GPU solutions, providing 100x higher price-performance for AI workloads.
Tokens are words, character sets, or combinations of words and punctuation, according to Microsoft. They are used by large language models (LLMs) to decompose text. Tokenization is the first step in training. “The LLM analyzes the semantic relationships between tokens, such as how commonly they're used together or whether they're used in similar contexts. After training, the LLM uses those patterns and relationships to generate a sequence of output tokens based on the input sequence,” according to Microsoft.
Meanwhile, not to be outdone, SambaNova Systems announced SambaNova Cloud, the world’s fastest AI inference service enabled by the speed of its SN40L AI chip. SambaNova Cloud runs Llama 3.1 70B at 461 tokens per second (t/s) and 405B at 132 t/s at full precision.
SambaNova's current inference configuration incorporates 16 SN40L chips, with a combination of tensor parallelism across chips and pipeline parallelism within each chip, according to the company. Each SN40L chip consists of two logic dies, high-bandwidth memory (HBM), and direct-attached DDR DRAM. The 16 chips are interconnected with a peer-to-peer network. They offer a compute roofline of 10.2 bf16 PFLOPS, they added.