Recent off their third Top500 win for Frontier – now with an 8.4% greater Linpack rating – the HPC staff at Oak Ridge Nationwide Laboratory had some thrilling information to share immediately. Frontier – the primary U.S. exascale system and first official Linpack exascale system – has handed its acceptance and is taking up grand scientific challenges.
“Acceptance of Frontier came about on the finish of December 2022, and the Frontier HPE Cray EX system totally entered the consumer program in the beginning of April 2023,” Oak Ridge shared with HPCwire in a press release. “Since then, Frontier has been made obtainable to all the OLCF allocation applications: INCITE, ALCC, and DD, together with ECP. We now have greater than 1,000 customers with entry to Frontier.”
(Respectively, these applications are: the Progressive and Novel Computational Impression on Idea and Experiment program; the Superior Scientific Computing Analysis [ASCR] Management Computing Problem; the Director’s Discretionary program; and the Exascale Computing Venture.)
After we spoke with Frontier Venture Director Justin Whitt final June, he walked us via the steps that had been vital earlier than the acceptance course of might even start. “We’ve obtained to get all of the manufacturing software program on the system, from the community software program, to the programming environments to all that, get it to what we are going to use after we even have researchers on the system. As soon as we have now that finished, and every part’s checked out, we are going to begin the acceptance course of on the machine,” he stated.
Exascale Computing Venture Director Doug Kothe additional mentioned how rigorous the acceptance course of is: “There’s performance: do staple items that we’d like work? There’s efficiency: are we getting the efficiency out of the system? Definitely all indications are based mostly on the HPL [High Performance Linpack] run that we’re. After which there’s stability and stability is the one which’s most difficult. Basically, surrogate workloads that mimic precise manufacturing workloads are run for weeks on the system. And there’s very particular metrics when it comes to the p.c of jobs which have to finish and the p.c of these jobs that get the correct reply, and so on. So acceptance is fairly onerous. And so we really feel assured that after that interval, the machine will probably be pretty nicely shaken out for us to get on.”
All that onerous work placing Frontier via its paces additionally translated into an improved rating on the brand new Top500 listing, printed yesterday in tandem with the Worldwide Supercomputing Convention (ISC) in Hamburg. Regardless of having a slightly smaller peak configuration (0.6% fewer flops, to be exact), Frontier turned in a Linpack rating that was 92 petaflops greater than its earlier entry, going from 1.102 Linpack exaflops on the November 2022 listing to 1.194 Linpack exaflops on the brand new listing. As testomony to how huge an enchancment that’s, if these extra flops had been dropped right into a stand-alone system, it will be ample for an eighth place end on the listing.
The truth that the Top500 system configuration is smaller tells you that each one that extra Linpack goodness was achieved via tuning, optimizations and – we’ve now discovered – frequency changes.
We reached out to Al Geist, Chief Know-how Officer for the Oak Ridge Management Computing Facility (OLCF) and the ECP, to get the inside track on how they squeezed 8.4% extra flops out of a (barely) smaller system and with solely 7% extra energy (Frontier’s energy-efficiency truly went up barely).
“Final 12 months Frontier was capable of hit 1.1 exaflops with 9,248 nodes, regardless that we weren’t operating the nodes at their full pace. We had the utmost frequency dialed down about 7%, and we had lowered the utmost energy to the GPUs to 500W,” Geist shared by e-mail.
“Since that point, Frontier has turn into extra strong and we now run the GPUs at 560W and at their full frequency for all our customers on the system. Moreover, the ROCm libraries are getting extra optimizations and the HPE staff added their optimizations as nicely.
“So, after we reran HPL this 12 months, we obtained the 92 petaflops pace enhance as a result of the nodes are operating at full pace, due to enhancements within the AMD libraries, and due to additional optimizations from the HPE staff. This end result reveals that Frontier continues to mature.
“We word that there’s nonetheless extra efficiency obtainable in Frontier. The newest 1.19 exaflops end result used solely 9,212 nodes of the 9,472 nodes which might be in Frontier,” Geist instructed us.
What having an exascale machine means for science
“Each certainly one of our [ECP] purposes is concentrating on a really particular drawback that’s actually unachievable and unattainable with out exascale assets,” Kothe instructed HPCwire after we met with him at ORNL final 12 months. “You want a lot of reminiscence and large compute to go after these huge issues. So with out exascale, a whole lot of these issues would take months or years to deal with on a petascale system or they’re simply not even attainable.”
Now Frontier has been put into service on numerous analysis tasks thatcan solely be feasibly superior with a machine of this pace and scale.
As detailed by Oak Ridge, listed below are a few of research underway on Frontier:
ExaSMR : Led by ORNL’s Steven Hamilton , this research seeks to chop out the lengthy timelines and excessive front-end prices of superior nuclear reactor design and use exascale computing energy to simulate modular reactors that may not solely be smaller but in addition safer, extra versatile and customizable to sizes past the standard large reactors that energy cities.
Exascale Atomistic Functionality for Accuracy, Size and Time (EXAALT) : This molecular dynamics research, led by Danny Perez of Los Alamos Nationwide Laboratory, seeks to rework elementary supplies science for vitality by utilizing exascale computing speeds to allow vastly bigger, sooner and extra correct simulations for such purposes as nuclear fission and fusion.
Combustion PELE : This research, named for the Hawaiian goddess of fireplace and led by Jacqueline Chen of Sandia Nationwide Laboratories, is designed to simulate the physics inside an inner combustion engine in pursuit of growing cleaner, extra environment friendly engines that would cut back carbon emissions and preserve fossil fuels .
Entire System Mannequin Utility (WDMApp) : This research, led Amitava Bhattacharjee of Princeton Plasma Physics Laboratory, is designed to simulate the magnetically confined fusion plasma – a boiling stew of charged nuclear particles hotter than the solar – vital for the contained reactions to energy nuclear fusion applied sciences for vitality manufacturing.
WarpX : Led by Jean-Luc Vay of Lawrence Berkeley Nationwide Laboratory, this research seeks to simulate smaller, extra versatile plasma-based particle accelerators, which might allow scientists to design particle accelerators for a lot of purposes from radiation remedy to semiconductor chip manufacturing and past. The staff’s work received the Affiliation of Computing Equipment’s 2022 Gordon Bell Prize , which acknowledges excellent achievement in high-performance computing.
ExaSky : This research, led by Salman Habib of Argonne Nationwide Laboratory, seeks to develop the scale, scope and accuracy of simulations for complicated cosmological phenomena, equivalent to darkish vitality and darkish matter, to uncover new insights into the dynamics of the universe.
EQSIM : Led by LBNL’s David McCallen , this research is designed to simulate the physics and tectonic situations that trigger earthquakes to allow evaluation of areas in danger.
Vitality Exascale Earth System Mannequin (E3SM) : This research, led by Sandia’s Mark Taylor , seeks to allow extra correct and detailed predictions of local weather change and its impact on the nationwide and world water cycle by simulating the complicated interactions between the large-scale, largely 2D motions of the ambiance and the smaller, largely 3D motions that happen in clouds and storms.
Most cancers Distributed Studying Setting (CANDLE) : Led by Argonne’s Rick Stevens , this research seeks to develop predictive simulations that might assist establish and streamline trials for promising most cancers remedies, lowering years of prolonged, costly scientific research.
“Frontier represents the fruits of greater than a decade of arduous work by devoted professionals from throughout academia, personal enterprise and the nationwide laboratory complicated via the Exascale Computing Venture to appreciate a objective that when appeared barely doable,” Kothe elaborated in a put up printed immediately on the the ORNL web site. “This machine will shrink the timeline for discoveries that may change the world for the higher and contact everybody on Earth.”
“I don’t suppose we are able to overstate the impression Frontier guarantees to make for a few of these research,” Whitt added. “The science that will probably be finished on this pc will probably be basically completely different from what we have now finished earlier than with computation. Our early analysis groups have already begun exploring elementary questions on every part from nuclear fusion to forecasting earthquakes to constructing a greater combustion engine.”
Frontier is an HPE-Cray EX system that spans greater than 9,400 nodes, every with one AMD Epyc CPU and 4 AMD Intuition MI250X GPUs.