Dynamic Power Capping Enables Better Energy Efficiency in HPC

Scientific graph showing fluctuations in power usage over time, with two horizontal lines indicating mean values.
The green line indicates the mean power usage using dynamic power capping. The red line shows the mean power usage during an experimental operation in uncapped mode in December 2024. The experimental operation demonstrated that dynamic power capping enabled an energy savings of approximately 20% without recognizable performance degradation. Image: HPE/HLRS

An intelligent power management solution developed by Hewlett Packard Enterprise in collaboration with HLRS regulates power distribution across a supercomputer to optimize system performance within a power budget.

As supercomputers get ever larger, so do their power demands. This not only has environmental costs in terms of CO2 emissions, but also direct economic impacts, increasing system operating costs and investment costs in related power and cooling infrastructure. Today’s larger systems mean that high-performance computing (HPC) centers must work toward two closely intertwined goals: ensuring both that a supercomputer does not consume more power than desired and that the power that it does consume is utilized as efficiently as possible. Doing so ensures the greatest possible computational productivity based on the resources that are available.

In 2020 Hewlett Packard Enterprise (HPE) started a collaboration with the High-Performance Computing Center Stuttgart (HLRS) to implement and evaluate a new approach to power management. The solution they developed continuously monitors which applications are running on HLRS’s Hawk supercomputer and uses a dynamic power capping approach to adjust the power allotted to each application based on its specific power demands.

The solution has been running in production on Hawk since February 2024. Through an experiment in December 2024 in which Hawk was temporarily operated in uncapped mode, the HPE/HLRS team determined that dynamic power capping reduced overall power consumption of applications by about 20% with no recognizable degradation in performance. The collected power savings are comparable to the annual power consumption of approximately 1,500 single family homes.

The HPE/HLRS team explains its approach to dynamic power capping in a recent paper published in the proceedings of the 2024 IEEE International Conference on Cluster Computing (CLUSTER Workshops).

Optimizing power consumption on overprovisioned HPC systems

According to HLRS’s Dr. Ralf Schneider, the idea to develop a dynamic power capping approach resulted from the fact that HLRS’s Hawk supercomputer was what HPC system operators describe as overprovisioned. “Hawk was so large that there was a risk that it could potentially overload our power capacity,” he explained. “This means we needed to set limits to the power it used. At the same time, we wanted to get the maximum performance out of the machine based on the power that we have available. HPE suggested a strategy that involves managing a balance between power-hungry applications and applications that require less power for efficient execution.”

One approach that computing centers have used to control power consumption is to set a cap on the amount of power a supercomputer uses. Because a supercomputer’s power consumption is determined by the number of processors in the system and the processors’ speed, a “static” approach to power capping can mean throttling processor speed so that the system runs slower than its full capabilities. Although this method can effectively reduce the absolute amount of power consumed, it can negatively affect the performance and throughput of application codes. In a sense, static power capping can reduce the ability of supercomputers to fulfill their key mission: running massively parallel simulations as fast as possible.

The dynamic power capping approach developed and tested by HPE and HLRS aims to resolve this problem, using the fact that different types of codes have different power requirements. In compute-bound codes, how fast simulation software delivers a result is simply a function of the available processor speed — basically, a code performs better when the system speed is faster. In memory-bound codes, however, the time it takes for an algorithm to run depends less on processor speed and more on an HPC system’s memory and data transfer capabilities. In such cases, maximizing CPU speed does not increase overall code performance, because the algorithm must constantly spend time waiting for data transfer before performing its next calculation.

“HPE’s dynamic power capping approach is unique in that it balances the different power requirements of these two categories of codes within a given available power budget,” explained Dr. Christian Simmendinger, an HPC performance engineer at HPE. “For memory-bound codes the available power can be capped significantly, leading to significant energy savings without causing negative effects on application performance. The power capping level is periodically optimized in an automated way, reacting to changing phases in the running of an application.”

The graph represents the power consumption of multiple computing racks over a period of 5 days. The red highlighted area shows how the dynamic power capping function balances the higher and lower power demands of multiple smaller applications of different types. The yellow area demonstrates that the overall power limit is met once a large scale applications is executed on all racks. Image: Simmendinger et al, 2024.

The team found that balancing available power between compute-bound and memory-bound codes also reduces sudden spikes and drops in overall system power usage, facilitating a consistent, steady-state power level that adheres to HLRS’s power consumption goals. The framework can also respond dynamically if HLRS changes its desired power limit.

HPE and HLRS deployed the solution they developed on Hawk in February 2024. As their recent IEEE paper reports, careful tracking and evaluation of performance since then has identified significant energy-efficiency benefits when using dynamic power capping in comparison to a static power capping approach. Although in some cases applications can run slightly slower than they might without power capping, the energy efficiency benefits to the overall system operation far outweigh such negligible losses.

Dynamic power capping on future HLRS systems

Following its successful deployment on Hawk, HLRS and HPE are now looking forward to extending the capabilities of this solution for HLRS’s next-generation GPU-based supercomputers, Hunter and Herder.

With its upcoming exascale supercomputer, Herder, scheduled to arrive in 2027, perfecting dynamic power capping will be particularly important. To run the future system, HLRS will soon begin construction of a new facility capable of delivering up to 8 MW of power. Because running such a facility will be expensive, it might not be desirable to run Herder at full power all the time. Dynamic power capping will make it possible to optimize energy efficiency of applications based on system usage and the University of Stuttgart’s power consumption goals.

This innovative approach to dynamic power capping has already begun attracting attention in the supercomputing community. In October 2024, HLRS was named winner of a Datacenter Strategy Award for “Transformation.” The award recognized HLRS’s dynamic power capping approach and other initiatives the high-performance computing center has taken to optimize energy efficiency and environmental sustainability in the planning of its future infrastructure.

Christopher Williams

Related publication

Simmendinger C, Marquardt M, Mäder J, Schneider R. 2024. PowerSched – managing power consumption on overprovisioned systems. 2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops).