HLRS High Performance Computing Center Stuttgart: EE-HPC

EE-HPC is testing an approach for improving energy efficiency in HPC systems by automatically regulating system parameters and settings based on current job requirements.

Energy usage by high-performance computing (HPC) centers is a deciding factor in the procurement and operation of HPC systems. Indeed, the cost of energy over the life cycle of an HPC system constitutes a substantial part of its overall cost. Even within a comprehensive analysis of resource consumption, energy usage is the dominant factor.

One strategy used in some large Tier 0/1 HPC centers to regulate the energy consumption of the complete system involves limiting the energy usage of applications. This approach focuses mainly on taking relatively simple measures, such as limiting the CPU frequency or turning off whole compute nodes.

Modern systems, however, offer a growing number of options that hold high energy savings potential. For example, adjusting system parameters and settings in the runtime environments of OpenMP and MPI can achieve performance improvements that lead to more efficient energy usage. The range of possibilities for optimization extends further to include comparing global load balances and optimizing collective operations in MPI. Nevertheless, determining the optimal settings can be difficult, particularly with respect to HPC systems that run highly diverse applications, where setting global parameters is often not desirable.

The goal of EE-HPC is to improve the overall energy efficiency of HPC centers by optimally adjusting system parameters (not only regarding CPUs but also memory, input/output (I/O), and network parameters) that influence energy usage, based on the jobs and job phases that are running at any particular time. This approach involves regulating and optimizing such parameters in a comprehensive and transparent manner. The project will deliver an open source production environment for job-specific performance and energy modeling, including a method for optimizing and controlling runtime and system parameters.

The composition of the consortium (tier 0/1, tier 2, and the DKRZ as a central national service provider), as well as its networking with project partners in the Gauss Centre for Supercomputing (GCS), the NHR Alliance, and tier 3 centers (HPC.NRW, Konwihr, bwHPC) will ensure that the project results are used widely over the long term.

Runtime

01. September 2022 -
31. August 2025

High-Performance Computing Center Stuttgart

EE-HPC

EE-HPC is testing an approach for improving energy efficiency in HPC systems by automatically regulating system parameters and settings based on current job requirements.

Project partners

Funding

Contact

Jose Gracia