Despite the global challenges posed by climate change and emissions reductions in various industries in recent years, the ever growing demand for computation has ensured the opposite effect in this field as the computational capacity and therefore the associated energy requirement keeps growing1. Meanwhile, recent legislation introduced explicit responsibilities to avoid or re-use waste heat2. This amplifies the need for improving sustainability of (scientific) computing. GreenHPC is only one part of GreenIT which comprises efforts to increase the energy efficiency across all domains of computation, from smartphones to desktops, industrial applications and the cloud. With the Software Carbon Intensity (SCI) score, the aims to provide a standardized3 metric to reduce carbon emissions. The actual formula $SCI=(O+M)/R$ accounts for both, the operational emissions $O$ as well as the embodied emissions $M$, per functional unit $R$. The operational emissions are just the product of energy consumed and region-specific carbon intensity while the embodied emissions capture the share of total emissions for the hardware components themselves.
For supercomputers however, energy consumption is not only an aggregate ecological, but also an individual economical concern, since it can account for up to 50% of the total cost of ownership of such systems.4 The most intuitive first step is to only power-up computing resources when they are actually needed which is why HSUper shuts down idle nodes after fifteen minutes. Beyond this, improving energy efficiency is a necessary part of the path towards exascale computing. In that spirit, a companion to the Top500 list of the 500 fastest computers, the Green500 list tracks the 500 most energy efficient HPC systems with the highest GFLOPS per watt. While improvements in hardware efficiency lead to better theoretical values, energy efficiency is also determined by application software and its usage, making this an iterative (see below), collective challenge for everybody involved with HPC .
The first step researchers can take towards improving the energy efficiency of their scientific computations is to measure and report the associated energy consumption. “Putting an energy/carbon price on scientific output” incentivizes identifying potential for optimization and choosing different methods based on their energy efficiency.
By proactively reporting energy consumption and carbon emissions associated with their scientific output, researchers can promote these metrics in their respective communities, raising awareness for green computing and motivating innovation which may lead to further improvements in energy efficiency.
However, measuring the energy consumption of HPC workloads is not so trivial. This already starts at the definition of the measurement scope, since a cluster consists not only of the CPU and RAM installed in each compute node, but also of various types of shared hardware, such as networking and storage devices as well as cooling equipment, shared between multiple or all compute nodes. A metered PDU / PSU can be used to measure the actual power consumption with a resolution of seconds and watts for connected devices and the total impact can be roughly extrapolated by multiplying it with the PUE metric of the data center, which indicates the overhead from non-IT energy consumption. However, these devices/metrics are not always available and accessing the measurement data may not be possible for regular users. Therefore, hardware counters provide an alternative method of measuring the energy consumed by the components that can be directly attributed to a single job, the CPU and RAM . While IPMI also offers power monitoring capabilities, the CPU hardware counters based RAPL is usually used on Intel and AMD systems, which may be measured only on the same system and therefore incur some minimal overhead and are only available after boot-up.
An overview of energy measurement methods discussed above is provided in the following table:
Scope | Measurement method | External |
---|---|---|
Cluster level (incl. compute, cooling, networking, storage, etc.) | Electricity meter | ✅ |
Rack level (multiple nodes and switches) | Metered PDU | ✅ |
Chassis/Node level (switch/four HSUper nodes) | Outlet metered PDU or PSU over IPMI | ✅ |
Sub-node level (processor, core, memory) | RAPL | ❌ |
On HSUper, the Slurm batch system provides energy consumption and carbon emission estimation in the last line of the output file of your jobs:
$ sbatch --output out.txt --wait my_job.sh >/dev/null
$ tail -n 1 ./out.txt
Energy (CPU+Mem) : 0.49kWh (0.20kg CO2, 0.25€)
⚠️ Note that this value do not reflect the actual total energy consumption of the hardware and specifically the carbon emission and electricity cost are only estimations derived using a simple linear model based on historical data.
More information about energy consumption estimation using RAPL can be found in the HSUper documentation.
Additionally, there are several portable alternative methods to access energy measurements by these lower level interfaces using tools like PowerAPI, PowerJoular, and (including other performance metrics) LIKWID, xbat, or PAPI.
GPU
power consumption can be measured using the Nvidia System Management Interface with nvidia-smi -q -d POWER
.
On platforms where the underlying interfaces are not available, such as ARM-based systems, (application and system specific) energy models may be developed based on PDU measurements for benchmark cases. However, developing such models is very much a non-trivial task, since a multitude of factors may affect energy consumption.
Since the carbon intensity of the grid also varies over time, estimating the true carbon emissions associated with individual HPC workloads is a non-trivial challenge. Tracarbon or CodeCarbon estimate the carbon emissions based on real time energy measurements by these tools while they can also be computed a posteriori using a Green Algorithms online calculator or script processing Slurm accounting data on the cluster. Tracarbon and CodeCarbon furthermore offer a simple API to measure parts of python applications which is very relevant in the context of ML and AI applications.
Besides using more efficient algorithms and implementations, users can take several steps to improve the energy efficiency of their computations.
Be scheduler friendly:
Do not request excessive wall times for your jobs to avoid suboptimal scheduling and potentially unnecessary power-cycles of compute nodes.
If the wall time of your job cannot be determined a priori, consider restartable jobs using checkpoints.
Shorter wall times also results in lower wait times for your jobs due to backfilling by the scheduler.
Only allocate a GPU if you are using it:
This is self explanatory, but if you allocate a
GPU
on HSUper, make sure that your code is using it. When in doubt, check with nsys profile -o report ./my_app && nsys stats report.nsys-rep
.
Chose the right technology:
If possible, choose a more energy efficient programming language5 and architecture6.
Consider a different clock frequency:
CPU
power consumption depends on the processor load, square of the operating voltage and clock frequency.7 Since the clock frequency is proportional to the operating voltage, the relation between energy consumption and clock frequency is nonlinear.
Therefore, reducing the clock frequency through Slurm by using e.g. srun --cpu-freq=low-medium:OnDemand ./my_app
may yield lower energy consumption depending on the type of application that is being run.
Consider over-/undersubscribin:
Oversubscription of
CPU
cores may result in reduced time to solution and energy consumption.8
However, even undersubscription of
CPU
cores may lead to improvements in memory-bound scenarios.7
In this case, allocating compute- and memory-bound applications on the same resources, may help increase utilization.
Model the trade-off between energy consumption and scientific output:
Depending on the research domain, it may be possible to explicitly model the trade-off between computational requirements determining energy consumption and scientific output limited by metrics such as accuracy.9
Given such models, insights can be generated with minimal cost.
In this article, methods for measuring and improving the energy efficiency of applications in HPC have been presented. However, non of these methods are a “silver bullet”, since they make different trade-offs in terms of measurement scope, overhead, and frequency and have different requirements for system, usage, and instrumented application. Irrespective of this fact, administrators, developers, and users in the HPC space should take measures to quantify and improve energy efficiency, especially as regulations and energy-based rather than core-hour-based billing should be expected to make this an explicit responsibility in the future.
J. Manner. “Black software — the energy unsustainability of software systems in the 21st century,” in Oxford Open Energy, vol. 2, 2022. ↩︎
§11-12, Gesetz zur Steigerung der Energieeffizienz in Deutschland (Energieeffizienzgesetz - EnEfG) ↩︎
Information technology — Software Carbon Intensity (SCI) specification ISO/IEC 21031:2024, 2024. ↩︎
E. Suarez et al., “Energy-aware operation of HPC systems in Germany,” Frontiers in High Performance Computing, vol. 3, 2025. ↩︎
Pereira, R., et al, “Energy efficiency across programming languages: how do energy, time, and memory relate?,” in Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering, 2017, pp. 256–267. ↩︎
D. Harris, “What’s up? watts down - more science, less energy,” NVIDIA Blog, https://blogs.nvidia.com/blog/gpu-energy-efficiency-nersc/, 2023, (accessed Mar. 12, 2025). ↩︎
R. Prichard, W. Strasser. “When Fewer Cores Is Faster: A Parametric Study of Undersubscription in High-Performance Computing,” in Cluster Computing, vol. 27, no. 7, pp. 9123–9136, 2024. ↩︎ ↩︎
Fought, E., et al. “Saving time and energy with oversubscription and semi‐direct Møller– Plesset second order perturbation methods,” in Journal of Computational Chemistry, vol. 38, no. 11, pp. 830–841, 2017. ↩︎
A. Das Sharma, R. Horn, and P. Neumann, ‘The Error-Energy Tradeoff in Molecular and Molecular-Continuum Fluid Simulations’, in Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops, 2024, pp. 111–121. ↩︎