preloader
  • Energy Measurement

Details about measuring energy (or power) consumption on the HSUper cluster can be found on this page.

Energy Measurement

RAPL (CPU and DRAM Energy)

The Slurm also includes energy measurements using the RAPL interface.1 The corresponding energy consumption is reported in the job output as shown on the previous page.

Manual Usage

To obtain the raw RAPL measurements, one can utilize the linux Power Capping Framework through the file system. All relevant attributes are located under /sys/devices/virtual/powercap/intel-rapl/ with the following subdirectories mapping to individual RAPL domains:

SubdirectoryDomainExplanation
intel-rapl\:0package-0Energy consumption of CPU #1
intel-rapl\:0/intel-rapl\:0\:0/dramEnergy consumption of the DRAM for CPU #1
intel-rapl\:1package-1Energy consumption of CPU #2
intel-rapl\:1/intel-rapl\:1\:0/dramEnergy consumption of the DRAM for CPU #2

For each domain, the name file holds the name of the corresponing RAPL domain, the energy_uj file holds the current value of the counter in μJoules, and the max_energy_range_uj file the maximum value of this counter. Note that these counters will overflow rather quickly, so they must be read more than once per minute and account for this.

pyRAPL

However, manually reading the RAPL measurements is not so trivial. To get the energy consumption for a specific command in your job (running on the CPUs of a single node), proceed as follows:

First, install the required Python packages in a virtual environment.

# (Install Python dependencies)
ml python/3.11
python -m venv .venv
source .venv/bin/activate
pip install pyRAPL pymongo pandas

Then, inside of your job, wrap your command in a measurement to determine the energy it consumed.2

ml python/3.11
python -m venv .venv
source .venv/bin/activate
CMD="..." # TODO: Add with the command to measure
# Run CMD and measure energy consumption
python3 - <<EOF
import pyRAPL, os
pyRAPL.setup()
with pyRAPL.Measurement("rapl-measurement"):
    os.system("""$CMD""")
EOF
# The energy consumption per RAPL domain is printed

Additional Ressources

Additional information about RAPL can be found in the following resources:


  1. ⚠ Note: On HSUper, the Slurm AcctGatherEnergy RAPL plugin is used. Earlier version of this plugin have a bug and the implementation of the RAPL interface is different between architectures. Therefore, energy measurements (capturing the CPU and DRAM) may vary significantly between different hardware platforms and cannot be directly compared or taken as the true energy consumption of the system. ↩︎

  2. ⚠ Note: This only works for single node jobs. ↩︎