The Slurm also includes energy measurements using the RAPL interface.1 The corresponding energy consumption is reported in the job output as shown on the previous page.
To obtain the raw
RAPL
measurements, one can utilize the linux Power Capping Framework through the file system.
All relevant attributes are located under /sys/devices/virtual/powercap/intel-rapl/
with the following subdirectories mapping to individual
RAPL
domains:
Subdirectory | Domain | Explanation |
---|---|---|
intel-rapl\:0 | package-0 | Energy consumption of CPU #1 |
intel-rapl\:0/intel-rapl\:0\:0/ | dram | Energy consumption of the DRAM for CPU #1 |
intel-rapl\:1 | package-1 | Energy consumption of CPU #2 |
intel-rapl\:1/intel-rapl\:1\:0/ | dram | Energy consumption of the DRAM for CPU #2 |
For each domain, the name
file holds the name of the corresponing
RAPL
domain,
the energy_uj
file holds the current value of the counter in μJoules,
and the max_energy_range_uj
file the maximum value of this counter.
Note that these counters will overflow rather quickly, so they must be read more than once per minute and account for this.
However, manually reading the RAPL measurements is not so trivial. To get the energy consumption for a specific command in your job (running on the CPUs of a single node), proceed as follows:
First, install the required Python packages in a virtual environment.
# (Install Python dependencies)
ml python/3.11
python -m venv .venv
source .venv/bin/activate
pip install pyRAPL pymongo pandas
Then, inside of your job, wrap your command in a measurement to determine the energy it consumed.2
ml python/3.11
python -m venv .venv
source .venv/bin/activate
CMD="..." # TODO: Add with the command to measure
# Run CMD and measure energy consumption
python3 - <<EOF
import pyRAPL, os
pyRAPL.setup()
with pyRAPL.Measurement("rapl-measurement"):
os.system("""$CMD""")
EOF
# The energy consumption per RAPL domain is printed
Additional information about RAPL can be found in the following resources:
⚠ Note: On HSUper, the Slurm AcctGatherEnergy RAPL plugin is used. Earlier version of this plugin have a bug and the implementation of the RAPL interface is different between architectures. Therefore, energy measurements (capturing the CPU and DRAM) may vary significantly between different hardware platforms and cannot be directly compared or taken as the true energy consumption of the system. ↩︎
⚠ Note: This only works for single node jobs. ↩︎