Contemporary research pipelines often employ Python - either for data analysis or even for data generation (e.g., simulation) itself. On HSUper, Python interpreters and corresponding packages can be installed via Spack or via miniforge3, where the latter will be most suitable for typical users. This article briefly describes how to configure miniforge3 on HSUper and provides an example installation of typical deep learning libraries to a user-defined environment.
Please note that Python and pip are already installed on each node, but they’re only available via an alias for the respective Python version, e.g.,
python3.6
andpip3.6
. Be aware of this distinction to avoid confusion with system-wide installations.
miniforge3 is a management system for your scientific Python stack. It is available via the module system and the initial setup can be performed via
module load miniforge3
conda init
You need to execute the conda init
command only once.
To restore your shell configuration to its original state, you can run conda init --reverse --all
.
Note that you need to close and re-open your SSH connection to HSUper in order for the changes to take effect.
The initial configuration adds conda
permanently to your environment. However, to use the conda
command you still need to load the miniforge3 module.
module load miniforge3
Since conda relies on a specific Python version and the
python
alias, attempting to use theconda
command without first loading the miniforge3 module will result in an error message.
Typically, conda-users define a conda-environment for each project. Such an environment contains a dedicated python interpreter and all the respective libraries. Environments do not interfere with each other (i.e., you can install different versions of the same library to different environments without causing any problems).
As an example, consider a project in which you perform image classification using the PyTorch
library. Assume further, that your specific workflow is only compatible with (slightly outdated) Python v3.11. At the start of the project, you then define a new conda-enviroment entitled img_class_project
with the correct Python version by pasting the following command into your terminal:
conda create -n "img_class_project" python=3.11
When creating a new conda environment, you can optionally specify which Python version to use upfront, avoiding the need to install it separately later.
Installing Python with conda currently breaks the
conda
command. A workaround is to install everything within an environment that has no python version installed by conda usingconda install <package> -n <myenv>
. This was necessary when using miniforge3 version 24.3.0.
After confirming the installation, you can now switch into your newly created enviroment by using
conda activate img_class_project
As of miniforge3 version 24.3.0, running the
conda
command will exit with an error if Python in the active environment was installed usingconda install
.
Note that you can deactivate your currently active enviroment with the command
conda deactivate
Let us again assume the image classification project from above and switch into the newly created environment as described before. Typically, users want to install packages using pip
. However, if you used miniforge3 version 24.3.0, the conda solver already installed pip if you defined a Python version for the environment or installed it manually. If pip is not yet present in your environment, execute the following command to install pip into your currently active environment
conda install pip
or you can install from any environment packages to another one using the following command:
conda install pip -n "img_class_project"
The conda solver installs pip alongside with Python, which breaks (with miniforge3 version 24.3.0) the
conda
command [ModuleNotFoundError: No module named ‘conda’]. To avoid this issue, simply runconda
commands from outside the environment where Python was installed using conda:
You can then install any pip
-compatible package, e.g.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
Note again, that the respective package will only be installed into your current environment. Other environments will remain unaffected.
Using conda together with the job scheduler SLURM on HSUper is fairly trivial. Yet, users must distinguish between regular jobs (i.e., using jobscripts) and interactive jobs.
Users can use their regular jobscripts as described in the documentation e.g.,
#!/bin/bash
#SBATCH --job-name=conda_tutorial
#SBATCH --partition=small_gpu
#SBATCH --nodes=1
#SBATCH --time 1-00:00:00
#SBATCH --gpus=1
module load miniforge3
python main.py
SLURM will execute the respective job in the conda environment that is active at time of job submission i.e., when using sbatch
.
Unlike with jobscripts, SLURM does not transfer the current conda environment into an interactive job. Users must hence activate the desired environment again after ssh’ing to the allocated interactive node. Consider the following example
salloc --time=10:00 --partition=dev
#...
# e.g. ssh node002, if this node was allocated for the interactive job
module load miniforge3
conda activate img_class_project
# execute your python commands e.g., python main.py