preloader

HSUper provides different node types and therefore also different partitions in which compute jobs can be run. In the following, you find a list of available partitions and their restrictions.

Partitions

Partition NameNodes per JobWall-clock LimitConcurent Jobs LimitNodesRemarks
dev1-21h11-571For testing purposes only, max. two queued jobs
small1-572h-3-571Regular nodes, exclusive node reservation
small_shared1-572h-3-571Same settings as small but node resources are by default shared
small_fat1-524h-572-576Fat memory nodes, exclusive node reservation
small_gpu1-524h-gpu 1-5Up to two GPUs can be allocated per job
medium6-25624h-3-571Regular nodes, exclusive node reservation
medium-s6-3224h53-571Regular nodes, exclusive node reservation
medium-m33-6424h33-571Regular nodes, exclusive node reservation
medium-l65-25624h13-571Regular nodes, exclusive node reservation
large>25624h-3-571Regular nodes, exclusive node reservation ! available to selected users only !

Please note: We are currently working on a fairer job partition scheme. The idea is to limit the medium partition with the amount of concurrent jobs. This is currently set to 70 and may be reduced to 30 for the small partition. The current medium partition would be retired / limited to one concurent job.

Concurrent Jobs

The Concurent Jobs Limit column defines how many jobs can run concurrently per User. In general at most 1000 jobs can be submitted per user. This is to prevent very slow job scheduling.

You may use the Quality of Service preempt (e.g. #SBATCH --qos=preempt) to be allowed to have up to 1000 jobs running at the same time, but as soon as a job with a higher priority than yours is queued, your job will be cancelled within 30 seconds. So make sure to handle signals to create checkpoints or create checkpoints periodically to be able to resume from your last state once your job is scheduled again. Cancelled jobs due to preempting are automatically requeued.