Slurm < Cluster

Fehlerbehebung

Fehler slurm_receive_msg: Zero Bytes were transmitted or received

Ursache: munge auth fehlerhaft

Node down

scontrol update NodeName=NODENAME State=RESUME

Features

slurmctl-Server down -→ Knoten (und Jobs) laufen weiter
mysql-Daemon down -→ kein Submit von Jobs, aber laufende Jobs brechen nicht ab

Partitionen (Queues) und Gruppenbeschränkungen

Partitionen können in der slurm.conf auf Gruppen (bzw. Accounts) beschränkt werden -→ Nutzer, die nicht der Gruppe angehören, sehen die Partition gar nicht erst
Versucht ein Nutzer in eine für ihn nicht freigegebene Queue zu submitten, bekommt er den Hinweis, dass er für diese Queue nicht freigegeben ist

Todo

ACLs: Benutzer - Account - Partition
check MPI
→ routing queue
ACLs: miid-db sync
test GPU job
job preemption
SLURM Energy Accounting Plugin

Config/Plugins

use priority FIFO scheduler with backfill

SchedulerType=sched/backfill

sort queued jobs after priority with 5 factors: Age, Fair-share, size, Partition, QOS

PriorityType=priority/multifactor

select nodes using consumable resources with CORES and MEMORY

SelectType =select/cons_res
SelectTypeParameters =CR_Core_Memory

manage a node's resources via cgroups

TaskPlugin=task/cgroup

track processes with cgroups

ProctrackType=proctrack/cgroup

GPU with general resources (gres)

GresTypes=gpu

preemption for bulldozer nodes: jobs scheduled on partition 'amd' may preempt running jobs

PreemptType=preempt/partition_prio
PreemptMode=SUSPEND

acounting via slurm db

AccountingStorageEnforce=associations

Consumable Resource Allocation Plugin: select/cons_res

SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory

Multi-factor Job Priority Plugin

# Activate the Multi-factor Job Priority Plugin with decay
PriorityType=priority/multifactor
# 2 week half-life
PriorityDecayHalfLife=14-0
# The job's age factor reaches 1.0 after waiting in the
# queue for 2 weeks.
PriorityMaxAge=14-0

Comments

Zufallsfunde: Im slurm.conf

Es muessen Ports verabredet werden, sonst gibts kein slurm-eigenes mpi. E.g. MpiParams=ports=112233-123456
Fuer automatisches Anlaufen der Knoten nach Reboots reicht '1' nicht aus, sondern es braucht ReturnToService=2

-- Cluster.stucki - 24 Apr 2014

Nullnummer(Gliederung) der zu schreibenden Seite zu SBATCH/SRUN/mpi SlurmIntro

-- Cluster.stucki - 25 Apr 2014

Am manchen Stellen steht AdminLevel={0,1,2} - Setzen geht nur mit
sacctmgr …user... set AdminLevel ={none|operator|admin}
Wer 'admin' ist, kann mit sview im AdminMode Reservations und Partitions erzeugen.

-- Cluster.stucki - 29 Apr 2014

Partition(Queue) An/Pause/Leeren/Aus
scontrol update partitionname=NNN State={ UP | INACTIVE | DRAIN | DOWN }

-- Cluster.stucki - 30 Apr 2014

##. (from the faq) What process should I follow to add nodes to Slurm?

The slurmctld daemon has a multitude of bitmaps to track state of nodes and cores in the system. Adding nodes to a running system would require the slurmctld daemon rebuild all of those bitmaps, which the developers feel would be safer to do by restarting the daemon. Communications from the slurmd daemons on the compute nodes to the slurmctld daemon include a configuration file checksum, so you probably also want to maintain a common slurm.conf file on all nodes. The following procedure is recommended:

Stop the slurmctld daemon (e.g. "/etc/init.d/slurm stop" on the head node)
Update the slurm.conf file on all nodes in the cluster
Restart the slurmctld daemon (e.g. "/etc/init.d/slurm start" on the head node)
Start the slurmd daemons on the new nodes (e.g. "/etc/init.d/slurm start" on those node)
Have all slurmd daemons read the new configuration file (e.g. "scontrol reconfig", no need to restart the daemons)

-- Cluster.stucki - 27 Jun 2014

This topic: Cluster > Slurm
Topic revision: 27 Jun 2014, stucki

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback