Fehlerbehebung
Fehler slurm_receive_msg: Zero Bytes were transmitted or received
Ursache: munge auth fehlerhaft
Node down
scontrol update NodeName=NODENAME State=RESUME
Features
- slurmctl-Server down -→ Knoten (und Jobs) laufen weiter
- mysql-Daemon down -→ kein Submit von Jobs, aber laufende Jobs brechen nicht ab
Partitionen (Queues) und Gruppenbeschränkungen
- Partitionen können in der slurm.conf auf Gruppen (bzw. Accounts) beschränkt werden -→ Nutzer, die nicht der Gruppe angehören, sehen die Partition gar nicht erst
- Versucht ein Nutzer in eine für ihn nicht freigegebene Queue zu submitten, bekommt er den Hinweis, dass er für diese Queue nicht freigegeben ist
Todo
- ACLs: Benutzer - Account - Partition
- check MPI
- → routing queue
- ACLs: miid-db sync
- test GPU job
- job preemption
- SLURM Energy Accounting Plugin
Config/Plugins
- use priority FIFO scheduler with backfill
SchedulerType=sched/backfill
- sort queued jobs after priority with 5 factors: Age, Fair-share, size, Partition, QOS
PriorityType=priority/multifactor
- select nodes using consumable resources with CORES and MEMORY
SelectType =select/cons_res
SelectTypeParameters =CR_Core_Memory
- manage a node's resources via cgroups
TaskPlugin=task/cgroup
- track processes with cgroups
ProctrackType=proctrack/cgroup
- GPU with general resources (gres)
GresTypes=gpu
- preemption for bulldozer nodes: jobs scheduled on partition 'amd' may preempt running jobs
PreemptType=preempt/partition_prio
PreemptMode=SUSPEND
AccountingStorageEnforce=associations
- Consumable Resource Allocation Plugin: select/cons_res
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
- Multi-factor Job Priority Plugin
# Activate the Multi-factor Job Priority Plugin with decay
PriorityType=priority/multifactor
# 2 week half-life
PriorityDecayHalfLife=14-0
# The job's age factor reaches 1.0 after waiting in the
# queue for 2 weeks.
PriorityMaxAge=14-0
Zufallsfunde: Im slurm.conf
- Es muessen Ports verabredet werden, sonst gibts kein slurm-eigenes mpi. E.g.
MpiParams=ports=112233-123456
- Fuer automatisches Anlaufen der Knoten nach Reboots reicht '1' nicht aus, sondern es braucht
ReturnToService=2
-- Cluster.stucki - 24 Apr 2014
Nullnummer(Gliederung) der zu schreibenden Seite zu SBATCH/SRUN/mpi
SlurmIntro
-- Cluster.stucki - 25 Apr 2014
Am manchen Stellen steht
AdminLevel={0,1,2}
- Setzen geht nur mit
sacctmgr …user... set AdminLevel ={none|operator|admin}
Wer 'admin' ist, kann mit
sview
im
AdminMode
Reservations und Partitions erzeugen.
-- Cluster.stucki - 29 Apr 2014
Partition(Queue) An/Pause/Leeren/Aus
scontrol update partitionname=NNN State={ UP | INACTIVE | DRAIN | DOWN }
-- Cluster.stucki - 30 Apr 2014
##. (from the faq) What process should I follow to add nodes to Slurm?
The slurmctld daemon has a multitude of bitmaps to track state of nodes and cores in the system. Adding nodes to a running system would require the slurmctld daemon rebuild all of those bitmaps, which the developers feel would be safer to do by restarting the daemon. Communications from the slurmd daemons on the compute nodes to the slurmctld daemon include a configuration file checksum, so you probably also want to maintain a common slurm.conf file on all nodes. The following procedure is recommended:
- Stop the slurmctld daemon (e.g. "/etc/init.d/slurm stop" on the head node)
- Update the slurm.conf file on all nodes in the cluster
- Restart the slurmctld daemon (e.g. "/etc/init.d/slurm start" on the head node)
- Start the slurmd daemons on the new nodes (e.g. "/etc/init.d/slurm start" on those node)
- Have all slurmd daemons read the new configuration file (e.g. "scontrol reconfig", no need to restart the daemons)
-- Cluster.stucki - 27 Jun 2014