Cook Book: Job Arrays
Job arrays allow you to create many jobs from one
qsub
command.
You generate them by calling
qsub
with the
-t
parameter to which you can pass comma delimited lists or ranges as explained
here.
Simple Job Arrays
The following simple job script will create 10 identical jobs which print the id in the array.
The ids are 1, 2, …, 10.
#!/bin/bash
#PBS -t 1-10
#PBS -N array_test
#PBS -d /home/<USER>/tmp
#PBS -o testjob.out
#PBS -e testjob.err
#PBS -M <EMAIL ADDRESS>
#PBS -l walltime=00:01:00
#PBS -l nodes=1:ppn=1
#PBS -l mem=10mb
echo ${PBS_ARRAYID}
We can submit this with:
$ qsub job_script.sh
You can control which job ids to generate by specifying the
-t
parameter from the outside.
For example, this script is identical to the one above but does not contain a
#PBS -t
line.
#!/bin/bash
#PBS -d /home/<USER>/tmp
#PBS -o testjob.out
#PBS -e testjob.err
#PBS -M <EMAIL ADDRESS>
#PBS -l walltime=00:01:00
#PBS -l nodes=1:ppn=1
#PBS -l mem=10mb
echo ${PBS_ARRAYID}
We can now submit this with
$ qsub -t 10,33-44 job_script.sh
Job Arrays And Strongly Differing Jobs
While sometimes it might be easy to infer the part of work to do from a numeric id (i.e. the id of a matrix tile), it might get harder in some cases.
One good way is to first create many job script files which contain the work to do for one number.
Such scripts can be called
job_${JOB_ID}.sh
, for example.
In our example, we create a script
gen.sh
which contains the following:
#!/bin/bash
JOB=0
for i in `seq 10`; do
for animal in dog cat cow; do
let "JOB=${JOB}+1"
echo "#!/bin/bash" > job_${JOB}.sh
echo "echo 'The ${i}th ${animal} says Hello.'" >> job_${JOB}.sh
done
done
Executing this script now gives us 10 files called
job_1.sh
through
job_10.sh
.
We then write a wrapper script to call these files:
#!/bin/bash
#PBS -d /home/<USER>/tmp
#PBS -o testjob.out
#PBS -e testjob.err
#PBS -M <EMAIL ADDRESS>
#PBS -l walltime=00:01:00
#PBS -l nodes=1:ppn=1
#PBS -l mem=10mb
date
hostname
echo "PBS_ARRAYID=${PBS_ARRAYID}"
bash job_${PBS_ARRAYID}.sh
date
Note that printing the
hostname
is a good idea since it gives you somewhat of an idea of which of your jobs are executed on which node.
This allows you to infer whether too many jobs were execute concurrently if they run multithreaded.
Printing the
date
is a good idea to get an idea of how long your job ran.
The script above can now be called with different settings for
-t
:
# Launch jobs 1-10.
$ qsub -t 1-10 job_script.sh
# Launch all jobs.
$ qsub -t 1-30 job_script.sh
# Launch selective jobs.
$ qsub -t 1,3,5,13,19 job_script.sh