Submit script files
Introduction
SLURM is the utility used at LaPalma for batch processing support, so all jobs must be run through it. This document provides information for getting started with job execution at LaPalma. Here we will describe the most important options and some examples of submission script files are available, but we recommend you check the SLURM Quick Start User Guide.
In order to keep the login nodes in a proper load, a 10 minutes limitation in the cpu time is set for processes running interactively in these nodes. Any execution taking more than this limit should be carried out through the queue system (see this FAQ).
Queues (QOS)
The user's limits are assigned automatically to each particular user
(depending on the resources granted by the Access Committee). Anyway you
are allowed to use the special queue debug
in order to perform some
fast short tests.
Queues |
Max CPUs |
Wall time limit |
---|---|---|
|
2400 |
72 hours |
|
1200 |
48 hours |
|
1200 |
24 hours |
|
64 |
30 min |
|
1 |
1 hour |
The specific limits assigned to each user depends on the priority granted by the access committee. Users granted with high priority hours will have access to a maximum of 2400 CPUs and a maximum wall clock limit of 72 hours. For users with low priority hours the limits are 1200 CPUs and 24 hours. If you need to increase these limits please contact the support group.
class_a
,class_b
andclass_c
: Queues assigned by the access committee and where normal jobs will be executed, no special directive is needed to use these queues, they will be automatically assigned.debug
: This queue is reserved for testing the applications before submitting them to the production queues. Only one job per user is allowed to run simultaneously in this queue, and the execution time will be limited to 30 minutes. The maximum number of cpus per application is 64. Only a limited number of jobs may be running at the same time in this queue. To use this queue add a directive in your script file, or also specify the queue when submitting without changing the script:#SBATCH --qos=debug - or - [lapalma1]$ sbatch --qos=debug script.sub
interactive
: Jobs submitted to this queue will run in the interactive (login) node. It is intended to run GUI applications that may exceed the interactive cpu time limit. Note that only sequential jobs are allowed. To use this queue launch the following command from login1 (see this FAQ):[lapalma1]$ salloc -p interactive
Submission directives
A job must contain a series of directives to inform the batch system about the characteristics of the job. These directives appear as comments in the job script, with the following syntax:
#SBATCH --directive=<value>
Some common directives have a shorter version, you can use both forms:
#SBATCH -d <value>
Additionally, the job script may contain a set of commands to execute. If not, an external script must be provided with the 'executable' directive. Here you may find the most common directives (complete list here):
-J ...
: Name of the job--qos <queue_name>
: The queue where the job is to be submitted. Let this field empty unless you need to usedebug
queue-t ...
: walltime (use formathh:mm:ss
ordays-hh:mm:ss
)-n ...
: Number of tasks, this is the normal way to specify how many cores you want to use-o /path/to/file_out
: Redirect standard output (stdout
) tofile_out
(use/dev/null
to ignore this output)-e /path/to/file_err
: Redirect error output (stderr
) tofile_err
(use/dev/null
to ignore this output)-D <directory>
: Execution will be performed in the specified directory (if it is not set, current directory will be used)
Note
Walltime (the limit of wall clock time) must be set using format
HH:MM:SS
orDD-HH:MM:SS
to a value greater than the real execution time for your application, bear in mind that your job will be killed after the period you specified. Shorter limits are likely to reduce the waiting time in the queue. If you do not specify any time limit, the maximum available in your assigned queue will be used.To avoid overwriting the standard and error output files if you submit several jobs, add
%j
to the filenames in order to automatically include the job Id in it (see examples)You can use the script
idlenodes
to know the number of idle nodes at that moment, that could be useful when deciding how many nodes you could ask for in order to wait less time in the queue (also you can useidlecores
to know the number of idle cores, that is 16 times the number of idle nodes).If you are running hybrid MPI+OpenMP applications, where each process will spawn a number of threads, use
--cpus-per-task=<number>
to specify the number of cpus allocated for each task (it must be an integer between 1 and 16, since each node has 16 cores), and then accordingly set the number of tasks per node with--ntasks-per-node=<ntasks>
(and/or the number of tasks per core with--ntasks-per-core=<ntasks>
, if needed). In this case, it could be also useful to specify the number of total nodes with-N
, instead of the number of tasks with-n
.Each node has 32 GB of memory, so when an application uses more than 1.7 GB of memory per process, it is not possible to have 16 processes in the same node. Then you can combine
--ntasks-per-node
and--cpus-per-task
directives to run less processes per node, so each of them will have more available memory (in this case some cores will stay idle, but they will still count to calculate the total consumed time, so try to minimize the wasted cores).Before submitting large jobs, please, perform some short tests to make sure your program is running fine. When running your jobs, check outputs and logs from time to time, and cancel the job if application fails.
There many more options, like specifying dependencies among jobs with
-d
directive, giving a starting time with--begin
, automatically requeue job if it fails with--requeue
, etc. (see complete list here and also this FAQ).
How to specify the submission options
You can specify these options in the command line:
[lapalma1]$ sbatch -J <job_name> -t <days-HH:MM:SS> <your_executable>
But we highly recommend you write all the commands in a file (called submission script file) so you can reuse it when needed. That file should have next sections:
Submission file must be a executable script (although no execute permit is needed) beginning with line
#!/bin/bash
SLURM options (as many as needed):
#SBATCH -directive [<value>]
Modules to be loaded (as many as needed). Your environment variables will be stored when submitting your job and then used when executing the program. This could be a problem if your environment at submission time is not the proper one to execute your programs (for instance, no path to executables or dynamic libraries are set), so we recommend you begin cleaning your environment with
module purge
and then load only the modules required by your program.Shell commands needed to run your application.
Once your script file is ready, you only need to use next command to submit it to the queue:
[lapalma1]$ sbatch script_file
If you need more information about how to manage your jobs, check also the Useful Commands (executions) and the FAQs.
Environment variables
Although is not needed in most situations, there are also some SLURM environment variables you can use in your scripts if you need them.
Variable |
Meaning |
---|---|
|
Specifies the job ID of the executing job |
|
Specifies the total number of processes in the job |
|
Is the actual number of nodes assigned to run your job |
|
Specifies the MPI rank (or relative process ID) for the current process. The range is from 0-( |
|
Specifies relative node ID of the current job. The range is from 0-( |
|
Specifies the node-local task ID for the process within a job |
|
Specifies the list of nodes on which the job is actually running |
|
Task ID inside the job array |
|
Job ID (it will be the same for all jobs array, the same as the |
Examples of submission script files
Attention
This section is obsolete, and we are moving the examples to the Slurm Sample batch scripts section.
Here you will find some examples about script files for different situations.
Note
If you copy and paste these examples, be careful because some
unwanted spaces may be added at the beginning of each line: make sure
that lines that contain parameters begin with #SBATCH
and there are no
spaces before these symbols.
Basic example (MPI)
You want to run your MPI
program called myprogram_mpi
using 64 cores
(4 nodes) and that should take about 5 hours (add always some extra time
because your application will be killed if it overpasses this wall time
limit):
#!/bin/bash
#############################
#SBATCH -J test_mpi
#SBATCH -n 64
#SBATCH -t 05:30:00
#SBATCH -o %x-%j.out
#SBATCH -e %x-%j.err
#SBATCH -D .
#############################
module purge
module load gnu openmpi/gnu
mpirun ./myprogram_mpi
# Use these other options if your MPI program does not run properly
# srun ./myprogram_mpi
# srun --mpi=pmi2 ./myprogram_mpi
Comments:
-n 64
: this script will run the applicationmyprogram_mpi
on 64 cores (4 nodes)-D .
The working directory will be the current one (where the submission was performed from)-o
and-e
: Two output files will be created, one for the standard output (-o
, extension.out
) and another one for errors (-e
, extension.err
). Note that we have used%x
, so those files will be named using the job name specified with-J
(test_mpi
). We have also included the parameter%j
in the names of those files, so the job ID will be added to them, in order to avoid overwriting the output files if we execute several times this script, since each execution will have a different job ID (this ID is shown when you submit the script usingsbatch
. You can also get it usingsqueue
once the submission is done and the job has not finished yet). For instance, if your job name wastest_mpi
and the job ID was1234
, files will be namedtest_mpi-1234.out
andtest_mpi-1234.err
.Remember that you cannot directly run MPI programs, you need to use
srun
ormpirun
to execute them. If no more arguments are added, the number of slots specified by theSBATCH
parameters will be used, but you can also force value usingsrun -n 20
,srun -n $SLURM_NTASKS
, etc. If you have problems running MPI programs (they do not initialize, or they are executed sequentially, change the command or options to run your program, you can use one of the next ones:mpirun
,srun
,srun --mpi=pmi2
, etc.Do not forget to load all needed modules. For instance, if you want to execute VASP, you will need to use next commands and later use
mpirun
to run VASP:module purge module load intel mkl vasp
Basic example (OpenMP)
You want to run your OpenMP
program called myprogram_omp
(written in
C
or fortran
) using 16 slots (this is the maximum number of
available slots with shared memory to run OpenMP). This program should
take about 30 minutes (add always some extra time because your
application will be killed if it overpasses this wall time limit)
#!/bin/bash
#############################
#SBATCH -J test_omp
#SBATCH -n 16
#SBATCH -t 00:45:00
#SBATCH -o %x-%j.out
#SBATCH -e %x-%j.err
#SBATCH -D .
#############################
module purge
module load gnu
export OMP_NUM_THREADS=16
./myprogram_omp
Comments:
Caution
Be sure you execute your OpenMP programs directly, and
you do NOT use mpirun
or srun
(unless you have a hybrid
MPI-OpenMP), since using them several instance of your OpenMP
program will be repeated.
Using this script our application
myprogram_omp
will be executed with 16 slots in one node (settingOMP_NUM_THREADS
to 16 is not really needed, since by default the number of tasks will be used. If for any reason you want to execute with a different number of threads, then you can use this variable to set it).The working directory will be the current one (where the submission was performed from).
-o
and-e
: Two output files will be created, one for the standard output (-o
, extension.out
) and another one for errors (-e
, extension.err
). Note that we have used%x
, so those files will be named using the job name specified with-J
(test_omp
). We have also included the parameter%j
in the names of those files, so the job ID will be added to them, in order to avoid overwriting the output files if we execute several times this script, since each execution will have a different job ID (this ID is shown when you submit the script usingsbatch
. You can also get it usingsqueue
once the submission is done and the job has not finished yet). For instance, if your job name wastest_omp
and the job ID was1234
, files will be namedtest_omp-1234.out
andtest_omp-1234.err
.
Jobs array
Jobs array and task generation can be used to run applications over different inputs.
Example of Job Array (parallel programs)
For instance, assume that you have 10 different input files (named
input000.dat
, input002.dat
, input004.dat
, ..., input018.dat
)
and you want to process each file with your MPI parallel program. Each
execution will use 32 cores and should not take more than 1 hour to
finish (we will add some extra time just to be sure). Then, your script
should be similar to the next one:
#!/bin/bash
##########################################################
#SBATCH -J test_MPI_jobsarray
#SBATCH -n 32
#SBATCH -t 0-1:10:00
#SBATCH --array=0-18:2
#SBATCH -o test_jobsarray-%A-%j-%a.out
#SBATCH -e test_jobsarray-%A-%j-%a.err
#SBATCH -D .
##########################################################
module purge
module load gnu openmpi/gnu
echo "#1 EXECUTING TASK ID: $SLURM_ARRAY_TASK_ID"
fmtID=$(printf "%03d" $SLURM_ARRAY_TASK_ID)
srun ./mpi_program -i input$fmtID.dat
Let us explain the parameters that we have used:
-n 32
is used to specify that we want that each task will be executed with 32 cores.With
--array
parameter we specify that we want that our job generates task. The task ID will be generated with format--array=ini-end:step
, so using0-18:2
the IDs will be0, 2, 4, ..., 18
. Some more examples:--array=1,4,5,8,12,65
will produce IDs1, 4, 5, 8, 12, 65
--array=1-10
will produce IDs1, 2, 3, ..., 10
(step
can be omitted when its value is1
)
Important
Note that we are asking for 10 tasks x 32 cores =
320 cores. Only submit large number of tasks when you are
totally sure that everything is working fine. If you are still
testing, submit only 2 or 3 short tasks to avoid wasting
resources. If you need to submit a really large number of tasks,
please, consider limiting the maximum number of simultaneously
running tasks. That could be done using syntax
--array=ini-end%limit
or --array=ini-end:step%limit
.
We have used the environment variable
$SLURM_ARRAY_TASK_ID
to access the task ID. We have converted the original format of the task ID assigned by SLURM (0, 2, 4, ..., 18
) to the required format (000, 002, 004, ..., 018
) using bashprintf
function, storing that value infmtID
(note that the right syntax isfmtID=$(...)
, with no space before of after the "=
" symbol).We have named our files
test_arrayjobs-%A-%j-%a
:%A
: The job array ID: it will have a fixed value, the same that the job submission. You can access to this value in your script using environment variable called$SLURM_ARRAY_JOB_ID
.%j
: The job ID, the first task will have the value given when submitting the job, and then next tasks will have successive values incremented by one. You can access to this value in your script using environment variable called$SLURM_JOBID
.%a
: The task ID: it will have the values that you specified using-array
parameter. You can access to this value in your script using environment variable called$SLURM_ARRAY_TASK_ID
, this is the variable that you will typically use to specify your input files or arguments.
For instance, if you submit the previous example and got the next message:
Submitted batch job 416
Then the generated files will be the next ones:
test_jobsarray-416-416-0.err #(%A = $SLURM_ARRAY_JOB_ID = 416, %j = $SLURM_JOBID = 416, %a = $SLURM_ARRAY_TASK_ID = 0)
test_jobsarray-416-416-0.out
test_jobsarray-416-417-2.err #(%A = $SLURM_ARRAY_JOB_ID = 416, %j = $SLURM_JOBID = 417, %a = $SLURM_ARRAY_TASK_ID = 2)
test_jobsarray-416-417-2.out
test_jobsarray-416-418-4.err #(%A = $SLURM_ARRAY_JOB_ID = 416, %j = $SLURM_JOBID = 418, %a = $SLURM_ARRAY_TASK_ID = 4)
test_jobsarray-416-418-4.out
test_jobsarray-416-419-6.err #(%A = $SLURM_ARRAY_JOB_ID = 416, %j = $SLURM_JOBID = 419, %a = $SLURM_ARRAY_TASK_ID = 6)
test_jobsarray-416-419-6.out
test_jobsarray-416-420-8.err #(%A = $SLURM_ARRAY_JOB_ID = 416, %j = $SLURM_JOBID = 420, %a = $SLURM_ARRAY_TASK_ID = 8)
test_arrayjobs-416-420-8.out
test_arrayjobs-416-421-10.err #(%A = $SLURM_ARRAY_JOB_ID = 416, %j = $SLURM_JOBID = 421, %a = $SLURM_ARRAY_TASK_ID = 10)
test_jobsarray-416-421-10.out
test_jobsarray-416-422-12.err #(%A = $SLURM_ARRAY_JOB_ID = 416, %j = $SLURM_JOBID = 422, %a = $SLURM_ARRAY_TASK_ID = 12)
test_jobsarray-416-422-12.out
test_jobsarray-416-423-14.err #(%A = $SLURM_ARRAY_JOB_ID = 416, %j = $SLURM_JOBID = 423, %a = $SLURM_ARRAY_TASK_ID = 14)
test_jobsarray-416-423-14.out
test_jobsarray-416-424-16.err #(%A = $SLURM_ARRAY_JOB_ID = 416, %j = $SLURM_JOBID = 424, %a = $SLURM_ARRAY_TASK_ID = 16)
test_jobsarray-416-424-16.out
test_jobsarray-416-425-18.err #(%A = $SLURM_ARRAY_JOB_ID = 416, %j = $SLURM_JOBID = 425, %a = $SLURM_ARRAY_TASK_ID = 18)
test_jobsarray-416-425-18.out
If you try the squeue
command, you will see each task in a different
line and the job ID will be formed by two values: XX_YY
where XX
is
the job array ID and YY
is the task ID. If needed, you can cancel all
tasks or just some of them. For instance, try:
[lapalma1]$ scancel 416
[lapalma1]$ scancel 416_2
[lapalma1]$ scancel 416_[6-8]
[lapalma1]$ scancel 416_8 416_16
For further information, check the jobs array documentation