Burros
Introduction
Users who need to run CPU- or memory-intensive jobs, which are unsuitable for their own PCs or other IAC's Supercomputing resources (like LaPalma, TeideHPC, HTCondor system, etc.), can access any of several High performance Linux PCs. They are also suitable for developing, debugging and testing parallel applications before submitting them to other Supercomputers.
Warning
Some of these machines have a huge disk space (about 20TB). Don't abuse it! There are no backups of your data on any of these machines, so don't use it like a storage system. Do not forget to delete or move your data to other locations once your executions are done to make room for other users.
At present there are two types of "burros" available:
Public burros.
These are machines purchased with general research area funds, and as such are available to all IAC personnel under the same conditions.
Open-project burros.
These are machines purchased with funds of a given project, which has graciously allowed researchers not belonging to the project to make use of the "burro", but with certain conditions, which are explained later in the Queue system section.
Note
The list of available public burros is only accesible from the internal documentation.
Note
The list of available open-project burros is only accesible from the internal documentation.
You can use the available IAC's "burros" to develop and debug your parallel applications. They are normal machines with higher resources (a bigger amount of cores, RAM, etc.) that are available for our researchers so they can use them for general tasks, like testing their parallel codes.
Connecting (text based)
Simply use a ssh
connection, like you do with any other IAC's machine
[...]$ ssh <user>@<burro>
Note
For burros with a queue system, only a small portion of the system will be available when you connect via ssh. It is enough to navigate to different folders and edit files, but for compiling and running applications you must use the queue system.
Connecting (X based)
As far as possible it is recommended to work in text mode, since you use fewer
resources and the interaction will be faster, but sometimes you might need to
start some graphical application. In this case you might be tempted to simply
connect with ssh -X
, which will work, but it will be very slow.
A much better option is to use the Remote Desktop Protocol (RDP). RDP is available by default in all public burros, and from your computer you can use a number of different applications to connect to a "burro" with RDP (in Linux, Remmina is a popular one). If you need assistance with this, please do get in touch with us.
Note
As for text based connections, remember that in burros with a queue system, only a small portion of the system will be available to your graphical session. This should be enough for very light work, but for anything more demanding, you should use the queue system.
For example, if you need to run a demanding IDL job in graphical mode, you can use RDP to get a graphical session, then open a terminal, and inside it request an interactive session, and from it launch IDL.
Note
RDP sessions will remain active even after you disconnect from your client. This is very useful if you want to leave something running and want to connect again at a later time. But active sessions will consume resources, so when you are done with your session please remember to close it completely by logging out from the remote desktop environment.
Queue system
The queue system used for the public and open-project "burros" is Slurm, widely used in many research institutes and supercomputing centres. You can find a general guide on how to use Slurm at IAC machines here.
Below we describe the default configuration and some commands specific to the Slurm installation in the IAC public "burros". Open-project "burros" share most of the configuration of a public "burro", but with some usage conditions, which are described in the section on open-project burros below.
Default Slurm configuration in a public "burro"
Note
This configuration can be modified in project "burros" to take into account the needs of each research group.
When a workstation is shared amongst several users it is easy to oversubscribe it (e.g. to run more processes than available cores), which makes all applications to perform worse and it is usually detrimental to all users. To avoid this problem, the default Slurm configuration in a "burro" is as follows:
Limited CPU power
A default ssh or graphical session is limited to the CPU power equivalent to 3/4 of a core. This should be enough for editing files, checking on the state of the machine, etc. but certainly not enough for running any intensive computer program. For this, you will need to use the Slurm queueing system, either using an interactive session or submitting a batch job.
Important
Slurm accounts are independent of the Linux system accounts, so before you use Slurm you will need to get a Slurm account:
if using a project burro, contact the PI of the project.
if using a public or an open-project burro, you can activate an account by yourself with the
slurm_addme
command. The following example shows how a first call to thesinter
command (see section Getting and interactive session below) is not successful because the account foruser
was not yet created.slurm_addme
creates the account and the followingsinter
command is successful.
[user@burro ~]$ sinter -n 2
salloc: error: Job submit/allocate failed: Invalid account or account/partition combination specified
[user@burro ~]$ slurm_addme
Associations =
U = user A = ddgroup C = burro
Non Default Settings
[user@burro ~]$ sinter -n 2
salloc: Granted job allocation 127003
salloc: Nodes burro are ready for job
(sinter) [user@burro ~]$
Partitions configuration
In order to have a reasonable balance of batch/interactive jobs, we configure two "partitions" (or "queues").
batch
, meant for long heavy-computation jobs, with the following limits:Max CPUs: number of hardware threads of node (if hardware threads are disabled in the node, this is the same as the number of cores)
Max time: 2 days (QOS:
normal
) or 7 days (QOS:long
). With Slurm you can specify a Quality of Service (QOS) for each job. We have configured thebatch
partition with two QOSs:normal
(default QOS), with max time of jobs of 2 days;long
, with max time of jobs of 7 days, but with a greatly reduced priority when compared to thenormal
QOS. If possible, try to avoid using thelong
QOS (it is much better if you can cut your job in smaller parts or if you can prepare your application to frequently create checkpoints that can be used to restart your application with the state saved by a previous run). If really necessary, request this QOS with--qos=long
.
interactive
, meant for lighter interactive jobs, with the following limits:Max CPUs per job: number of hardware threads of node
Max CPUs per partition: 2 * (number of hardware threads of node) [via oversubscription]
Max time: 8 hours
At any time you can check the actual configuration of the partitions
with the sinfo
command. For example:
$ sinfo -O SocketCoreThread,Partition,Maxcpuspernode,time
S:C:T PARTITION MAX_CPUS_PER_NODE TIMELIMIT
2:28:2 batch* 112 7-00:00:00
2:28:2 interactive 112 8:00:00
Note
While having fine-grained control of where each process runs is not necessary for many jobs, those running parallel applications and advanced users might want to read our notes in section CPU Management).
Important
It might seem obvious, but remember that the "interactive" queue is for "interactive" jobs only!
Acceptable use cases for the interactive queue are, for example: if you need to test whether your code compiles OK with different compilation options; if you are writing code and need to test if it works as you develop it; if you are debugging code; if you need to run a GUI application, etc.
Non-acceptable use cases are running long CPU-intensive jobs that do not require your interaction in a long time (10 minutes or more).
This is so because jobs in these two partitions can overlap, so batch
jobs can run alongside interactive
ones (and this assumes that
interactive jobs are not as computation intensive as batch ones, and
allows a controlled oversubscription of the workstation).
In total, up to three times the number of hardware threads could be
allocated at the same time (one in the batch
partition, and two in the
interactive
partition). This works OK only if the jobs in the
interactive
partition are light jobs, GUI apps interactions, Python
interactive sessions, etc.
Don't abuse the system by submitting long CPU-intensive jobs in the "interactive" queue or the system administrators will show you a yellow (or even a red) card!
Tracking Memory usage
In a shared "burro", memory usage also needs to be tracked, so that we ensure that a job cannot saturate the available RAM (which can negatively impact on other jobs' performance and even bring the whole system down). The way we track memory usage is as follows:
Note
Memory limitation is activated by default in all public and open-project "burro"s, but not in project ones. If you want your "burro" to have this limitation, please let us know.
by default, when submitting a job to Slurm, each allocated CPU will also be allocated a default RAM amount. Since we have to guarantee that a machine at full capacity does not consume more memory than available, and given that up to three times the number of hardware threads could be allocated at any given time (see above), the default RAM allocated per CPU is:
\[(1/3) Total\_memory / N_\mathrm{threads}\]As an example, one of our "burro"s has 32 hardware threads, and a total of 256 GB of RAM, so the default RAM allocated per CPU is 2.67 GB, which can be checked with the following command (output is in MB):
$ scontrol show config | grep DefMemPerCPU DefMemPerCPU = 2667
if more/less memory is required, you can use the
--mem-per-cpu
or the--mem
options to specify the required amount.For example, if we only need one CPU, but we require 60 GB of memory, the default value of 2.67 GB will not be enough, and the job will be killed when this value is exceeded. To avoid that, we can simply submit a job with a batch script similar to the one below (see section Batch jobs below for more information on batch scripts).
#!/bin/bash ############################# #SBATCH -J test #SBATCH -n 1 #SBATCH -t 00:15:00 #SBATCH --mem=60G ############################# ./your_job
Sometimes, it is perhaps easier to specify the amount of memory requested per process (for example, in MPI parallel applications), so in that case we could use a batch script similar to:
#!/bin/bash ############################# #SBATCH -J test #SBATCH -n 10 #SBATCH -t 00:15:00 #SBATCH --mem-per-cpu=6G ############################# srun ./your_mpi_job
Tracking GPU usage
In those "burro"s that have GPUs, their usage is also tracked with Slurm, so that a user can have exclusive access to some or all the GPUs. Slurm is configured in a way that jobs requesting a GPU have a much larger priority than other jobs, but non-GPU jobs can also use the "burro". In the Introduction section you can see the table of currently available "burro"s, and whether they have GPUs.
The way that GPU usage and tracking is integrated with Slurm in a "burro" is as follows:
You will be able to use the GPU card only from inside a Slurm job, either interactive or batch. As a matter of fact, outside of Slurm it will look like the "burro" has no GPUs (e.g. the
nvidia-smi
command will report aFailed to initialize NVML: Unknown Error
).You request a GPU by adding the Slurm option
--gres=gpu
. Once allocated, you will have exclusive access to the GPU for the duration of the job.
You can get more information and examples of use of the GPUs at the burros here.
Integration of Slurm with periodic reboots
Due to security considerations the IAC "burros" need to be rebooted regularly whenever security patches for any of the Ubuntu packages are published. The exact date of these reboots is not fixed, but we can expect in the order of one or two reboots per month.
For machines without Slurm these reboots can be quite disruptive since the reboots will kill all running processes (messages are broadcasted to all open terminals in each "burro" to be rebooted, but it is quite easy to miss them if you are executing long-running processes).
"Burros" using Slurm nicely integrate the information about periodic reboots so as to make sure that no jobs will be running when the workstation is rebooted. See the "note" below to understand how this feature works.
Note
Whenever a reboot is scheduled, we automatically create a Slurm reservation of 30 minutes around the exact time when the reboot will take place. For example, if a reboot is scheduled for tomorrow at 07:00, a Slurm reservation of 30 minutes will be placed around that time and Slurm will make sure that no jobs can be running during that reservation.
So, if you submit now a 48 hours job, it will not start running before the reboot because if it were to use the full allocated 48 hours it would clash with the reservation of tomorrow. In this case, you can either reduce the allocation of the job so that it will end before the reservation begins, or you can leave the job in the queue and it will start running after the reboot whenever Slurm can find a suitable allocation slot for it.
You can see if there are planned reservations by using the command:
scontrol show res
Also, if you submit a job that clashes with a planned reservation,
you will see when using the squeue
command that the job stays in the queue
in the PENDING
(PD
) state, and that the NODELIST (REASON)
column reads
something like (ReqNodeNotAvail, Reserved for maintenance)
.
Getting an interactive session
Before the installation of
Slurm, users
ssh
-ed to the "burros" and executed jobs interactively in that
session. As stated above, this will now (by default) give you only the
CPU power equivalent to 3/4 of a core. In order to obtain a more
powerful interactive session you need to make use of the sinter
command, specifying the number of cores requested and the time limit for
the session. The syntax for sinter
is:
sinter -n <n_cores> -t <hh:mm:ss>
For example, if you want to request a 10 core interactive session for
two and a half hours, you could do (note that (sinter)
is added to
your command line to remind you that you are now in a Slurm interactive
session):
[user@burro ~]$ sinter -n 10 -t 02:30:00
salloc: Granted job allocation 127221
salloc: Nodes burro are ready for job
(sinter) [user@burro ~]$
Note
Do not ask for more cores than you need, as other people may be waiting to use the "burro".
After the requested time has passed, the session will expire and
sinter
must be called again.The shorter the allocation time, the more likely that your job can start right away.
If you finish working, type
exit
to quit the interactive session, so you can make the cores available for other users.For advanced users, note that
sinter
is just a wrapper around the standard salloc Slurm command.
MPI Jobs
If you want to test MPI jobs while in the interactive session, you just
need to use the srun
command, while specifying the number of tasks
you want to use (less or equal to the number of cores you requested for
the interactive session). So, for example, after requesting the 10 core
interactive session as per the example above, you could simply run a 4
core MPI job by simply running:
(sinter) [user@burro ~]$ srun -n 4 ./your_mpi_application
Note
Not specifying the number of tasks to use and just executing srun
./your_mpi_application
is equivalent to requesting all the cores requested
for the interactive session.
Warning
If your application has been compiled with Intel MPI, you should add
--mpi=pmi2
to your srun
command (for example, srun -n 4 --mpi=pmi2
./your_mpi_application
).
Tip
Running a Python script using MPI would simple require srun python
script.py
.
Running IPython in parallel (using MPI) in an
interactive session is doable (see IPython Parallel), but certainly
not straightforward. If you just execute ipython
in an interactive
session and then try to load MPI with mpi4py
(either directly or via
another package that you use) you will receive an error message that says
The application appears to have been direct launched using "srun", but OMPI
was not built with SLURM's PMI support and therefore cannot execute
. If you
really need this, please do get in touch with us and we will try to get it
working.
For debugging purposes, you can launch IPython in an interactive session with
srun -n 1 --pty ipython
. This will give you only one MPI rank (the same
as if you were to launch IPython without Slurm), but your code will be able
to load the MPI libraries without errors and if it uses other
multi-processing libraries it will be able to use the number of cores
requested for the interactive session.
Batch jobs
For general information about Slurm and how to submit batch jobs, please refer to the IAC-wide guide to Slurm.
Below we provide some information applicable only to the "burros".
Getting job notifications by e-mail
If you want to receive notification by e-mail of when your job begins and ends, add the following to the "batch script" (see Batch jobs).
#SBATCH --mail-type=BEGIN,END
#SBATCH --mail-user=<e-mail>
Warning
Please note that mails will only be sent to IAC mail addresses.
As an example, when your job ends, you will receive an e-mail containing information like the following:
######################## JOB EFFICIENCY REPORT ########################
# Job ID: 127267
# Cluster: <clustername>
# User/Group: <username>/<groupname>
# State: COMPLETED (exit code 0)
# Cores: 1
# CPU Utilized: 00:00:11
# CPU Efficiency: 84.62% of 00:00:13 core-walltime
# Wall-clock time: 00:00:13
# Memory Utilized: 1.30 MB
#######################################################################
MPI Jobs
Slurm and OpenMPI in the "burros" are nicely integrated, so when
submitting a batch MPI job you simply have to use the command srun
,
without worrying about the number of cores needed, which is taken
directly from the Slurm allocation, or the Process Management Interface
(PMI). As a basic example, your submission script could be simply as
follows:
#!/bin/bash
#############################
#SBATCH -J your_job
#SBATCH -n 4
#SBATCH -t 00:01:00
#SBATCH -o %j.out
#SBATCH -e %j.err
#SBATCH -D .
#############################
srun ./your_mpi_application
Warning
If your application has been compiled with Intel MPI, you should add
--mpi=pmi2
to your srun
command (for example, srun -n 4 --mpi=pmi2
./your_mpi_application
).
Tip
If you want to run an MPI job written in Python (for example, using mpi4py) you would simply need a
command like srun python script.py
.
Job efficiency
Tip
You can check the real-time memory and cpu efficiency of your jobs directly in this webpage.
It is in your own and in all users' interest that all jobs running within Slurm are as efficient as possible, as this means that more science can be done with the limited resources available. You should take into account three different efficiencies: time, CPU and memory efficiency. For those "burro"s with GPUs, you should also consider GPU efficiency. We will include information about this shortly.
Time efficiency. This is the ratio of the actual wallclock time your job took until completion to the requested time. So, for example, if you set a time limit of one day for a job that only takes twelve hours to complete, the time efficiency of this job will be 50%.
CPU efficiency. This represents how efficiently allocated CPUs were being used while a job was running. For example, a job that was allocated two CPUs runs for ten hours, having a 100% load in both CPUs for the first five hours but one of the CPUs being completely idle for the remaining five hours, will have a CPU efficiency of 75%.
Memory efficiency. This represents the ratio of the peak memory consumption of your job to the allocated memory. Thus, for example, if a job's RAM consumption peaks at some point of the execution at 45GB when the allocated memory was 120GB, its Memory efficiency will be 37.5%.
Measuring your job's efficiencies
In order to help you make your jobs more efficient, you can use two tools that we have deployed in the "burros".
promseff
for a single running job
After your job has finished (either succesfully completed, cancelled or timed-out), an efficiency report will be attached to the standard output of your job.
You can get the same report while the job is still running using the
promseff
utility.
[user@burro:~]$ promseff -j 132503
#################~ JOB EFFICIENCY REPORT ~##################
# JobID: 132503 #
# Cluster: burro #
# Cores: 16 #
# Wall-clock time: 17.86 hours #
# User-CPU Utilized: 170.77 CPU-hours #
# User-CPU Efficiency: 59.74 % #
# System-CPU Utilized: 1.56 CPU-hours #
# System-CPU Load: 0.54 % #
# Memory Utilized: 7.06 GB #
# Memory Efficiency: 2.30 % #
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
# If your job has low CPU Efficiency or you have doubts #
# about setting up a job, do not hesitate and contact us: #
# res_support@iac.es #
############################################################
Note
Due to restriction to access network-shared resources, the reports will not be generated if
the output of the job is in the /home
or /net
. Please use the local
disks for the jobs.
rseff
for multiple completed jobs
In order to help you make your jobs more efficient, you can use the tool
rseff
installed in the burros (the original name of this tool is
reportseff as found
here). Check all the
options with the command rseff --help
, but below we show an example
where we want to collect the efficiencies of the jobs that run in the
last day. We can see that some jobs had good CPU efficiency (> 80%), not
too bad memory efficiency (~30-50%), but terrible time efficiency in all
cases. When run in the terminal, the efficiencies will be colour coded
for easily spotting (in red), those efficiencies that you should strive
to improve for future jobs.
[user@burro:~]$ rseff --since d=1 -s CD
JobID State Elapsed TimeEff CPUEff MemEff
128652 COMPLETED 02:58:02 19.8% 60.0% 142.4%
128655 COMPLETED 03:14:01 1.0% 1.4% 28.2%
128656 COMPLETED 00:20:33 0.1% 2.3% 55.2%
128659 COMPLETED 00:01:29 0.2% 80.2% 31.2%
128661 COMPLETED 00:01:30 0.2% 83.8% 31.2%
128662 COMPLETED 00:09:30 5.3% 7.2% 109.0%
128663 COMPLETED 00:07:21 4.1% 7.7% 108.7%
Note
In the above rseff
command example, we are getting the efficiencies only
for COMPLETED jobs (-s CD
). This is due to a limitation of Slurm that
can only realiably measure job statistics for jobs that have finished with a
COMPLETED state. If your job reached a TIMEOUT or it was CANCELLED,
the statistics can be woefully wrong. Use the promseff
utility described
above or check your job output for a correct efficiency report in those
cases.
Usage conditions in Open-project "burros"
The default queue system configuration in open-project "burros" closely mirrors that of public "burros", but with some limitations for jobs of "external" users and some extra configuration settings, in order to guarantee that project members experience minimal disruption.
Usage of open-project "burros" by "external" researchers have the following limitations:
they cannot use the
interactive
partition.if the "burro" has GPU cards, "external" users will be able to use them, but as "preemptible" jobs. This means that if a "project" job requires the GPU, the "external" job will be killed and requeued to start afresh when the GPU becomes available again.
the
batch
partition will not accept jobs longer than 2 days (for "project" members the maximum time is 7 days without the need to specify a QOS)the maximum memory usage of all "external" jobs is limited to 50% of the total memory of the machine.
the available disk space is divided in directories
/scratch
and/project
(depending on the hardware available there might be other directories with the same prefix, i.e./scratch1
, etc.). Project members will have access to all directories, while "external" users will have access only to the/scratch
directory, which is configured with quotas, so that external users only make use of a limited portion of the disk (by default 4% of the whole disk, with a minimum of 2TB, but this is configurable by the project itself).
On top of this, the queue system is configured with some extra settings that guarantee that "project" jobs will have much higher priority than "external" jobs. We illustrate this with the three following scenarios:
If the machine is currently completely busy with "project" jobs, any other jobs (either "project" or "external") submitted to Slurm will get in the queue, but "project" jobs, due to their higher priority, will jump to the beginning of the queue even if they were submitted at a later time, guaranteeing that they will be executed first when the resources become available.
If the machine is currently busy executing an "external" job and a "project" job is then submitted, this job will not wait in the queue until the "external" job finishes. Instead, it will start execution concurrently with the "external" job (as far as the total memory requested is not larger than the total memory of the machine), sharing the resources. This obviously will imply some performance reduction, but the "project" job will be able to progress without waiting for the "external" job to finish, which reduces the disruption to the project jobs.
The way that this is implemented means that both jobs will alternatively change from the "S" (Suspended) to the "R" (Running) state every 30 seconds, in such a way that each job will use approximately 50% of the CPU time.
On the contrary, if the machine is currently busy executing a "project" job and an "external" job is then submitted, this will not share resources with the "project" job and instead it will wait in the queue until resources are available.
GPUs
Warning
Please remember when using machines with GPUs that jobs that request their use will be given a much higher priority than non-GPU jobs (which will drop to the bottom of the queue if there are GPU jobs waiting to be executed). Thus, in most cases you are advised to run non-GPU jobs in machines without GPUs.
GPUs are starting to be ubiquitous in research centers due to their high compute capability and low power consumption. They specially excel in vector and matrix operations which are commonplace in a lot of applications, such as machine learning and image processing.
Some of the burros at the IAC have GPUs available. In this documentation we detail how a GPU-ready code can be used in such machines.
Running and compiling
Note
In order to check the utilization of all GPUs available in burros with Slurm you can check the page https://pasa.ll.iac.es/grafana/gpus.
But from the terminal, in burros that have a queue system, you will only get access to the GPUs once you get one or more GPUs allocated (via either an interactive session or a batch job). Until then, you will not have any kind of access to them (not even to see their load). After starting a job that uses a GPU it is useful to monitor its load. Sharing your allocation with another terminal is very useful for this.
Check here how to access the GPUs with Slurm, but in
short, just adding --gres=gpu
to your submit script will grant you one GPU
for the duration of the job.
To check the GPUs utilization, you can use nvtop
or the nvidia-smi
tool. For
instance:
[...@burro] nvidia-smi -q -d UTILIZATION
...
Attached GPUs : 2
GPU 00000000:02:00.0
Utilization # Busy GPU
Gpu : 97 %
Memory : 78 %
...
GPU 00000000:83:00.0
Utilization # Idle GPU
Gpu : 0 %
Memory : 0 %
...
Machine learning applications
We provide a ready-to-use environment for machine learning applications (mainly meant for systems with GPUs, but available also in machines without a graphic card). It is created in conda and can be loaded using Environment modules:
module load ml_py/3.10
As the name suggests, it is based on Python 3.10 and provides the following
versions of common machine learning modules (these versions are installed in
machines with a recent CUDA driver. In older machines the modules might be
different, which you can check with the command pip list installed
) :
Module |
Version |
---|---|
cudnn |
9.3.0.75 |
cupy |
13.0.0 |
h5py |
3.12.1 |
matplotlib |
3.8.4 |
numpy |
1.22.1 |
pandas |
2.0.3 |
pycuda |
2024.1 |
scipy |
1.11.4 |
sympy |
1.13.3 |
tensorflow |
2.13.0 |
torch |
2.1.2 |
Note
Use this environment only for Python work. To use other system applications or tools you are advised to work in another session without loading this environment, in order to avoid possible incompatibilities due to different system and environment library versions.
Note
If you need other packages, we can install them in this environment if they are
sufficiently general. If not, you can also intall packages to your personal area
with the command pip --user install <package>
. The environment is configured
so that your personal packages will be installad in your ~/.local
directory
(see environment variable PYTHONUSERBASE
). The HOME
directory has a small
quota, so this might be sufficient only for small packages, but if disk space
becomes an issue, you might want to change the location for pip
user packages
(either modifying the PYTHONUSERBASE
variable or making its value a symbolic
link to another location.
CUDA programs
In the "burro"s you can use the NVIDIA HPC suite via the modules tool to compile GPU-ready code. These compilers allow you to generate code compatible with different CUDA Toolkit versions. If the compilation version doesn't match the version at execution, you can have runtime errors.
If you are not familiar with this procedure and just want to compile and run in the same "burro", the easiest option is to perform both the compilation and the execution inside a Slurm GPU allocation.
As an example, we show below how to run GPU sample code from the NVIDIA samples in an interactive session:
[user@burro]$ sinter -n 1 -t 00:10:00 --gres=gpu
(sinter) [user@burro]$ module load nvhpc
(sinter) [user@burro]$ nvcc matrixMul.cu -I../../../Common/ -arch=sm_35 -o matrixMul
(sinter) [user@burro]$ ./matrixMul
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "Kepler" with compute capability 3.5
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 264.67 GFlop/s, Time= 0.495 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS
Note
The architecture may need to be specified if there are runtime errors.
In the previous example, the GPU has a compute capability of 3.5 (thus -arch=sm_35
is needed).
You can check the compute capability of the different NVIDIA card in
their website.
Acknowledging the use of the "Burros" in your work
Important
It is important that you acknowledge the use of the IAC Burros. This will help us improve the visibility of this High Performance facility and ensure that they are available at the IAC for the foreseeable future.
Publications
Please acknowledge the use of the IAC "Burros" in any publication of your work where you have used them extensively (and we would be grateful if you could send us the details of the published paper). Although there is no standard acknowledgment format, we suggest the following:
The author(s) wish to acknowledge the contribution of the IAC High-Performance
Computing support team and hardware facilities to the results of this research.
"Informe anual"
The use of the "Burros" should also be acknowledged when filling the "Informe Anual" of the projects. When introducing a refereed publication in the section "Producción Científica", add as a used resource the following: "Supercomputing: Others".
Further information and support
If you need help or you are having any kind of issues related to the use of the "Burros", the SIE gives direct support to IAC's users.
To stay informed about updates to the "Burros", tips, etc., or to ask any questions regarding their use, please use the #computing/burros channel in IAC-Zulip.