Supercomputing is a general term that encompasses
any high-speed computational process, and whose definition changes
as new computing methods are developed. There is a large bibliography
about this topic, but in this page we focus on the advantages that
can be achieved by IAC researchers by using Supercomputing. If you
are interested in theoretical aspects,
this
Wikipedia article about Supercomputing gives you a good overview
and many references and links to further information.
The main reason to use Supercomputing is to get your
computational results in less time.
That time could be reduced by factor of 1.5, 2, 5, 10, 100, 1000, ...
the limit will depend on the restrictions of your problem and program,
the Supercomputing techniques you apply and the available resources
to compute it. Even if time is not a limiting factor in your computations,
using Supercomputing you may be able to work with
much bigger problems than those you were previously able to, in
the same time.
We describe here the available Supercomputing resources
at IAC and some links with more information about them. In March 6th, 2018,
we did a presentation about the Supercomputing resources at IAC, you might
want to take a look of it
(sorry, link only available from IAC's internal network). If you are thinking about using Supercomputing at IAC and need any help, please
contact with us ().
We may help you to choose the best option depending on your problem,
to compile your applications and transfer your data to these environments,
to prepare your submit files and perform the submissions, etc.
Teide-HPC and LaPalma Supercomputers (Parallel Computing)
Parallel Computing is the simultaneous execution of the same
task (split up and specially adapted) on multiple processors in
order to obtain faster results (see also: Parallel
Computing by Wikipedia). If you have a problem that requires a huge amount
of calculations to be processed, but some of those operations are independent
and could be performed at the same time, then you should consider using Parallel
Computing in order to get your results in less time. Algorithms with huge loops
and iterations with no (or a few) dependencies among them (like simulations of galaxies)
are good candidates to be parallelized.
Once you have your parallel code, a Supercomputer is needed in order to run it.
Researches at IAC have access to two Supercomputers, Teide-HPC and La Palma:
Teide-HPC
(Teide High Performance Computing) is a supercomputer located in the Instituto Tecnológico de Energías
Renovables S.A.
(ITER). It is the second most powerful supercomputer in Spain
and it appears in the 169th position (June 2014)
within the Top500 list of the most powerful computers in the world.
It is composed of 1,100 Fujitsu computer servers, with a total of 17,800 computing cores
(featuring the latest in Intel Sandy Bridge processors, allowing to obtain not only the best performance
but also great energy efficiency), 36 TB of memory, a high-performance network and a
parallel system of NetApp storage. According to Top500
it has a theoretical peak performance
of 340.8 TFLOPs with a maximal LINPACK performance achieved of 274.0 TFLOPs.
IAC's users need an account to be able to connect and run their programs in Teide-HPC.
Please, send an email to support (res_support@iac.es) to get this account and
also to solve any issue related to Teide-HPC.
We have also prepared documentation about this machine for IAC's users, it is available
at SIEwiki in vesta
(internal access only).
LaPalma, in its third version LaPalma3, belongs to IAC
and it is one of thirteen nodes located on Spanish territory
linked together to form the Spanish Supercomputing Network (RES).
LaPalma node was previously part of MareNostrum,
one of the most powerful computers in Europe.
In its present status (March 2018), LaPalma has a pick of 83.85 TFLOPS with 4032
cores Intel Xeon E5-2670 and 2GB of RAM per core. The total disk
space is 346 TB and Lustre Parallel Filesystem
is available. LaPalma has a fast Infiniband network (40 Gb/s) for internal communication
(both computation and storage system).
A large set of scientific programs and
libraries are already installed, and it is possible to install new software
packages on demand, if they are compatible and widely used.
50% of the computation time of LaPalma is assigned to the RES, and the other
50% (4,554,547 hours per four-month period) is available
for IAC researchers, who can apply for it at any time, although it is recommendable
to do it in the official periods to get higher priority.
Please, visit the following links for more information (some of them are only available from the internal IAC network):
LaPalma User's Guide:
with complete information about how to use this Supercomputer.
LaPalma Computing Time: where you
can find out how to get an account on LaPalma (depending on whether you are an IAC researcher or not)
and used time per period.
Support: If you have further questions, or have any issue when using LaPalma,
please send an email to res_support@iac.es.
HTCondor (Distributed Computing)
Distributed computing is the process of running a single
computational task on more than one computer (see also: Distributed Computing by Wikipedia). For instance,
suppose we need to reduce some data using an application we have
developed. We have a very large number of sets of data to reduce and
the processing time is some hours per each set. If we compute all sets
one by one in our machine, it may take several weeks to have all
results… Now imagine we have hundreds of machines where we
could run our program reducing different sets of data at the same
time, then we could get all results just in a few hours!! HTCondor
software makes this possible, and it will do all the work for you:
HTCondor will copy the input files to the remote machines, execute
your program there with different data and bring back the results to
your machine when they are complete.
At the IAC the HTCondor system
– High-Throughput Computing (HTC)
system –
is installed in several "burros" and desktop machines, allowing us to
run our applications (shell scripts, astronomical software, our own
programs written in C, Fortran, Python, IDL, Matlab, etc.) in other
computers when they are not being used by their owners. At this time
(Apr. 2023), HTCondor is made up of more than 1000 cores, which are
ready to execute other users' programs when they are idle, i.e. you
could get the equivalent to one month of serial execution in well
below one hour! (this is the theorical maximum, as not all HTCondor
cores are always idle; a more realistic estimation could be an average
of around 500 idle cores).
Please, visit the following links for more information:
HTCondor@IAC:
documentation based on IAC's users' experience with HTCondor that we are continuously
updating, such as useful commands, FAQs, jobs submission examples, etc.
Support: Send us an email (sinfin@iac.es) and we will try to help you as much as possible.
"Burros" (High Performance Linux PCs)
Users who need to run CPU- or memory-intensive jobs, which
are unsuitable for their own PCs or other IAC's Supercomputing resources
(like LaPalma, TeideHPC, HTCondor system, etc.), can access
any of several High performance Linux PCs. They are also suitable for
developing, debugging and testing parallel applications before submitting them
to other Supercomputers. These machines are open to any user and do not require
advanced reservation, but please follow the next simple rules:
Before running any job on them, please check their load (with
uptime or htop):
if it is higher than the number of cores, wait a bit till it goes down before launching your application.
Also check that the load does not exceed the number of cores after your program starts.
If you are testing your parallel codes, check how many cores are being used and
don't take up all the cores.
These machines should be used only when developing or testing your parallel programs:
if you need to run a parallel aplication for hours or days on a large number of cores,
there are better alternatives, such as TeideHPC or LaPalma (please, contact us).
Some of these machines have a huge disk space (about 20TB).
Don't abuse it! There are no backups of your data on any of these machines, so don't use it like a storage system.
Do not forget to delete or move your data to other locations once your executions are done to make room for other users.
Some of these machines are listed here:
Sorry, this list is only available when connecting from the IAC's internal network.
Contact SIE for further details...
Other HPC Resources
There are more resources available in other institutions that can be accessed by researchers at IAC:
Universidad de La Laguna (ULL): Thanks to the close relationship
between IAC and the ULL, our researchers can have access to some of the
resources of the ULL:
GCAP: The "Grupo de Computación de Altas Prestaciones
(GCAP)" of La Universidad de La Laguna
(http://cap.pcg.ull.es)
is open to collaborations in HPC topics. They have several GPUs that
could be used and other resources.
Red Española de Supercomputación (RES): The
RES
is an alliance of 8 organizations and its Supercomputers distributed throughout
Spain which work together and offer since 2006 a High Performance Computing
service to the scientific community. IAC belongs to the RES and LaPalma is
one of the available Supercomputers, but we must mention others that are
one of the most powerful of Spain and also Europe, like MareNostrum III.
You can apply
to get access
and use these Supercomputers, with usually 3 deadlines every year.
Partnership for Advanced Computing in Europe (PRACE): At a higher
level than RES, you can find
PRACE.
It consists of 25 member countries whose representative organizations create
a pan-European Supercomputing infrastructure, providing access to world class
computing and data management resources and services for large-scale scientific
and engineering applications at the highest performance level.
There are some individual institutions that own Supercomputers and
it might be possible to get access to them under certain conditions. Since
those conditions change from time to time, it is not easy to list all these
institutions, but some examples are:
ITER (Tenerife),
CIEMAT (Madrid),
CESGA (Galicia),
EPCC (Edinburgh),
etc.
Most of the Supercomputing Centres and Networks also offer formation
on HPC topics with many courses, education and training programs, seminars,
schools, PhD opportunities, etc. Also there are projects and programmes
to allow the mobility of researches, so they can visit HPC Centres for a
period of time to receive formation in HPC topics and gain access to the
facilities. Some examples of that are the
RES
Education and Training Programs,
PRACE Training Events,
HPC-Europa Mobility
Programme, etc.